Data science expert Mark Tenenholtz admits that 90% of models he created are inefficient

The best models for audio datasets are ResNet and EffNet. Tenenholtz justified the usage of image models for audio datasets.

Kaggle Master and senior data scientist Mark Tenenholtz sent a tweet admitting that despite spending thousands of hours on ML models, 90 per cent of the models that he used were ineffective. Tenenholtz then listed the best baseline models for different datasets in the same thread. He underlined the importance of having a good baseline in models, and it was a valuable asset to solve issues with ML models.

https://twitter.com/marktenenholtz/status/1501905740813848582?s=21

For tabular data, Tenenholtz said that XGBoost, LightGBM, or RF models are some of the most commonly used models. Even though ensemble tree-based models can outperform neural networks, XGBoost is the most popular choice among Kagglers. 

For time series data, he stated that models like XGBoost, LightGBM and RF were the best despite them not being built for time series data. Tenenholtz explained that even if the dataset is for tabular data, one could set a prediction horizon that is well-matched to the lag between the input and the output so that the user can control the system better and effectively. Then, the dataset can be treated as tabular data. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

For image datasets, ResNet and EffNet-BO are small and quick models that are effective for nearly any type of image data. A huge advantage of these models is that they can be scaled up and used for greater accuracy. 

DistilRoBERTa is the best model for text datasets. The model offers a combination of speed and accuracy. When scaled up, the accuracy of the model increases. 

The best models for audio datasets are ResNet and EffNet. Tenenholtz justified the usage of image models for audio datasets. He said that he had started audio problems by converting the audio to a spectrogram and combining it with an image model. 

Poulomi Chatterjee
Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR