Deep Learning, XGBoost, Or Both: What Works Best For Tabular Data?

When asked about his approach to data science problems, Sergey Yurgenson, the Director of data science at DataRobot, said he would begin by creating a benchmark model using Random Forests or XGBoost with minimal feature engineering. A neurobiologist (Harvard) by training, Sergey and his peers on Kaggle have used XGBoost(extreme gradient boosting), a gradient boosting framework available as an open-source library, in their winning solutions. The supremacy of XGBoost is not just restricted to popular competition platforms. It has become the go-to solution for working on tabular data. When it comes to solving classification and regression problems with tabular data, the use of tree ensemble models (like XGBoost) are usually recommended. 

Today, XGBoost has grown into a production-quality software that can process huge swathes of data in a cluster. In the last few years, XGBoost has added multiple major features, such as support for NVIDIA GPUs as a hardware accelerator and distributed computing platforms including Apache Spark and Dask.

However, there have been several claims recently that deep learning models outperformed XGBoost. To verify this claim, a team at Intel published a survey on how well deep learning works for tabular data and if XGBoost superiority is justified. 

The authors explored whether DL models should be a recommended option for tabular data by rigorously comparing the recent works on deep learning models to XGBoost on a variety of datasets. The study showed XGBoost outperformed  DL models across a wide range of datasets and the former required less tuning. However, the paper also suggested that an ensemble of the deep models and XGBoost performs better on these datasets than XGBoost alone. For the experiments, the authors examined DL models such as TabNet, NODE, DNF-Net, 1D-CNN along with an ensemble that includes five different classifiers: TabNet, NODE, DNF-Net, 1D-CNN, and XGBoost. The ensemble is constructed using a weighted average of the single trained models predictions. The models were compared for the following attributes:

  • Accuracy
  • Efficient inference
  • Rate of hyper-parameter tuning (shorter the optimization time, the better).

Datasets used: Forest Cover Type, Higgs Boson, Year Prediction, Rossmann Store Sales, Gas Concentrations, Eye Movements, Gesture Phase, MSLR, Epsilon, Shrutime and Blastchar.

To their surprise, the authors found the DL models were outperformed by XGBoost when datasets were changed. Compared to XGBoost and the full ensemble, the single DL models are more dependent on specific datasets. The authors attributed the drop in performance to selection bias and differences in the optimization of hyperparameters. Now, the obvious next step would be to check the ensemble models. But, which combination? Is it a combination of XGBoost and DL models or an ensemble of non-DL models? The authors suggest picking up a subset of models for ensemble based on the following factors:

  • The validation loss(lower the loss, the better)
  • Highest confidence models (by some uncertainty measure) and 
  • Random order.

XGBoost vs. Other ML Algorithms(Source: Vishal Morde)

Tree based methods like XGB are sample efficient at making decision rules from informative, feature engineered data is one competing theory on the success of XGBoost. It is considered extremely fast, stable, faster to tune and robust to randomness, which is well suited for tabular data. The preferential treatment of XGB over deep learning can be further understood through the lens of manifold learning.

Meanwhile, Dmitry Efimov, who heads the ML centre of excellence at American Express, said the Intel researchers missed out on the preprocessing aspect of neural networks. “From the problems we have solved recently, it’s pretty clear that if you just apply simple normalization to the tabular data and train any neural network, the decision trees would outperform. But if you apply more effort to preprocess data and reduce noisy information from the data, neural networks will outperform. The main question is how much effort you want to apply,” he explained.  Addressing Efimov’s argument on “the right kind of preprocessing“, Bojan Tunguz,  a Kaggle GM and a well known face in the ML community, said that a ‘highly competent’ data scientist can massage data and take advantage of any algorithm’s unique characteristics. “Heck, I can do it in such a way to get a logistic regression to outperform XGBoost!” said Tunguz. 

The debate around deep learning and its relatively simpler alternatives like XGBoost is nothing new. Though this paper tries to explore something that is already widely known (or assumed), the authors do cut some slack for DL models. The Intel researchers admit that results can vary with the hyperparameter optimization process, and the good results are because of the initial hyperparameters of XGBoost (which are already robust), or, the XGBoost model may have some inherent characteristics that make it easier to optimize. The Intel team believes that combining neural networks and XGBoost can outperform other models. Even Tunguz, in his Linkedin post, said “a weighted blend of XGBoost and neural networks is usually the way to go for the majority of problems.”

Key takeaways

While significant progress has been made using DL models for tabular data, the authors concluded they still do not outperform XGBoost, and further research is warranted.

  • In many cases, the DL models perform worse on unseen datasets.
  • The XGBoost model generally outperformed the deep models.
  • No DL model consistently outperformed the others.
  • The ensemble of deep models and XGBoost outperforms the other models in most cases.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

More Stories


8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

Vijaysinh Lendave
A Hands-On Guide to Outlier Detection with Alibi Detect

The detection of dataset elements that differ significantly from the majority of instances is known as outlier detection. There are various visualization methods and statistical tests, such as z-test, Grubb’s test and other algorithms used to detect them.

Victor Dey
Microsoft FLAML VS Traditional ML Algorithms: A Practical Comparison

FLAML is an open-source automated python machine learning library that leverages the structure of the search space in search tree algorithmic problems and is designed to perform efficiently and robustly without relying on meta-learning, unlike traditional Machine Learning algorithms. To choose a search order optimized for both cost and error and it iteratively decides the learner, hyperparameter, sample size and resampling strategy while leveraging their compound impact on both cost and error of the model as the search proceeds.

Vijaysinh Lendave
How To Create Interactive Public Dashboards And Storylines In Tableau?

An increase in data analytics and data integration has made way for more specialized visual analytical tools. Typically files like excel spreadsheets are very good with analytics and visualization, but it has limitations like it can not handle big data, which is our main concern. On the other hand, specialised software leverages easy operation on both static and dynamic data, computational speed, self-service function, and interactive visualization facilitate users to pull up a report or dashboard or storyline and freely deep dive to granular levels of information. 

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM