Deep learning (DL) models are known for tackling the nonlinearities associated with data, which the traditional estimators such as logistic regression couldn’t. However, there is still a cloud of doubt with regards to the increased use of computationally intensive DL for simple classification tasks. To find out if DL really outperforms shallow models significantly, the researchers from the University of Pennsylvania experiment with three ML pipelines that involve traditional methods, AutoML and DL in a paper titled, ‘Is Deep Learning Necessary For Simple Classification Tasks.’
How Valid Is The Argument Of AutoML vs DL
The UPenn researchers stated that a support-vector machine (SVM) model might predict more accurately susceptibility to a certain complex genetic disease than a gradient boosting model trained on the same dataset. Moreover, choosing different hyperparameters within that SVM model can vary performances. These hyperparameter optimisations are usually done with the help of prior experience, brute-force search. This is where AutoML comes into the picture.
AutoML provides methods for automatically selecting these options from a multitude of possible architecture configurations.
For the experiments, the authors consider TPOT (Tree-based Pipeline Automation Tool), a Python-based AutoML tool that uses genetic programming to identify optimal ML pipelines for either regression or classification on a given (labeled) dataset.
The above figure illustrates the model pipelines used in this study where to the left, we can see a pipeline of deep learning strategy with no AutoML. In the middle, we have a standard (no neural networks) TPOT pipeline containing a logistic regression classifier and a kernel SVM classifier. To the right, a TPOT-NN pipeline containing two multilayer perceptron estimators.
The tree based tool, wrote the authors, maintains a balance between high performance and low model complexity.
The three model pipelines were tested on 6 different datasets of hills and valleys, and the results show that neural network estimators had the lowest average prediction accuracy while the TPOT-NN pipelines performed best. In spite of having a neural network in the mix, the TPOT-NN pipelines performed only slightly better than the standard ones.
However, TPOT-NN’s outperformed the LR classifier in most cases.
Training time, which has been a key differentiator for deep learning approaches, was compared too. The researchers found that the total training time for TPOT pipelines ranged from
4h 22m to 8d 19h 55m, with a mean training time of
1d 0h 49m. To train a single pipeline, on average, there was an increase of 629% for neural network pipelines and 336% for TPOT-NN pipelines versus standard TPOT.
The parameters for this tree based automation tool included 100 training generations with a population size of 100 in each generation for evaluation.
The study states that the NN-only models yielded highly inconsistent performance on several datasets. However, this is unsurprising as these datasets contain sequence data, where an estimator must be able to identify a ‘hill’ or a ‘valley’ that could occur at any location in the sequence of 100 features.
One possible explanation for the large difference in classification accuracy variance between NN-only and TPOT-NN was attributed to the better performance of heuristic and ensemble methods consistently.
Based on the results the authors conclude that AutoML (and TPOT-NN, in particular) maybe useful for discovering new neural network “motifs” to be composed into larger networks. And, by repeating the internal architecture to a final depth of 152 hidden layers, the result is virtually identical to the version of ResNet.
Though DL on its own might have been outperformed by the AutoML tool, the authors recommend an integrated approach towards AutoML and DL rather than their divided usage. Furthermore, they stated that the addition of multilayer perceptron classifiers into the pool of available operators improves the performance of AutoML.
Link to the paper here.