MITB Banner

Does Deep Learning Suffer From Too Many Optimizers?

Share

“There is no single optimizer that dominates its competitors across all tasks.”

Critics often call machine learning ‘glorified statistics’. There is some merit to the argument. The fundamental function of any machine learning model is pattern recognition, which relies on the principles of convergence; the methods of fitting data to the model. To that end, neural networks use optimization methods, typically categorised as first-order, high-order and derivative-free. First-order optimization such as gradient descent and its variants are quite popular.

Gradient descent method refers to the idea of updating the variables iteratively in the (opposite) direction of the gradients of the objective function. After each update, the gradient descent method (think mathematical expression) guides the model towards the target gradually, converging to the optimal value of the objective function.

The stochastic gradient method is an unbiased estimate of the real gradient. It reduces the update time while dealing with large numbers of samples and removes a certain amount of computational redundancy. Then there are other variants that claim to do a better job.

Large-scale stochastic optimization drives a wide variety of machine learning tasks. The permutations and combinations of this whole ordeal quickly run out of hands when one stumbles upon scores of benevolent sounding optimizers. Fatigue by abundance is now a serious challenge among researchers. Choosing the right optimization method can be a nightmare. Not to forget the effective tuning of hyperparameters that heavily influences the training speed and final performance of the learned model. These tasks are time and resource-intensive.

Source: Paper by Schmidt et al.

Choosing the optimizer is one of the most crucial design decisions in deep learning, and it is not an easy one. The above illustration shows the number of times ArXiv titles and abstracts mention specific optimizers per year. The growing literature now lists hundreds of optimisation methods. The researchers at the University of Tübingen performed an extensive, standardized benchmark of fifteen popular deep learning optimizers. This paper is one of the few works that focus on large scale benchmarking of the optimizers. According to the researchers, the objective here is to help understand how optimization methods and hyperparameters influence the training performance.  

“While some optimizers are frequently decent, they also generally perform similarly, often switching their positions in the ranking.”

Paper by Schmidt et al.

Aiming for generality, the researchers evaluated the performance on eight diverse real-world deep learning problems from different disciplines. From a collection of more than a hundred deep learning optimizers, the researchers selected fifteen of the most popular choices for benchmarking. “There are enough optimizers,” said the researchers. The authors also noted that the conclusions of this paper might not generalize to other workloads such as GANs, reinforcement learning, or applications where e.g. memory usage is crucial.

The researchers analysed more than 50,000 individual runs and have open-sourced all the baseline results of their experiments. This seminal work underlines the dangers of chasing state-of-the-art hype and highlights the following:

  • There are now enough optimizers.
  • Optimizer performance varies greatly across tasks. 
  • There is no single optimizer that dominates its competitors across all tasks.
  • ADAM and ADABOUND consistently perform well.
  • Different optimizers exhibit a surprisingly similar performance distribution compared to a single method re-tuned or simply re-run with different random seeds. 
  • Having an accurate baseline for optimizers can drastically reduce the amount of computational budget.

Given these results, the researchers question the rationale behind development of new methods when there are more fundamental problems at hand. The researchers hope their experiments will nudge the ML community to “move beyond inventing yet another optimizer and to focus on key challenges, such as automatic, inner-loop tuning for truly robust and efficient optimization.” The researchers also admitted the creators of new optimizers cannot be expected to compare their work with every possible previous method. The baselines of all the experiments have been open-sourced, and the ML community can access the data set that contains 53,760 unique runs, each consisting of thousands of individual data points, such as the mini-batch training losses of every iteration or epoch-wise performance measures, which can be used as competitive and well-tuned baselines for future benchmarks of new optimizers.

Know more here.

PS: The story was written using a keyboard.
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories

Featured

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

AIM Conference Calendar

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives. Revel in intimate events that encapsulate the heart and soul of the AI Industry.

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed