MITB Banner

How To Build Smaller, Faster, Better Deep Learning Models

Google researcher Gaurav Menghani proposed a method to make ‘deep learning models smaller, faster, and better.

Share

efficient deep learning models

Deep learning has broad applications in sentiment analysis, natural language understanding, computer vision, etc. The technology is growing at a breakneck speed on the back of rapid innovation. However, such innovations call for a higher number of parameters and resources. In other words, the model is as good as the metrics.

To that end, Google researcher Gaurav Menghani has published a paper on model efficiency. The survey covers the landscape of model efficiency from modelling techniques to hardware support. He proposed a method to make ‘deep learning models smaller, faster, and better’.

Challenges

Menghani argues that while larger and more complicated models perform well on the tasks they are trained on, they may not show the same performance when applied to real-life situations.

Following are the challenges practitioners face while training and deploying models:

  • The cost of training and deploying large deep learning models is high. The large models are memory-intensive and leave a bigger carbon footprint.
  • A few deep learning applications need to run in real-time on IoT and smart devices. This calls for optimisation of models for specific devices.
  • Building training models with as little data as possible when the user data might be sensitive. 
  • Off the shelf models may not always be able to address the constraints of new applications.
  • Training and deployment of multiple models on the same infrastructure for different applications may exhaust available resources.

A mental model

Menghani presents a mental model complete with algorithms, techniques, and tools for efficient deep learning. It has five major areas: compression techniques, learning techniques, automation, efficient architectures, and infrastructure (hardware).

Mental model

Compression techniques: These algorithms and techniques optimise the model’s architecture, typically by compressing its layers. One of the most popular examples of compression technique is quantisation, where the weight of a layer is compressed by reducing its precision with minimum loss in quality.

Learning techniques: These algorithms are focused on training the model to make fewer prediction errors, require less data, and converge faster. It also gives scope for trimming the parameters for obtaining a smaller footprint or a more efficient model. One example of this is distillation that allows accuracy improvement of a smaller model by teaching it to mimic a larger one.

 Automation: It helps in improving the core metrics of a given model. Hyper-parameter optimisation (HPO) is an example of an automation tool where optimising hyper-parameters increases accuracy. Another technique is architecture search, where the model architecture is tuned and the search helps find a model that optimises both loss and accuracy.

Efficient Architectures: These are fundamental blocks designed from scratch. Such models are superior to the baseline methods such as connected, fully connected layers and RNNs.

Infrastructure: Building an efficient model also needs a strong foundation of infrastructure and tools. It includes model training frameworks such as TensorFlow, PyTorch, etc.

Guide to efficiency

Menghani said the ultimate goal is to build Pareto-optimal models. Pareto efficiency means the resources are allocated in the most economically efficient way. To build Pareto-optimal models, practitioners must achieve the best possible result in one dimension while holding the other constant. Typically, one of these dimensions is quality (metrics such as accuracy, precision, and recall) and the other one is footprint (metrics such as model size, latency, RAM)

He suggests two methods to achieve the Pareto-optimal model:

Shrink and improve footprint sensitive models: This is a useful strategy for practitioners who want to reduce the footprint of the model without compromising on the quality. Shrinking can be useful for on-device deployment and server-side optimisation. Ideally, it should be minimally lossy, but in a few cases, reducing capacity can be compensated by the improve phase.

Grow-improve and shrink for quality sensitive models: This strategy can be used when a practitioner wants to deploy better quality models while keeping the same footprint. Here the capacity is first added by growing the model and then improved via learning techniques, automation etc; post this, the model is shrunk back.

Read the full paper here.

Share
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.