MITB Banner

Google Releases TensorFlow API To Develop Smaller & Faster ML Models

Share

Illustration by Google CEO Sundar Pichai speaks during the Google I/O 2019 keynote session at Shoreline Amphitheatre in Mountain View, California on May 7, 2019. (Photo by Josh Edelson / AFP) (Photo by JOSH EDELSON/AFP via Getty Images)

The TensorFlow Model Optimization team from Google recently released Quantization Aware Training (QAT) API as part of the TensorFlow Model Optimization Toolkit. According to the team, the API will enable training and deploying machine learning models with improved performance; these would be compact despite maintaining maximum accuracy.

Quantization is the technique of transforming a machine learning model into an equivalent representation, which uses parameters and computations at a lower precision. This technique helps in improving the execution performance, as well as the efficiency of the AI model. Furthermore, this technique allows an AI model to execute on specialized neural accelerators, which often has a restricted set of data types, such as Edge TPU in Coral. 

Quantization Aware Training (QAT) API

Quantization Aware Training surpasses inference-time quantization, creating a model that downstream tools will use to produce quantized models. These quantized models usually use lower-precision, which provides benefits during the deployment of a model. The technique can be used in production in speech, vision, text, and translate use cases.

The researchers trained QAT accuracy numbers with the default TensorFlow Lite configuration and contrasted with the floating-point baseline and post-training quantized models. This showed that the QAT-trained models have comparable accuracy to floating-point.

Why Use QAT API

As mentioned earlier, quantization transforms a machine learning model into an equivalent representation that uses parameters and computations at a lower precision. However, the process of going from higher to lower precision can result in lossy and noisy outcomes. 

This is because quantization squeezes a small range of floating-point values into a fixed number of information buckets. The parameters or weights of a model can only take a small set of values, and the minute differences between them are lost. This, in result, leads to information loss and introduces computational errors.

Quantization Aware Training overcomes this loss issue by stimulating low-precision inference-time computation in the forward pass of the training process. Using this API, the AI model learns parameters that are more robust to quantization. 

Features of Quantization Aware Training 

The goal of this API is to reduce the size, latency as well as consumption of power while maintaining negligible accuracy loss. Quantization Aware Training (QAT) can be used in production in speech, vision, text, and translation of use cases. According to the team, this tool can also be useful for researchers and hardware designers who may want to experiment with various quantization strategies and simulate how quantization affects accuracy for different hardware backends.

The QAT API is flexible and capable of handling complicated use cases. For instance, this API allows a user to control quantization precisely within a layer, create custom quantization algorithms, and handle any custom layers that have been written.

The QAT API provides a simple and highly flexible way to quantize any TensorFlow Keras model, which makes it easy to train with “quantization awareness” for an entire model or only parts of it, then export it for deployment with TensorFlow Lite.

Steps To Quantize the Entire Keras Model

Click here to know more.

API Compatibility

Users can apply quantization with the following APIs:

  1. Model building: tf.keras with only Sequential and Functional models
  2. TensorFlow versions: TF 2.x for tf-nightly
  3. TensorFlow execution mode: eager execution

Wrapping Up

By default, the QAT API is configured to work with the quantized execution support available in TensorFlow Lite. Furthermore, the TensorFlow Team will enhance the QAT API by adding features like model-building to clarify how sub-classed models have limited to no support, distributed training, model coverage to include RNN/LSTMs and general Concat support, hardware acceleration to ensure the TFLite converter can produce full-integer models, and more.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.