MITB Banner

This Is What Google TensorFlow Is Giving Away For Free Now

Share

After Quantization Aware Training (QAT) and Model Maker, tech giant Google has now open-sourced TensorFlow Runtime (TFRT), a new runtime that will replace the existing TensorFlow runtime. This new runtime will be responsible for various performance such as efficient execution of kernels, low-level device-specific primitives on targeted hardware and other such.  

Machine learning is a complex domain as building or deploying these models keep changing with the dynamic needs due to the increasing investment in the ML ecosystem. While the researchers at TensorFlow have been inventing new algorithms that require more compute, application developers are enhancing their products with new techniques across edge and server.

However, the increase in computation needs and rise of computing costs has sparked a proliferation of new hardware aimed at specific ML use cases. According to the developers, the TensorFlow RunTime aims to provide a unified, extensible infrastructure layer with performance across a wide variety of domain-specific hardware.

At the virtual TensorFlow Dev Summit 2020 in March, Megan Kacholia, VP Engineering at Google, Google Brain and TensorFlow, made several interesting announcements including TensorFlow 2.2 pre-release, Model Maker, T5 (Talk-to-Text Transfer Transformer) and TFRT. During the summit, Megan stated, “The new TensorFlow Runtime won’t be exposed to TFRT as a developer or a researcher, but it will be working under the cover to provide the best performance possible across a wide variety of domain-specific hardware.”

Behind TensorFlow Runtime (TFRT)

TFRT is a new runtime that provides efficient use of multithreaded host CPUs, supports fully asynchronous programming models, and focuses on low-level efficiency. According to the developers, this new runtime include three design highlights as mentioned below

  • To Achieve Higher Performance: To achieve higher performance, this new runtime has a lock-free graph executor that supports concurrent op execution with low synchronisation overhead. It also includes a thin eager op dispatch stack in order to make eager API calls asynchronous as well as more efficient.
  • To Enable Extending the TF Stack Easier: This is done by decoupling device runtimes from the host runtime, the core TFRT component that drives host CPU and I/O work.
  • To Get Consistent Behavior: To make sure of the consistent behaviour, this new runtime leverages common abstractions, such as shape functions and kernels, across both eager and graph. 

How It Works

Unlike the existing TensorFlow runtime, this new runtime plays a crucial part in both eager and graph execution. According to the developers, in eager execution, TensorFlow APIs call directly into the new runtime. While in graph execution, the computational graph of a program is lowered to an optimised target-specific program and dispatched to TFRT. In both execution paths, the new runtime invokes a set of kernels that call into the underlying hardware devices to complete the model execution.

Wrapping Up

As part of a benchmarking study for TensorFlow Dev Summit 2020, developers at the tech giant integrated TFRT with TensorFlow Serving and measured the latency of sending requests to the model and getting prediction results back. Comparing the performance of GPU inference over TFRT to the current runtime, the developers noticed an improvement of 28% on average inference time. They stated that the early results are strong validation for this new runtime, and it is expected to provide a big boost to the performance. The project has been made available on GitHub.

According to the developers at TensorFlow, this new runtime will help researchers, application developers and hardware makers who are looking for faster iteration time and better error reporting when developing complex new models in eager mode, improved performance when training and serving models in production and to integrate edge and datacenter devices into TensorFlow in a modular way respectively.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.