This Is What Google TensorFlow Is Giving Away For Free Now

After Quantization Aware Training (QAT) and Model Maker, tech giant Google has now open-sourced TensorFlow Runtime (TFRT), a new runtime that will replace the existing TensorFlow runtime. This new runtime will be responsible for various performance such as efficient execution of kernels, low-level device-specific primitives on targeted hardware and other such.  

Machine learning is a complex domain as building or deploying these models keep changing with the dynamic needs due to the increasing investment in the ML ecosystem. While the researchers at TensorFlow have been inventing new algorithms that require more compute, application developers are enhancing their products with new techniques across edge and server.

However, the increase in computation needs and rise of computing costs has sparked a proliferation of new hardware aimed at specific ML use cases. According to the developers, the TensorFlow RunTime aims to provide a unified, extensible infrastructure layer with performance across a wide variety of domain-specific hardware.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

At the virtual TensorFlow Dev Summit 2020 in March, Megan Kacholia, VP Engineering at Google, Google Brain and TensorFlow, made several interesting announcements including TensorFlow 2.2 pre-release, Model Maker, T5 (Talk-to-Text Transfer Transformer) and TFRT. During the summit, Megan stated, “The new TensorFlow Runtime won’t be exposed to TFRT as a developer or a researcher, but it will be working under the cover to provide the best performance possible across a wide variety of domain-specific hardware.”




Behind TensorFlow Runtime (TFRT)

TFRT is a new runtime that provides efficient use of multithreaded host CPUs, supports fully asynchronous programming models, and focuses on low-level efficiency. According to the developers, this new runtime include three design highlights as mentioned below

  • To Achieve Higher Performance: To achieve higher performance, this new runtime has a lock-free graph executor that supports concurrent op execution with low synchronisation overhead. It also includes a thin eager op dispatch stack in order to make eager API calls asynchronous as well as more efficient.
  • To Enable Extending the TF Stack Easier: This is done by decoupling device runtimes from the host runtime, the core TFRT component that drives host CPU and I/O work.
  • To Get Consistent Behavior: To make sure of the consistent behaviour, this new runtime leverages common abstractions, such as shape functions and kernels, across both eager and graph. 

How It Works

Unlike the existing TensorFlow runtime, this new runtime plays a crucial part in both eager and graph execution. According to the developers, in eager execution, TensorFlow APIs call directly into the new runtime. While in graph execution, the computational graph of a program is lowered to an optimised target-specific program and dispatched to TFRT. In both execution paths, the new runtime invokes a set of kernels that call into the underlying hardware devices to complete the model execution.

Wrapping Up

As part of a benchmarking study for TensorFlow Dev Summit 2020, developers at the tech giant integrated TFRT with TensorFlow Serving and measured the latency of sending requests to the model and getting prediction results back. Comparing the performance of GPU inference over TFRT to the current runtime, the developers noticed an improvement of 28% on average inference time. They stated that the early results are strong validation for this new runtime, and it is expected to provide a big boost to the performance. The project has been made available on GitHub.

According to the developers at TensorFlow, this new runtime will help researchers, application developers and hardware makers who are looking for faster iteration time and better error reporting when developing complex new models in eager mode, improved performance when training and serving models in production and to integrate edge and datacenter devices into TensorFlow in a modular way respectively.

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Download our Mobile App

MachineHack

AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR