What Are Google Cloud TPU VMs?

Google recently announced new Cloud TPU virtual machines or TPU VMs, which gives direct access to TPU host machines, where developers and researchers can use its industry-leading TPU hardware to build machine learning models. Google is offering a new and improved user experience to develop and deploy TensorFlow, JAX and PyTorch on Cloud TPUs

Here are some of the highlights of Google Cloud TPU VMs: 

  • Write and debug a machine learning model seamlessly using a single TPU VM, then scale it up on a Cloud TPU to take advantage of the super-fast TPU interconnect. 
  • Offers access to every TPU VM you create, so you can install and run any code you wish in a tight loop with TPU accelerators
  • Use local storage, integrate Cloud TPUs into your research, execute custom code in your input pipelines, and production workflows more efficiently. 
  • Write your integrations via a new ‘libtpu’ shared library on the VM, besides integration of TensorFlow, PyTorch, and JAX on Cloud TPU. 

Aidan Gomez, co-founder and CEO of Cohere, said direct access to TPU VMs has changed how we build machine learning models on TPUs. It has significantly enhanced the developer experience and model performance, he added. 

Cloud TPU architecture 

“Until now, you could only access Cloud TPU remotely. Typically, you would create one or more VMs that would then communicate with Cloud TPU host machines over the network using gRPC,” explained Google in its blog post. 

gRPC or grpc remote procedure call is a high-performance, open-source, universal RPC framework that can run in any environment. 

Previous network-attached Cloud TPU system architecture.jpg
(Source: Google) 

However, with the latest launch of Cloud TPU VMs, developers and researchers can now run on the TPU host machines directly attached to TPU accelerators. The pictorial representation is shown below. 

New Cloud TPU VM system architecture.jpg
(Source: Google Cloud) 

Google claimed its new Cloud TPU system architecture is simple and flexible. The developer achieves performance gains because their code no longer needs to make round trips across datacenter networks to reach the TPUs, alongside cost benefits. 

“If you previously needed a fleet of powerful compute engine VMs to feed data to remote hosts in a Cloud TPU Pod slice, you can now run that directly on the Cloud TPU host systems and eliminate the need for additional compute engine VMs,” said Google.

Customer feedback

Since October last year, Google has given early access to a select few customers, alongside several teams of researchers and engineers. 

Patrick von Platen, a research engineer at Hugging Face, said they recently integrated JAX alongside TensorFlow and PyTorch into their Transformer library. This has enabled the natural language processing (NLP) community to effectively train popular NLP models like BERT on Cloud TPU VMs.

Further, Platen believes easy access to Cloud TPU VMs will make pre-training of large language models possible for a much larger audience, including small startups and educational institutions. 

Hugging Face is an open-source provider of NLP technologies and the creator of the popular Transformers library.

The Sanbox@Alphabet team has also adapted TPUs for classical simulations of quantum computers and to perform large-scale quantum chemistry computations. 

Shrestha Basu Mallick, a product manager at the Sandbox@Alphabet team, said, “Thanks to Google Cloud TPU VMs, and the ability to scale from one to 2048 TPU cores, our team has built the most powerful classical simulator of quantum circuits. The simulator can evolve a wave function of 40 qubits, which entails manipulating one trillion complex amplitudes! Scalability has been key to enabling our team to perform quantum chemistry computations of huge molecules, with up to 500k orbitals.” 

Cloud TPU VMs are currently available for preview in the US and Europe regions, priced at $1.35 per hour per TPU host machine. Customers can use single Cloud TPU devices and Cloud TPU pod slices and choose TPU v2 or TPU v3 accelerator hardware.  

Download our Mobile App

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Is Sam Altman a Hypocrite? 

While on the one hand, Altman is advocating for the international community to build strong AI regulations, he is also worried when someone finally decides to regulate it