MITB Banner

The Era Of “CPU For All Tasks” Has Long Ended

Share

 

The era of using a CPU for all computational tasks is long gone. At the Google I/O 2018, the search engine giant announced a new generation of Tensor Processing Unit (TPU) which will help turbocharge their various products. The TPUs will also be available on Google Cloud Platform for enterprises and machine learning researchers for a fee. Sundar Pichai, the CEO of Google announced that the TPUs were at least eight times more powerful than the last edition and go up to 100 petaflops in performance. Reportedly, he also underscored that underscored that high-performance ML is a major key differentiator for them, and that’s true for their Google Cloud customers as well.

When tech companies understood that Moore’s Law no longer worked, they went on to create their own computing chips. Google, along with other tech giants like Microsoft and Tesla, have started building their own electronic chips. Yann LeCun, Facebook’s AI chief has said, “The amount of infrastructure if we use the current type of CPU is just going to be overwhelming.” LeCun went on to warn chip vendors saying, “If they don’t, then we’ll have to go with an industry partner who will build hardware to specs, or we’ll build our own.” Google is also giving out free trial accounts, and the Google Cloud hosting TPU v3 is also available in alpha.

History And The Need For Tensor Processing Unit

TPUs are basically an application-specific integrated circuit (ASIC) chip custom-built for mathematical computing heavy ML tasks. TPUs were designed and deployed by an internal team at Google and as mentioned earlier they have been used in multiple Google products. They are designed keeping in mind how neural networks and other deep learning algorithms work. There is also support for distributed training via TPU leveraging some high-performance computing technologies. John Leroy Hennessy, the legendary computer scientists who has done amazing work in computer architecture, believes that we live in an era where we need to create task-specific chips.

TPU

 

Deep Learning and CPUs

Deep learning works on the concepts of neural networks where neurons are arranged in many layers. There are three distinct layer groups called the input layer, hidden layers and the output layer. There is a “weight” associated with each neuron connection. The crux of learning in neural networks is the learning of the weights in huge neural networks. Sometimes the number of weights or “parameters” that need to be figured out run into millions and billions.

ML researchers over the decades used to run such large computations on CPUs. They can handle doing simple maths calculations, handling database entries, even run engines in an aircraft. But since they can do so many tasks they are not well designed to predict what instruction is going to come next and hence are very slow. There is a well set out algorithm for neural networks and future computations are very predictable but still CPUs because of their fundamental design can’t predict future calculations and hence fail to provide speed up.

 

Deep Learning And TPUs

Compare this to a TPU which is hand designed to work in ML applications where billions of parameters need to be calculated. Google calls it the “domain-specific architecture”. Hence they replaced the general purpose CPU with a matrix processor built only for neural network workloads. In contrast to CPUs, TPUs can’t run multiple applications. They can’t run word processors and other simple applications but they do one thing well — run billions and billions of massive multiplications and additions to support neural networks with emphasis on power saving. This is because TPUs are able to sidestep the problem of von Neumann bottleneck. The TPU does only one thing and hence can predict what kind of operations it will receive in the next instruction.

Deep Learning Layers

Google says, “…We were able to place thousands of multipliers and adders and connect them to each other directly to form a large physical matrix of those operators. This is called systolic array architecture. In case of Cloud TPU v2, there are two systolic arrays of 128 x 128, aggregating 32,768 ALUs for 16-bit floating point values in a single processor.”

TPU loads the parameters from memory into the matrix of multipliers and adders. For once, TPU reads the data from the memory. As execution proceeds it is passed down to the next multipliers which does the summations and hence the output is the summation of all multiplications between data and parameters. No memory access is needed during this process and massive improvements in performance are delivered.

Conclusion

The achievements of TPU underline the impact and importance of “domain-specific architectures”. Apart from other resources the TPUs are capable of saving money for enterprises. Stanford University published DAWNBench contest which closed on April 2018 reported that the lowest training cost by non-TPU processors was $72.40 (for the task of training a ResNet-50 at 93% accuracy with ImageNet using spot instance). The same training on Google Cloud TPU v2, cost around $12.87. Therefore it costs less than 1/5th of non-TPU cost. This is the power of domain-specific architecture for you. Hats off, Google.

Share
Picture of Abhijeet Katte

Abhijeet Katte

As a thorough data geek, most of Abhijeet's day is spent in building and writing about intelligent systems. He also has deep interests in philosophy, economics and literature.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.