An AI accelerator is a kind of specialised hardware accelerator or computer system created to accelerate artificial intelligence apps, particularly artificial neural networks, machine learning, robotics, and other data-intensive or sensor-driven tasks. They usually have novel designs and typically focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability.
As deep learning and artificial intelligence workloads grew in prominence in the last decade, specialised hardware units were designed or adapted from existing products to accelerate these tasks, and to have parallel high-throughput systems for workstations targeted at various applications, including neural network simulations. As of 2018, a typical AI integrated circuit chip contains billions of MOSFET transistors.
Hardware acceleration has many advantages, the main being speed. Accelerators can greatly decrease the amount of time it takes to train and execute an AI model, and can also be used to execute special AI-based tasks that cannot be conducted on a CPU. Here, we look at the most popular hardware AI accelerators.
Graphics Processing Unit (GPU)
Graphics processing unit is a specialised chip that can do rapid processing, primarily for the purpose of rendering images. They have become a key part of modern supercomputing. They have been used in growing new hyperscale data centres and have become accelerators, speeding up all sorts of tasks – from encryption, to networking, to AI. GPUs have sparked an AI boom, become a key part of modern supercomputers, and continue to drive advances in gaming and pro graphics.
Modern GPUs are great at handling computer graphics and image processing. Their extremely parallel structure makes them more valuable than general-purpose central processing units (CPUs) for algorithms that process huge blocks of data in parallel. Multiple GPUs are utilised on supercomputers, on workstations to expedite processing multiple videos at once and 3D rendering, for VFX and for simulations, and in AI for training workloads. In contrast to a CPU, NVIDIA GPUs, for example, contain chips that have what are known as CUDA Cores, and each one of these cores is a tiny processor that can execute some code.
Vision Processing Unit (VPU)
A vision processing unit (VPU) is a rising class of microprocessor, and a particular type of AI accelerator intended to quicken machine vision tasks. The vision processing unit is reported as more fitting for performing various kinds of machine vision algorithms. These tools may be designed with particular resources for capturing visual data from cameras, and are built for parallel processing. Some of these tools are low power and high performance and may be plugged into interfaces that enable programmable use.
Vision processing units are fit for performing machine vision algorithms such as CNN (convolutional neural networks), SIFT (Scale-invariant feature transform) and other similar ones. They may include direct interfaces to take data from cameras (bypassing any off-chip buffers) and have a greater emphasis on on-chip data flow between many parallel execution units.
The factors driving VPUs include the growing adoption of smartphones, increasing adoption of edge AI, and expanding demand for advanced computing capacities for computer vision. One example of a VPU is Intel’s Movidius Myriad X VPU which is being used in many of the edge devices. Target markets are robotics, the internet of things, new classes of devices for AR/VR, and integrating machine vision acceleration into smartphones and other mobile devices.
Field-Programmable Gate Array (FPGA)
A field-programmable gate array (FPGA) is an integrated circuit (IC) made to be configured by a customer or a designer after manufacturing, which is why it is called “field-programmable”. FPGAs include a range of programmable logic blocks and a hierarchy of “reconfigurable interconnects” that enable the blocks to be connected together like many logic gates that can be inter-wired in various configurations.
FPGAs can be beneficial over GPUs in terms of interface flexibility and enhanced by the integration of programmable logic with CPUs and standard peripherals. GPUs, on the contrary, are optimised for parallel processing of floating-point operations utilising thousands of small cores. They also provide big processing capabilities with larger power efficiency. FPGAs, which can do a wide range of logical functions simultaneously, are being considered unsuitable for emerging technologies such as self-driving cars or deep learning applications.
Today’s field-programmable gate arrays (FPGAs) have big resources of logic gates and RAM blocks to implement complex data computations. Due to their programmable characteristics, FPGAs are a perfect fit for many different markets. FPGAs can be reprogrammed to the wanted application or functionality needs after manufacturing. This feature separates FPGAs from Application-Specific Integrated Circuits (ASICs), which are custom produced for particular design tasks.
FPGAs are increasingly applied to expedite AI workloads in data centres for jobs like machine learning inference. Many hardware companies like Xilinx have launched their FPGA products as latest datacenter accelerator cards as satisfying increasing business demand for heterogeneous architectures and performance advances as customers work on more AI workloads.
Application-Specific Integrated Circuit (ASIC)
A whole category of AI hardware accelerator is gaining prominence, with something called an application-specific integrated circuit (ASIC). ASICs employ strategies such as optimised memory use and the use of lower precision arithmetic to accelerate calculation and increase the throughput of computation. Some adopted low-precision floating-point formats used AI accelerations are half-precision and the bfloat16 floating-point format. Hardware acceleration is used to speed up the computing processes present in an AI workflow.
For example, Intel released Nervana, an ASIC for inference and support for a large amount of parallelisation in server settings. It has also revamped the chip structure considerably and built them on a 10nm manufacturing process. ASICs hold numerous advantages, with the main being speed. Accelerators can minimise the amount of time it takes to train and execute an AI model, and can also be used to execute special AI-based tasks.
Tensor Processing Unit (TPU)
A tensor processing unit (TPU) is a specialised circuit that implements all the necessary control and arithmetic logic necessary to execute machine learning algorithms, typically by operating on predictive models such as artificial neural networks (ANNs) or random forests (RFs).
Google launched TPUs in the year 2016. TPUs, unlike GPUs, are custom-designed to handle operations like matrix multiplications in neural network training. The power Google TPUs can be reached in two types, which are cloud TPU and edge TPU. Cloud TPUs may be accessed from Google Colab notebook, which gives users with TPU pods which sit on Google’s data centres. On the other hand, edge TPU is a custom-built development kit which can be utilised to create specific applications.
Tensors are multi-dimensional arrays or matrices and are fundamental units which can hold data points like weights of a node in a neural network in a row and column format. Basic calculation operations are performed on tensors. TPUs were utilised in the known DeepMind’s AlphaGo, where AI beat the world’s best Go player. It was also applied in the AlphaZero system, which produced Chess, Shogi and Go-playing programs.