GPUs (graphic processing units) are chips built for parallel processing. Originally developed to accelerate the 3D rendering tasks in video games, GPUs have found major applications in deep learning, artificial intelligence and high-performance computing.
GPUs have become more flexible and programmable over time and emerged as the top choice architecture for deep learning models: The hundreds of simple GPU cores work parallelly to scale down the time required for intensive computations.
In 2011, NVIDIA and Google collaborated on a study: A computer vision model was trained on both CPUs and GPUs to distinguish cats from people. The model used 2,000 CPUs to achieve the same performance as 12 GPUs.
That said, we can’t write off CPUs for deep learning. We find out why.
CPUs for deep learning
GPUs cannot function as standalone chips and can only perform limited operations. Thanks to their sparse cache memory, the bulk of the data has to be stored off-chip, leading to a lot of back and forth for data retrieval. The resulting computational bottleneck caps the speed at which GPUs can run deep learning algorithms.
While parallelism makes GPUs a good choice for deep learning, CPUs offer unique advantages. For example, task optimisation is much easier in the CPU than GPU. Though CPUs have fewer cores, the architecture is powerful and can carry out different instructions (MIMD architecture). GPU cores are organised in the blocks of 32 cores and execute the same instructions parallelly. However, parallelisation in dense networks is highly complicated. Therefore, complex optimisation techniques are more difficult to execute in GPU than CPU. Further, the power cost of the GPU is higher compared to CPUs.
US-based AI startup Neural Magic’s suite of products allows clients to deploy deep learning models without needing specialised hardware–making AI more accessible and lowering the cost. MIT professor Nir Shavit was working on research to reconstruct a brain map of a mouse when he hit on the idea. Shavit, having no knowledge on GPUs, had to opt for CPUs to execute the deep learning part of the research. “I realised that a CPU can do what a GPU does—if programmed in the right way,” he said. He parlayed the idea into a business, and Neural Magic was born.
In 2020, a group of researchers proposed the Sub-Linear Deep Learning Engine (SLIDE), which blends randomised algorithms with multi-core parallelism and workload optimisation and uses just one CPU. The researchers found SLIDE outperformed an optimised implementation of Tensorflow on the best GPU available. With fully connected architectures, training SLIDE on a 44 core CPU is 3.5 hours faster than the same system trained using Tensorflow. On the same CPU hardware, SLIDE is ten times faster than Tensorflow.
CPUs offer reasonable speeds on a range of applications. Easy availability, software support and portability make CPUs a good choice for DL applications. Even companies like Amazon, Facebook, Google, Microsoft, and Samsung are benchmarking and optimising deep learning on CPUs.
However, the challenge of optimising deep learning applications on CPUs requires careful matching of the strength of CPUs with the architectural characteristics of the application. Systems at different scales (mobile and data-centre, single vs multi-node systems) have different properties and challenges, as do different deep learning algorithms/applications such as CNNs and RNNs) and inference and training in DL.
In 2019, AI-based PQ Labs introduced MagicNet which ran deep learning applications using CPU 199 times faster than GPU. Researchers showed MagicNet running at 718 fps on Intel i7 and Tiny Yolo running 292 fps on NVIDIA TITAN X or 1080Ti graphics card achieved the same accuracy.
Of late, Israel-based deep learning startup Deci has achieved breakthrough performance using CPUs. The firm’s image classification models, called DeciNets, are used on Intel Cascade Lake CPUs.