GPUs were introduced in the late 1990s, originally to accelerate the 3D rendering tasks in video games. Over the years, the scope of GPUs has expanded, and now we can see them be used in domains such as deep learning, artificial intelligence, and other high-performance related applications. However, major limitations of using specialised hardware like GPU or TPU are the complexities they introduce into hardware and software infrastructures as “added devices,” complexities that in the end translate into costs.
What if there was a way to use good old commodity CPUs while delivering the speed and efficiency of a GPU?
Sign up for your weekly dose of what's up in emerging technology.
Neural Magic, founded by Nir Shavit and Alexander Matveev, is attempting to do just that. Analytics India Magazine recently interviewed Nir Shavit, who is also an MIT professor. Prof Shavit is a co-author of the book The Art of Multiprocessor Programming. He won the 2004 Gödel Prize in theoretical computer science for his work on applying tools from algebraic topology to model shared memory computability and the 2012 Dijkstra Prize in Distributed Computing for the introduction of Software Transactional Memory.
CPU vs GPU
“My primary field of work has been multicore processing. Back then, I was working on a project with Harvard University in computational neurobiology. For this project, we had to run machine learning algorithms at very large scales to extract neural structures in mouse brains from electron microscopy images. Being multicore guys, we decided to attempt designing the machine learning algorithms first on CPUs instead of GPUs, and later move them to GPUs. We observed that our algorithms, running on a multicore CPU delivered the same performance as the best GPU at that time, the Pascal” said Prof Shavit.
This experiment gave birth to the idea for Neural Magic – a company that would deliver GPU speeds for running neural networks on commodity CPUs. Parallel computing, which GPUs offer, is considered to be the best way to carry out large amounts of instructions.
Drawing further on the analogy with the human brain, Prof Shavit said, “If we think that AI will take the path of mimicking human intelligence, we would have to focus on little compute and more memory. Our brain really is a device that uses specific areas for different tasks at different times. If all your brain was running at once, you could fry an egg on it — it would get so hot! Fortunately, our brain functions sparsely. Similarly, we have these big models that run through all the layers of the network at the same time; it is ridiculous.
This is not sustainable, and soon we will have large models, but different parts of it will fire at different times. This is where we are headed, and that’s my prediction. Devices like CPUs are very good at caching while having a very large memory, and such devices are best suited for machine learning. I’m not saying that the CPU today has all the features that it needs but from an architectural point of view, I think it’s better suited for the future than 1000s of tiny cores, all running in parallel in synchronous mode. In this context, it is important to talk about Google’s Pathways model performing tasks on part of the model while remaining sparse in terms of computation. The models will get even more sparse.”
Future of ML is hardware agnostic
The structure of GPU was originally designed for the acceleration of image processing. In this application, parallelism is very important as developers need to render images in a short amount of time. But when it comes to machine learning applications, Prof Shavit said that GPUs only offer a marginal difference over CPUs. While hardware is important in machine learning, more efficient software makes a model superior. “Algorithms have many more orders of magnitude of freedom in which to look for better solutions than hardware. Once you design a piece of hardware, you have to now fit the software onto it. You are, in a way, committed to that piece of hardware. But if we use a general-purpose CPU, we can circumvent this challenge,” said Prof Shavit. Interestingly, he cites the work of NVIDIA and calls it a great software company, despite its reputation as one of the leading chipmakers in the world.
What is next for Neural Magic?
Prof Shavit, citing Marc Andreesen of Andreessen Horowitz (Silicon Valley-based VC firm that has invested in Neural Magic), said, ‘software will eat the world‘. He added that this upcoming trend offers the flexibility just to download, install and run programs. This is the principle on which Prof Shavit’s company runs too. “As opposed to buying expensive GPUs for your machine learning tasks, you can go to our (Neural Magic) website and download software and run the inference immediately on commodity CPUs including the ones in your laptop. This makes a big difference, and with time, people are going to see its advantage,” he said.
Although a niche field, there are other competitors of Neural Magic; what separates them from other companies is that they deliver the same kind of performance as the accelerators. Apart from that, the tools offered by Neural Magic are flexible and easy to use.
Currently, the company is focused on building a community. All the software is open-sourced, except for the core engine. “We would like people to come, develop their models and do all the machine learning that they need on the Neural Magic Inference Engine, said Prof Shavit. Neural Magic raised USD 30 million in the Series A funding round in October 2021, bringing the total raised to over 50 million.
Speaking about the company’s long term goals, he spoke about the upcoming wave of democratised software systems structure. “You need to be able to move your application, containerised or other, from place to place; it should be very easy to do that. This is where we’re headed. You don’t have to start renting or buying GPUs and installing them. You download the software install, and you’re on your way. This is the vision towards which we are working,” he concluded.