“To resolve the efficiency challenge, AI software must communicate at a lower level with the hardware.”Graphcore
For at least three decades, the AI community has patiently waited for Moore’s law to catch up. With the advent of GPUs, TPUs and other exotic silicon supplements, research accelerated. Machine learning models became more efficient. But, this efficiency came at a cost. The models got larger. For instance, language models with billion parameters like GPT and BERT outperformed other models. As the research moved from labs to enterprises, the heft of such models has started to become an issue. Smaller organisations have no other option than to rely on pre-trained models or use licensed versions like in the case of OpenAI’s API, which gives access to its powerful GPT-3 model.
Hidden technical debts in ML
Data comes in different formats–images, video, text, and tabular. A typical ML engineer spends significant time on “feature engineering”. And building a data integration pipeline is no small task. Additionally, velocity requirements (i.e. processing time or real-time low latency demands) may call for big data techniques such as streaming processing. This adds numerous challenges to the task of building an extensive data DL system. Apart from extraction, transforming and loading (ETL) data, new distributed training algorithms might be needed. Deep learning techniques are not trivially parallelised and again need special supporting infrastructure.
Now, these processes have taken the automated route–AutoML. According to Peltarion, a no-code AI company, a critical difference between ML systems and non-ML systems is that data partly replaces code in an ML system: A learning algorithm is used to automatically identify patterns in the data instead of writing hard-coded rules.
Working with distributed systems, data processing such as Apache Spark, Distributed TensorFlow or TensorFlowOnSpark, adds complexity. The cost of associated hardware and software go up too.
Traditional software engineering typically assumes that hardware is at best a non-issue and at worst a static entity. In the context of machine learning, hardware performance directly translates to reduced training time. So, there is a great incentive for the software to follow the hardware development in lockstep.
“Because machine intelligence computing is so different, software has to work harder in AI and ML than it does in many other areas.”Graphcore
Deep learning often scales directly with model size and data amount. As training times can be very long, there is a powerful motivation to maximise performance using the latest software and hardware. Changing the hardware and software may cause issues in maintaining reproducible results and run up significant engineering costs while keeping software and hardware up to date.
Building production-ready systems with deep learning components pose many challenges, especially if the company does not have a large research group and a highly developed supporting infrastructure. However, recently, a new breed of startups have surfaced to address the software-hardware disconnect.
For Luis Ceze of OctoML, the biggest pain point is bridging the gap between data scientists and software engineers to deploy ML models effectively. According to Ceze, ML models are composed of high-level specifications of the model architecture, which need to be carefully translated into executable code, creating significant dependencies on the frameworks like TensorFlow, PyTorch and the code infrastructure.
With the growing set of hardware options such as GPUs, TPUs and other ML accelerators, the portability problem only worsens as each of these hardware variants requires manual tuning of low-level code to enable good performance. And that has to be redone as models evolve. The largest tech companies solve this problem by throwing resources at it, but that’s not a sustainable solution for them — or a possible solution for most.
For example, Apache TVM uses machine learning to optimise code generation. Since it can’t rely on human intuition and experience to pick the right parameters for model optimisation and code generation, it searches for the parameters in a very efficient way by predicting how the hardware target would behave for each option.
According to Ceze, machine learning software stacks are significantly fragmented at the data science framework level (TensorFlow, PyTorch etc) and at the systems software level needed for production deployment, such as NVIDIA’s cuDNN.
There are no appropriate CI/CD integrations to keep up with the model changes. OctoML’s open source solutions can make it easier for any ML developer to build models without burdening themselves with the hardware backend. This whole subdomain of affordable, efficient ML deployment has gained traction of late. While some call it MLOps, others call it AIOps. Regardless of their monikers, the deep learning community has realised that the time is ripe to decouple software-hardware dependencies for progress; one reason why companies like OctoML have been successful in attracting investors. Firms like Graphcore have been bullish on this phenomena. The team at Graphcore develop customised AI chips. The word “customised” leans more towards the software end of the business. Dave Lacey, the chief architect at Graphcore, believes the best software not only makes AI processors much easier to use for developers but can also harness the full potential of the underlying hardware. “In future, the best AI chips will be those with the best software,” he said.