PyTorch, TensorFlow, Caffe2, MXNet etc are the most popular frameworks for deep learning models. Each framework has its own pros and cons. But, what if we can combine the advantages of all these frameworks to optimise DL models? Open Neural Network Exchange (ONNX) was the result of this lightbulb idea.
What is the ONNX standard?
In September 2017, Microsoft and Facebook introduced the ONNX format — a standard for deep learning that enables models to be transferred between different frameworks. ONNX breaks the dependence between frameworks and hardware architectures. It has very quickly emerged as the default standard for portability and interoperability between deep learning frameworks.
Before ONNX, data scientists found it difficult to choose from a range of AI frameworks available. Developers may prefer a certain framework at the outset of the project, during the research and development stage, but may require a completely different set of features for production. With no concrete solution to these problems, companies were forced to resort to creative and often cumbersome workarounds, including translating models by hand.
Sign up for your weekly dose of what's up in emerging technology.
ONNX standard aims to bridge the gap and enable AI developers to switch between frameworks based on the project’s current stage. Currently, the models supported by ONNX are Caffe, Caffe2, Microsoft Cognitive toolkit, MXNET, PyTorch. ONNX also offers connectors for other standard libraries and frameworks.
“ONNX is the first step toward an open ecosystem where AI developers can easily move between state-of-the-art tools and choose the combination that is best for them,” Facebook had said in an earlier blog. It was specifically designed for the development of machine learning and deep learning models. It includes a definition for an extensible computation graph model along with built-in operators and standard data types.
Download our Mobile App
ONNX is a standard format for both DNN and traditional ML models. The interoperability format of the ONNX provides data scientists with the flexibility to chose their framework and tools to accelerate the process, from the research stage to the production stage. It also allows hardware developers to optimise deep learning-focused hardware based on a standard specification compatible with different frameworks.
Two use cases where ONNX has been successfully adopted include:
- TensorRT: NVIDIA’s platform for high performance deep learning inference. It utilises ONNX to support a wide range of deep learning frameworks.
- Qualcomm Snapdragon NPE: The Qualcomm neural processing engine (NPE) SDK adds support for neural network evaluation to mobile devices. While NPE directly supports only Caffe, Caffe 2 and TensorFlow frameworks, ONNX format helps in indirectly supporting a wider range of frameworks.
Optimising machine learning models for inference is difficult as it requires tuning the model and inference library to make the most of the hardware capabilities. It is a bigger challenge when we are trying to achieve optimal performance across different platforms — cloud, CPU, GPU — as each platform has its capabilities and characteristics. The complexity increases when models from a variety of frameworks are required to be run on different platforms. Optimising different combinations of frameworks and hardware is a time-consuming task. The ONNX standard helps by allowing the model to be trained in the preferred framework and then run it anywhere on the cloud. Models from frameworks, including TensorFlow, PyTorch, Keras, MATLAB, SparkML can be exported and converted to standard ONNX format. Once the model is in the ONNX format, it can run on different platforms and devices.
ONNX Runtime is the inference engine for deploying ONNX models to production. The features include:
- It is written in C++ and has C, Python, C#, and Java APIs to be used in various environments.
- It can be used on both cloud and edge and works equally well on Linux, Windows, and Mac.
- ONNX Runtime supports DNN and traditional machine learning. It can integrate with accelerators on different hardware platforms such as NVIDIA GPUs, Intel processors, and DirectML on Windows.
- ONNX Runtime offers extensive production-grade optimisation, testing, and other improvements
- ONNX format is relatively new. Lack of use cases may raise doubts on its reliability and ease of use.
- For easy usage, two conditions must be mandatorily met — use of only supported data types and operations; no customisation in terms of specific layers/operations is carried out.
- Since its launch, the ONNX project has seen rapid development. On the one hand, the new versions enhance compatibility between frameworks; however, in cases where initial conditions are not met, the developer would need to make custom implementations in the backend, which is a very time-consuming and laborious process.