OpenVINO stands for Open Visual Inference and Neural Network Optimisation and is a popular computer vision toolkit provided by Intel. The main reason for its increasing popularity is its usage of the state-of-the-art optimisation techniques for reducing the inference time of computer vision models. It can accelerate a model across devices like CPUs, GPUs, FPGAs, VPUs etc. The OpenVINO toolkit contains a ton of pre-trained models spanning across domains such as face detection, person detection, pose estimation, instance segmentation etc.
In this article, I will demonstrate a working example of a face detection model from the OpenVINO toolkit. But before diving into the coding part, let’s understand the basics of OpenVINO.
The OpenVINO toolkit contains two major components; Model Optimiser and Inference Engine.
Model Optimiser
Source: Intel OpenVINO
Model Optimiser is used to convert a deep learning model into an intermediate representation (IR). OpenVINO supports various frameworks like TensorFlow, Caffe, ONNX, MXNet etc. So in addition to all the pre-trained models that are available with the toolkit have also been converted into our deep learning model written in any of these frameworks into an intermediate representation. The inference engine of OpenVINO only understands this IR format. It can’t work with native framework files.
Now let’s dive deep into the working of the Model Optimiser.
It handles the software level optimisations of the deep learning model. Few operations that it performs are:
Quantisation
It is related to the number of bits required to represent the weights and biases of our model. We usually train our models using the FP32 (floating point 32) format, it is ideal for training the model, but we don’t need that level of precision for inference. So, we can reduce the precision of our models without any substantial accuracy loss.
The quantisation is carried out with the help of Calibrate Layer. Initially, we set a certain threshold value for accuracy drop, and the Calibrate layer takes a subset of data and converts the FP32 layer into FP16 or INT8. If the accuracy drop is less than the specified threshold value, then the conversion is carried out.
Fusion
Fusion relates to combining multiple-layer operations into a single operation. For example, a batch normalisation layer, an activation layer, and a convolutional layer could be combined into a single operation. This can be particularly useful for GPU inference, where the separate operations may occur on separate GPU kernels. At the same time, a fused operation occurs on one kernel, thereby incurring less overhead in switching from one kernel to the next.
Inference Engine
Source: Intel OpenVINO
It is a library written in C++. It provides APIs to read the intermediate representation of the model. It handles the hardware level optimisation. It provides different plugins for different devices.
Another important component of the Inference Engine is ‘Extensions’.
Extensions are used for extending the compatibility of layers i.e. if all the layers of a model are converted into an IR format, then also it is not necessary that all the layers will be compatible with the device. So suppose we want to run a piece of optimised code on a CPU but one of the layers is not supported on CPU by default, then CPU extensions can help us by extending the compatibility.
There are various types of extensions for different devices and different operating systems.
In this example, we will use libcpu_extension_sse4.so
In Linux, we have two types of extensions AVX and SSE4.
- AVX works with Intel Core series processors (ex – Core i5, Core i7 etc.), AVX systems can also use SSE4 extensions.
- SSE4 works with Intel Atom series processors.
Face Detection Program in Python
Source: Intel OpenVINO
We will start off by downloading the face detection model. We can either use the model optimiser to download the model or we can simply search for a model and download it directly.
Method 1: Using the model optimiser
- Go to the model_optimiser directory, usually located at –
/opt/intel/openvino/deployment_tools/tools/model_downloader
and check for the downloader.py file. - Run that file by passing in the name of the model as a command-line argument.
sudo python downloader.py –name face-detection-adas-0001 |
Use the python downloader.py –print_all command to check all the available models.
Method 2: Using the direct link
The link to download the face detection is –
You can check out all the available pre-trained models here –
https://software.intel.com/openvino-toolkit/documentation/pretrained-models
The complete code for the face detection program can be found here –
https://github.com/Dhairya10/face-detection-open-vino
The two most important classes that we will interact with are –
- IECore, which is the Python wrapper to work with the Inference Engine
- IENetwork, which is what will initially hold the network and get loaded into IECore
Inference Engine Python API Documentation –
https://docs.openvinotoolkit.org/latest/_inference_engine_ie_bridges_python_docs_api_overview.html
Using Neural Computer Stick 2 (Movidius)
Now, we will see how to run this code using Neural Compute Stick 2.
We will have to change the device name to ‘MYRIAD’.
(Change line 15 in main.py file. Set open_vino_device = ’MYRIAD’)
In order to set up the NCS. Follow the steps below.
Getting Started with NCS2
https://software.intel.com/en-us/articles/get-started-with-neural-compute-stick?elq_cid=3708810&erpm_id=6960228
Configuring NCS2
Conclusion
Computer vision applications are ubiquitous, and by using the OpenVINO toolkit, developers can create a more robust and scalable application. From self-driving cars to using face unlock feature on our mobile, computer vision applications are deeply embedded in our day to day lives and these applications can be optimised even further by using the OpenVINO toolkit. So any developer working in the Computer Vision space should give it a shot.