Recently, in an official announcement, Google launched an OpenCL-based mobile GPU inference engine for Android. The tech giant claims that the inference engine offers up to ~2x speedup over the OpenGL backend on neural networks which include enough workload for the GPU. This GPU inference engine is currently made available in the latest version of TensorFlow Lite (TFLite) library.
Understanding OpenGL & OpenCL
Open Graphics Library or OpenGL is an API designed for rendering vector graphics through which a client application can control this system. It is a popular software interface that allows a programmer to communicate with graphics hardware.
OpenGL is basically used for graphics programming, and it allows you to write programs in order to perform graphics operations. The OpenGL rendering system is carefully specified to make hardware implementations allowable.
Open Computing Language or OpenCL is an open, royalty-free standard for cross-platform, parallel programming of diverse accelerators that are found in supercomputers, cloud servers, personal computers, mobile devices and embedded platforms.
This open standard helps in improving the speed as well as the responsiveness of a broad spectrum of applications in several market categories, such as professional creative tools, vision processing, neural network training, inferencing and more.
Why Use OpenCL?
According to the TensorFlow Lite GPU team, besides improving the existing OpenGL-based mobile GPU inference engine, they also continuously keep investigating and experimenting with other technologies. This is where the OpenCL-based mobile GPU inference engine comes into light.
The team has been using OpenGL compute shaders for using the GPU for general-purpose tasks. Compute shader is a shader stage that can perform rendering and the space that a compute shader operates on is mostly abstract, i.e. it depends on each of the compute shaders to decide what space means.
According to the team, compute shaders were added with OpenGL ES 3.1 version, but its backward compatible API design decisions were limiting them from reaching the full potential of the GPU. This is where OpenCL comes into play. OpenCL is designed for computation with various accelerators from the beginning and is thus more relevant to the domain of mobile GPU inference.
They stated that they have experimented and looked into an OpenCL-based inference engine. It brought a lot of intuitive features that let the developers optimise the mobile GPU inference engine.
Improvements Due to OpenCL
There are many intuitive features in OpenCL that have marked improvement over the OpenGL backend. They are:
1| Performance Profiling:
According to the TFLite team, optimising the OpenCL backend was much easier than OpenGL. This is because OpenCL offers excellent profiling features and with these profiling APIs, the developers are now able to measure the performance of each kernel dispatch very precisely.
2| Optimised Workgroup Sizes:
The TFLite developers observed that the performance of TFLite GPU on Qualcomm Adreno GPUs is very sensitive to workgroup sizes. Also, picking the right workgroup size can boost the performance, whereby picking the wrong one can degrade the performance by an equal amount.
With the help of the performance profiling features in the OpenCL, the developers are now able to implement an optimiser for workgroup sizes. This resulted in up to 50% speedup over the average implementation.
3| Native 16-bit Precision Floating Point (FP16):
The OpenCL backend maintains FP16 natively. It requires the accelerator to define the availability of the data type. According to the developers, using OpenCL, even some of the older GPUs, for instance, Adreno 305 can perform at their full capabilities. While in the case of OpenGL, it mostly relies on hints which the vendors can also choose to ignore in their implementations and thus heading to no performance guarantees.
4| Constant Memory:
According to the team, OpenCL has a theory of constant memory. The leading chipmaker, Qualcomm added a physical memory, which has the properties to make it ideal to be used with OpenCL’s constant memory. Also, OpenCL on Adreno can significantly outperform OpenGL’s performance by having synergy with this physical constant memory and the aforementioned native FP16 support.
The TFLite developers evaluated the performance of the inference engine on selected Android devices with OpenCL on a couple of popular neural networks — MNASNet 1.3 and SSD MobileNet v3 (large). As an outcome, the new OpenCL backend showed roughly twice as fast as the OpenGL backend and does particularly better on Adreno devices when annotated with SD.
One major issue that the TFLite developers faced while employing the OpenCL inference engine is that OpenCL is not a part of the standard Android distribution. While major Android vendors include OpenCL as part of their system library, it is possible that OpenCL may not available for some users.