Today, any typical modern-day smartphone is able to scan faces, documents, QR codes, capture super-resolution photos, recognise gestures, voice and perform multiple other tasks besides answering calls and texts.
These handheld devices are the epitome of software and hardware engineering; and to do these tasks, they require state-of-the-art image recognition and NLP models running in the background. Image and language models are at the heart of many machine learning applications today and training these models is a computational nightmare with increasing data.
Google has been using TensorFlow Lite for taking pictures on its flagship model Pixel. For Portrait mode on Pixel 3, Tensorflow Lite GPU inference accelerates the foreground-background segmentation model by over 4x and the new depth estimation model by over 10x vs CPU inference with floating-point precision.
Apple says that it is using machine learning in the iPhone 11’s cameras to help process their images, and that the chip’s speed allows it to shoot 4K video at 60 fps with HDR.
Whereas, Samsung’s Galaxy S10 series phones and Galaxy Fold use neural processing units (NPUs) to power Scene Optimizer that enhances the ability to recognise photos.
Therefore, in order to bridge the gap between the realtime magic that ML has to offer and hardware inadequacies, chipmakers and phone manufacturers are coming with customised processors designed to deal with the demands of neural networks.
How The Adjustments Were Made To Meet The Demand
Even though Deep learning algorithms have been around since the early 90s, the lack of right kind of hardware created a primary hurdle for many developers at least until 2009.
In 2015, Qualcomm kick-started the deep learning on mobiles movement with its efforts to accelerate models using mobile GPUs.
The most important milestone in this space occurred in 2017 with the introduction of TensorFlow Lite. This framework offered options optimised for on-device inference.
This library also got support for the Android Neural Networks API (NNAPI), allowing for access to the device’s AI hardware acceleration resources directly through the Android OS.
This enabled building an ML pipeline without using specialised vendors tools or SDKs.
havng said that,The use of floating-point and quantized models for mobile devices has been a topic of discussion amongst the developers and vendors.
With floating point inference, the model is in the same format as it was originally trained on the server, however, models working with high-resolution image transformations, require more than 6GB of RAM and enormous computational resources.
Whereas, the quantized approach allows the model to be first converted from a 16-bit floating-point type to int-8 format, in a way, reducing the size and RAM consumption by a factor of 4 and potentially speeds up by 2-3 times.
The disadvantage here is that reducing the bit-width of the network weights (from 16 to 8 bits) leads to accuracy loss. Even though extensive research is being done by the likes of Google and Qualcomm, the quantized inference is still yet to find a solution for large scale deployment.
What Do Experts Have To Say
via ETH Zurich
In order to assess the state of deep learning in the era of smartphones, researchers from ETHZurich, Google, Huawei, Qualcomm and other top companies collaborated to publish a paper.
The above picture illustrates the comparison of the performance evolution of mobile AI accelerators. For this comparison, the Mobile devices were running the FP16 model using TensorFlow Lite and NNAPI.
In this work, they evaluated the performance and compare the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference.
The researchers list their findings as follows:
- When compared to the second generation of NPUs, the speed of floating-point and quantized inference has increased by more than 7.5 and 3.5 times, respectively, bringing the AI capabilities of smartphones to a substantially higher level.
- All flagship SoCs presented during the past 12 months show a performance equivalent to or higher than that of entry-level CUDA-enabled desktop GPUs and high-end CPUs.
- TensorFlow Lite is still only one major mobile deep learning library, providing reasonably high functionality and ease of deployment of deep learning models on smartphones.
Deep Learning Is Just A Touch Away
Apple, on the other hand, has been very vocal about their interest in building a next-generation machine learning platform. The enhancement of their hardware services combined with state-of-the-art software options has put Apple at the frontiers of machine learning advancement.
“The A13 Bionic is the fastest CPU ever in a smartphone,” Apple said at their recently concluded mega event, adding that it also has “the fastest GPU in a smartphone,” too.
The iPhone 11 is powered by Apple’s new A13 Bionic chip, which Apple touts as its faster processor ever. As for battery life, the iPhone 11 packs a one-hour-longer battery life than the iPhone XS.
The A13 also features an Apple-designed 64-bit ARMv8.3-A six-core CPU, with two high-performance cores running at 2.65 GHz called Lightning and four energy-efficient cores called Thunder. The 2 high-performance cores are 20% faster with 30% reduction in power consumption, the 4 high-efficiency cores are 20% faster with a 40% reduction in power consumption.
With all SoC vendors and phone makers like Apple and Samsung, determined about AI for mobiles, running many state-of-the-art deep learning models on smartphones in the last few years have radically changed.
Today devices having Qualcomm and other top systems on a chip (SoCs) come with a dedicated AI hardware designed to run ML workloads on embedded AI accelerators. The latest Android 10 too, has an updated 1.2 version
At the TensorFlow’s developer summit, held earlier this year, along with TensorFlow 2.0, the team also announced the open sourcing of TensorFlow Lite for mobile devices and two development boards Sparkfun and Coral which are based on TensorFlow Lite for performing machine learning tasks on handheld devices like smartphones.
TensorFlow Lite aims at making smartphones, the next best choice to run machine learning models. These proceedings only mean that in the coming two-three years, all mid-range and high-end chipsets will get enough power to run the vast majority of standard deep learning models developed by the research community and industry.
Not only chipmakers but there is a lot coming from the other end as well. Frameworks like TensorFlow are being developed to suit the demands of the hand held devices. With advancements emerging from both ends, the goal to make smartphones the next hub for deploying ML models is soon going to be a reality.