Today, any typical modern-day smartphone is able to scan faces, documents, QR codes, capture super-resolution photos, recognise gestures, voice and perform multiple other tasks besides answering calls and texts. \n\n\n\nThese handheld devices are the epitome of software and hardware engineering; and to do these tasks, they require state-of-the-art image recognition and NLP models running in the background. Image and language models are at the heart of many machine learning applications today and training these models is a computational nightmare with increasing data. \n\n\n\nGoogle has been using TensorFlow Lite for taking pictures on its flagship model Pixel. For Portrait mode on Pixel 3, Tensorflow Lite GPU inference accelerates the foreground-background segmentation model by over 4x and the new depth estimation model by over 10x vs CPU inference with floating-point precision.\n\n\n\nApple says that it is using machine learning in the iPhone 11\u2019s cameras to help process their images, and that the chip\u2019s speed allows it to shoot 4K video at 60 fps with HDR.\n\n\n\nWhereas, Samsung\u2019s Galaxy S10 series phones and Galaxy Fold use neural processing units (NPUs) to power Scene Optimizer that enhances the ability to recognise photos.\n\n\n\nTherefore, in order to bridge the gap between the realtime magic that ML has to offer and hardware inadequacies, chipmakers and phone manufacturers are coming with customised processors designed to deal with the demands of neural networks.\n\n\n\nHow The Adjustments Were Made To Meet The Demand\n\n\n\n\n\n\n\nEven though Deep learning algorithms have been around since the early 90s, the lack of right kind of hardware created a primary hurdle for many developers at least until 2009.\n\n\n\nIn 2015, Qualcomm kick-started the deep learning on mobiles movement with its efforts to accelerate models using mobile GPUs.\n\n\n\nThe most important milestone in this space occurred in 2017 with the introduction of TensorFlow Lite. This framework offered options optimised for on-device inference. \n\n\n\nThis library also got support for the Android Neural Networks API (NNAPI), allowing for access to the device\u2019s AI hardware acceleration resources directly through the Android OS.\n\n\n\nThis enabled building an ML pipeline without using specialised vendors tools or SDKs. \n\n\n\nhavng said that,The use of floating-point and quantized models for mobile devices has been a topic of discussion amongst the developers and vendors.\n\n\n\nWith floating point inference, the model is in the same format as it was originally trained on the server, however, models working with high-resolution image transformations, require more than 6GB of RAM and enormous computational resources. \n\n\n\nWhereas, the quantized approach allows the model to be first converted from a 16-bit floating-point type to int-8 format, in a way, reducing the size and RAM consumption by a factor of 4 and potentially speeds up by 2-3 times.\n\n\n\nThe disadvantage here is that reducing the bit-width of the network weights (from 16 to 8 bits) leads to accuracy loss. Even though extensive research is being done by the likes of Google and Qualcomm, the quantized inference is still yet to find a solution for large scale deployment.\n\n\n\nWhat Do Experts Have To Say \n\n\n\n\n\n\n\nvia ETH Zurich\n\n\n\nIn order to assess the state of deep learning in the era of smartphones, researchers from ETHZurich, Google, Huawei, Qualcomm and other top companies collaborated to publish a paper. \n\n\n\nThe above picture illustrates the comparison of the performance evolution of mobile AI accelerators. For this comparison, the Mobile devices were running the FP16 model using TensorFlow Lite and NNAPI.\n\n\n\nIn this work, they evaluated the performance and compare the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference.\n\n\n\nThe researchers list their findings as follows:\n\n\n\nWhen compared to the second generation of NPUs, the speed of floating-point and quantized inference has increased by more than 7.5 and 3.5 times, respectively, bringing the AI capabilities of smartphones to a substantially higher level.All flagship SoCs presented during the past 12 months show a performance equivalent to or higher than that of entry-level CUDA-enabled desktop GPUs and high-end CPUs.TensorFlow Lite is still only one major mobile deep learning library, providing reasonably high functionality and ease of deployment of deep learning models on smartphones.\n\n\n\nDeep Learning Is Just A Touch Away\n\n\n\n\n\n\n\nvia Apple\n\n\n\nApple, on the other hand, has been very vocal about their interest in building a next-generation machine learning platform. The enhancement of their hardware services combined with state-of-the-art software options has put Apple at the frontiers of machine learning advancement.\n\n\n\n\u201cThe A13 Bionic is the fastest CPU ever in a smartphone,\u201d Apple said at their recently concluded mega event, adding that it also has \u201cthe fastest GPU in a smartphone,\u201d too.\n\n\n\nThe iPhone 11 is powered by Apple\u2019s new A13 Bionic chip, which Apple touts as its faster processor ever. As for battery life, the iPhone 11 packs a one-hour-longer battery life than the iPhone XS.\n\n\n\nThe A13 also features an Apple-designed 64-bit ARMv8.3-A six-core CPU, with two high-performance cores running at 2.65 GHz called Lightning and four energy-efficient cores called Thunder. The 2 high-performance cores are 20% faster with 30% reduction in power consumption, the 4 high-efficiency cores are 20% faster with a 40% reduction in power consumption.\n\n\n\nWith all SoC vendors and phone makers like Apple and Samsung, determined about AI for mobiles, running many state-of-the-art deep learning models on smartphones in the last few years have radically changed.\n\n\n\nToday devices having Qualcomm and other top systems on a chip (SoCs) come with a dedicated AI hardware designed to run ML workloads on embedded AI accelerators. The latest Android 10 too, has an updated 1.2 version\n\n\n\nAt the TensorFlow\u2019s developer summit, held earlier this year, along with TensorFlow 2.0, the team also announced the open sourcing of TensorFlow Lite for mobile devices and two development boards Sparkfun and Coral which are based on TensorFlow Lite for performing machine learning tasks on handheld devices like smartphones.\n\n\n\nTensorFlow Lite aims at making smartphones, the next best choice to run machine learning models. These proceedings only mean that in the coming two-three years, all mid-range and high-end chipsets will get enough power to run the vast majority of standard deep learning models developed by the research community and industry.\n\n\n\nNot only chipmakers but there is a lot coming from the other end as well. Frameworks like TensorFlow are being developed to suit the demands of the hand held devices. With advancements emerging from both ends, the goal to make smartphones the next hub for deploying ML models is soon going to be a reality.