Last updated October 21, 2022
In AI Origins & Evolution

The Possibility of Training an AI Model in Smartphones and Sensors

A new technique has enabled AI models to learn from data derived from devices. However, like many other models, it comes with its own set of challenges

Share

Published on October 24, 2022

by Bhuvana Kamath

Listen to this story

One of the earliest substantial works in artificial intelligence (AI) was conducted during the mid-20th century by British computer scientist Alan Mathison Turing. In 1935, Turing came up with an abstract computing machine – a limitless memory scanner that moves back and forth – symbolically. This stored-program concept is known as the Turing machine, with the possibility of the machine modifying and improving its own program. In essence, all modern computers today are universal Turing machines.

A series of such historical developments have nurtured AI models to create machines that can replicate human intelligence in real-time. However, a true AI that functions like a human has not been achieved – at least not yet. This does not exclude us from adopting AI algorithms to achieve our predefined goals. AI models are suitable for solving complex problems, providing higher efficiency and accuracy compared to simpler methods. But a new technique has enabled AI models to learn from data derived from devices themselves. However, like many other models, it comes with its own challenges.

Implementing AI in microcontrollers

Simply put, an AI model is defined as a tool or algorithm based on a certain data sets to arrive at a decision – all without the need for interference of humans in the decision-making process. But a machine-learning model trained on an intelligent edge device allows the model to adapt to new data and make better predictions. For example, a trained model on a smart keyboard could let the device gradually learn from the user’s writing. However, the training process requires large amounts of memory – using powerful computers at a data center – even before the model is deployed on a device. This becomes expensive, along with raising issues in privacy.

On the other hand, microcontrollers are miniature computers – running simple commands – the basis for billions of connected devices, from sensors in automobiles to internet-of-things (IoT) devices. However, cheap and low-power microcontrollers have extremely limited memory or no operating system, making it challenging to train AI models on “edge devices” that independently work from central computing resources.

To address the issue, MIT researchers and the MIT-IBM Watson AI Lab developed a new method that enables on-device training, which uses less than a quarter of a megabyte memory. This means that it can be used to train an ML model on a microcontroller in a matter of minutes.

In the paper named On-Device Training Under 256KB Memory, the team developed intelligent algorithms and framework to reduce the size of computation required to train a model. This makes the process faster and more memory-efficient. Generally, training solutions designed in connected devices use more than 500 megabytes of memory, exceeding the 256-kilobyte capacity of most of the microcontrollers.

Senior author and member of the MIT-IBM Watson AI Lab Song Han described the innovation, saying, “Our study enables IoT devices to not only perform inference but also continuously update the AI models to newly collected data, paving the way for lifelong on-device learning. The low resource utilization makes deep learning more accessible and can have a broader reach, especially for low-power edge devices.”

Comparison with neural networks

One of the most common types of machine-learning models are neural networks that are loosely based on the human brain. These models have layers of interconnected nodes or neurons that use information to carry out tasks, like identifying people in pictures.

In order for the model to learn the task, it must first be trained, which involves showing it millions of similar examples. It may undergo hundreds of updates as it learns, with the intermediate activations stored in each round. Activation in a neural network is the middle layer’s intermediate results. “This is because there may be millions of weights and activations, training a model requires much more memory than running a pre-trained model,” Han said.

Source: Tiny Training Engine (TTE), MIT

Han’s team employed two algorithmic solutions to make the training process less memory-intensive and more efficient. “Updating the whole model is very expensive because there are a lot of activations, so people tend to update only the last layer, but as you can imagine, this hurts the accuracy. For our method, we selectively update those important weights and make sure the accuracy is fully preserved,” he added.

The second solution involves simplifying the weights and quantized training – which are typically 32 bits. Through the process of quantisation – an algorithm rounds the weights to eight bits – which cuts the amount of memory for both inference and training. It is later applied with the technique of quantization-aware scaling (QAS) to avoid any drop in accuracy that may arise from training.

Moreover, a system called a ‘tiny training engine’ runs these algorithmic innovations on a simple microcontroller lacking an operating system. This system changes the training process’ order so that more work is done in the compilation phase before the model is deployed on the edge device.

A successful speedup

In order to test the framework, a computer vision model was trained by the researchers to detect people in images. After 10 minutes of training, it mastered the task successfully. The researchers claim that the method was able to train a model 20 times faster than any other approach. The optimisation only required 157 kilobytes of memory to train an ML model on a microcontroller. Other techniques, designed for lightweight training, would at least need memory between 300 and 600 megabytes.

After demonstrating the success of the technique for computer vision models, the researchers wish to apply the technique to language models such as time-series data. In addition, they hope to reduce the size of larger models – without compromising accuracy – by applying what they have learnt.

Thus, the new technique enables AI models to continually learn from data on intelligent edge devices like sensors and smartphones, minimizing energy costs and privacy risks. Furthermore, the model could also help in reducing the carbon footprint caused due to the training of large-scale ML models. As the model continually learns from data on the device, the new benchmark is sure to bring advancements in the near future.

Access all our open Survey & Awards Nomination forms in one place