Why Google’s MobileNetV2 Is A Revolutionary Next Gen On-Device Computer Vision Network

Recently, a group of researchers from Google released a neural network architecture MobileNetV2, which is optimised for mobile devices. The architecture delivers high accuracy results while keeping the parameters and mathematical operations as low as possible to bring deep neural networks to mobile devices.

Last year, the company introduced MobileNetV1 for Tensorflow, designed to support classification, detection, embedding and segmentation. “The ability to run deep networks on personal mobile devices improves user experience, offering anytime, anywhere access, with additional benefits for security, privacy, and energy consumption. As new applications emerge allowing users to interact with the real world in real time, so does the need for ever more efficient neural networks,” Google researchers Mark Sandler and Andrew Howard said in their research blog post.


Sign up for your weekly dose of what's up in emerging technology.

The new mobile architecture, MobileNetV2 is the improved version of MobileNetV1 and is released as a part of TensorFlow-Slim Image Classification Library. Developers can even access it in Colaboratory or can download the notebook and explore it using Jupyter. It is also available as modules on TensorFlow-Hub. The pretrained checkpoints can be found on the open source platform GitHub.

What Is MobileNetV2?

MobileNets are small, low-latency, low-power models parameterised to meet the resource constraints of a variety of use cases. According to the research paper, MobileNetV2 improves the state-of-the-art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. It is a very effective feature extractor for object detection and segmentation. For instance, for detection, when paired with Single Shot Detector Lite, MobileNetV2 is about 35 percent faster with the same accuracy than MobileNetV1.

Download our Mobile App

It builds upon the ideas from MobileNetV1, using depth-wise separable convolutions as efficient building blocks. However, Google says that the 2nd version of MobileNet has two new features:

  • Linear bottlenecks between the layers: Experimental evidence suggests that using linear layers is crucial as it prevents nonlinearities from destroying too much information. Using non-linear layers in bottlenecks indeed hurts the performance by several percent, further validating our hypothesis
  • Shortcut connections between the bottlenecks

The Basic Structure of MobileNetV2

The bottlenecks of the MobileNetV2 encode the intermediate inputs and outputs while the inner layer encapsulates the model’s ability to transform from lower-level concepts such as pixels to higher level descriptors such as image categories. With traditional residual connections, shortcuts enable faster training and better accuracy.

Model Architecture

The basic building block is a bottleneck depth-separable convolution with residuals. The architecture of MobileNetV2 contains the initial fully convolution layer with 32 filters, followed by 19 residual bottleneck layers. The researchers have tailored the architecture to different performance points, by using the input image resolution and width multiplier as tunable hyperparameters, that can be adjusted depending on desired accuracy or performance trade-offs. The primary network  (width multiplier 1, 224 × 224), has a computational cost of 300 million multiply-adds and uses 3.4 million parameters. The network computational cost ranges from 7 multiply-adds to 585M MAdds, while the model size varies between 1.7M and 6.9M parameters.

How Is It Different From MobileNetV1?

The MobileNetV2 models are much faster in comparison to MobileNetV1. It uses 2 times fewer operations, has higher accuracy, needs 30 percent fewer parameters and is about 30-40 percent faster on a Google pixel phone.

To enable on-device semantic segmentation, the researcher used MobileNetV2 as a feature extractor in a reduced form of DeepLabv3 that controls the resolution of computed feature maps. On the semantic segmentation benchmark, PASCAL VOC 2012, MobileNetV2 performed similar to MobileNetV1 as feature extractor, but the V2 version requires 5.3 times fewer parameters and 5.2 times fewer operations in terms of multiply-adds.

On A Concluding Note

The new version of MobileNet has several properties that make it suitable for mobile applications and allows very memory-efficient inference and utilises standard operations present in all neural frameworks. For the ImageNet dataset, MobileNetV2 improves the state of the art for a wide range of performance points. For object detection task, it outperforms real-time detectors on COCO datasets. MobileNetV2 provides a very efficient mobile-oriented model that can be used as a base for many visual recognition tasks, claims Google.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Smita Sinha
I have over three-years of experience in editing, reporting. My career in journalism began with The Economic Times. When I am not busy, I read, I binge-watch web series.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox