MITB Banner

Tech Behind Google’s New CNN, EfficientNetV2

EfficientNetV2 can train up to 11x faster than prior models, while being up to 6.8x smaller in parameter size.

Share

Recently, Google introduced a family of convolutional networks known as EfficientNetV2. According to its developers, the EfficientNetV2 model significantly outperformed previous models on ImageNet and CIFAR/Cars/Flowers datasets.

There are many existing techniques to improve training efficiency. For example, ResNet-RS improves training efficiency by optimising the scaling hyperparameters; Vision Transformers improves training efficiency on large-scale datasets using Transformer blocks. However, these techniques often come with expensive overhead depending on parameter size. This is the reason why Google released this new family of convolutional networks. 

EfficientNetV2 vs EfficientNet

EfficientNetV2 is the successor of EfficientNets. Introduced in 2019, EfficientNet is a family of models optimised for FLOPs and parameter efficiency. It leverages neural architecture search to look for the baseline EfficientNet-B0 model with a better trade-off on accuracy and FLOPs.

EfficientNetV2 overcomes some of the training bottlenecks in EfficientNet, such as:

  • Training with enormous image sizes is slow: The large image size of EfficientNet results in significant memory usage. As the total memory on GPU and TPU is fixed, the researchers had to train the EfficientNet models with a smaller batch size that slows down the training.
  • Depthwise convolutions are slow in early layers: Another training bottleneck of EfficientNet comes from the extensive depthwise convolutions. Depthwise convolutions have fewer parameters and FLOPs than regular convolutions, but they often cannot fully utilise modern accelerators.
  • Equally scaling up every stage is sub-optimal: EfficientNet equally scales up all stages using a simple compound scaling rule. However, these stages not equally contribute to the training speed and parameter efficiency.

Based on these observations, the researchers designed a search space enriched with additional ops such as Fused-MBConv, and apply training-aware NAS and scaling to jointly optimise model accuracy, training speed, and parameter size. Also, EfficientNets aggressively scale up image size, leading to large memory consumption and slow training. The researchers slightly modified the scaling rule and restricted the maximum image size to a smaller value to address this issue.

Tech behind EfficientNetV2

The size of the deep learning models and training data are increasingly getting larger. In such a case, training efficiency plays an important role. For instance, GPT-3 model with an unprecedented model and training data sizes demonstrates few-shot learning. However, it requires weeks of training with thousands of GPUs, making it difficult to retrain or improve the model.

The researchers used a combination of training-aware neural architecture search (NAS) and scaling to optimise the training speed and parameter efficiency to develop this model. 

Contributions  

  • The researchers have introduced EfficientNetV2, a new family of smaller and faster models. EfficientNetV2 model outperformed previous models in training speed and parameter efficiency.
  • The researchers have proposed an improved method of progressive learning, which adaptively adjusts regularisation and image size. The researchers also showed that it speeds up training and simultaneously improves accuracy.
  • The researchers have demonstrated the new model achieved 11x faster training speed and up to 6.8x better parameter efficiency on ImageNet, CIFAR, Cars, and Flowers dataset. 

Wrapping up

EfficientNets use NAS to construct a baseline network and use “compound scaling” to increase the capacity of the network without adding more parameters. The training can be accelerated by progressively increasing the image size during training, but it leads to a drop in accuracy. To make up for this accuracy drop, the researchers proposed an improved method of progressive learning, which adaptively adjusts regularization along with image size. By pretraining on the same ImageNet21k, our EfficientNetV2 achieves 87.3% top-1 accuracy on ImageNet ILSVRC2012, outperforming the recent ViT by 2.0% accuracy while training 5x-11x faster using the same computing resources

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.