How Important Are The Early Phases of Neural Network Training

In yet another effort to shed the image of black box models for machine learning, a group of researchers probed something fundamental this time – what happens at the initial stages of training and how effective is pre-training?

Historically, most of the work prioritized around what happens during the later stages of training, while the initial phase has been explored less.

To explore the initial phases of training, the researchers from Facebook AI and MIT CSAIL collaborated and provided a unified framework to understand the same. To do this, they employed the methodology of iterative magnitude pruning with rewinding.


Sign up for your weekly dose of what's up in emerging technology.

The authors considered previous results where it has already been proven that iterating and rewinding the weights to their values early in the training of the unpruned model, rather than their initial values, led to better performance on deeper networks such as ResNets. 

In other words, this suggests that the changes in the network during this initial phase are vital for the success of the training of small networks. And if these claims are solid, then it has many implications. From cutting down the model size to training time, from making more ML-friendly edge devices to explaining the predictions, there are plenty of use cases.

This approach provides a simple scheme for measuring the importance of the weights at an early stage of training within an actionable and causal framework.

Overview Of The Approach

For experiments, the authors considered ResNet-20 and tracked the changes during the earliest phase of training by specifically focusing on the first 4,000 iterations (10 epochs). 

The procedure involves pruning 20% of weights and rewinding the remaining weights to their values from an earlier iteration during the pre-pruning training run. This process is then iterated.

During the first 4,000 iterations of training, the authors observed three sub-phases:

  1. The first phase, which lasts only the initial few iterations, where the gradient magnitudes are enormous, and the network changes rapidly. 
  2. The performance quickly improves in the next 500 iterations and weight magnitudes increase. Whereas sign differences from initialization quickly increase, and gradient magnitudes reach a minimum before settling.
  3. Finally, all these quantities continue to change in the same direction, but begin to decelerate.

It is unclear, however, the extent to which various aspects of the data distribution are necessary; Notably, whether the change in weights during the early phase of training dependent on p(x) or p(y|x).

To investigate the change in weights, the authors, by pre-training the network with techniques that ignore labels entirely (self-supervised), provided random labels or blurred training examples. These experiments were done on the CIFAR-10 dataset.

The results show that pre-training on random labels provides no improvement above rewinding and that pre-training for longer begins to hurt accuracy. Whereas, blurring the examples makes the IMP approach underperform regardless of the pre-training provided.

Key Takeaways

The contributions of this work can be summarised as follows:

  • To provide in-depth summarizing learning over the early part of training with an overview of various statistics
  • Deeper networks are not robust to reinitialization with random weights
  • The distribution of weights after the early phase of training is already highly non-i.i.d(independent and identically distributed)
  • To measure how dependent early phases of training is on data

The authors conclude that weights are highly non-independent at the rewinding point. They claim that the weights at this point cannot be easily approximated and any kind of approach that aims at skipping directly to the rewinding point is unlikely to succeed. However, rewinding may not be necessary if networks are pre-trained appropriately.

Know more here.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM