Since the explosion of AlexNet paper onto the computer vision scene, machine learning results have improved greatly due to deeper models with high complexity, increased computational power and the availability of large-scale labelled data.
The room to refine neural networks still exists as they sometimes fumble and end up using brute force for lightweight tasks. To address this, the researchers at Google, have come up with a RigL, an algorithm for training sparse neural networks that use a fixed parameter count and computational cost throughout training, without sacrificing accuracy.
So, what is sparse in the context of neural networks?
Each layer of neurons in a network is represented by a matrix. Each entry in the matrix can be thought of as representative of the connection between two neurons. A matrix in which most entries are 0 is called a sparse matrix. When a matrix is large and sparse, storing these entries becomes more efficient and so will be the computations. Neural networks can leverage the efficiency gained from sparsity by assuming most connection weights are equal to 0.
In a typical neural network, every neuron on a given layer is connected to every neuron on the subsequent layer. This means that each layer must have n^2 connections, where n is the size of both of the layers. With the increasing size of networks, the number of representations increases, which in turn increases the size of the model. And, if you want this model to run on your smartphone to do some image processing on your photos in real-time, it might not work smoothly.
So what can be done? Researchers suggest making the network sparse; eliminate the redundancies of the sub-tasks. That’s where the RigL algorithm comes into the picture.
Working Of RigL
The RigL algorithm identifies which neurons should be active during training, which helps the optimisation process to utilise the most relevant connections and results in better sparse solutions. Trained with RigL, the sparse network learns to focus on the center of the images, discarding the uninformative pixels from the edges.
The RigL method starts with a network initialised with a random sparse topology. At regularly spaced intervals we remove a fraction of the connections with the smallest weight magnitudes. RigL then activates new connections using instantaneous gradient information, i.e., without using past gradient information. After updating the connectivity, training continues with the updated network until the next scheduled update. Next, the system activates connections with large gradients, since these connections are expected to decrease the loss most quickly.
As shown above, RigL begins with random sparse initialisation of the network. It then trains the network and trims out those connections with weak activations. Based on the gradients calculated for the new configuration, it grows new connections and trains again, repeating the cycle.
By changing the connectivity of the neurons dynamically during training, RigL helps optimise to find better solutions.
Combining sparse primitives to enable training of extremely large sparse models remains to be explored with RigL due to the lack of hardware and software support for sparsity. The authors believe that the performance of sparse networks will continue to improve on current hardware and new types of hardware accelerators, which are expected to have better support for parameter sparsity.
Google’s New Algorithm ‘RigL’ Can
- Improve the accuracy of sparse models intended for deployment.
- Improve the accuracy of large sparse models that can only be trained for a limited number of iterations.
- Combine with sparse primitives to enable training of extremely large sparse models which otherwise would not be possible.
Read the original paper here.