# Google Introduces A New Algorithm For Training Sparse Neural Networks

Since the explosion of AlexNet paper onto the computer vision scene, machine learning results have improved greatly due to deeper models with high complexity, increased computational power and the availability of large-scale labelled data.

The room to refine neural networks still exists as they sometimes fumble and end up using brute force for lightweight tasks. To address this, the researchers at Google, have come up with a RigL, an algorithm for training sparse neural networks that use a fixed parameter count and computational cost throughout training, without sacrificing accuracy.

So, what is sparse in the context of neural networks?

Each layer of neurons in a network is represented by a matrix. Each entry in the matrix can be thought of as representative of the connection between two neurons. A matrix in which most entries are 0 is called a sparse matrix. When a matrix is large and sparse, storing these entries becomes more efficient and so will be the computations. Neural networks can leverage the efficiency gained from sparsity by assuming most connection weights are equal to 0.

In a typical neural network, every neuron on a given layer is connected to every neuron on the subsequent layer. This means that each layer must have n^2 connections, where n is the size of both of the layers. With the increasing size of networks, the number of representations increases, which in turn increases the size of the model. And, if you want this model to run on your smartphone to do some image processing on your photos in real-time, it might not work smoothly.

So what can be done? Researchers suggest making the network sparse; eliminate the redundancies of the sub-tasks. That’s where the RigL algorithm comes into the picture.

### Working Of RigL

The RigL algorithm identifies which neurons should be active during training, which helps the optimisation process to utilise the most relevant connections and results in better sparse solutions. Trained with RigL, the sparse network learns to focus on the center of the images, discarding the uninformative pixels from the edges.

The RigL method starts with a network initialised with a random sparse topology. At regularly spaced intervals we remove a fraction of the connections with the smallest weight magnitudes. RigL then activates new connections using instantaneous gradient information, i.e., without using past gradient information. After updating the connectivity, training continues with the updated network until the next scheduled update. Next, the system activates connections with large gradients, since these connections are expected to decrease the loss most quickly.

As shown above, RigL begins with random sparse initialisation of the network. It then trains the network and trims out those connections with weak activations. Based on the gradients calculated for the new configuration, it grows new connections and trains again, repeating the cycle.

By changing the connectivity of the neurons dynamically during training, RigL helps optimise to find better solutions.

Combining sparse primitives to enable training of extremely large sparse models remains to be explored with RigL due to the lack of hardware and software support for sparsity. The authors believe that the performance of sparse networks will continue to improve on current hardware and new types of hardware accelerators, which are expected to have better support for parameter sparsity.

• Improve the accuracy of sparse models intended for deployment.
• Improve the accuracy of large sparse models that can only be trained for a limited number of iterations.
• Combine with sparse primitives to enable training of extremely large sparse models which otherwise would not be possible.

## Our Upcoming Events

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### AI in Programming is to Collaborate, Not Eliminate

While the potential of AI is unquestionable, a deeper look into its current capabilities suggests that a complete or even a partial AI takeover in programming is unlikely

### Apple Should be Scared of Windows Copilot

Copilot will start its early rollout as part of the free Windows 11 update, beginning on September 26

### Top 5 Libraries in C/C++ for ML in 2023

There are tons of libraries in C/C++ for ML, such as TensorFlow, Caffe, and mlpack

### Tesla Optimus Finally Learns Yoga, Performs Vrikshasana

Jim Fan, senior AI scientist at NVIDIA, has come forward with insights on how exactly Optimus functions with such brilliance

### NVIDIA’s Dominance Set to Surge Further

NVIDIA’s Meteoric Rise in 2023: On Track to Surpass \$50 Billion Revenue, Achieves \$1 Trillion Market Cap, and Forges Global Partnerships for AI Dominance.

### 6 Brilliant JavaScript Frameworks for Every Developer

Although Python and R are more famous for machine learning, Java can serve this purpose effectively, especially if you’re already familiar with it