6 Techniques From Leading AI Scientists To Optimise Deep Neural Networks

Inspired by human brains, Artificial Neural Networks (ANN) are now being utilised by enterprises across the globe to solve a number of complex computing tasks like speech recognition, computer vision, stock market prediction, among others. 

In this article, we list down 6 techniques which can be used to optimise deep neural networks.    

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

1| Stochastic Gradient Descent

Backpropagation or Backward propagation of errors can be said as one of the most common and popular techniques of training neural networks. It searches for the minimum value of the error function in weight space using a technique known as gradient descent. The basic idea behind stochastic approximation can be traced back to the research paper Stochastic Approximation Method by Herbert Robbins and Sutton Monro which was published in 1951.

Stochastic Gradient Descent (SGD) is a type of Gradient Descent and is one of the most popular iterative methods for optimising an objective function with suitable smoothness properties in a deep neural network. SGD replaces the actual gradient which is calculated from the dataset by an estimated one which is calculated from randomly selected data. This technique uses a single sample to perform each iteration.  

2| Limited memory BFGS (L-BFGS) & Conjugate gradient (CG)

Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) are the Batch methods which help in simplifying and speeding up the process of pretraining deep algorithms. This method was developed by researchers Quoc V. Le, Jiquan Ngiam, Adam Coates, Abhik Lahiri, Bobby Prochnow, and Andrew Y. Ng at Stanford University in order to mitigate the issues of tuning and parallelising in the Stochastic Gradient Descent technique.  

L-BFGS is highly competitive or sometimes superior to SGDs/CG for low dimensional problems where the number of parameters is relatively small, for instance, convolutional models. On the other hand, for high dimensional problems, CG is more competitive and usually outperforms L-BFGS and Stochastic Gradient Descents.

3| Mini-Batch Gradient Descent

Mini-batch gradient descent is a type of the gradient descent algorithm which works by splitting the training dataset into small batches. There are several features of this technique such as this method reduces the variance of the parameter updates, which can lead to more stable convergence, it can make use of highly optimized matrix optimizations common to state-of-the-art deep learning libraries that make computing the gradient with respect to a mini-batch is very efficient. Further, Mini-batch gradient descent is typically the algorithm of choice when training a neural network and the term SGD usually is employed also when mini-batches are used.

4| Weight Initialization

Weight Initialization is a method of optimising the deep neural networks by preventing layer activation outputs from vanishing during the process of a forward pass through a deep neural network. There are two types of weight initialisation, they are zero weight initialisation and random weight initialisation. Zero weight initialisation was proposed by Sarfaraz Masood and Pravin Chandra in their paper training neural network with zero weight initialization. 

In zero weight initialisation, since all the weights are the same, the activations in all hidden units are also the same, thus it makes the gradient with respect to each weight be the same. Whereas, in random weight initialisation, random values are assigned to weights very close to zero which serves the process of symmetry-breaking and gives better accuracy than zero weight initialisation.

5| Synthetic Gradients

Researchers at Google’s Deepmind developed the optimisation technique, Synthetic Gradients which has been claimed to improve communication between multiple neural networks. The method uses activations at every network layer and present extra space to use that information for updation. It provides quick and accurate results in complex neural network computing.

6| Gradient Descent with Momentum

Gradient Descent with Momentum is basically used to increase the speed of deep neural networks. It is achieved by accelerating gradient descent that accumulates a velocity vector in directions of persistent reduction in the objective across iterations. However, if someone is implying a sparse input dataset then the performance of this method will be a poor one.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM