What Are Activation Functions And When To Use Them

These action potentials can be thought of as activation functions in the case of neural networks. The path that needs to be fired depends on the activation functions in the preceding layers just like any physical movement depends on the action potential at the neuron level.

Deep neural networks are trained, by updating and adjusting neurons weights and biases, utilising the supervised learning back-propagation algorithm in conjunction with optimization technique such as stochastic gradient descent.

Each artificial neuron receives one or more input signals x 1, x 2,…, x m and outputs a value y to neurons of the next layer. The output y is a nonlinear weighted sum of input signals. A Neural Network without Activation function would simply be a Linear regression Model. Non-linearity is achieved by passing the linear sum through non-linear functions known as activation functions.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

The Activation Functions can be basically divided into 2 types-

1. Linear Activation Function
2. Non-linear Activation Functions

ReLU, Sigmoid, Tanh are 3 the popular activation functions(non-linear) used in deep learning architectures.

How Good Are Sigmoid And Tanh

The problems with using Sigmoid is their vanishing and exploding gradients. When neuron activations saturate closer to either 0 or 1,  the value of the gradients at this point come close to zero and when these values are to be multiplied during backpropagation say for example, in a recurrent neural network, they give no output or zero signal. Added to this problem, is that the sigmoid output is not zero-centred. That means if the value of the function is positive, it makes gradients of the weights all positive or all negative, making the gradients reaching for extremities in either direction, that is, exploding gradients. So, sigmoids are usually preferred to run on the last layers of the network.

To avoid the problems faced with a sigmoid function, a hyperbolic tangent function(Tanh) is used.

Tanh function gives out results between -1 and 1 instead of 0 and 1, making it zero centred and improves ease of optimisation. But, the vanishing gradient problem persists even in the case of Tanh.

Why ReLU

Rectified Linear Unit or ReLU is now one of the most widely used activation functions. The function operates on max(0,x), which means that anything less than zero will be returned as 0 and linear with the slope of 1 when the values is greater than 0. And, ReLU boasts of having convergence rates 6 times to that of Tanh function when it was applied for ImageNet classification.

The learning rate with ReLU is faster and it avoids the vanishing gradient problem. But, ReLU is used for the hidden layers. Whereas, a softmax function is used for the output layer during classification problems and a linear function during regression.

The drawback with ReLU function is their fragility, that is, when a large gradient is made to flow through ReLU neuron, it can render the neuron useless and make it unable to fire on any other datapoint again for the rest of the process. In order to address this problem, leaky ReLU was introduced.

So, unlike in ReLU when anything less than zero is returned as zero, leaky version instead has a small negative slope. One more variant to this can be the Maxout of function which is a generalisation of both ReLU and its leaky colleague.

Based on the popularity in usage and their efficacy in functioning at the hidden layers, ReLU makes for the best choice in most of the cases.

More Great AIM Stories

Meet the Recession-Proof IT Giant

I have a master's degree in Robotics and I write about machine learning advancements.

AIM Upcoming Events

Regular Passes expire on 3rd Mar

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 17th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, Virtual
Deep Learning DevCon 2023
27 May, 2023

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

How this robotics startup uses AI to power its tethered drones

The startup uses ultra-lightweight, compact power conversion technology to power the drones from the ground from any AC source.

Council Post: Improving data literacy among Indian citizens to safeguard their privacy

To ensure the information is safe and stays private, many of the platforms have released security features like end-to-end encryption.

After grade school level math, OpenAI now tackles high school Math Olympiad problems

OpenAI said that it had achieved a new state-of-the-art (41.2 per cent vs 29.3 per cent) on the miniF2F benchmark.

Neural Nets transforming the world of search engines

Neural Search is changing the world of search engines by using deep learning to search more than just text.

Top movies to look forward to in 2022 if you are into AI

So, here is the list of movies in 2022 to curb your enthusiasm.

My dream is to become a 4X GM one day: Kaggle Grandmaster Tanul Singh

Kaggle is not just a competitive data science platform, it’s a big community of world’s best data scientists.

Top statistics books for data scientists

The Signal and the Noise draws on his learnings and guides data scientists on distinguishing ‘true signals’ from noisy data.

Top ten challenges in object detection every data scientist should know

Object detection forms the foundation of many other downstream computer vision tasks, such as image segmentation, image captions, object tracking, and more.

Diffusion Models Vs GANs: Which one to choose for Image Synthesis

Both of them have found wide usage in the field of image, video and voice generation, leading to a debate on what produces better results—diffusion models or GANs.

Alphabet’s 2021 year in review: What propelled the 40% growth

This was CapitalG’s sixth IPO in 2021, after UiPath, Duolingo, Robinhood and Oscar.