# Types Of Activation Functions In Neural Networks And Rationale Behind It

Activation functions are important for a Artificial Neural Network to learn and understand the complex patterns. The main function of it is to introduce non-linear properties into the network. What it does is, it calculates the ‘weighted sum’ and adds direction and decides whether to ‘fire’ a particular neuron or not. I’ll be explaining about several kinds of non-linear activation functions, like Sigmoid, Tanh, ReLU and leaky ReLU. The non linear activation function will help the model to understand  the complexity and give accurate results.

In this article, I have described some commonly used activation function. There are various activation functions and research is still going on to identify the optimum function for a specific model. Once you know the logic of the model, you can decide which activation to use.

### Sigmoid Function:

The sigmoid function is an activation function where it scales the values between 0 and 1 by applying a threshold. Below is a sigmoid curve,

#### THE BELAMY

The above equation represents a sigmoid function. When we apply the weighted sum in the place of X, the values are scaled in between 0 and 1. The beauty of an exponent is that the value never reaches zero nor exceed 1 in the above equation. The large negative numbers are scaled towards 0 and large positive numbers are scaled towards 1.

In the above example, as x goes to minus infinity, y goes to 0 (tends not to fire).
As x goes to infinity, y goes to 1 (tends to fire):
At x=0, y=1/2.
The threshold is set to 0.5. If the value is above 0.5 it is scaled towards 1 and if it is below 0.5 it is scaled towards 0.

We can also change the sign to implement the opposite of the threshold by the above example. With a large positive input we get a large negative output which tends to not fire and with a large negative input we get a large positive output which tends to fire.

The beauty of sigmoid function is that the derivative of the function. Once this is computed, it is easy to apply gradient descent during back propagation. It makes it smooth to gradually descent towards to minima once this is scaled while we apply the gradient descent. Here is a visual representation,

### Hyperbolic Tangent:

The Tanh function is an activation function which re scales the values between -1 and 1 by applying a threshold just like a sigmoid function. The advantage i.e the values of a tanh is zero centered which helps the next neuron during propagating.
Below is a tanh functionWhen we apply the weighted sum of the inputs in the tanh(x), it re scales the values between -1 and 1. . The large negative numbers are scaled towards -1 and large positive numbers are scaled towards 1. In the above example, as x goes to minus infinity, tanh(x) goes to -1 (tends not to fire).
As x goes to infinity, tanh(x) goes to 1 (tends to fire):
At x=0, tanh(x)=0.
The thresold is set to 0. If the value is above 0 it is scaled towards 1 and if it is below 0 it is scaled towards -1.

This is implemented in the computation, just like the sigmoid it will smooth the curve where gradient descent will converge towards the minima based on the learning rate. Here is a visual of how it works,

### ReLU(Rectified Linear Unit) :

This is one of the most widely used activation function. The benefits of ReLU is the sparsity, it allows only values which are positive and negative values are not passed which will speed up the process and it will negate or bring down possibility of occurrence of a dead neuron.

f(x) = (0,max)

This function will allow only the maximum values to pass during the front propagation as shown in the graph below. The draw backs of ReLU is when the gradient hits zero for the negative values, it does not converge towards the minima which will result in a dead neuron while back propagation.

### Leaky ReLU:

This can be overcome by Leaky ReLU , which allows a small negative value during the back propagation if we have a dead ReLU problem. This will eventually activate the neuron and bring it down.

f(x)=1(x<0)(αx)+1(x>=0)(x) where α is a small constant

Some people have got results with this activation function but they are not always consistent. This activation function also has drawbacks, during the front propagation if the learning rate is set very high it will overshoot killing the neuron. This will happen when the learning rate is not set at an optimum level like in the below graph,

Low and optimal learning rate leading to a gradual descent towards the minima.

### Which is the right Activation Function?

We have seen many activation functions, we need some domain knowledge to know which is the right activation function for our model. Choosing the right activation function depends on the problem that we are facing, there is no activation function which yields perfect results in all the models.

⦁ Sigmoid functions and their combinations usually work better for classification            techniques ex. Binary Classification 0s and 1s.

⦁ Tanh functions are not advised or implemented because of the dead neuron problem.

⦁ ReLU is a widely used activation function and yields beter results compared to Sigmoid and Tanh.

⦁ Leaky ReLU is a solution for a dead neuron problem during the ReLU function in the hidden layers.

There are other activation functions like softmax, selu, linear, identity, soft-plus, hard sigmoid etc which can be implemented based your model.

Image Sources: Hacknoon, Towards data science and Aharley – Ryerson University

## More Great AIM Stories

### Meet the Recession-Proof IT Giant

Kishan Maladkar holds a degree in Electronics and Communication Engineering, exploring the field of Machine Learning and Artificial Intelligence. A Data Science Enthusiast who loves to read about the computational engineering and contribute towards the technology shaping our world. He is a Data Scientist by day and Gamer by night.

## AIM Upcoming Events

Regular Passes expire on 3rd Mar

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 17th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, Virtual
Deep Learning DevCon 2023
27 May, 2023

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### How this robotics startup uses AI to power its tethered drones

The startup uses ultra-lightweight, compact power conversion technology to power the drones from the ground from any AC source.

### Council Post: Improving data literacy among Indian citizens to safeguard their privacy

To ensure the information is safe and stays private, many of the platforms have released security features like end-to-end encryption.

### After grade school level math, OpenAI now tackles high school Math Olympiad problems

OpenAI said that it had achieved a new state-of-the-art (41.2 per cent vs 29.3 per cent) on the miniF2F benchmark.

### Neural Nets transforming the world of search engines

Neural Search is changing the world of search engines by using deep learning to search more than just text.

### Top movies to look forward to in 2022 if you are into AI

So, here is the list of movies in 2022 to curb your enthusiasm.

### My dream is to become a 4X GM one day: Kaggle Grandmaster Tanul Singh

Kaggle is not just a competitive data science platform, it’s a big community of world’s best data scientists.

### Top statistics books for data scientists

The Signal and the Noise draws on his learnings and guides data scientists on distinguishing ‘true signals’ from noisy data.

### Top ten challenges in object detection every data scientist should know

Object detection forms the foundation of many other downstream computer vision tasks, such as image segmentation, image captions, object tracking, and more.

### Diffusion Models Vs GANs: Which one to choose for Image Synthesis

Both of them have found wide usage in the field of image, video and voice generation, leading to a debate on what produces better results—diffusion models or GANs.

### Alphabet’s 2021 year in review: What propelled the 40% growth

This was CapitalG’s sixth IPO in 2021, after UiPath, Duolingo, Robinhood and Oscar.