# How Do Activation Functions Introduce Non-Linearity In Neural Networks?

The main job of an activation function is to introduce non-linearity in a neural network.

A neural network is modelled after the human brain that consists of neurons. To obtain the output, a neural network accepts an input and weights summed with bias before arriving at the output. An activation function is one of the most important factors in a neural network, which is applied to the input before deriving an output. This activation function decides whether or not a neuron will be activated and transferred to the next layer. It is also referred to as threshold or transformation for the neurons as it decides if the neuron’s input is relevant in the prediction process or not.

The main job of an activation function is to introduce non-linearity in a neural network. One way to look at this is that without a non-linear activation function, a neural network will behave just like a single-layer perceptron; it does not matter how many layers it has.

### Activation Function & Non-Linearity

At the most basic level, a neural network consists of three main layers:

#### THE BELAMY

Input layer: This layer accepts input from the outside world to the network. No computation is performed here, and the only job is to pass the received information to the hidden layer.

Hidden layer: This layer accepts information from the input layer and performs all computations. This layer is hidden from the outside world and transfers the result to the next layer.

Output layer: It accepts the result from the hidden layer and relays it to the outside world.

The activation function is present in the hidden layer. The activation layer cannot be linear because irrespective of how complex the architecture is, a linear activation function is effective only one layer deep. Also, the real world and associated problems are highly non-linear. The only situation where using linearity may prove beneficial is in case of regression problems (think predicting housing prices).

A linear activation function lacks in performing backpropagation. Thus, it is not recommended to be used in a neural network. While a model may perform a task even without the presence of an activation function in a linear manner, it would lack efficiency and accuracy. The significance of the activation function lies in making a given model learn and execute difficult tasks. Further, a non-linear activation function allows the stacking of multiple layers of neurons to create a deep neural network, which is required to learn complex data sets with high accuracy.

### Non-Linear Activation Functions

Examples of non-linear activation functions include:

Sigmoid function: The Sigmoid function exists between 0 and 1 or -1 and 1. The use of a sigmoid function is to convert a real value to a probability. In machine learning, the sigmoid function is generally used to refer to the logistic function, also called the logistic sigmoid function; it is also the most widely used sigmoid function (others are the hyperbolic tangent and the arctangent).

A sigmoid function is placed as the last layer of the model to convert the model’s output into a probability score, which is easier to work with and interpret.

Another reason to use it mostly in the output layer is that it can otherwise cause a neural network to get stuck in training time.

TanH function: It is the hyperbolic tangent function whose range lies between -1 and 1, hence also called the zero-centred function. Because it is zero centred, it is much easier to model inputs with strongly negative, positive or neutral values. TanH function is used instead of sigmoid function if the output is other than 0 and 1. TanH functions usually find applications in RNN for natural language processing and speech recognition tasks.

On the downside, in the case of both Sigmoid and TanH, if the weighted sum input is very large or very small, the function’s gradient becomes very small and closer to zero.

ReLU function: Rectified Linear Unit, also called ReLU, is a widely favoured activation function for deep learning applications. Compared to Sigmoid and TanH activation functions, ReLU offers an upper hand in terms of performance and generalisation. In terms of computation too, ReLU is faster as it does not compute exponentials and divisions. The disadvantage is that ReLU overfits more, as compared with Sigmoid.

Softmax function: It is used to build a multi-class classifier to solve the problem of assigning an instance to one class when the number of possible classes is larger than two. Softmax ensures that the sum of outputs is 1. The softmax function squeezes the outputs for each class between 0 and 1 and divides it by the sum of outputs.

## More Great AIM Stories

### Google, the Tech-Savvy People Person

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

## AIM Upcoming Events

Regular Passes expire on 3rd Mar

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 17th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, Virtual
Deep Learning DevCon 2023
27 May, 2023

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### A beginner’s guide to image processing using NumPy

Since images can also be considered as made up of arrays, we can use NumPy for performing different image processing tasks as well from scratch. In this article, we will learn about the image processing tasks that can be performed only using NumPy.

### Mural painter Sneha Chakraborty on her journey as an NFT artist

The exhibition was offline, a physical experience that you can experience through your phone.

Google has “deprioritised” the Stadia game streaming platform and wants to offer its Stadia technology to select partners in a new service called “Google Stream”.

### Business analyst vs data analyst explained

A business analyst should be trained in modelling; meanwhile, data analysts need to have excellent data mining skills.

In September 2019, the retail giant integrated Amazon Media Group, Amazon Marketing Services and Amazon Advertising Platform under ‘Amazon Advertising.’

### What drove Sony to buy Halo studio Bungie

Unlike Microsoft, Sony is a multimedia giant with its claws on most of the entertainment industry and Bungie plans to shift to PvP games.

### Odisha students’ moonshot rover wins third prize in NASA’s HERC challenge

Each year, NASA (HERC) throws down an engineering design challenge to engage students across the globe to push space exploration.

### Not just techies, but scientific community loves GPUs

GPU computing was born when the scientific community wanted to use its raw processing power for intensive computations.

### Top AI-Powered hearables that must be on your wishlist in 2022!

Artificial Intelligent Noise Cancellation technology continuously analyses ambient sound components to select the most effective noise cancelling filter.

### Behind Uber’s RADAR fraud detection system

The humans in the loop feature of RADAR allow the system to be humanised, traceable and accurate.