A neural network is modelled after the human brain that consists of neurons. To obtain the output, a neural network accepts an input and weights summed with bias before arriving at the output. An activation function is one of the most important factors in a neural network, which is applied to the input before deriving an output. This activation function decides whether or not a neuron will be activated and transferred to the next layer. It is also referred to as threshold or transformation for the neurons as it decides if the neuron’s input is relevant in the prediction process or not.
The main job of an activation function is to introduce non-linearity in a neural network. One way to look at this is that without a non-linear activation function, a neural network will behave just like a single-layer perceptron; it does not matter how many layers it has.
Table of contents
Activation Function & Non-Linearity
At the most basic level, a neural network consists of three main layers:
THE BELAMY
Sign up for your weekly dose of what's up in emerging technology.
Input layer: This layer accepts input from the outside world to the network. No computation is performed here, and the only job is to pass the received information to the hidden layer.
Hidden layer: This layer accepts information from the input layer and performs all computations. This layer is hidden from the outside world and transfers the result to the next layer.
Download our Mobile App
Output layer: It accepts the result from the hidden layer and relays it to the outside world.
The activation function is present in the hidden layer. The activation layer cannot be linear because irrespective of how complex the architecture is, a linear activation function is effective only one layer deep. Also, the real world and associated problems are highly non-linear. The only situation where using linearity may prove beneficial is in case of regression problems (think predicting housing prices).
A linear activation function lacks in performing backpropagation. Thus, it is not recommended to be used in a neural network. While a model may perform a task even without the presence of an activation function in a linear manner, it would lack efficiency and accuracy. The significance of the activation function lies in making a given model learn and execute difficult tasks. Further, a non-linear activation function allows the stacking of multiple layers of neurons to create a deep neural network, which is required to learn complex data sets with high accuracy.
- Overview of Recurrent Neural Networks And Their Applications
- Towards Better Data Engineering: Mostly People, But Also Process and Technology
- Rahul Singh
- Types Of Activation Functions In Neural Networks And Rationale Behind It
- What Are Activation Functions And When To Use Them
Non-Linear Activation Functions
Examples of non-linear activation functions include:
Sigmoid function: The Sigmoid function exists between 0 and 1 or -1 and 1. The use of a sigmoid function is to convert a real value to a probability. In machine learning, the sigmoid function is generally used to refer to the logistic function, also called the logistic sigmoid function; it is also the most widely used sigmoid function (others are the hyperbolic tangent and the arctangent).
A sigmoid function is placed as the last layer of the model to convert the model’s output into a probability score, which is easier to work with and interpret.
Another reason to use it mostly in the output layer is that it can otherwise cause a neural network to get stuck in training time.
TanH function: It is the hyperbolic tangent function whose range lies between -1 and 1, hence also called the zero-centred function. Because it is zero centred, it is much easier to model inputs with strongly negative, positive or neutral values. TanH function is used instead of sigmoid function if the output is other than 0 and 1. TanH functions usually find applications in RNN for natural language processing and speech recognition tasks.
On the downside, in the case of both Sigmoid and TanH, if the weighted sum input is very large or very small, the function’s gradient becomes very small and closer to zero.
ReLU function: Rectified Linear Unit, also called ReLU, is a widely favoured activation function for deep learning applications. Compared to Sigmoid and TanH activation functions, ReLU offers an upper hand in terms of performance and generalisation. In terms of computation too, ReLU is faster as it does not compute exponentials and divisions. The disadvantage is that ReLU overfits more, as compared with Sigmoid.
Softmax function: It is used to build a multi-class classifier to solve the problem of assigning an instance to one class when the number of possible classes is larger than two. Softmax ensures that the sum of outputs is 1. The softmax function squeezes the outputs for each class between 0 and 1 and divides it by the sum of outputs.