Listen to this story
Everyone in the data science field is starving for a modelling procedure that can predict and work accurately. However, modelling neural networks has the potential to become highly accurate but there are various problems that developers are required to face for optimal results from the modelling. Overfitting is also a problem that can accrue with neural networks because of the model or the data we are working with. In this article, we are going to discuss overfitting and methods to use to prevent the overfitting of a neural network. The major points to be discussed in the article are listed below.
Table of contents
- About overfitting
- Methods to prevent overfitting of a neural network
- Method 1: Data augmentation
- Method 2: Simplifying neural network
- Method 3: Weight regularization
- Method 4: Dropouts
- Method 5: Early stopping
Let’s start with understanding overfitting.
In many examples of modelling, we can find that the model is representing a higher level of accuracy but while talking about the prediction it is throwing wrong outputs. these are the situations where we can say that the model is overfitted. While modelling data we mainly focus on estimating the distribution and probability under the data.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
This estimation helps us in creating a model that can predict using similar unseen values. Under training, a model can be encountered with a lot of noise in the training data and this can be a reason for being the model false because it has also tried to model that noise.
Overfitting occurs when the model tries to learn every detail and noise of the data and this learning goes to the extent where the model starts giving wrong predictions or we can say the learning impacts the performance of the model on the new data.
The above image is a representation of the overfitting problem in which the red dots are the data points and the green line is a representation of the relationship between data and the blue line is representing the learning of the model that is overfitted.
Generally, we find this problem with the non-linear models and most of the neural networks are nonlinear and show the problem of overfitting. Here the nonlinearity of the models means that they are flexible and can expand according to the data which makes the model overfitted sometimes. In this article, we will look at the steps that we should take to prevent the overfitting of a neural network.
Steps to prevent overfitting of a neural network
In this section, we will take a look at some of the simple but major steps that a basic modelling procedure required to prevent the overfitting of a neural network. We will start from the data side and go to the training side.
Method 1: Data augmentation
In one of our articles, we have discussed that acquiring more data is a way to improve the accuracy of the models. It is simple to understand that more data gives more details about the task the model needs to perform. Here data augmentation can also be considered as the way to enlarge our datasets.
For a simple example while working with the small image dataset we can increase the count of the images by pushing the filliped, rotated, and scaled versions of the images in the data. That will increase the size of the data and using such techniques we can enhance the accuracy of the model while saving it from the overfitting condition.
This step is the general step that can be used with every type of modelling whether it is a neural network or static models like random forest and support vector machine. There are various methods we apply for data augmentation with classification data like SMOTE, and oversampling and using one of our articles, we can find an idea of data augmentation with image data.
Method 2: Simplifying neural network
This may seem like a wrong step toward solving our problem but this is one of the basic and easy steps to prevent overfitting. This step can consist of two methods one is to remove the complex layers and the second is to reduce the neurons of the layers. In general modelling, we can find that using complex models with easy data can increase the problem of overfitting while simple models can perform much better.
Before reducing the complexity of the network we are required to calculate the input and output of the layers. It is always suggested to use simple networks instead of applying complexity to the network. If the network is overfitting then we should try to make it simple.
Method 3: Weight regularization
Weight regularization is a step that helps in preventing overfitting by reducing the complexity of the models. There are various ways of regularization like L1 and L2 regularization. These methods mainly work by penalizing the weights of any function and these smaller weights lead to simpler models. As discussed above the simpler models helps in avoiding overfitting.
As the name suggests this step adds the regularization term along with the loss function so that the weights matrix can get smaller. The addition makes a cost function and can be defined as follows
Cost function = Loss + Regularization term
We can differentiate between the methods of regularization by looking at the regularization term.
Using the L1 regularization we add the following regularization term.
Here we can say that this regularisation tries to minimize the absolute value of the weights.
Using the L2 regularization we add the following regularization term.
Here we can see that this regularization tries to minimize the squared magnitude of weights.
Both of these methods are popular methods and the main difference between them is the L1 method is robust, simple and interpretable while L2 regularization is capable of learning complex data and is not so robust. So the Selection of any of the methods is dependent on the complexity of the data. In one of our articles, we can find more information about regularization methods.
Method 4: Dropouts
This step helps us in preventing overfitting by reducing the number of neurons from the network by the time the network is getting trained. We can also say this is a regularization technique but not working with the cost function working with the neurons.
This method is simple and drops the neurons from the network at the time of training every epoch. We can also think of this process as, making the network simple and different at the time of training because ultimately it is reducing the complexity of the network and willing to prepare a new network. The net effect of applying dropout layers in the network converges to the reduced overfitting of the network. The below image can be considered as the representation of the working of this step.
The above image represents a model with 2 hidden layers whose complexity is reduced by removing some of the neurons. We can apply a dropout in the TensorFlow network using the following lines of code.
rate, noise_shape=None, seed=None, **kwargs
Here we are required to set a rate as a numeric value and this layer will automatically drop neurons at each step during training.
Method 5: Early stopping
As the name suggests this step is a method to stop the training of the network at earlier stages than the final stage. We can compare it with the cross-validation technique because it also uses some of the portions of the training data as validation data so that the performance of the model can be measured against this validation data. As the performance of the model increases to a peak point training can be stopped.
This step also works while we train the model. As the model learns in training we try to measure its performance on the unseen data and it keeps the training running to the point from where the model starts failing on the validation or unseen data. if the performance on this validation set is decreasing or remains the same for certain iterations, then the training is stopped.
The above image is a representation of the learning graph of a network where early stopping is applied. We can see as the errors start increasing the early stopping point is decided and we can stop training the network at this point.
For networks made using TensorFlow, we are required to set callbacks under the fit function. The callback can be defined using the following codes.
Callback = tf.keras.callbacks.EarlyStopping(
After setting the callback we can fit this into the training using the following codes
history = model.fit(np.arange(data, val_data, callbacks=[callback],
In the history object, all the recodes will get saved and we can check the iteration by just checking the length of the history object.
In this article, we have discussed the overfitting problems of the neural networks which is a general problem that can be happened because of noisy data and non-linear models and the steps that can be utilized to prevent our neural networks from overfitting.