Artificial Neural Network (ANN) is probably the first stop for anyone who enters into the field of Deep Learning. Inspired by the structure of Natural Neural Network present in our body, ANN mimics a similar structure and learning mechanism.\n\n\n\nANN is just an algorithm to build an efficient predictive model. Because the algorithm and so its implementation resembles a typical neural network, it is named so. The functionality of ANN can be explained in below 5 simple steps:\n\n\n\nRead the input dataProduce the predictive model (A mathematical function)Measure the error in the predictive modelInform and implement necessary corrections to the model repeatedly until a model with least error is foundUse this model for predicting the unknown\n\n\n\nA beginner in data science, after going through the concepts of Regression, Classification, Feature Engineering etc. and enters into the field of deep learning, it would be very beneficial if one can relate the functionality of algorithms in deep learning with above concepts.\n\n\n\nBefore understanding ANN, let us understand a perceptron, which is a basic building block of ANN. Perceptron is the name initially given to a binary classifier. However, we can view the perceptron as a function which takes certain inputs and produces a linear equation which is nothing but a straight line. This can be used to separate certain easily separable data as shown in the figure. However, remember that in real-world scenarios, classes will not be so easily separable.\n\n\n\n\n\n\n\nThe structure of a perceptron can be visualised as below:\n\n\n\n\n\n\n\nA typical neural network with multiple perceptrons in it looks like below: \n\n\n\n\n\n\n\nThis means generating multiple linear equations at multiple points. These perceptrons can also be called as neurons or nodes which are actually the basic building blocks in natural neural network within our body. In the above figure, the first vertical set of 3 neurons is the input layer. The next two vertical sets of neurons are part of the middle layer which are usually referred to as hidden layers, and the last single neuron is the output layer. The neural network in the above figure is a 3-layered network. This is because the input layer is generally not counted as part of network layers. Each neuron in the input layer represents an attribute (column) in the input data (i.e., x1, x2, x3 etc.). What is happening in the above network is that input data is fed to set of neurons, and each produces an output. Again, each of these outputs are fed to other neurons which in turn produces another output, which is again fed to the output layer. Error calculated at this output layer is again sent back in the network to further refine the outputs of each neuron which are again fed to the neuron in output layer to produce a refined output than before. As explained in the 5-step process above, this process is repeated until we get an output with minimal error.\n\n\n\nThe process of producing outputs, calculating errors, feeding them back again to produce a better output is generally a confusing process, especially for a beginner to visualise and understand. Hence, an effort is made here to explain this process with just one neuron and one layer. Once this basic concept is understood, expanding this to a larger neural network is not difficult. \n\n\n\nEveryone agrees that simple linear regression is the simplest thing in machine learning or atleast the first thing that anyone learns in machine learning. So, we will try to understand this concept of deep learning also with a simple linear regression, by solving a regression problem using ANN.\n\n\n\nImplementing ANN for Linear Regression\n\n\n\nWe have understood from the above that each of the neuron in the ANN except the input layer produces an output. The output is based on what function that we use. This function is generally referred as \u2018Activation Function\u2019. As ANN is mainly used for classification purposes, generally sigmoid function or other similar classification algorithms are used as activation functions. But, as we are now trying to solve a linear regression problem, our activation function here is nothing but a \u2018Simple Linear Equation\u2019 of the form -\n\n\n\ny=w0 + w1x1 + w2x2 + w3x3 + \u2026. wnxn\n\n\n\nwhere x1, x2, x3.. xn are the independent attributes in the input data,\n\n\n\nw1, w3\u2026 wn are the weights (Co-efficients) to corresponding attributes, and\n\n\n\nw0 is the bias\n\n\n\nBecause our output should just be a single linear line, we should configure our ANN with just 1 neuron. As the output of this 1 neuron itself is the linear line, this neuron will be placed in the output layer. Hidden layers are required when we try to classify objects with using multiple lines (or curves). So, we don\u2019t need any hidden layers as well here. \n\n\n\nHence the ANN to solve a linear regression problem consists of an input layer with all the input attributes and an output layer with just 1 neuron as shown below:\n\n\n\n\n\n\n\nNow, we have finalised the structure of our ANN. Our next task is to actually write code to implement it. We will be implementing this simple ANN from scratch as that will help to understand lot of underlying concepts in already available ANN libraries.\n\n\n\nRecall the 5 steps that are mentioned at the beginning. As mentioned there, the process involves feeding input to a neuron in the next layer to produce an output using an activation function. This process is called as \u2018Feed Forward\u2019. After producing the output, error (or loss) is calculated and a correction is sent back in the network. This process is called as \u2018Back Propagation\u2019. We will also use some standard terminologies for our ANN network such as \u2018Network\u2019, \u2018Topology\u2019 etc. which we will see in the code. With various terms and terminologies that we have learnt so far, let us implement the code \u2013\n\n\n\n1. Import the required libraries\n\n\n\n\n\n\n\n2. Initialise the weights and other variables\n\n\n\n In our approach, we will be providing input to the code as a list such as [2,3,1]. Here, the total no. of values present in the list (list size) indicate the number of layers that we want to configure, and each number in the list indicate the no. of neurons inside each layer. So, the list [2,3,1] indicates our network should consists of 3 layers in which first layer consists of 2 neurons, second layer consists of 3 neurons and output layer consists of 1 neuron. This structure can be called as \u2018network topology\u2019. However, as we are solving regression problem, we just need 1 neuron at the output layer as discussed above. So, we just need to pass the input list as .\n\n\n\nIn our approach to build a Linear Regression Neural Network, we will be using Stochastic Gradient Descent (SGD) as an algorithm because this is the algorithm used mostly even for classification problems with a deep neural network (means multiple layers and multiple neurons). I will assume the reader is already aware of this algorithm and proceed with its implementation.\n\n\n\nWe will initialise all the weights to zeros. Let us create a class called \u2018Network\u2019 and initialise all required variable in the constructor as below \u2013\n\n\n\n\n\n\n\n\u2018self.output\u2019 variable in the above code is to hold the outputs of each neuron. It will be initialised accordingly with a sufficient sized list based on our input. Remaining variables are pretty self-explanatory.\n\n\n\n3. Coding \u2018fit\u2019 function\n\n\n\nWe know that the gradient descent algorithm requires \u2018learning rate\u2019 (eta) and no. of iterations (epoch) as inputs. We will be passing all these values in a list to the program along with the training data. Let us build a \u2018fit\u2019 method to construct a predictive model with all the inputs given \u2013\n\n\n\n\n\n\n\n4. Produce the Output and Correct the Error\n\n\n\nI have mentioned above what \u2018feed forward\u2019 and \u2018back propagation\u2019 are. Let us implement those methods \u2013\n\n\n\n\n\n\n\nAbove function is just forming a simple linear equation of y = mx + c kind and nothing more.\n\n\n\nIn SGD algorithm, we continuously update the initialised weights in the negative direction of the slope to reach the minimal point. \n\n\n\nError function E(w) = \u2211[(w0 + w1x1 \u2013 y1)2 +(w0 + w1x2 \u2013 y2)2+\u2026.. +(w0 + w1xn \u2013 yn)2]\n\n\n\nHere, I have not taken \u00bd as scaling factor to the equation. One may take if desired so. Also, in SGD only one row is passed to the above error function every time to calculate the error. Hence, if we differentiate the above equation w.r.t. each of the weights w0,w1, w2 .. etc., we get equations like \n\n\n\n\u2202E0=1,\n\n\n\n\u2202E1= 2*(w0 + w1x1 \u2013 y1)*x1\n\n\n\nAfter calculating the slope w.r.t. each of the weights, we will be updating the weights with new values in the negative direction of the slope as below \u2013\n\n\n\n\n\n\n\nLet us implement all this logic in the back propagate function as below:\n\n\n\n\n\n\n\nIn order to visualise the error at each step, let us quickly write functions to calculate Mean Squared Error (for full dataset) and Squared Error (for each row) which will be called for each step in an epoch.\n\n\n\n\n\n\n\n\n\n\n\nHaving the model built in the above way, let us define a method which takes some input and predicts the output \u2013\n\n\n\n\n\n\n\nThat\u2019s it. We have built a simple neural network which builds a model for linear regression and also predicts values for unknowns.\n\n\n\n5. Executing the program\n\n\n\nIn order to pass inputs and test the results, we need to write few lines of code as below \u2013\n\n\n\n\n\n\n\nIn above code, a sample dataset of 10 rows is passed as input. Full code can be accessed and executed at Google Colab :\n\n\n\nhttps:\/\/colab.research.google.com\/drive\/1f84s4nlKSas5LGpR8zdRxWOsKL5HIoyy\n\n\n\nSample outputs for given inputs are as below: \n\n\n\n\n\n\n\nThe plot below shows how the error is getting reduced in each step as weights get continuously updated and again fed into the system.\n\n\n\n\n\n\n\nSo, we have understood how in few lines of code we can build a simple neural network. The same code can be extended to handle multiple layers with various activation functions so that it just works like a full-fledged ANN. I will implement that in my next article.