Artificial neural network is a self-learning model which learns from its mistakes and give out the right answer at the end of the computation. In this article we will be explaining about how to to build a neural network with basic mathematical computations using Python for XOR gate.\n\nXOR Gate:\n\nHere's a representation of an XOR gate; with the inputs represented by A and B, and the output with a Y:These are the libraries required to build a neural network in Python, which includes graphical representation libraries used to plot the sigmoid curve later.\n\n\n\nActivation Function And Its Derivation:\n\nThe activation function we will be implementing here is the sigmoid activation function. This function will scale the values between 0 and 1 using an exponent 'e'. It is also called as a Logistic Function.\n\nHere is a representation of a sigmoid function:Let's consider some examples using the above equation with the weights and see how it works:\n\nAs it scales the values between 0 and 1, it helps the neural network to convert the data from linear to non-linear data. We need the derivative of the sigmoid function to calculate the error gradient required to correct it during the back propagation. We will be explaining about it during this setup:From the above derivation we can infer that the derivative of a sigmoid function is the sigmoid function itself with the mathematical equation.\n\nLet's write a function to implement the same in Python with its derivative:\n\n\n\nGraphical representation of a sigmoid curve and its derivative:\n\nSetting Up The Neural Network:\n\nLet's start with assigning the inputs to an object 'X' which is nothing but the inputs of a XOR gate. And we'll assign the output to the object 'y'.\n\nNow we need to design our neural network, since we have two values for our input 'X' and one value for output 'y'. The network will have two input nodes and one output node, with a hidden layer of size 3 (you can consider any size you want here, make sure that the network is fully connected). Epochs are the number of iterations which have to\u00a0 back propagate with the help of learning rate to get the actual value. Since it is a simple computational model, we are opting for a high number of epochs.\n\nTo start off, we need to initialise some weights. This is to let the neural network know where to start and how far it is from that actual output. We'll be using some random numbers to start off with. These numbers are in the range of 0 and 1, and also follow uniform distribution. The input size, hidden size and the output size is used to obtain the shape of the matrix.\n\n\n\nThe matrix containing weights are in the shape of the sizes initialised above. The output is as follows:\n\n\n\nOnce the weights are initialised we can work on building the front propagation and the back propagation of our network.\n\nFront Propagation:\n\nBelow are the front propagation equations from the above diagram.\n\nDuring front propagation, the input X is multiplied (dot product) with the W1 to initialise the beginning of the process, and passed through the activation function '\u03c3' sigmoid to get the output of the hidden layer. This output is then multiplied by\u00a0W2 and the function '\u03c3' sigmoid to obtain the output. The python code is as follows:\n\nAfter we compute this, we get an output which looks like:\n\nBut the actual output is 'y'\n\nSo, the yhat differs from y\u00a0(actual output) by some decimal values.\u00a0This is called as an error 'E'.\n\nNow let us look at it by how much our output differs. All we need to find is:\n\nE = yhat - y\n\nLearning rate in simple words:\n\nLet's take an example, say the\u00a0y = 5 and our model gives a output\u00a0yhat = 3. So we get an error of E\u00a0=\u00a02. The learning rate is the rate at which the output yhat\u00a0will increase towards the actual value y. Consider the learning rate to be 0.1...\n\nRate = E * 0.1\n\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 = 2 x 0.1\n\n\u00a0 \u00a0 = 0.2\n\n...So in the first iteration our output will increase to 3.2, then in the next iteration we get 3.4, and so on, till it reaches the value 5.\n\nBack Propagation:\n\nThis is the process in which the error is corrected in every iteration (epochs). We need to find the gradient at which the error needs to be corrected. Let us find the derivative of the output with respect to the weights. The derivative tell us by how much the we need to change the weights through each epoch in order to reach the actual output.\n\nLet us find the derivative of Error (E ) with respect to W2 from the front propagation equations.\n\n\n\nNow let us compute the Error (E ) with respect to W1.\n\nOnce we compute this, we can work on building the back propagation equations in Python.\n\n\n\nThe front and back propagation equations are built. Let us see the error change through the iterations. If the error decreases, then our neural network is working towards finding the right answer.\n\nNow let us test the model with an input and see how accurate it is.\n\nFinally we have our result, which is very close to 1. This is how we teach a neural network to understand the mistake and make small changes in every iteration to reach the actual value. Here we have built a very simple network which has less computations in front and back propagation, which is why it is faster. As we build a more complex network like Recurrent Neural Network \u00a0and Convolutional Neural Network the computations increase and the time taken to train the model is very long. When it comes to Artificial Intelligence, the longer we train the model the better are the results.