What is a neural network? As in the structure of a human brain, neurons are interconnected to help make decisions; neural networks are inspired by the neurons, which helps a machine make different decisions or predictions. Neural networks are the web of interconnected nodes where each node has the responsibility of simple calculations. A combination of calculation helps in bringing desired results. In today’s machine learning and deep learning scenario, neural networks are among the most important fields of study growing in readiness. There can be many types of neural networks. Some important neural networks are:
This article assumes that the reader has good knowledge about the ANN, CNN and RNN. Further, in the article, our main motive is to get to know about BI-LSTM (bidirectional long short term memory). So we suggest going for ANN and CNN articles to get the basic idea of other things and keys we normally use in the neural networks field.
So basically, the long short term memory layer we use in a recurrent neural network. So let’s just have some basic idea or recurrent neural network so we won’t find any difficulty in understanding the motive of the article.
RNN(recurrent neural network)
RNN(recurrent neural network) is a type of neural network that we use to develop speech recognition and natural language processing models. Recurrent neural networks remember the sequence of the data and use data patterns to give the prediction.
RNN uses feedback loops which makes it different from other neural networks. Those loops help RNN to process the sequence of the data. This loop allows the data to be shared to different nodes and predictions according to the gathered information. This process can be called memory.
RNN and the loops create the networks that allow RNN to share information, and also, the loop structure allows the neural network to take the sequence of input data. RNN converts an independent variable to a dependent variable for its next layer.
Like the above picture, we can visualise an RNN where the input we give to an RNN takes it and processes it in the loop, and whenever a new difficult input comes, it gathers the information from the loop and gives the prediction.
For example, in the sentence “we are going to ………” we need to predict the word in the blank space. Of course, nobody can predict anything about the word, but as the next sentence model will know (in school we enjoyed a lot), it will predict that the “school can fill up the blank space”.
In the above image, we can see in a block diagram how a recurrent neural network works. For example, sequencing data keeps the information revolving in the loops and gains the knowledge of the data or information.
In the last few years, recurrent neural networks hugely used to resolve the machine learning problems such as speech recognition, language modeling, image classification. To make any RNN one of the essential parts of the network in LSTM( long short term memory). LSTM makes RNN different from a regular RNN model.
Long short term memory networks, usually called LSTM – are a special kind of RNN. They were introduced to avoid the long-term dependency problem. In regular RNN, the problem frequently occurs when connecting previous information to new information. If RNN could do this, they’d be very useful. This problem is called long-term dependency.
The repeating module in a standard RNN contains a single layer. Image source
To remember the information for long periods in the default behaviour of the LSTM. LSTM networks have a similar structure to the RNN, but the memory module or repeating module has a different LSTM. The block diagram of the repeating module will look like the image below.
The repeating module in an LSTM contains four interacting layers. Image source
As in the above diagram, each line carries the entire vector from the output of a node to the input of the next node. The neural network layer is already learned, and the pointwise operations are mathematical operations like vectors. The merging line donates the concatenation of vectors, and the diverging lines send copies of information to different nodes.
The horizontal line going through the top of the repeating module is a conveyor of data. And the gates allow information to go through the lower parts of the module. So, in that case, we can say that LSTM networks can remove or add the information. Some activation function options are also present in the LSTM. This is a unidirectional LSTM network where the network stores only the forward information.
BI-LSTM(Bi-directional long short term memory)
Bidirectional long-short term memory(bi-lstm) is the process of making any neural network o have the sequence information in both directions backwards (future to past) or forward(past to future).
In bidirectional, our input flows in two directions, making a bi-lstm different from the regular LSTM. With the regular LSTM, we can make input flow in one direction, either backwards or forward. However, in bi-directional, we can make the input flow in both directions to preserve the future and the past information. For a better explanation, let’s have an example.
In the sentence “boys go to …..” we can not fill the blank space. Still, when we have a future sentence “boys come out of school”, we can easily predict the past blank space the similar thing we want to perform by our model and bidirectional LSTM allows the neural network to perform this.
Image for bi-LSTM image source
In the diagram, we can see the flow of information from backward and forward layers. BI-LSTM is usually employed where the sequence to sequence tasks are needed. This kind of network can be used in text classification, speech recognition and forecasting models. Next in the article, we are going to make a bi-directional LSTM model using python.
Code Implementation of Bidirectional-LSTM
Setting up the environment in google colab.
Importing the libraries
import numpy as np from keras.preprocessing import sequence from keras.models import Sequential from keras.layers import Dense, Dropout, Embedding, LSTM, Bidirectional from keras.datasets import imdb
Here we are going to use the IMDB data set for text classification using keras and bi-LSTM network
n_unique_words = 10000 # cut texts after this number of words maxlen = 200 batch_size = 128
In the above, we have defined some objects we will use in the next steps. In the next step, we will load the data set from the Keras library.
(x_train, y_train),(x_test, y_test) = imdb.load_data(num_words=n_unique_words)
To fit the data into any neural network, we need to convert the data into sequence matrices. For this, we are using the pad_sequence module from keras.preprocessing.
x_train = sequence.pad_sequences(x_train, maxlen=maxlen) x_test = sequence.pad_sequences(x_test, maxlen=maxlen) y_train = np.array(y_train) y_test = np.array(y_test)
In the next, we are going to make a model with bi-LSTM layer.
model = Sequential() model.add(Embedding(n_unique_words, 128, input_length=maxlen)) model.add(Bidirectional(LSTM(64))) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
Here in the above codes we have in a regular neural network we have added a bi-LSTM layer using keras. Keras of tensor flow provides a new class [bidirectional] nowadays to make bi-LSTM.
In the next step we will fit the model with data that we loaded from the Keras.
history=model.fit(x_train, y_train, batch_size=batch_size, epochs=12, validation_data=[x_test, y_test]) print(history.history['loss']) print(history.history['accuracy'])
Here we can see that we have trained our model with training data set with 12 epochs. Although the image is not clearer because the number of content in one place is high, we can use plots to know the model’s performance.
from matplotlib import pyplot pyplot.plot(history.history['loss']) pyplot.plot(history.history['accuracy']) pyplot.title('model loss vs accuracy') pyplot.xlabel('epoch') pyplot.legend(['loss', 'accuracy'], loc='upper right') pyplot.show()
Here we can see the performance of the bi-LSTM. It is clear now we can see that the accuracy line is all time near to the one, and the loss is almost zero. Thus, the model has performed well in training.
So here in this article we have seen how the RNN, LSTM, bi-LSTM works internally and what makes them different from each other. In the final step, we have created a basic BI-LSTM model for text classification. The data was almost idle for text classification, and most of the models will perform well with this kind of data. The main examination of the model can happen with real-life problems. It is well suggested to use this type of model with sequential data. So we can use it with text data, audio data, time series data etc for better results.