An autoencoder is an Artificial Neural Network used to compress and decompress the input data in an unsupervised manner. Compression and decompression operation is data specific and lossy. The autoencoder aims to learn representation known as the encoding for a set of data, which typically results in dimensionality reduction by training the network, along with reduction a reconstruction side is also learned. Data specific means, autoencoder will only be able to compress the data on which they have trained, e.g. if the autoencoder is trained on images of dogs, it will give a poor performance on images of cats. Here lossy operation can be explained as when you share an image on WhatApp, the quality of uploaded/shared image is degraded, in the same way, reconstruction side gives the output. From the below image, watch the quality of the reconstructed image and original image carefully.
The autoencoder is a specific type of feed-forward neural network where input is the same as output. As shown in the above figure, to build an autoencoder, we need an encoding method, decoding method and loss function to compare the output with the target.
First, the input passes through the encoders, which are nothing but fully connected artificial neural networks that produce the further code decoder with a similar structure like ANN, producing output using the same code. Here code is nothing but the compressed version of the input.
Code implementation:
Autoencoders are in the same way as conventional ANN trained through backpropagation.
We are mainly going to cover three autoencoder i,e
- Simple autoencoder
- Deep CNN autoencoder
- Denoising autoencoder
For the implementation part, we are using a popular MNIST digits data set.
- Simple Autoencoder:
Import all the dependencies from keras.layers import Dense,Conv2D,MaxPooling2D,UpSampling2D from keras import Input, Model from keras.datasets import mnist import numpy as np import matplotlib.pyplot as plt
Build the model, here the encoding dimension decides by what amount the image will compress, lesser the dimension more the compression.
encoding_dim = 15 input_img = Input(shape=(784,)) # encoded representation of input encoded = Dense(encoding_dim, activation='relu')(input_img) # decoded representation of code decoded = Dense(784, activation='sigmoid')(encoded) # Model which take input image and shows decoded images autoencoder = Model(input_img, decoded)
Build the encoder model decoder model separately so that we can easily differentiate between input and output
# This model shows encoded images encoder = Model(input_img, encoded) # Creating a decoder model encoded_input = Input(shape=(encoding_dim,)) # last layer of the autoencoder model decoder_layer = autoencoder.layers[-1] # decoder model decoder = Model(encoded_input, decoder_layer(encoded_input))
Compile the model with Adam optimizer and cross entropy loss function, fitment
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
(x_train, y_train), (x_test, y_test) = mnist.load_data() autoencoder.fit(x_train, x_train, epochs=15, batch_size=256, validation_data=(x_test, x_test)) encoded_img = encoder.predict(x_test) decoded_img = decoder.predict(encoded_img)
Using the plot function, you can see the output for encoded and decoded images, respectively as below.
- Deep CNN Autoencoder:
As the input is images, it makes more sense to use Convolutional Network; the encoder will consist of a stack of Conv2D and max-pooling layer, whereas the decoder consists of a stack of Conv2D and Upsampling layer.
model = Sequential()
# encoder network model.add(Conv2D(30, 3, activation= 'relu', padding='same', input_shape = (28,28,1))) model.add(MaxPooling2D(2, padding= 'same')) model.add(Conv2D(15, 3, activation= 'relu', padding='same')) model.add(MaxPooling2D(2, padding= 'same'))
#decoder network model.add(Conv2D(15, 3, activation= 'relu', padding='same')) model.add(UpSampling2D(2)) model.add(Conv2D(30, 3, activation= 'relu', padding='same')) model.add(UpSampling2D(2)) model.add(Conv2D(1,3,activation='sigmoid', padding= 'same')) # output layer model.compile(optimizer= 'adam', loss = 'binary_crossentropy' model.summary()
Output: Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_17 (Conv2D) (None, 28, 28, 30) 300 _________________________________________________________________ max_pooling2d_7 (MaxPooling2 (None, 14, 14, 30) 0 _________________________________________________________________ conv2d_18 (Conv2D) (None, 14, 14, 15) 4065 _________________________________________________________________ max_pooling2d_8 (MaxPooling2 (None, 7, 7, 15) 0 _________________________________________________________________ conv2d_19 (Conv2D) (None, 7, 7, 15) 2040 _________________________________________________________________ up_sampling2d_7 (UpSampling2 (None, 14, 14, 15) 0 _________________________________________________________________ conv2d_20 (Conv2D) (None, 14, 14, 30) 4080 _________________________________________________________________ up_sampling2d_8 (UpSampling2 (None, 28, 28, 30) 0 _________________________________________________________________ conv2d_21 (Conv2D) (None, 28, 28, 1) 271 ================================================================= Total params: 10,756 Trainable params: 10,756 Non-trainable params: 0 _________________________________________________________________
Resize the images to 28×28 and scale the values between 0 to 1 and fit the model
model.fit(x_train, x_train, epochs=15, batch_size=128, validation_data=(x_test, x_test))
Here are the input images and decoded images are given by the CNN based Autoencoder
- Denoising autoencoder:
Let’s check whether the autoencoder can deal with noise in images, noise in the sense of Bluray images, white marker on the images changing the color of images, etc.
Now here we are introducing some noise to our original digits, then we will try to recover those images by the best possible result.
Introduce noise as below
noise_factor = 0.7 x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape) x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape) x_train_noisy = np.clip(x_train_noisy, 0., 1.) x_test_noisy = np.clip(x_test_noisy, 0., 1.)
Here is some example of noisy images
plt.figure(figsize=(20, 2)) for i in range(1, 5 + 1): ax = plt.subplot(1, 5, i) plt.imshow(x_test_noisy[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) plt.show()
You can see that we barely identify digits, intentionally introducing more noise so as to check up to what extent autoencoder can recover the image.
Modify the layers of the above-defined model, such as increase the filter so that model can perform at best and fit the model
model.fit(x_train_noisy, x_train, epochs=15, batch_size=128, validation_data=(x_test_noisy, x_test)) pred = model.predict(x_test_noisy)
Plot function
plt.figure(figsize=(20, 4)) for i in range(5): # Display original ax = plt.subplot(2, 5, i + 1) plt.imshow(x_test_noisy[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) # Display reconstruction ax = plt.subplot(2, 5, i + 1 + 5) plt.imshow(pred[i].reshape(28, 28)) plt.gray() ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) plt.show()
Endnotes:
We have seen the structure of autoencoders and practically realised some basic autoencoders. There is a wide range of applications of autoencoders such as Dimensionality reduction image compression, a recommendation system and so on. Here we have trained our model for a few epochs; by increasing the epochs, we can boost the performance and also by increasing the dimension of our network.