Guide to Autoencoders, with Python code

The autoencoder is a specific type of feed-forward neural network where input is the same as output.

Share

Published on June 21, 2021

by Vijaysinh Lendave

An autoencoder is an Artificial Neural Network used to compress and decompress the input data in an unsupervised manner. Compression and decompression operation is data specific and lossy. The autoencoder aims to learn representation known as the encoding for a set of data, which typically results in dimensionality reduction by training the network, along with reduction a reconstruction side is also learned. Data specific means, autoencoder will only be able to compress the data on which they have trained, e.g. if the autoencoder is trained on images of dogs, it will give a poor performance on images of cats. Here lossy operation can be explained as when you share an image on WhatApp, the quality of uploaded/shared image is degraded, in the same way, reconstruction side gives the output. From the below image, watch the quality of the reconstructed image and original image carefully.

The autoencoder is a specific type of feed-forward neural network where input is the same as output. As shown in the above figure, to build an autoencoder, we need an encoding method, decoding method and loss function to compare the output with the target.

First, the input passes through the encoders, which are nothing but fully connected artificial neural networks that produce the further code decoder with a similar structure like ANN, producing output using the same code. Here code is nothing but the compressed version of the input.

Code implementation:

Autoencoders are in the same way as conventional ANN trained through backpropagation.

We are mainly going to cover three autoencoder i,e

Simple autoencoder
Deep CNN autoencoder
Denoising autoencoder

For the implementation part, we are using a popular MNIST digits data set.

Simple Autoencoder:

 Import all the dependencies
 from keras.layers import Dense,Conv2D,MaxPooling2D,UpSampling2D
 from keras import Input, Model
 from keras.datasets import mnist
 import numpy as np
 import matplotlib.pyplot as plt

Build the model, here the encoding dimension decides by what amount the image will compress, lesser the dimension more the compression.

 encoding_dim = 15 
 input_img = Input(shape=(784,))
 # encoded representation of input
 encoded = Dense(encoding_dim, activation='relu')(input_img)
 # decoded representation of code 
 decoded = Dense(784, activation='sigmoid')(encoded)
 # Model which take input image and shows decoded images
 autoencoder = Model(input_img, decoded)

Build the encoder model decoder model separately so that we can easily differentiate between input and output

 # This model shows encoded images
 encoder = Model(input_img, encoded)
 # Creating a decoder model
 encoded_input = Input(shape=(encoding_dim,))
 # last layer of the autoencoder model
 decoder_layer = autoencoder.layers[-1]
 # decoder model
 decoder = Model(encoded_input, decoder_layer(encoded_input))

Compile the model with Adam optimizer and cross entropy loss function, fitment

autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

 (x_train, y_train), (x_test, y_test) = mnist.load_data()
 autoencoder.fit(x_train, x_train,
                 epochs=15,
                 batch_size=256,
                 validation_data=(x_test, x_test))
 encoded_img = encoder.predict(x_test)
 decoded_img = decoder.predict(encoded_img)

Using the plot function, you can see the output for encoded and decoded images, respectively as below.

Deep CNN Autoencoder:

As the input is images, it makes more sense to use Convolutional Network; the encoder will consist of a stack of Conv2D and max-pooling layer, whereas the decoder consists of a stack of Conv2D and Upsampling layer.

model = Sequential()

 # encoder network
 model.add(Conv2D(30, 3, activation= 'relu', padding='same', input_shape = (28,28,1)))
 model.add(MaxPooling2D(2, padding= 'same'))
 model.add(Conv2D(15, 3, activation= 'relu', padding='same'))
 model.add(MaxPooling2D(2, padding= 'same'))

 #decoder network
 model.add(Conv2D(15, 3, activation= 'relu', padding='same'))
 model.add(UpSampling2D(2))
 model.add(Conv2D(30, 3, activation= 'relu', padding='same'))
 model.add(UpSampling2D(2))
 model.add(Conv2D(1,3,activation='sigmoid', padding= 'same')) # output layer
 model.compile(optimizer= 'adam', loss = 'binary_crossentropy'
 model.summary()

 Output:
 Model: "sequential"
 _________________________________________________________________
 Layer (type)                 Output Shape              Param #   
 =================================================================
 conv2d_17 (Conv2D)           (None, 28, 28, 30)        300       
 _________________________________________________________________
 max_pooling2d_7 (MaxPooling2 (None, 14, 14, 30)        0         
 _________________________________________________________________
 conv2d_18 (Conv2D)           (None, 14, 14, 15)        4065      
 _________________________________________________________________
 max_pooling2d_8 (MaxPooling2 (None, 7, 7, 15)          0         
 _________________________________________________________________
 conv2d_19 (Conv2D)           (None, 7, 7, 15)          2040      
 _________________________________________________________________
 up_sampling2d_7 (UpSampling2 (None, 14, 14, 15)        0         
 _________________________________________________________________
 conv2d_20 (Conv2D)           (None, 14, 14, 30)        4080      
 _________________________________________________________________
 up_sampling2d_8 (UpSampling2 (None, 28, 28, 30)        0         
 _________________________________________________________________
 conv2d_21 (Conv2D)           (None, 28, 28, 1)         271       
 =================================================================
 Total params: 10,756
 Trainable params: 10,756
 Non-trainable params: 0
 _________________________________________________________________

Resize the images to 28×28 and scale the values between 0 to 1 and fit the model

 model.fit(x_train, x_train,
                 epochs=15,
                 batch_size=128,
                 validation_data=(x_test, x_test))

Here are the input images and decoded images are given by the CNN based Autoencoder

Denoising autoencoder:

Let’s check whether the autoencoder can deal with noise in images, noise in the sense of Bluray images, white marker on the images changing the color of images, etc.

Now here we are introducing some noise to our original digits, then we will try to recover those images by the best possible result.

Introduce noise as below

 noise_factor = 0.7
 x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape) 
 x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape) 
 x_train_noisy = np.clip(x_train_noisy, 0., 1.)
 x_test_noisy = np.clip(x_test_noisy, 0., 1.)

Here is some example of noisy images

 plt.figure(figsize=(20, 2))
 for i in range(1, 5 + 1):
     ax = plt.subplot(1, 5, i)
     plt.imshow(x_test_noisy[i].reshape(28, 28))
     plt.gray()
     ax.get_xaxis().set_visible(False)
     ax.get_yaxis().set_visible(False)
 plt.show()

You can see that we barely identify digits, intentionally introducing more noise so as to check up to what extent autoencoder can recover the image.

Modify the layers of the above-defined model, such as increase the filter so that model can perform at best and fit the model

 model.fit(x_train_noisy, x_train,
                 epochs=15,
                 batch_size=128,
                 validation_data=(x_test_noisy, x_test))
 pred = model.predict(x_test_noisy)

Plot function

 plt.figure(figsize=(20, 4))
 for i in range(5):
     # Display original
     ax = plt.subplot(2, 5, i + 1)
     plt.imshow(x_test_noisy[i].reshape(28, 28))
     plt.gray()
     ax.get_xaxis().set_visible(False)
     ax.get_yaxis().set_visible(False)
     # Display reconstruction
     ax = plt.subplot(2, 5, i + 1 + 5)
     plt.imshow(pred[i].reshape(28, 28))
     plt.gray()
     ax.get_xaxis().set_visible(False)
     ax.get_yaxis().set_visible(False)
 plt.show()

Endnotes:

We have seen the structure of autoencoders and practically realised some basic autoencoders. There is a wide range of applications of autoencoders such as Dimensionality reduction image compression, a recommendation system and so on. Here we have trained our model for a few epochs; by increasing the epochs, we can boost the performance and also by increasing the dimension of our network.