MITB Banner

Guide to Autoencoders, with Python code

The autoencoder is a specific type of feed-forward neural network where input is the same as output.

Share

An autoencoder is an Artificial Neural Network used to compress and decompress the input data in an unsupervised manner. Compression and decompression operation is data specific and lossy. The autoencoder aims to learn representation known as the encoding for a set of data, which typically results in dimensionality reduction by training the network, along with reduction a reconstruction side is also learned. Data specific means, autoencoder will only be able to compress the data on which they have trained, e.g. if the autoencoder is trained on images of dogs, it will give a poor performance on images of cats. Here lossy operation can be explained as when you share an image on WhatApp, the quality of uploaded/shared image is degraded, in the same way, reconstruction side gives the output. From the below image, watch the quality of the reconstructed image and original image carefully.

Fig1: Schematic of autoencoder 

The autoencoder is a specific type of feed-forward neural network where input is the same as output. As shown in the above figure, to build an autoencoder, we need an encoding method, decoding method and loss function to compare the output with the target. 

First, the input passes through the encoders, which are nothing but fully connected artificial neural networks that produce the further code decoder with a similar structure like ANN, producing output using the same code. Here code is nothing but the compressed version of the input. 

Code implementation: 

Autoencoders are in the same way as conventional ANN trained through backpropagation. 

We are mainly going to cover three autoencoder i,e

  1. Simple autoencoder 
  2. Deep CNN autoencoder 
  3. Denoising autoencoder

For the implementation part, we are using a popular MNIST digits data set.

  1. Simple Autoencoder:
 Import all the dependencies
 from keras.layers import Dense,Conv2D,MaxPooling2D,UpSampling2D
 from keras import Input, Model
 from keras.datasets import mnist
 import numpy as np
 import matplotlib.pyplot as plt 

Build the model, here the encoding dimension decides by what amount the image will compress, lesser the dimension more the compression.  

 encoding_dim = 15 
 input_img = Input(shape=(784,))
 # encoded representation of input
 encoded = Dense(encoding_dim, activation='relu')(input_img)
 # decoded representation of code 
 decoded = Dense(784, activation='sigmoid')(encoded)
 # Model which take input image and shows decoded images
 autoencoder = Model(input_img, decoded) 

Build the encoder model decoder model separately so that we can easily differentiate between input and output

 # This model shows encoded images
 encoder = Model(input_img, encoded)
 # Creating a decoder model
 encoded_input = Input(shape=(encoding_dim,))
 # last layer of the autoencoder model
 decoder_layer = autoencoder.layers[-1]
 # decoder model
 decoder = Model(encoded_input, decoder_layer(encoded_input)) 

Compile the model with Adam optimizer and cross entropy loss function, fitment 

autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

 (x_train, y_train), (x_test, y_test) = mnist.load_data()
 autoencoder.fit(x_train, x_train,
                 epochs=15,
                 batch_size=256,
                 validation_data=(x_test, x_test))
 encoded_img = encoder.predict(x_test)
 decoded_img = decoder.predict(encoded_img) 

Using the plot function, you can see the output for encoded and decoded images, respectively as below.

  1. Deep CNN Autoencoder:

As the input is images, it makes more sense to use Convolutional Network; the encoder will consist of a stack of Conv2D and max-pooling layer, whereas the decoder consists of a stack of Conv2D and Upsampling layer.

model = Sequential()

 # encoder network
 model.add(Conv2D(30, 3, activation= 'relu', padding='same', input_shape = (28,28,1)))
 model.add(MaxPooling2D(2, padding= 'same'))
 model.add(Conv2D(15, 3, activation= 'relu', padding='same'))
 model.add(MaxPooling2D(2, padding= 'same')) 
 #decoder network
 model.add(Conv2D(15, 3, activation= 'relu', padding='same'))
 model.add(UpSampling2D(2))
 model.add(Conv2D(30, 3, activation= 'relu', padding='same'))
 model.add(UpSampling2D(2))
 model.add(Conv2D(1,3,activation='sigmoid', padding= 'same')) # output layer
 model.compile(optimizer= 'adam', loss = 'binary_crossentropy'
 model.summary() 
 Output:
 Model: "sequential"
 _________________________________________________________________
 Layer (type)                 Output Shape              Param #   
 =================================================================
 conv2d_17 (Conv2D)           (None, 28, 28, 30)        300       
 _________________________________________________________________
 max_pooling2d_7 (MaxPooling2 (None, 14, 14, 30)        0         
 _________________________________________________________________
 conv2d_18 (Conv2D)           (None, 14, 14, 15)        4065      
 _________________________________________________________________
 max_pooling2d_8 (MaxPooling2 (None, 7, 7, 15)          0         
 _________________________________________________________________
 conv2d_19 (Conv2D)           (None, 7, 7, 15)          2040      
 _________________________________________________________________
 up_sampling2d_7 (UpSampling2 (None, 14, 14, 15)        0         
 _________________________________________________________________
 conv2d_20 (Conv2D)           (None, 14, 14, 30)        4080      
 _________________________________________________________________
 up_sampling2d_8 (UpSampling2 (None, 28, 28, 30)        0         
 _________________________________________________________________
 conv2d_21 (Conv2D)           (None, 28, 28, 1)         271       
 =================================================================
 Total params: 10,756
 Trainable params: 10,756
 Non-trainable params: 0
 _________________________________________________________________ 

Resize the images to 28×28 and scale the values between 0 to 1 and fit the model

 model.fit(x_train, x_train,
                 epochs=15,
                 batch_size=128,
                 validation_data=(x_test, x_test)) 

Here are the input images and decoded images are given by the CNN based Autoencoder 

  1. Denoising autoencoder:

Let’s check whether the autoencoder can deal with noise in images, noise in the sense of Bluray images, white marker on the images changing the color of images, etc.

Now here we are introducing some noise to our original digits, then we will try to recover those images by the best possible result.

Introduce noise as below

 noise_factor = 0.7
 x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape) 
 x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape) 
 x_train_noisy = np.clip(x_train_noisy, 0., 1.)
 x_test_noisy = np.clip(x_test_noisy, 0., 1.) 

Here is some example of noisy images

 plt.figure(figsize=(20, 2))
 for i in range(1, 5 + 1):
     ax = plt.subplot(1, 5, i)
     plt.imshow(x_test_noisy[i].reshape(28, 28))
     plt.gray()
     ax.get_xaxis().set_visible(False)
     ax.get_yaxis().set_visible(False)
 plt.show() 

You can see that we barely identify digits, intentionally introducing more noise so as to check up to what extent autoencoder can recover the image.

Modify the layers of the above-defined model, such as increase the filter so that model can perform at best and fit the model 

 model.fit(x_train_noisy, x_train,
                 epochs=15,
                 batch_size=128,
                 validation_data=(x_test_noisy, x_test))
 pred = model.predict(x_test_noisy) 

Plot function 

 plt.figure(figsize=(20, 4))
 for i in range(5):
     # Display original
     ax = plt.subplot(2, 5, i + 1)
     plt.imshow(x_test_noisy[i].reshape(28, 28))
     plt.gray()
     ax.get_xaxis().set_visible(False)
     ax.get_yaxis().set_visible(False)
     # Display reconstruction
     ax = plt.subplot(2, 5, i + 1 + 5)
     plt.imshow(pred[i].reshape(28, 28))
     plt.gray()
     ax.get_xaxis().set_visible(False)
     ax.get_yaxis().set_visible(False)
 plt.show() 

Endnotes:

We have seen the structure of autoencoders and practically realised some basic autoencoders. There is a wide range of applications of autoencoders such as Dimensionality reduction image compression, a recommendation system and so on. Here we have trained our model for a few epochs; by increasing the epochs, we can boost the performance and also by increasing the dimension of our network.

References:  

Share
Picture of Vijaysinh Lendave

Vijaysinh Lendave

Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.