Nowadays, Generative adversarial networks, in short, GAN, are the effective ways to train data for computer vision-based applications. Moreover, it can be used to create a synthetic dataset to fulfil the requirements. A typical GAN architecture uses two models for training: Generator and Discriminator; the generator is used to output a synthetic image. The discriminator model is used to check whether the image is real or fake based on the discriminator output the generator model is trained on. To achieve proper equilibrium, both the models are trained in an adversarial manner.
If you are new to GAN, I recommend this article to have a proper understanding of GAN.
Sign up for your weekly dose of what's up in emerging technology.
When it comes to large datasets with higher pixel values, GAN generates the images with sharp pixels that look crispy though make the training unstable. Generating high-resolution images is a challenging task because the generator must know the details and structures involved in images. The high-resolution images can cause any issues that the discriminator can easily spot; therefore, the whole training process fails.
The solution for this behaviour of GAN is training the network by progressively adding layers. Progressive GAN is an extension of standard GAN that holds the generator in stable mode while dealing with large images to achieve better stable performance. Methods involve starting with very small images such as 4 x 4-pixel images and successively adding blocks of layers that increase the size of images to 8 x 8, 64 x 64 until the desired size, as shown in the picture below. This makes Progressive GAN capable of generating high pixel images such as 1024 x 1024 images.
As we can see from the above image, during training, the new convolutional layer is added to both generator and discriminator models; this makes the whole model effectively learn deep detailing of the pixels and finer level pixels simultaneously.
Today in this article, we will implement the Progressive GAN model using TensorFlow and see how this model can be used to generate artificial faces.
Implementation of Progressive GAN:
The below demonstration uses the Tensorflow model based on GAN, which maps the N-dimensional Latent space vector into an RGB image.
The below code shows the mapping of a latent space vector to the image and Generating the target image using gradient descent which gives the latent vector.
Here the latent vector is simply a representation of the compressed data in which similar data points are closer to each other in space. This is important because the GAN model takes points from the latent vector and generates an image based on it.
The below code is part of the official implementation of Progressive GAN in Tensorflow.
Install & Import all dependencies:
# imageio for creating animation !pip -q install imageio !pip -q install scikit-image !pip install git+https://github.com/tensorflow/docs from absl import logging import imageio import PIL.Image import PIL.Image import matplotlib.pyplot as plt import numpy as np import tensorflow as tf tf.random.set_seed(42) import tensorflow_hub as hub from tensorflow_docs.vis import embed import time try: from google.colab import files except ImportError: pass from IPython import display from skimage import transform
The model uses the latent dimension as a multiple of 512. Here in this section, we create a function for displaying the image, animating the image to see the changes, and the interpolation function to create new pixels.
latent_dim = 512 def interpolating_vectors(v1, v2, num_steps): v1_n = tf.norm(v1) v2_n = tf.norm(v2) v2_normal = v2 * (v1_n / v2_n) vect =  for step in range(num_steps): interpol = v1 + (v2_normal - v1) * step / (num_steps - 1) interpol_norm = tf.norm(interpol) interpol_normal = interpol * (v1_n / interpol_norm) vect.append(interpol_normal) return tf.stack(vect) def image_display(img): img = tf.constant(img) img = tf.image.convert_image_dtype(img, dtype=tf.uint8) return PIL.Image.fromarray(img.numpy()) def animation(img): img = np.array(img) converted_images = np.clip(img * 255, 0, 255).astype(np.uint8) imageio.mimsave('./animation.gif', converted_images) return embed.embed_file('./animation.gif')
Latent space interpolation:
Here we are using the random vectors for the interpolation; for that, we are using the TF-hub module, which consists pretrained model of Progressive GAN.
progan = hub.load('https://tfhub.dev/google/progan-128/1').signatures['default'] def interpolating_between_vect(): v1 = tf.random.normal([latent_dim]) v2 = tf.random.normal([latent_dim]) vect = interpolating_vectors(v1, v2, 150) interpolated_images = progan(vect)['default'] return interpolated_images interpolated_images = interpolating_between_vect() animation(interpolated_images)
See the image generated by Progressive GAN by randomly generated vectors;
Finding closet vector in latent space:
Here we try to generate a target image using the latent space vector; we can also upload our image by changing the image_from_module_space to False.
inside_image = True def from_module_space(): vector = tf.random.normal([1, 512]) images = progan(vector)['default'] return images def upload_image(): uploaded = files.upload() img = imageio.imread(uploaded[list(uploaded.keys())]) return transform.resize(img, [128, 128]) if inside_image: target_image = from_module_space() else: target_image = upload_image() image_display(target_image)
As I have chosen to use the image from the module, here is our target image;
Below we are generating our starting image based on that model tries to converge to the target image. We need to define the loss function between the target image and the latent space image. Then, we can use gradient descent to find variables that minimize loss.
tf.random.set_seed(4) initial_vector = tf.random.normal([1, latent_dim]) display_image(progan(initial_vector)['default'])
Below is our starting image;
Now get the closest latent point for the target image;
def closest_vetor(initial_vector, num_optimization_, steps_per_img): images =  losses =  vector = tf.Variable(initial_vector) optimizer = tf.optimizers.Adam(learning_rate=0.01) loss_fn = tf.losses.MeanAbsoluteError(reduction="sum") for step in range(num_optimization_): if (step % 100)==0: print() print('.', end='') with tf.GradientTape() as tape: image = progan(vector.read_value())['default'] if (step % steps_per_img) == 0: images.append(image.numpy()) target_image_difference = loss_fn(image, target_image[:,:,:3]) regularizer = tf.abs(tf.norm(vector) - np.sqrt(latent_dim)) loss = target_image_difference + regularizer losses.append(loss.numpy()) grads = tape.gradient(loss, [vector]) optimizer.apply_gradients(zip(grads, [vector])) return images, losses num_optimization_=200 steps_per_img=5 images, loss = closest_vetor(initial_vector, num_optimization_, steps_per_img)
Check out the animated image, which shows how the model has converged to our targeted image; and check the result side by side left-sided is generated image, and right-sided is the target image
display_image(np.concatenate([images[-1], target_image], axis=1))
From this article, we have seen how Standard GAN architectures lag when handling large pixel values. But the Progressive GAN comes to fulfil the task; we see how progressively adding the layers helps the generator function remain stable throughout the operation and generate a reasonable image.