Following MIT, researchers at NVIDIA have recently developed a new augmented method for training Generative Adversarial Networks (GANs) with a limited amount of data. The approach is an adaptive discriminator augmentation mechanism that significantly stabilised training in limited data regimes.
Machine learning models are data-hungry. As a matter of fact, in the past few years, we have seen that models that are fed with silos of data produce outstanding predictive outcomes.
Alongside, with significant growth, Generative Adversarial Networks have been successfully used for various applications including high-fidelity natural image synthesis, data augmentation tasks, improving image compressions, etc. From emoting realistic expressions to traversing the deep space, and from bridging the gap between humans and machines to introduce new and unique art forms, GANs have it all covered.
The Need For Less Data
Although deep neural network models, including GANs, have shown impressive results, yet there remains a challenge of collecting a large number of specific datasets. In the case of GANs, the challenge is to collect a large enough set of images for a specific application that places constraints on the subject type, image quality, geographical location, time period, privacy, copyright status, among others.
Further, one of the key issues with small datasets is that the discriminator overfits the training examples and thus, its feedback to the generator becomes meaningless, and training starts to diverge.
According to the researchers, in almost all areas of deep learning, dataset augmentation is known to be the standard solution against overfitting. In contrast, a GAN trained under similar dataset augmentations learns to generate the augmented distribution, which is usually considered to be highly undesirable.
This is the reason behind developing the new approach which does not require changes to loss-functions or network architectures and is applicable while training as well as during fine-tuning of an existing GAN on another dataset. The researchers also demonstrated how to use a wide range of augmentations to prevent the discriminator from overfitting, ensuring that none of the augmentations leaked to the generated images.
Behind the Model
The researchers called this approach as Adaptive Discriminator Augmentation (ADA), where they tested the method against a number of alternatives in artificially subsetting larger datasets such as FFHQ and LSUN CAT to study how the quantity of available training data affects training in GANs.
Popular dataset StyleGAN2 and BigGAN are considered as a baseline, however, the researchers chose StyleGAN2 because it provided more predictable results with a significantly lower variance between training runs. Also, the adaptive augmentation is compared against a wider set of alternatives, such as PA-GAN, WGAN-GP, zCR, auxiliary rotations, and spectral normalisation.
During the process, the researchers studied the effectiveness of stochastic discriminator augmentation by performing exhaustive sweeps for different augmentation categories and dataset sizes. They observed that the optimal augmentation strength depends heavily on the amount of training data, and not all augmentation categories are equally useful in practice.
The results FFHQ and LSUN CAT dataset across training set sizes demonstrated that adaptive discriminator augmentation (ADA) improves FIDs substantially in limited data scenarios. Also, with ADA the augmentations do not leak, and thus the same diverse set of augmentations can be safely used in all datasets.
Thus, the researchers showed that the adaptive discriminator augmentation reliably stabilises training and vastly improves the resulting quality when training data is in short supply. The contributions in this work make it easier to train high-quality generative models with custom sets of images.
They stated, “Of course, augmentation is not a substitute for real data— one should always try to collect a large, high-quality set of training data first, and only then fill the gaps using augmentation.” They added, “As future work, it would be worthwhile to search for the most effective set of augmentations and to see if recently published techniques, such as the U-net discriminator or multi-modal generator could also help with limited data.”
For more details, read the paper here.