The guiding principle of generative models is being able to construct a convincing example of the data that it is fed with. The more realistic the image, the stronger is the evidence that the model has grasped the objective.\r\n\r\nGenerative models offer an appealing alternative to self-supervised tasks in that they are trained to model the full data distribution without requiring any modification of the original data.\u00a0\r\n\r\nThe dream of generations to finding actionable insights from raw data alone has hardly been realised yet. So far, self-supervision has dominated the representation learning in spite of the success of GANs.\r\n\r\nThe simplest objective for unsupervised learning is to train an algorithm to generate its own instances of data. The so-called generative models should not simply reproduce the data they are trained on, but rather build a model of the underlying class from which that data was drawn. For example, they should not show a particular photograph of a horse or a rainbow, but the set of all photographs of horses and rainbows; not a specific utterance from a specific speaker, but the general distribution of spoken utterances.\u00a0\r\n\r\nBi-directional Generative Adversarial Networks (BiGANs) were introduced a couple of years ago to learn inverse mapping, and demonstrate that the resulting learned feature representation is useful for auxiliary supervised discrimination tasks, which are on par with unsupervised and self-supervised feature learning.\u00a0\r\n\r\nIntuitively, models trained to predict these semantic latent representations given data may serve as useful feature representations for auxiliary problems where semantics are relevant.\r\n\r\n\r\n\r\nNow researchers introduce BigBiGAN which is built upon the state-of-the-art BigGAN model, extending it to representation learning by adding an encoder and modifying the discriminator.\r\nWhat BigBiGAN Does To Visual Models\r\nThe architecture for BigBiGAN remains the same as that of BiGAN. The only change here is that the researchers have found that an improved discriminator structure leads to better representation learning results without compromising generation.\r\n\r\n\r\n\r\nThe above figure is the structure of the BigBiGAN framework where a joint discriminator D is used to compute the loss. Its inputs are data-latent pairs, either (x\u223cPx,\u02c6z\u223c E (x)), sampled from the data distribution Px and encoder E outputs, or (\u02c6x\u223cG(z),z\u223cPz), sampled from the generator G outputs and the latent distribution Pz. The loss includes the unary data term Sx and the unary latent term Sz, as well as the joint term Sxz which ties the data and latent distributions.\r\n\r\nBigBiGAN is trained on an unlabeled ImageNet. Its learned representation is later frozen and then a linear classifier is trained on its outputs, fully supervised using all of the training set labels.\r\n\r\nIn the above figure, top row images are real data; bottom row images are generated reconstructions of the above image. Unlike most explicit reconstruction costs (e.g., pixel-wise), the reconstruction cost implicitly minimized by a (Big)BiGAN tends to emphasize more semantic, high-level details.\r\n\r\nThe extent to which these reconstructions tend to retain the high-level semantics of the inputs rather than the low-level details suggests that BigBiGAN training encourages the encoder to model the former more so than the latter.\r\n\r\nAccording to the authors, the following are the key objectives behind this work:\r\n\r\n \tShow that BigBiGAN (BiGAN with BigGAN generator) matches the state of the art in unsupervised representation learning on ImageNet.\r\n \tProposal of a more stable version of the joint discriminator for BigBiGAN.\r\n \tPerform a thorough empirical analysis and ablation study of model design choices.\r\n \tShow that the representation learning objective also helps unconditional image generation.\r\n\r\nAI With Intuition: The Final Frontier\r\nThe ability to learn about the world without explicit supervision is fundamental to what is regarded as intelligence. And, the above results do resonate with the popular intuitions about the human mind.\u00a0\r\n\r\nThough the reconstructions from the generative models are far from pixel-perfect, they still may provide some intuition for what features the encoder learns to model.\u00a0\r\n\r\nFor example, when the input image contains a dog, person, or a food item, the reconstruction is often a different instance of the same \u201ccategory\u201d with similar pose, position, and texture \u2013 for example, a similar species of dog facing the same direction.\u00a0\r\n\r\nIn this work, the researchers introduce BigBiGAN and show that progress in image generation quality translates to substantially improved representation learning performance. This is a new perspective considering how ambiguous the inner workings of deep networks are.\r\n\r\nRead the full work here.