Self-supervised learning (SSL) has become a handy technique in computer vision tasks. Significant advances in SSL means its methods can now learn representations even if the input samples are distorted. Also referred to as ‘data augmentations’, this is made possible by maximising the similarity of representations extracted from different distorted versions of a sample.
However, there is a slight hitch. This approach introduces trivial constant representations. Currently, most methods avoid collapsed solutions by employing careful implementation details. To that end, Yann LeCun and his team have introduced Barlow Twins. This objective function avoids collapse by measuring the cross-correlation matrix between the output of two identical networks fed with distorted versions. The aim here is to minimise the redundancy between the vector components and make the outputs as close as possible to the identity matrix.
Trivial Representations
Self-supervised learning has proved to be a good solution for deep learning systems’ excessive data dependency. LeCun, referred to as one of the Godfathers of deep learning and the inventor of convolutional neural networks, first gave a glimpse of self-supervised learning in 2018 during his keynote speech at AAAI conference.
Self-supervised learning helps in creating data-efficient artificial systems. This method learns valuable representations of the input data without relying on human annotations. Self-supervised learning has major applications in the field of natural language processing.
As discussed, data augmentation also leads to trivial representations. In the past, there have been several attempts to overcome this problem, including:
- Contrastive methods define positive and negative sample pairs that are treated differently in the loss function.
- Clustering methods use one distorted sample to measure ‘targets’ for the loss. Another distorted version of the sample is used to predict these targets. It is then followed by the application of an alternate optimisation scheme such as K-means.
- BYOL and SIMSIAM are among the more recent methods. In both methods, network architectures and parameters updates are modified to introduce asymmetry.
Barlow Twins
LeCun and his team now proposed a new method called Barlow Twins. Named after neuroscientist H.Barlow, this method draws heavily from his influential 1960 article, titled ‘Possible Principles Underlying the Transformation of Sensory Messages’, which notes that sensory processing’s goal is to recode redundant sensory input data into code with a statistically independent component, also called factorial code.
Barlow Twins method applies redundancy reduction, similar to Barlow’s one in his article, to self-supervised learning. As used in the Barlow Twins method, the principle of redundancy reduction has proved successful in explaining the visual system’s organisation and led to the introduction of several algorithms for supervised and unsupervised learning.
Barlow Twins is conceptually simple, easy to implement and learns useful representations. With this method, the researchers propose an objective function that makes the cross-correlation matrix computed from twin representations to be very close to the identity matrix. Barlow Twins benefits from the use of very high-dimensional representations.
Barlow Twins operates on a joint embedding of distorted images. It produces two distorted views for all the images of a batch sampled from the dataset obtained through the distribution of data augmentation. These two batches of distorted views are fed to a deep network with trainable parameters, producing batches of representations.
Credit: Barlow Twins
The advantages of the methods include: It doesn’t require large batches, asymmetric mechanisms like prediction networks, momentum encoders, stop gradients or non-differentiable operators.
Wrapping Up
Barlow Twins have outperformed previous state-of-art methods for self-supervised learning with the added advantage of being simpler and avoiding trivial representations. It is also on-par with the current ImageNet classification methods with linear classification head and a number of other classification and object detection methods. Researchers believe that further algorithm refinement could open doors for more effective solutions.
Read the full paper here.