A new study on unsupervised deep learning indicates that the brain classifies faces into semantically significant elements like age at a single neural network level.
The active appearance model (AAM) is a mainly constructed framework that falls short of identifying a general principle of learning. Deep neural networks is a popular computational model in the ventral monkey stream in recent years. Unlike AAM, these models are not domain-specific, and their tuning distributions are built by data-driven learning. Modern deep networks are trained on multiway object identification tasks using high-density instructional signals, resulting in high-dimensional representations that closely mimic those found in biological systems.
On the other hand, deep classifiers no longer explain single-neuron responses in the monkey face patch and AAM. Additionally, the representative form of deep classifiers and AAM are distinct. While deep classifiers generate high-dimensional multiplexed representations across many simulated neurons, AAM employs a low-dimensional coding scheme that contains orthogonal information in a single dimension.
Sign up for your weekly dose of what's up in emerging technology.
According to a long-held view, the visual system employs self-supervision to determine sensory data. While such an interpretable structure may appear deceptively easy to humans, it has proven challenging to recover in practice due to the highly sophisticated non-linear transformation of pixel-level inputs.
Source: Unsupervised deep learning
Recent advances in machine learning have provided an implementational blueprint for this theory, introducing deep self-supervised generative models. For example, the beta-variational autoencoder (-VAE) is a model that learns to reconstruct sensory data reliably from a low-dimensional embedding while simultaneously encouraging individual network units to code for semantically significant factors such as an object, colour, face, gender, and scene layout.
Deep Generative Models
These deep generative models continue the neuroscience community’s long legacy of generating self-supervised vision models while also allowing for strong generalisation, imagination, abstract reasoning, compositional inference, and other characteristics of biological visual cognition. This study aims to discover whether a general learning objective can result in an encoding comparable to that employed by real neurons. The results demonstrate that the -VAE-optimized disentangling aim verifies the mechanism by which the ventral visual stream generates visible low-dimensional facial images.
In contrast to previous studies, the findings demonstrate that unsupervised deep learning can meaningfully process such a code at the neuronal level. Furthermore, the study demonstrates that the axes of variation of individual IT neurons correspond to single “disentangled” latent units. Thus, this research extends prior research on the coding features of single neurons in the monkey face-patch area by demonstrating one-to-one correspondences between model units and neurons, rather than the previously reported few-to-one correspondences.
For more information, refer to the article.
All models’ raw responses to the 2162 face photos generated in this investigation have been deposited in the figshare repository https://doi.org/10.6084/m9.figshare.c.5613197.v2.
The following databases contain the face image data used in this study:
FERET face database https://www.nist.gov/itl/iad/image-group/color-feret-database,
CVL face database
MR2 face database
PEAL face database
Due to the study’s complexity and partial reliance on commercial libraries, the code supporting the findings is accessible upon request from Irina Higgins (firstname.lastname@example.org). In addition, the -VAE model, the alignment score, and the UDR measure are all available as open-source implementations at https://github.com/google-research/disentanglement_lib.