Active Hackathon

Inside Multimodal Neurons, The Most Advanced Neural Networks Discovered By OpenAI

In a major breakthrough, researchers at OpenAI have discovered neural networks within AI systems resembling the neural network inside the human brain. The multimodal neurons are one of the most advanced neural networks to date. 

The researchers have found these advanced neurons can respond to a cluster of abstract concepts centred around a common high-level theme rather than a specific visual feature. Like their biological counterparts, these neurons can respond to a range of emotions, animals, photographs, drawings and famous people.


Sign up for your weekly dose of what's up in emerging technology.

Researchers wrote these neurons in CLIP can respond to the same concept, whether presented literally, symbolically, or conceptually.

The multimodal neurons have been discovered in the CLIP model that can connect text and images. It can learn visual concepts from natural language supervision. Further, this general-purpose vision system can match the performance of a ResNet-50 but outperforms existing vision systems on the most challenging datasets. For instance, one neuron called the ‘Spider-Man’ can respond to a spider’s image, the text ‘spider’, and the comic book character ‘spider-man’.

The Study

The researchers found multimodal neurons in several CLIP models of varying sizes, but they focused on studying the mid-sized RN50-x4 model. Researchers employed two tools to understand the activations of the model:

  • Feature visualisation, which maximises the neuron’s firing by doing gradient-based optimisation on the input.
  • Dataset examples, which looks at the distribution of maximal activating images for a neuron from a dataset. 

The researchers carried out a series of carefully-constructed experiments to find these neurons’ unique capabilities in the convolutional layer. Each layer consists of thousands of neurons. “For our preliminary analysis, we looked at feature visualisations, the dataset examples that most activated the neuron, and the English words that most activated the neuron when rastered as images,” said researchers. Most of these neurons were made to deal with sensitive topics, from political figures to emotions.

The experiment revealed an incredible diversity of features such as region neurons, person neurons, emotion neurons, art style neurons, time neurons, abstract neurons, colour neurons and more. 

Researchers found that a majority of neurons in CLIP are readily interpretable. “From an interpretability perspective, these neurons can be seen as extreme examples of “multi-faceted neurons” which respond to multiple distinct cases. Looking to neuroscience, they might sound like “grandmother neurons,” but their associative nature distinguishes them from how many neuroscientists interpret that term,” stated researchers. 

Researchers also studied how these multimodal neurons can give us insight into understanding how CLIP performs classification, such as image and text classification. 

Not Fool-Proof

Neural networks work on the same principle as their biological counterparts to process data. However, the drawback is, it is difficult to understand why it makes certain decisions and how it comes to a particular conclusion.

The researchers said that despite being trained on a curated subset of the internet, it still inherits its many unchecked biases and associations. “…we have discovered several cases where CLIP holds associations that could result in representational harm, such as denigration of certain individuals or groups,” researchers stated. For instance, “Middle East” neuron was associated with terrorism; and an “immigration” neuron responded to Latin America. 

Despite fine-tunes and the use of zero-shot techniques, researchers said these biases and associations would remain in the system. The CLIP findings are still evolving, and there is a lot of research and understanding that needs to be done in multimodal systems. In a bid to advance the area, researchers have shared the tools, dataset examples, text feature visualisations, and more with the community. 

More Great AIM Stories

Srishti Deoras
Srishti currently works as Associate Editor at Analytics India Magazine. When not covering the analytics news, editing and writing articles, she could be found reading or capturing thoughts into pictures.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Data Science Skills Survey 2022 – By AIM and Great Learning

Data science and its applications are becoming more common in a rapidly digitising world. This report presents a comprehensive view to all the stakeholders — students, professionals, recruiters, and others — about the different key data science tools or skillsets required to start or advance a career in the data science industry.

How to Kill Google Play Monopoly

The only way to break Google’s monopoly is to have localised app stores with an interface as robust as Google’s – and this isn’t an easy ask. What are the options?