MITB Banner

How Classifiers Are Secretly Just Energy-Based Models

Share

The popularity of GANs and generative modelling is clearly on the rise. However, there is a significant performance gap between the strongest generative modelling approach and customised solutions for each specific problem. 

Practitioners believe that many tasks that are discriminative in nature, including the state-of-the-art generative models, have diverged quite heavily from state-of-the-art discriminative architectures. 

Therefore the performance of generative models is usually far below the performance of the state of the art discriminative models. 

Though there have been efforts to make the generative models better by leveraging invertible architectures, these methods, however, still perform below par in comparison to their purely discriminative counterparts jointly trained as generative models. 

In a paper published last week, by a team at the University of Toronto, classifiers were stripped down to investigate hidden space within them. 

By naming the paper as, “YOUR CLASSIFIER IS SECRETLY AN ENERGY BASED MODEL AND YOU SHOULD TREAT IT LIKE ONE”, the authors have made their intentions clear of why there is a need for reimagining the way we do deep learning research.

This paper advocates the use of energy-based models (EBMs) to help realise the potential of generative models on downstream discriminative problems.

Overview Of Energy-Based Models

via DeepMind

Energy-based Models (EBMs) were first introduced in 2006 by Yann LeCun and his team. These models can be thought of as one way of making the model improve its predictive quality.

Recently, energy-based models came into light when the researchers at DeepMind used these models to explore memory association in machines.

The researchers introduced a novel approach that leverages meta-learning to enable fast storage of patterns into the weights using energy-based memory models. The goal here is to store the patterns as quickly as possible in its weights and then retrieve them from associative memory.

These energy-based models get their name from the energy function that they capitalise on. EBMs are formulated around a mathematical function containing rules and variables affecting the final prediction. This function, called the energy function is modelled by a neural network and the writing rule is implemented as a weight update, producing parameters from the initialisation. 

A stark contrast in the energy-based models can be observed in the way they go about classification tasks. For example, let’s say there is an image under consideration to be classified. On feeding this image to a convolutional neural network, the dependencies at the granular level are captured and are given some probability scores. 

Whereas, in the case of energy models, classification is done based solely on energy values. 

Re-imagining Classifiers

via paper by Will Grathwohl et al.,

In this model, the researchers make use of the extra degree of freedom hidden within the logits to define the density function over input examples as well as the joint density among examples and labels.

This work demonstrated how one can slightly re-interpret the logits and re-use them to define an energy-based model of the joint distribution of data points and labels.

To test the efficacy of the JEM, the authors trained the model on CIFAR10, SVHN, and CIFAR100 and compared it against other hybrid models as well as standalone generative and discriminative models. 

The results, announce the authors, have shown to perform near the state of the art in both tasks simultaneously, outperforming other hybrid models.

However, the authors admit their limitations to energy-based models. The gradient estimators that were used to train JEM are quite unstable and are prone to diverging if the sampling and optimisation parameters are not tuned correctly.

Key Takeaways

In this work the authors have:

  • Presented a joint energy model JEM, a novel reinterpretation of standard classifier architectures
  • Demonstrated that this model retains the strong performance of SOTA discriminative models while adding the benefits of generative modelling approaches. 
  • Demonstrated the utility of incorporating new type of training into discriminative models. 

While there exist many issues in training energy-based models EBMs, the authors hope the results presented here will encourage the community to improve upon current approaches.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.