How Classifiers Are Secretly Just Energy-Based Models

The popularity of GANs and generative modelling is clearly on the rise. However, there is a significant performance gap between the strongest generative modelling approach and customised solutions for each specific problem. 

Practitioners believe that many tasks that are discriminative in nature, including the state-of-the-art generative models, have diverged quite heavily from state-of-the-art discriminative architectures. 

Therefore the performance of generative models is usually far below the performance of the state of the art discriminative models. 

Though there have been efforts to make the generative models better by leveraging invertible architectures, these methods, however, still perform below par in comparison to their purely discriminative counterparts jointly trained as generative models. 

In a paper published last week, by a team at the University of Toronto, classifiers were stripped down to investigate hidden space within them. 

By naming the paper as, “YOUR CLASSIFIER IS SECRETLY AN ENERGY BASED MODEL AND YOU SHOULD TREAT IT LIKE ONE”, the authors have made their intentions clear of why there is a need for reimagining the way we do deep learning research.

This paper advocates the use of energy-based models (EBMs) to help realise the potential of generative models on downstream discriminative problems.

Overview Of Energy-Based Models

via DeepMind

Energy-based Models (EBMs) were first introduced in 2006 by Yann LeCun and his team. These models can be thought of as one way of making the model improve its predictive quality.

Recently, energy-based models came into light when the researchers at DeepMind used these models to explore memory association in machines.

The researchers introduced a novel approach that leverages meta-learning to enable fast storage of patterns into the weights using energy-based memory models. The goal here is to store the patterns as quickly as possible in its weights and then retrieve them from associative memory.

These energy-based models get their name from the energy function that they capitalise on. EBMs are formulated around a mathematical function containing rules and variables affecting the final prediction. This function, called the energy function is modelled by a neural network and the writing rule is implemented as a weight update, producing parameters from the initialisation. 

A stark contrast in the energy-based models can be observed in the way they go about classification tasks. For example, let’s say there is an image under consideration to be classified. On feeding this image to a convolutional neural network, the dependencies at the granular level are captured and are given some probability scores. 

Whereas, in the case of energy models, classification is done based solely on energy values. 

Re-imagining Classifiers

via paper by Will Grathwohl et al.,

In this model, the researchers make use of the extra degree of freedom hidden within the logits to define the density function over input examples as well as the joint density among examples and labels.

This work demonstrated how one can slightly re-interpret the logits and re-use them to define an energy-based model of the joint distribution of data points and labels.

To test the efficacy of the JEM, the authors trained the model on CIFAR10, SVHN, and CIFAR100 and compared it against other hybrid models as well as standalone generative and discriminative models. 

The results, announce the authors, have shown to perform near the state of the art in both tasks simultaneously, outperforming other hybrid models.

However, the authors admit their limitations to energy-based models. The gradient estimators that were used to train JEM are quite unstable and are prone to diverging if the sampling and optimisation parameters are not tuned correctly.

Key Takeaways

In this work the authors have:

  • Presented a joint energy model JEM, a novel reinterpretation of standard classifier architectures
  • Demonstrated that this model retains the strong performance of SOTA discriminative models while adding the benefits of generative modelling approaches. 
  • Demonstrated the utility of incorporating new type of training into discriminative models. 

While there exist many issues in training energy-based models EBMs, the authors hope the results presented here will encourage the community to improve upon current approaches.

Download our Mobile App

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring