Is Hopfield Networks All You Need? LSTM Co-Creator Sepp Hochreiter Weighs In

Introduced in the 1970s, Hopfield networks were popularised by John Hopfield in 1982. Hopfield networks, for the most part of machine learning history, have been sidelined due to their own shortcomings and introduction of superior architectures such as the Transformers (now used in BERT, etc.).

Co-creator of LSTMs, Sepp Hochreiter with a team of researchers, have revisited Hopfield networks and came up with surprising conclusions. In a paper titled, ‘Hopfield networks Is All You Need’, the authors introduce a couple of elements that make Hopfield networks interchangeable with the state-of-the-art transformer models.

What’s New About Hopfield Networks

Source: Hubert Ramsauer et al.

The above figure depicts the relation between binary modern Hopfield networks — the new Hopfield network has continuous states, a new update rule, and the transformer.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

The standard binary Hopfield network has an energy function that can be expressed as the sum of interaction functions F with F(x) = x^2. Modern Hopfield networks called “dense associative memory” (DAM) models use an energy function with interaction functions of form F(x) = x^n and, thereby, achieve a storage capacity proportional to d^(n−1).

The main contributions of the paper can be summarised as follows:

1| Introduction of a new energy function using the log-sum-exp function

2| The state ξ is updated by the following new update rule:

3| The new energy function offers the following,

  • Global convergence to a local minimum 
  • Exponential storage capacity
  • Convergence after one update step

In this work, the authors have also provided a new PyTorch layer called “Hopfield” which allows equipping deep learning architectures with modern Hopfield networks as new powerful concepts comprising pooling, memory, and attention.

Why Use Them At All

“The modern Hopfield network gives the same results as the SOTA Transformer.”

The modern Hopfield networks were put to use by Hochreiter and his colleagues to find patterns in the immune repertoire of an individual. Their network called DeepRC, implements, what the researchers call, ‘a transformer like a mechanism’, which is nothing but the modern Hopfield networks. 

The re-emergence of the once outdated Hopfield networks has created ripples within the ML community. 

In one of the popular forums, the enthusiasts queried why should anyone bother replacing the attention layer with that of Hopfield. “Am I correct that they are theoretically exactly the same operation, and there is no benefit to switching?”

To which, one of the authors of the original paper responded by saying that there is no reason to replace the transformer implementations with Hopfield layers. However, the Hopfield layer is more general. One can do multiple updates, can adjust the parameter, have static queries etc. Most importantly, the Hopfield interpretation allows one to gain new insights into the working of transformers, characterised by the kind of fixed points.

Moreover, the Hopfield layer can be integrated flexibly in arbitrary deep network architectures, which the author thinks can open up new possibilities.

Regarding the computational gains with Hopfield networks, the researcher wrote that the Hopfield layer could be seen as a stand-alone module. That said, if one wants to replace a pooling layer, then the Hopfield layer would require more compute compared to that of replacing an LSTM layer. 

Attention heads lie at the heart of successes such as BERT and other language models. And, to figure out that an almost obscure technique such as Hopfield network is now on par with state-of-the-art models is nothing short of a miracle. The researchers hope that this successful demonstration will encourage others to revisit the fundamental methods of hiding in plain sight.

Read the original paper here.

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.