Is Hopfield Networks All You Need? LSTM Co-Creator Sepp Hochreiter Weighs In

Introduced in the 1970s, Hopfield networks were popularised by John Hopfield in 1982. Hopfield networks, for the most part of machine learning history, have been sidelined due to their own shortcomings and introduction of superior architectures such as the Transformers (now used in BERT, etc.).

Co-creator of LSTMs, Sepp Hochreiter with a team of researchers, have revisited Hopfield networks and came up with surprising conclusions. In a paper titled, ‘Hopfield networks Is All You Need’, the authors introduce a couple of elements that make Hopfield networks interchangeable with the state-of-the-art transformer models.

What’s New About Hopfield Networks

Source: Hubert Ramsauer et al.

The above figure depicts the relation between binary modern Hopfield networks — the new Hopfield network has continuous states, a new update rule, and the transformer.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

The standard binary Hopfield network has an energy function that can be expressed as the sum of interaction functions F with F(x) = x^2. Modern Hopfield networks called “dense associative memory” (DAM) models use an energy function with interaction functions of form F(x) = x^n and, thereby, achieve a storage capacity proportional to d^(n−1).

The main contributions of the paper can be summarised as follows:


Download our Mobile App



1| Introduction of a new energy function using the log-sum-exp function

2| The state ξ is updated by the following new update rule:

3| The new energy function offers the following,

  • Global convergence to a local minimum 
  • Exponential storage capacity
  • Convergence after one update step

In this work, the authors have also provided a new PyTorch layer called “Hopfield” which allows equipping deep learning architectures with modern Hopfield networks as new powerful concepts comprising pooling, memory, and attention.

Why Use Them At All

“The modern Hopfield network gives the same results as the SOTA Transformer.”

The modern Hopfield networks were put to use by Hochreiter and his colleagues to find patterns in the immune repertoire of an individual. Their network called DeepRC, implements, what the researchers call, ‘a transformer like a mechanism’, which is nothing but the modern Hopfield networks. 

The re-emergence of the once outdated Hopfield networks has created ripples within the ML community. 

In one of the popular forums, the enthusiasts queried why should anyone bother replacing the attention layer with that of Hopfield. “Am I correct that they are theoretically exactly the same operation, and there is no benefit to switching?”

To which, one of the authors of the original paper responded by saying that there is no reason to replace the transformer implementations with Hopfield layers. However, the Hopfield layer is more general. One can do multiple updates, can adjust the parameter, have static queries etc. Most importantly, the Hopfield interpretation allows one to gain new insights into the working of transformers, characterised by the kind of fixed points.

Moreover, the Hopfield layer can be integrated flexibly in arbitrary deep network architectures, which the author thinks can open up new possibilities.

Regarding the computational gains with Hopfield networks, the researcher wrote that the Hopfield layer could be seen as a stand-alone module. That said, if one wants to replace a pooling layer, then the Hopfield layer would require more compute compared to that of replacing an LSTM layer. 

Attention heads lie at the heart of successes such as BERT and other language models. And, to figure out that an almost obscure technique such as Hopfield network is now on par with state-of-the-art models is nothing short of a miracle. The researchers hope that this successful demonstration will encourage others to revisit the fundamental methods of hiding in plain sight.

Read the original paper here.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: From Promise to Peril: The Pros and Cons of Generative AI

Most people associate ‘Generative AI’ with some type of end-of-the-world scenario. In actuality, generative AI exists to facilitate your work rather than to replace it. Its applications are showing up more frequently in daily life. There is probably a method to incorporate generative AI into your work, regardless of whether you operate as a marketer, programmer, designer, or business owner.