MITB Banner

Sepp Hochreiter’s Quest to Kick OpenAI from Language Modelling Supermarket

“As a successor of LSTM. We have a new thing. It's not published, it's hidden. It's called XLSTM,” says the German computer scientist Sepp Hochreiter

Share

Listen to this story

Same as humans, AI models do not restart learning from scratch every second. Instead a certain type of neural network adds loops that interpret each new observation along with what has been previously observed.

In the AI field, LSTM (Long Short Term Memory) remarkably improved these networks, leading to a leap in accuracy. The model was developed by Dr. Sepp Hochreiter along with German scientist Juergen Schmidhuber in the late 90s. 

“As a successor of LSTM. We have a new thing. It’s not published, it’s hidden. It’s called XLSTM,” revealed Prof. Josef “Sepp” Hochreiter in an exclusive interview with AIM. The German computer scientist is currently the head of the machine learning Institute at Johannes Kepler University in Linz.

Hochreiter along with his team is feeding every transformer right now on smaller datasets combined with LSTMs. “We are so much better than GPT and want to kick OpenAI from the supermarket in autoregressive language modelling,” he said excitedly. 

From being just another start up in the Silicon Valley, the Sam Altman led OpenAI has gained fame since the release of its cash cow ChatGPT chatbot. According to Reuters, OpenAI is estimated to reach $1 billion in revenues by 2024, hence the company is being supported by the market. 

Transformer, Not (Convincing) Enough 

Before LSTM became an integral part of language models, its application in reinforcement learning was fascinatingly successful in Deepmind’s Starcraft 2 and OpenAI’s Dota 2

Hochreiter said, “What was more surprising was how good it is for language because it was not taught for language. It was time series prediction and sequence analysis.” Before the model became popular he also used it for protein sequences of DNA sequences.

The 55-year-old professor believes focusing on language is good because language already has abstractions as humans invented words, for objects we see in the real world. “These concepts, classes, and abstraction always come from humans and I’m looking forward to seeing AI invent their own concepts, description of servers and answer their own abstractions,” he added.  

Today, apart from being the model that makes Alexa, Siri, and Cortana so smart, LSTM is being used by government authorities across the world for predicting floods and droughts. Hochreiter says he is not convinced that transformer technology is applicable everywhere. “I think for some engineering tasks, LSTMs interact design with conventional architectures with a better sense of new things,” he opined. 

GPT Problems

The training data behind some of the largest, biggest language models remain a mystery. Hochreiter pointed out that some regulations are coming like LAION initiative to create datasets, without content which should not be used for training. “It’s a very complicated thing because different cultures would say a few different things as being appropriate or not appropriate. So that’s one problem,” he said. 

He further elaborated, “You’re not allowed to use certain books in training, OpenAI was naive because they used all the data, and perhaps the lawsuits are coming soon. The more data you have, the better your model later becomes, but you have to be careful what you file to summarise and select what content is allowed.”

The heap of accusations against the tech companies have been on a rise since the introduction of generative AI tools like Midjourney and ChatGPT. The latest addition to the list is American author and comedian Sarah Silverman

“In language models, what you put in, comes out at the other end. The first thing is to have some rules about what it can say,” said Hochreiter while the regulators worldwide grapple with the legal grey area AI is in. 

The Backstory

The young Hochreiter from Munich initially found computer science boring for him until he discovered neural networks. “Everything in computer science was known for 30 years but here, you can do new things, and it was fascinating, unexplored,” he said.

The pioneer of deep learning also discovered the vanishing gradient problem before he proposed LSTMs. “As I wrote my diploma thesis my supervisor moved to the States and was already a postdoc there. When he came back we had a lot to write down and then tried to publish it at NeurIPS in 1995 but got rejected.”

It became a NeurIPS paper in 1997. “Perhaps everybody has something like this,” said Hochreiter as getting a paper accepted can be difficult due to various problems with the current peer-review system. There are many notable papers that had a difficult time getting accepted but ended up significantly impacting the field like PageRank paper, Kalman filter paper and LSTM. 

Things changed around 2009 to 2011, where recurrent neural networks became popular again when a student of Schmidhuber, Alex Graves, worked with LSTM. Talking about the sudden popularity his work gained, Hochreiter said, 

“It worked out well and all IT giants from Google, Facebook, Meta to Amazon jumped on the bandwagon to use it. Looking back, nobody was interested. I was surprised that it became so popular because I knew from the beginning, it’s working. Also it became much more powerful because of the compute which grows on more data.”

While he develops a rival GPT today, Hochreiter is unsure whether he wants to keep the technology hidden or exploit it money wise as a company. “I published LSTM and didn’t get one cent,” he said. 

“I want to see what I can make out of it without publishing. It helps me keep something in Europe, a new technology, which is in the language modelling better sensitivity models, but who knows because I only am working on small datasets. I’ve not been saying much but it’s an LSTM with transformer ideas in it,” he concluded. 

Share
Picture of Tasmia Ansari

Tasmia Ansari

Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.