Interview With Juergen Schmidhuber

There are close to 3.5 billion smartphone users in the world, and if you happen to be one of them, then the chances are high that Juergen Schmidhuber has already touched your life. 

Be it Apple’s Siri or Amazon’s Alexa, all the top speech and voice assistants work on LSTM or Long Short Term Memory. Though these digital assistants and other applications of LSTMs are fairly modern, their conception, however, dates back to the early 1990s. Introduced by Juergen Schmidhuber and his doctoral student Sepp Hochreiter, these networks would become one of the most profitable applications in the history of deep learning. However, the mastermind behind innovations such as LSTMs, Dr.Schmidhuber largely remains unknown outside the research circles unlike his contemporaries such as Yann Lecun, Andrew Ng and Geoff Hinton.

Analytics India Magazine got in touch with Dr.Schmidhuber for an interview, to present its readers with the story of a man who dreams of building intelligent machines that would one day surpass the intellectual might of his childhood role model Albert Einstein.

1990-91: The Year Of Miracles

After a four year study of computer science and mathematics at TU Munich, Dr. Schmidhuber went on to work on his PhD (1988-91) thesis titled,`Dynamic Neural Nets and the Fundamental Spatio-Temporal Credit Assignment Problem’. With the end of his doctoral duties at TU Munich in sight, Dr.Schmidhuber and his peers produced a flurry of papers that would eventually change the course of AI research.

Here is a list of few of those works:

  • First Deep Learner, Based on Unsupervised Pre-Training (1991)
  • The Fundamental Deep Learning Problem (Vanishing / Exploding Gradients, 1991)
  • Long Short-Term Memory: Supervised Very Deep Learning (basic insights since 1991)
  • Artificial Curiosity Through Adversarial Generative NNs (1990)
  • Adversarial Networks for Unsupervised Data Modeling (1991)
  • Learning Sequential Attention with NNs (1990)

Majority of the aforementioned works took decades to get noticed and few are still being leveraged for weaving new ideas. For instance, let’s take a look at LSTMs. They were introduced in 1990 to solve the vanishing and exploding gradient problem in recurrent neural networks(RNNS). LSTMs offered the human-like thinking capability of holding on to relevant information through a mechanism called as gates. It was an ingenious idea. However, the reception to LSTM had been lukewarm. 

But today, they are on almost every smartphone!

From  Facebook’s automatic translation (4 billion translations per day) to Google’s speech recognition on 2 billion Android phones to Amazon’s Alexa, there has been a tremendous increase in the usage of LSTM with the improving hardware capabilities.

Citations profile via Google Scholar

Whereas, his work on artificial curiosity through adversarial generative neural networks has many similarities in the widely popular generative adversarial networks (GANs). These adversarial networks, though popularised by the work done by Ian Goodfellow, are considered to be a re-invention or an interesting application of Dr.Schmidhuber’s work in 1990. 

As can be seen in the picture above, Dr.Schmidhuber’s work still continues to be one of the highest cited works in the AI community and his determination to find super-intelligent machines still going strong.

Nowadays Dr. Schmidhuber spends most of his time taking classes at the university while attending to his duties as the Director of The Swiss AI Lab IDSIA. He along with four others have also founded a deep learning company, which goes by the name NNAISENSE, based out of Lugano, Switzerland. 

Re-NNAISENSE Of Machines

Dr.Schmidhuber(far right) along with NNAISENSE co-founders

Based out of Lugano, NNAISENSE (pronounced “nascence”) was founded by Dr.Schmidhuber, Faustino Gomez, Jan Koutnik, Jonathan Masci and Bas Steunebrink in 2014. This 30 members deep learning company has many top projects lined up including projects from German carmaker Audi as well. 

Here are a few of NNAISENSE’s interesting accomplishments:

  • It had the first very deep neural nets with hundreds of layers. NNAISENSE founders received numerous awards including the 2016 IEEE Neural Networks Pioneer Award “for pioneering contributions to deep learning and neural networks.” 
  • NNAISENSE won the “Learning to Run” competition at the prestigious NIPS conference in  2017, beating over 400 competitors from industry and academia. 
  • It was also the first company to demonstrate physical real-world control through deep Reinforcement Learning: a project with AUDI, where highly novel model cars learned to park without a teacher.
  • The team at NNAISENSE built the brain that learns to control the recent sophisticated pneumatic FESTO robot hand shown at Hannover Messe 2019.

Currently, the team is working with Sulzer-Schmid, leader of (drone-based) wind turbine inspection along with other ambitious projects such that includes additive manufacturing and other undisclosed ones.

Dr.Schmidhuber’s motto since the 1970s has been to build an AI smarter than him so that he can retire

The team at NNAISENSE believes that they can go far beyond what’s possible today, and pull off the big practical breakthrough that will change everything.

At NNAISENSE, these handfuls of AI masterminds work tirelessly towards achieving a singular goal, which also happens to be Dr. Schmidhuber’s childhood dream—to match or even surpass human-level intelligence in machines.

Most Essential Anti Hero Of Our Time

Dr.Schmidhuber at his labs

While the New York Times called him ‘Dad’, Bloomberg glorified him as godfather. However, for Dr.Schmidhuber, who is not a huge fan of these idolatry extravaganzas, Alexey Ivakhnenko is the true father of deep learning. Ivakhnenko was a Soviet mathematician whose work on deep networks dates back to 1965, but like Dr.Schmidhuber, Ivakhnenko largely remains unrecognised.

Dr.Schmidhuber firmly believes that the machine learning community can only gain from proper credit assignment to its members. 

“The inventor of an important method should get credit for inventing it. If you “re-invent” something that was already known, and only later become aware of this, you must at least make it clear later.”


Over the past decade, the AI community has increasingly become divided over Dr.Schmidhuber. While one group tries to paint Dr.Schmidhuber as an egomaniac who tries to bite more than he can chew, the rest are quite certain that Dr.Schmidhuber has been terribly ignored and it is high time he gets the credit.

So we got straight to the point and asked him to comment on the Turing award snubs. We even tested our luck by bringing to his notice, the conception of a Schmidhuber award, which is popular on forums such as Reddit. 

Like a gentleman he is, Dr.Schmidhuber was graceful enough to accept the enthusiasm amongst his fans. “Generally speaking, I greatly appreciate the support from the machine learning community,” said Dr.Schmidhuber acknowledging the efforts of individuals on Reddit.

I am extremely grateful to my students and postdocs who have made all of this possible.”


He credits his students and postdocs for all the accolades he has amassed over the years, which is so unlike what his adversaries blame him for!

Talking about adversaries in the context of deep learning, one can’t help but think about the sudden rise in popularity of a phenomenon that is Generative Adversarial Networks or GANs.

Ian Goodfellow, who had been enjoying the success as the poster child of GANs, was put to the test by Dr.Schmidhuber at the prestigious NIPS 2016 conference. Goodfellow’s presentation was interrupted by Dr.Schmidhuber and the 21st-century research circles were given a taste of some insubordination, which was missing for over a half a century. 

However, this gimmick only resulted in divorcing the masses further away from Dr.Schmidhuber. The advancement of any research field relies on the openness to criticism and though Dr.Schmidhuber’s objections are mostly about the lack of recognition to the real pioneers, his whole old school Swiss demeanour might have rubbed off a few researchers in the wrong way. 

Relatively young research areas such as machine learning should adopt the honor code of mature fields such as mathematics.”


The big 3 of the modern AI scene, Yann Lecun, Yoshua Bengio and Geoffrey Hinton, have been at the forefront of every major AI breakthrough news that has come over in the past decade. In 2018, they were even awarded the prestigious Turing Award for their contributions made to deep neural networks. However, the trio, LBH(Lecun-Bengio-Hinton), were accused of circular citations by Dr.Schmidhuber in his 2015. In a post titled, “Deep Learning Conspiracy”, he explains in detail how LBH have been ignorant of the original inventors.

Photo by Andreas Gerbert

In this critique, he laments how misleading it is to cite Hinton’s 2012 paper in the context of convolutional neural networks. LBH, wrote Dr.Schmidhuber, mention pooling, but not its pioneer (Weng, 1992), who replaced Fukushima’s (1979) spatial averaging by max-pooling, today widely used by many, including LBH, who write: “ConvNets were largely forsaken by the mainstream computer-vision and machine-learning communities until the ImageNet competition in 2012,” citing Hinton’s 2012 paper (Krizhevsky et al., 2012). He considers this to be very misleading.

“LBH may be backed by the best PR machines of the Western world (Google hired Hinton; Facebook hired LeCun). However, historic scientific facts will be stronger than any PR.”

Dr.Schmidhuber, “Deep Learning Conspiracy” (Nature 521 p 436)

Though the contributions of Lecun, Bengio and Hinton to deep learning cannot be disputed, they are accused of inflating a citation bubble. Dr.Schmidhuber has been vociferous about the ignorance of the original inventors in the AI community. He believes that there is a long tradition of insights into deep learning, and the community as a whole will only benefit from appreciating the historical foundations.

The idolization of an individual is a curse to any scientific community but so is ignorance of the prime movers. If one is even slightly familiar with the celebrities of AI, they would have come across epithets such as ‘godfather of AI’ or ‘GANfather’, which are thrown around a lot.  

Dr.Schmidhuber too is not alien to these superfluous titles, and when we insisted on knowing if there is anything novel about Goodfellow’s work on GANs at all, he responded by saying that GANs are an interesting application of my adversarial curiosity principle published in 1990. “One network probabilistically generates outputs, another network sees those outputs and predicts environmental reactions to them,” explained Dr.Schmidhuber.

Using gradient descent, he continued,  the predictor network minimizes its error, while the generator network tries to make outputs that maximize this error. One net’s loss is the other net’s gain. GANs are a special case of this where the environment simply returns 1 or 0 depending on whether the generator’s output is in a given set.

One should not be allowed to patent deep learning.”


Bringing the recent rush towards patenting AI to his attention, we asked Dr.Schmidhuber to comment on it, especially about Goodfellow’s patenting of adversarial training, to which he quipped, “I am not a patent attorney. However, one should not be allowed to patent uncited prior art published by others.” 

When asked if any current researchers have impressed him, Dr.Schmidhuber didn’t take any names as he believes there are many impressive researchers out there.

However, he does admit that DeepMind is doing impressive work as they beat a pro player in the game of Starcraft using Alphastar in 2019. At the heart of this success, reminds Dr.Schmidhuber, sits an LSTM network, which was also part of OpenAI Five which defeated human experts in the Dota 2!

Goedel Machines, AGI And The Future 

from Stanley Kubrick’s 2001: A Space Odyssey

Back in 2003, Dr.Schmidhuber proposed the Gödel machine, in an attempt to make a case for superintelligence in machines. A Gödel machine is capable of rewriting its own code as soon as it has found proof that the rewrite is useful. 

This in nutshell, is the definition of artificial general intelligence; to make the machines to learn how to learn. 

Ever since he was a teenager, Dr.Schmidhuber dreamt of building a self-improving AI that is smarter than him. Today, after three decades since the introduction of LSTM, he is still working towards achieving artificial general intelligence with the help of his team at NNAISENSE. 

In this decade, observes Dr.Schmidhuber, Active AI will invade the real world, driving industrial processes and machines and robots.

Although the real world is much more complex than virtual worlds and less forgiving, Dr.Schmidhuber predicts that the coming wave of “Real World AI” or simply “Real AI” will be much bigger than the previous AI wave, because it will affect all of the production. 

In the not too distant future, the creation of the “show-and-tell robotics” or “watch-and-learn robotics” or “see-and-do robotics”, as Dr.Schmidhuber likes to call them, will allow us to quickly teach a neural network to control a complex robot with many degrees of freedom to execute complex tasks, such as assembling a smartphone, solely by visual demonstration, and by talking to it, without touching or otherwise directly guiding the robot – a bit like we’d teach a kid. 

In a talk he gave at, CogX event in London, Dr.Schmidhuber enthused the audience about how AI will eventually emigrate into other galaxies in a few thousand years. He believes that there are places out there in deep space that are richer in resources than earth and can facilitate self-replicating robots that would eventually build factories and so on so forth. 

“The delays between successive radical breakthroughs in computer science decrease exponentially: each new one comes roughly twice as fast as the previous one.”

Schmidhuber’s law

Dr.Schmidhuber firmly believes that human civilisation is on the verge of witnessing something spectacular and he urges us to take pride in being fortunate to be contributing to the birth of a new kind of being; a new civilization.

It took from more than 13 billion years for the emergence of primitive forms of intelligence on earth after the Big Bang. But it took only a fraction of that time for humans to rule the earth and even less time to create AI. At this pace, we can safely assume that the next 100 years is going to be quite eventful. And, with the preparations for colonization of Mars and innovations like Neuralink around the corner, Dr.Schmidhuber might finally witness his childhood dream realised.

“Don’t think of humans as the crown of creation. Instead view human civilization as part of a much grander scheme, an important step on the path of the universe towards higher complexity.”

Dr.Schmidhuber for Scientific American, November 2017

As the 21st-century gears up to the advent of Artificial Super Intelligence, it is almost inevitable that the next generation students will soon have the fundamentals of AI included in their school curriculums. 

“Machine learning will seem trivial to you,” promises Dr.Schmidhuber, “if you learn basic math, including linear algebra, calculus, statistics and then learn the basics of theoretical computer science. And maybe also the basics of physics.”

Also Watch: A Quick Guide TO LSTMs

Download our Mobile App

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox