Listen to this story
This was a fireside chat like no other as Ilya Sutskever, OpenAI‘s co-founder and chief scientist, and Jensen Huang, NVIDIA’s president, sat down to discuss their shared passion for AI at the NVIDIA GTC. As a mere spectator, it was impossible not to be drawn into their conversation — a discussion that felt like two long-time friends catching up on their new-found AI toys, alongside deep discussion around the limitations of human learning, limitations of GPT-4, and more.
But what made it even more remarkable was the eerie sensation that these two giants of the AI world were almost talking to each other like an AI chatbot, exchanging ideas and insights with a level of precision and fluidity that left the layman in awe.
Interestingly, the session was recorded a day after the official launch of GPT-4. The easy calm on Sutskever’s face was clearly visible throughout the conversation. Besides, his eye-grabbing, funky classic Princess Diana’s iconic sheep jersey t-shirt maintained a stark contrast to Huang’s signature ‘tough guy’ black leather jacket.
The Birth of OpenAI
From the conversation, it was clear that Sutskever is a fan of supervised learning. He considers it to be the best approach for artificial intelligence because large and deep neural networks are required to represent good solutions, and a big dataset with a lot of computing power is needed to find those solutions.
“Optimisation is a bottleneck, but breakthrough optimisation methods have been developed to train large neural networks,” he said. The ImageNet dataset is considered ideal for training large convolutional neural networks.
However, in the early days of machine learning, especially in unsupervised learning, OpenAI had two big ideas to work with. “The first big idea that we had, one which I was especially excited about very early on, is the idea of unsupervised learning through compression,” said the Russian-Canadian computer scientist.
Ilya explains that good compression of data can extract hidden secrets within it, which is the key to unsupervised learning. “Now neural nets, specifically GPT models, are now doing incredible things and are able to learn meaningful representations of language,” he added.
The second big idea was reinforcement learning. The conversation around how reinforcement learning is an important area of research at OpenAI, and mentions a major project they undertook to train a reinforcement learning agent to play a real-time strategy game called Dota 2, stood out.
But did you know what led to the birth of the viral chatbot ChatGPT? It is “RL on Dota morphed into RL from human feedback. That combination gave us ChatGPT,” said Sutsvekar.
ChatGPT Vs GPT-4
While ChatGPT, trained on GPT-3.5, could not clear many tests, including UPSC, GPT-4 has shown remarkable results when it came to the tests including SAT and GRE, among others. GPT-4, compared to its predecessor, GPT-3.5, has the ability to accurately predict the next word in text for better understanding and Sutskever goes a little offtrack to explain it with the analogy of reading a detective novel.
Awakening the detective instinct in Huang, he was quick to comment that deep learning is able to learn reasoning and asked about the limitations of GPT-4’s reasoning capabilities. “Neural networks have limitations in their reasoning abilities and have not fully reached their potential. Reasoning capabilities could be improved through continued development,” said Sutskever.
But the current problem that OpenAI is striving to solve is how to make the models more reliable and give factually correct information. ChatGPT is infamous for hallucinations just like its contemporaries Bard and Galactica. But when Huang questioned if GPT-4 comes with built-in retrieval capability, Sutskever said that though it sticks to retrieval capacity, it is a “really really good next word predictor” and comes with the new feature of multimodality, as was rightly predicted by AIM.
But why is there a hype around multimodality?
The first reason is that it is useful for a neural network to have vision because “humans are very visual creatures”. The second reason is that we learn more about the world by learning from images in addition to text. Adding vision to the learning process can teach additional things that are not captured in text, but it is not a binary exchange rate. Other sources of information become more important when learning from a limited number of words.
Throughout the exchange, there were jokes also but it felt like ChatGPT generated them as Sutsvekar agreed that the chatbot is a pro at jokes. It was almost as if the entire exchange had been carefully scripted to showcase the full range of GPT-4’s capabilities, leaving no stone unturned in its quest to impress.
If AI could talk, this is probably how it would be.