MITB Banner

The Truth Behind OpenAI’s Silence On GPT-4

In the process of making GPT-4 better than its predecessors, OpenAI might have bitten off more than it can chew

Share

OpenAI is Not “Open” AI
Listen to this story

In March, OpenAI launched GPT-4 with much fanfare, but a dark cloud loomed over the horizon. Scientists and AI enthusiasts alike panned the company for not releasing any specifics about the model, like the parameter size or architecture. However, a top AI researcher has speculated the inner workings of GPT-4 revealing why OpenAI chose to hide this information — and it’s disappointing. 

OpenAI CEO Sam Altman famously stated on GPT-4 that “people are begging to be disappointed, and they will be”, speaking about the potential size of the model. Rumour mills ahead of the model’s launch suggested that it would have trillions of parameters and be the best thing that the world has ever seen. However, the reality is different. In the process of making GPT-4 better than GPT-3.5, OpenAI might have bitten off more than it could possibly chew. 

8 GPTs in a trenchcoat

George Hotz, world-renowned hacker and software engineer, recently appeared on a podcast to speculate about the architectural nature of GPT-4. Hotz stated that the model might be a set of eight distinct models, each featuring 220 billion parameters. This speculation was later confirmed by Soumith Chintala, the co-founder of PyTorch. 

While this puts the parameter count of GPT-4 at 1.76 trillion, the notable part is that all of these models don’t work at the same time. Instead, they are deployed in a mixture of expert architecture. This architecture makes each model into different components, also known as expert models. Each of these models is fine-tuned for a specific purpose or field, and is able to provide better responses for that field. Then, all of the expert models work together with the complete model drawing on the collective intelligence of the expert models.

This approach has many benefits. One is that of more accurate responses due to models being fine-tuned on various subject matters. MoE architecture also lends itself to being easily updated as the maintainers of the model can improve it in a modular fashion, as opposed to updating a monolithic model. Hotz also speculated that the model may be relying on the process of iterative inference for better outputs. Through this process, the output, or inference result of the model, is refined through multiple iterations. 

This method also might allow GPT-4 to get inputs from each of its expert models, which could reduce the hallucinations in the model. Hotz stated that this process might be done 16 times, which would vastly increase the operating cost of the model. This approach has been likened to the old trope of three children in a trenchcoat masquerading as an adult. Many have likened GPT-4 to be 8 GPT-3s in a trench coat, trying to pull the wool over the world’s eyes. 

Cutting corners 

While GPT-4 aced benchmarks that GPT-3 has had difficulties with, the MoE architecture seems to have become a pain point for OpenAI. In a now-deleted interview, Altman admitted to the scaling issues OpenAI is facing, especially in terms of GPU shortages. 

Running inference 16 times on a model with MoE architecture is sure to increase cloud costs on a similar scale. When blown up to ChatGPT’s millions of users, it’s no surprise that even Azure’s supercomputer fell short of power. This seems to be one of the biggest problems that OpenAI is facing currently, with Altman stating that cheaper and faster GPT-4 is the company’s top priority as of now. 

This has also resulted in a reported degradation of quality in ChatGPT’s output. All over the Internet, users have reported that the quality of even ChatGPT Plus’ responses have gone down. We found a release note for ChatGPT that seems to confirm this, which stated, “We’ve updated performance of the ChatGPT model on our free plan in order to serve more users”. In the same note, OpenAI also informed users that Plus users would be defaulted to the “Turbo” variant of the model, which has been optimised for inference speed. 

API users, on the other hand, seem to have avoided this problem altogether. Reddit users have noticed that other products which use the OpenAI API provide better answers to their queries than even ChatGPT Plus. This might be because users of the OpenAI API are lower in volume when compared to ChatGPT users, resulting in OpenAI cutting costs at ChatGPT while ignoring the API. 

In a mad rush to get GPT-4 out to the market, it seems that OpenAI has cut corners. While the purported MoE model is a good step forward for making the GPT series more performant, the scaling issues that it is facing show that the company might just have bitten off more than it can chew. 

Share
Picture of Anirudh VK

Anirudh VK

I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.