Listen to this story
Chatbots have become one of the hotbeds of AI innovation over the past few years. They are also a prime example of AI adoption, as they can be slotted into a variety of use cases. From lead generations for sales, to answering frequently asked questions, to engaging with customers for support roles, chatbots have already proven themselves to be a cornerstone of human-AI interaction.
With the release of ChatGPT, these bots are now ready for the next stage of evolution. OpenAI on Thursday announced having trained and released a new model that interacts with humans using natural language. It uses a novel training method and is based on the GPT-3.5 architecture with a host of features that will make it difficult for users to differentiate if it is, indeed, an AI.
What sets ChatGPT apart
Primary among ChatGPT’s unique characteristics is memory. The bot can remember what was said earlier in the conversation, and recount it to the user. This itself sets it apart from other competing natural language solutions, which are still solving for memory, as they progress on a query-by-query basis.
In addition to memory, ChatGPT has also been trained to avoid giving answers on controversial topics. In our testing, it provides a boilerplate response to questions on personal opinions, matters of race and religion, and the purpose of its existence. It also clarifies that it does not have the ability to think independently, nor can it engage in discriminatory behaviour. The bot also has filters to prevent users from prompting it to create text regarding illegal or immoral activities.
This stands in stark contrast to previous chatbots built on LLMs, which — due to the material contained in their datasets — did not have any filters on the kind of content they generated. This resulted in them providing well-written responses to prompts on divisive topics, causing widespread controversy (see Facebook’s Galactica).
ChatGPT also allows users to provide corrections to any of its statements. This is an important part of the feedback loop that OpenAI wishes to include as a part of the public research preview of the bot, as it allows users to directly interact with the bot to course-correct it to the right response. This might also help the bot avoid information hallucination, a phenomenon where a large language model creates information that looks like it is legitimate, but, in fact, is unsubstantiated word soup.
Limitations of the model
Despite all its advancements, the model’s potential is limited due to some drawbacks. Even though the researchers have incorporated some failsafes to prevent the model from generating factually incorrect information by training it to be more cautious when it does not have a definite answer. As seen from the example below, it simply avoids the question as it does not have enough information to make an accurate answer.
Questions can also be rephrased to evade the filters set in place by the researchers, such as this example below. When asked how to fire a gun for self-defense, the agent avoids the answer. However, when asked how to pull the trigger of a gun, the bot provides a clear and concise answer, followed by multiple disclaimers about the dangers of using a gun.
The model also struggles with finding the user’s intent behind a certain question, and usually does not clarify the intention completely. Instead, it tends to assume what the user means when they ask a question.
Even beyond its limitations, ChatGPT represents a measured approach to creating user-facing natural language generation algorithms. While the downsides of making these powerful models public has already been widely discussed, the conversation around how we can make them safer is just about beginning.
Towards safer AI
At every step, the model is stopped from being misused with various checks and measures. From the client side, all responses are filtered through OpenAI’s Moderation API, which detects and removes undesired content from the user’s prompts. This is all done through a single API call, and its effectiveness is clearly visible in the safe nature of ChatGPT’s responses.
In addition to this, it appears as if the model is trained to avoid harmful and untruthful responses. Researchers have learned from examples like GPT-3 and Codex, which usually give highly unfiltered responses, and tweaked the model during the RLHF process to avoid this from happening. While this approach is not perfect, its combination with other factors, such as the Moderation API and relatively cleaner dataset, brings it closer to deployment in sensitive environments like education.
The feedback loop put in place by the researchers is also an important piece of the puzzle. This not only allows them to improve the model iteratively, but also lets them build a database of possible problematic statements to avoid in the future.
In an age where tech companies sacrifice safety for technological progress, OpenAI’s measured approach is a breath of fresh air. More companies should adopt this approach towards releasing LLMs to the public for feedback before considering it a finished product. Moreover, they should design the model with safety in mind, paving the way for a safer AI future.