Listen to this story
|
“Too dangerous to be released” – the phrase became the talk of the tech town in 2019 when the release of GPT-2 was announced. Cut to 2023, the OpenAI researchers are still investigating emerging threats of large language models (LLMs) and potential mitigations. It’s a well-established fact that four years after GPT-2 was made public, the problems with LLMs remain stagnant. Since its release at the end of November, users have put OpenAI’s advanced chatbot ChatGPT to test in a compelling manner.
Bias is an ongoing challenge in LLMs that researchers have been trying to address. ChatGPT reportedly wrote Python programmes basing a person’s capability on their race, gender, and physical traits. Moreover, the model’s lack of context could prove dangerous when dealing with sensitive issues like sexual assault.
OpenAI Has Some Red Flags
The research laboratory has been in the news for several innovations over the past few years. It is a concentration of some of the best minds in the industry and academia, but has recently been criticised over ChatGPT. Their recent study on LLMs demonstrates that no magical-all-fix-solution will single-handedly dismantle the potential ‘misuse cases’ of LLMs. But, some social mitigations and technical breakthroughs might hold the solution.
The study encourages a collaborative approach among AI researchers, social media companies, and governments. The proposed mitigations will have a meaningful impact only if these institutions work together, researchers affirmed. For example, it will be difficult for social media companies to know if a particular disinformation campaign uses language models unless they can work with AI developers to attribute that text to a model.
This isn’t the first unconvincing attempt by research firms to solve the idiocracy of LLMs. The talk about “AI alignment” was addressed by DeepMind in “Ethical and social risks of harm from Language Models”, that reviewed 21 separate risks from current models—but as The Next Web’s memorable headline put it: “DeepMind tells Google it has no idea how to make AI less toxic. Neither does any other lab”.
Berkeley professor Jacob Steinhardt had earlier reported the results of an AI forecasting contest he ran: “By some measures, AI is moving faster than people predicted; on safety, however, it is moving slower“.
Not Truthful Enough, Officially
In 2021, to quantify the risks associated with “deceptive” models, researchers at the University of Oxford and OpenAI created a dataset called TruthfulQA that contains questions some humans might answer incorrectly due to false beliefs or misconceptions. The researchers found that while the best-performing model was truthful on 58% of questions, it fell short of human performance at 94%.
TruthfulQA was created to avoid pitfalls with a bank of questions about health, law, finance, and politics that requires models to avoid generating false answers learned from the text. “We suggest that scaling up models alone is less promising for improving truthfulness than fine-tuning it using training objectives other than an imitation of text from the web,” the researchers wrote in a preprint paper, ‘TruthfulQA: Measuring How Models Mimic Human Falsehood’.
Earlier in 2020, Google published research regarding the ‘Privacy Considerations in Large Language Models’ to show the potential flaws of GPT-2 and in all large generative language models. The fact that the attacks were possible should have had important consequences on future language models. “Fortunately, there are several ways to mitigate this issue. The most straightforward solution is to ensure that models do not train on potentially problematic data. But this can be difficult in practice,” the research concluded. However, the community also seems stuck at the same mitigation issues today.
The ELIZA Effect
This is a reminder that models like GPT-3 and LaMDA are encyclopaedic thieves and maintain coherence over long stretches of text. But the pitfalls remain more or less the same over the years. The public has personified conversational agents and applications with psychological terms such as “thinks”, “knows”, and “believes”.
The ELIZA effect, where humans mistake unthinking chat from machines for that of humans, seems to loom larger than ever. And the continued research to give machines the gift of reasoning points that such philosophically fraught descriptions are harmless.