MITB Banner

Uncensored Models Outperform Aligned Language Models

Do you really want a chatbot to not give out the information you want just to stay aligned?

Share

ChatGPT Cut-off Date Now January 2022
Listen to this story

When we hear the word “uncensored” in AI, we think of models that can be harmful and possibly biassed. While that can be absolutely true, these uncensored models are increasingly outperforming their aligned counterparts, even if possibly harmful. 

According to the Open LLM Leaderboard on Hugging Face, uncensored models like Wizard-Vicuna-13B-Uncensored-HF, whose developer recently got into trouble for releasing the model to the public, have been competing with LLaMa and Falcon and is one of the top models. 

Over-finetuning a model may handicap its capabilities. This phenomenon is called the alignment tax of AI models. When a model goes to several benchmarks testing with humans in the loop trying to make the model as aligned and as “politically correct” as possible, it loses a lot of its performance. A lot of it is because of reinforcement learning with human feedback (RLHF). 

Alignment tax is the extra cost that an AI system has to pay to stay more aligned, at the cost of building an unaligned, or uncensored model. Which ultimately is also hindering its performance. 

Too much training

There is no doubt that OpenAI’s decision to use RLHF to train its GPT model gave birth to the much-hyped and loved ChatGPT. But even then, according to the GPT-4 paper, the model’s accuracy and factuality were much better and more confident before the researchers decided to use RLHF for fine-tuning.

The Sparks of AGI paper by Microsoft Research explains this GPT-4 phenomenon. The paper talks about how at the early development stage of the model, it performed way better than the final result after fine-tuning with RLHF. Even though the model is now more aligned and gives balanced answers, its capabilities to answer were much better before. 

In a presentation of the paper, Sebastien Bubeck, one of its lead authors, narrated the problems that occurred after GPT-4 was trained. He gave an example of the prompt, “draw the unicorn”, and explained how the quality of the output degraded significantly after the model was aligned for safety. 

A similar case was shared by a lot of Reddit users in a post. During the initial release days of ChatGPT, it used to provide much better results. But after people started to jailbreak ChatGPT, OpenAI put up more guardrails and restrictions in an attempt to address the issues, resulting in poorer responses over time. 

Meta AI recently released LIMA, a LLaMa 65B model where it compared pre-training a model on unsupervised raw data versus large-scale instruction tuning, i.e. RLHF based model. According to the paper, LIMA was able to outperform GPT-4 in 43% of use cases with just 1,000 carefully curated prompts. Though the model was not uncensored as much, it clearly shows that RLHF might be hindering ChatGPT’s performance.

Is poor performance a fair bargain for alignment?

As the world is heading towards more AI regulations, it is important for models to be more aligned with what the developers and the users want. To filter out the misinformation that these models have the capability to produce, it is necessary to have humans in the loop that could bring the hallucinating models back on track. 

These models are essentially built on internet data. Apart from a lot of necessary information, the data is also inadvertently scrapped from misinformation-creating websites. This results in the model giving out falsehoods, which in all senses should be controlled. On the other hand, do you really want your chatbot to not give out the information you want? Even though ChatGPT-like models do not spew out controversial or misleading content as designed by the developers, a lot of users have criticised the model for being too “woke”.

A paper titled, Scaling Laws for Reward Model Overoptimisation, explains how RLHF preferences induce bias into the models that hinder the ground truth performance of the models. Some people compare over-tuning a model with lobotomy of the brain. 

For and against

Since these uncensored models have been outperforming many other censored models, we can make the case that these models should be allowed and used to create chatbots. Uncensored models that do not filter out responses that the creators of the model would feel are not safe enough, might not be suitable for researchers and scientists who want to explore the field model.

But on the other hand, it comes with a lot of problems like misuse, harmful actors, and malicious intent AI models. While we can make a case for the open source community to be responsible enough to not misuse the unaligned models, there is no surety of any of it. 


Moreover, aligning models to represent a single viewpoint might not be the way forward in the future. If OpenAI’s chatbot is too woke, there should be an alternative available for developers to build their own versions of ChatGPT.

Imagine, if China releases a chatbot that is so aligned with the government’s beliefs that it is unable to operate openly or criticise anything in the country. Even if developers try to control the datasets and perform RLHF as much as they can, what is the possibility that such an AI model would be perfectly aligned and now spew out anything that the creators don’t want it to?

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.