Uncensored Models Outperform Aligned Language Models

Do you really want a chatbot to not give out the information you want just to stay aligned?
ChatGPT Cut-off Date Now January 2022
Listen to this story

When we hear the word “uncensored” in AI, we think of models that can be harmful and possibly biassed. While that can be absolutely true, these uncensored models are increasingly outperforming their aligned counterparts, even if possibly harmful. 

According to the Open LLM Leaderboard on Hugging Face, uncensored models like Wizard-Vicuna-13B-Uncensored-HF, whose developer recently got into trouble for releasing the model to the public, have been competing with LLaMa and Falcon and is one of the top models. 

Over-finetuning a model may handicap its capabilities. This phenomenon is called the alignment tax of AI models. When a model goes to several benchmarks testing with humans in the loop trying to make the model as aligned and as “politically correct” as possible, it loses a lot of its performance. A lot of it is because of reinforcement learning with human feedback (RLHF). 

Alignment tax is the extra cost that an AI system has to pay to stay more aligned, at the cost of building an unaligned, or uncensored model. Which ultimately is also hindering its performance. 

Too much training

There is no doubt that OpenAI’s decision to use RLHF to train its GPT model gave birth to the much-hyped and loved ChatGPT. But even then, according to the GPT-4 paper, the model’s accuracy and factuality were much better and more confident before the researchers decided to use RLHF for fine-tuning.

The Sparks of AGI paper by Microsoft Research explains this GPT-4 phenomenon. The paper talks about how at the early development stage of the model, it performed way better than the final result after fine-tuning with RLHF. Even though the model is now more aligned and gives balanced answers, its capabilities to answer were much better before. 

In a presentation of the paper, Sebastien Bubeck, one of its lead authors, narrated the problems that occurred after GPT-4 was trained. He gave an example of the prompt, “draw the unicorn”, and explained how the quality of the output degraded significantly after the model was aligned for safety. 

A similar case was shared by a lot of Reddit users in a post. During the initial release days of ChatGPT, it used to provide much better results. But after people started to jailbreak ChatGPT, OpenAI put up more guardrails and restrictions in an attempt to address the issues, resulting in poorer responses over time. 

Meta AI recently released LIMA, a LLaMa 65B model where it compared pre-training a model on unsupervised raw data versus large-scale instruction tuning, i.e. RLHF based model. According to the paper, LIMA was able to outperform GPT-4 in 43% of use cases with just 1,000 carefully curated prompts. Though the model was not uncensored as much, it clearly shows that RLHF might be hindering ChatGPT’s performance.

Is poor performance a fair bargain for alignment?

As the world is heading towards more AI regulations, it is important for models to be more aligned with what the developers and the users want. To filter out the misinformation that these models have the capability to produce, it is necessary to have humans in the loop that could bring the hallucinating models back on track. 

These models are essentially built on internet data. Apart from a lot of necessary information, the data is also inadvertently scrapped from misinformation-creating websites. This results in the model giving out falsehoods, which in all senses should be controlled. On the other hand, do you really want your chatbot to not give out the information you want? Even though ChatGPT-like models do not spew out controversial or misleading content as designed by the developers, a lot of users have criticised the model for being too “woke”.

A paper titled, Scaling Laws for Reward Model Overoptimisation, explains how RLHF preferences induce bias into the models that hinder the ground truth performance of the models. Some people compare over-tuning a model with lobotomy of the brain. 

For and against

Since these uncensored models have been outperforming many other censored models, we can make the case that these models should be allowed and used to create chatbots. Uncensored models that do not filter out responses that the creators of the model would feel are not safe enough, might not be suitable for researchers and scientists who want to explore the field model.

But on the other hand, it comes with a lot of problems like misuse, harmful actors, and malicious intent AI models. While we can make a case for the open source community to be responsible enough to not misuse the unaligned models, there is no surety of any of it. 


Moreover, aligning models to represent a single viewpoint might not be the way forward in the future. If OpenAI’s chatbot is too woke, there should be an alternative available for developers to build their own versions of ChatGPT.

Imagine, if China releases a chatbot that is so aligned with the government’s beliefs that it is unable to operate openly or criticise anything in the country. Even if developers try to control the datasets and perform RLHF as much as they can, what is the possibility that such an AI model would be perfectly aligned and now spew out anything that the creators don’t want it to?

Download our Mobile App

Mohit Pandey
Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR