Listen to this story
Large language models have become the internet’s hot-favourite commodity. The trend torched by OpenAI’s ChatGPT is being taken forward by open-source models as the former refuses to share the details. Even though you cannot use them commercially, two models – Vicuna and Alpaca – released in March have managed to catch the AI community’s attention.
Meta has broken the mould and shown its dedication to the academic community by open-sourcing its latest model, LLaMA. The weights of the model are available to researchers upon request, setting the stage for the newest contenders in the AI realm. Stanford’s Alpaca and Vicuna-13B, which is a collaborative work of UC Berkeley, CMU, Stanford, and UC San Diego researchers, gained momentum soon after their release.
GitHub and Codes
Vicuna and Alpaca’s training codes are available for public use. Vicuna is trained on user-shared conversations consisting of 70k samples. In contrast, Alpaca leverages self-instruction from davinci-003 API, comprising 52k samples.
While this article was being written, Vicuna had 13.3k GitHub stars, while Alpaca had 20.2k stars. The repositories contain weights, fine tuning and data generation codes. The API is also available for Vicuna. Check out Vicuna and Alpaca’s GitHub repositories.
While releasing Vicuna, the researchers evaluated it using GPT-4 while Alpaca was evaluated by an author. However, evaluating AI chatbots is like trying to judge a fish on its ability to climb a tree. Many things need to be considered like language skills, reasoning and understanding of context. The models were evaluated on the basis of nine categories, ranging from common sense to maths.
As per GPT-4, Alpaca scored 7/10 and Vicuna-13B got a 10/10 in ‘writing’. Reason: Alpaca provided an overview of the travel blog post but did not actually compose the blog post as requested, hence a low score. On the other hand, Vicuna composed a detailed blog about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions, which fully addressed the user’s request, earning a higher score.
Despite their capabilities, both the models have their limitations. Vicuna is particularly vulnerable to training data contamination and may require to create new benchmarks for testing.
In comparison, Alpaca’s answers are typically shorter than ChatGPT, reflecting text-davinci-003’s shorter outputs. The model also exhibits common language models problems, including hallucination, toxicity, and stereotypes. Hallucination, in particular, seems to be a common failure mode for Alpaca, even when compared to text-davinci-003. For instance, Alpaca wrongly states that the capital of Tanzania is Dar es Salaam, which was the capital until 1974, when it was replaced by Dodoma. The researchers stated that Alpaca likely has other limitations associated with both the underlying language model and the instruction tuning data.