Listen to this story
|
Large language models have become the internet’s hot-favourite commodity. The trend torched by OpenAI’s ChatGPT is being taken forward by open-source models as the former refuses to share the details. Even though you cannot use them commercially, two models – Vicuna and Alpaca – released in March have managed to catch the AI community’s attention.
Meta has broken the mould and shown its dedication to the academic community by open-sourcing its latest model, LLaMA. The weights of the model are available to researchers upon request, setting the stage for the newest contenders in the AI realm. Stanford’s Alpaca and Vicuna-13B, which is a collaborative work of UC Berkeley, CMU, Stanford, and UC San Diego researchers, gained momentum soon after their release.

GitHub and Codes
The best part about both models is that they are open-sourced. The worst is that the terms of use do not let the users commercialise it. These models have managed to make it to the headlines also due to their low price tags. The cost of training Vicuna’s 7B and 13B parameters is $140 and $300, respectively. On the other hand, Alpaca’s 7B parameters require $500 for data and $100 for training.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Vicuna and Alpaca’s training codes are available for public use. Vicuna is trained on user-shared conversations consisting of 70k samples. In contrast, Alpaca leverages self-instruction from davinci-003 API, comprising 52k samples.
While this article was being written, Vicuna had 13.3k GitHub stars, while Alpaca had 20.2k stars. The repositories contain weights, fine tuning and data generation codes. The API is also available for Vicuna. Check out Vicuna and Alpaca’s GitHub repositories.
GPT-4 thinks…
While releasing Vicuna, the researchers evaluated it using GPT-4 while Alpaca was evaluated by an author. However, evaluating AI chatbots is like trying to judge a fish on its ability to climb a tree. Many things need to be considered like language skills, reasoning and understanding of context. The models were evaluated on the basis of nine categories, ranging from common sense to maths.
As per GPT-4, Alpaca scored 7/10 and Vicuna-13B got a 10/10 in ‘writing’. Reason: Alpaca provided an overview of the travel blog post but did not actually compose the blog post as requested, hence a low score. On the other hand, Vicuna composed a detailed blog about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions, which fully addressed the user’s request, earning a higher score.
Despite their capabilities, both the models have their limitations. Vicuna is particularly vulnerable to training data contamination and may require to create new benchmarks for testing.
In comparison, Alpaca’s answers are typically shorter than ChatGPT, reflecting text-davinci-003’s shorter outputs. The model also exhibits common language models problems, including hallucination, toxicity, and stereotypes. Hallucination, in particular, seems to be a common failure mode for Alpaca, even when compared to text-davinci-003. For instance, Alpaca wrongly states that the capital of Tanzania is Dar es Salaam, which was the capital until 1974, when it was replaced by Dodoma. The researchers stated that Alpaca likely has other limitations associated with both the underlying language model and the instruction tuning data.
In conclusion, while both Vicuna and Alpaca have their strengths and limitations, it is essential to evaluate which model aligns with a particular project’s requirements. Vicuna’s user-shared conversations and GPT4 assessment are advantageous, while Alpaca’s self-instruction from davinci-003 API is a unique feature. While the terms of use may restrict commercialisation, the open-source nature of Vicuna and Alpaca is valuable.