Listen to this story
Meta’s AI research team has garnered a positive reputation for open-sourcing their models — the latest being LLaMA, with the model’s weights being available for academicians and researchers on a case-by-case basis. However, one of these parties leaked the code on GitHub, giving programmers all over the world open access to their first GPT-level LLM.
The developer community has since had a field day with this model, optimising it to run on the lowest-powered devices, adding functionality to the model, and even using it to create some new use cases for LLMs. The open-source community is the biggest multiplier for AI research, and developers are the reason behind it.
Optimising the model
When LLaMA was launched, budding LLM enthusiasts found that it required more than 16GB of VRAM to run the 7 billion parameter version of the model. However, they quickly found ways to cut down on the required amount of memory for the model. The first step in optimising the model was a community project known as LLaMA.cpp, which rewrote the model in C++.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
This, along with a community effort to quantise the weights, allowed the model to run on a large range of hardware. A programmer was even able to run the 7B model on a Google Pixel 5, generating 1 token per second. llama.cpp was then ported to Rust, allowing for faster inference on CPUs, but the community was just getting started
Researchers at Stanford University created another model — a fine-tuned one based on LLaMA 7B. By using over 50,000 instruction-following demonstrations from GPT 3.5, the researchers were able to train LLaMA to give similar outputs to OpenAI’s model. What’s more, the model, called Alpaca, had a training and inferencing cost of only $600 — far lesser than the millions of dollars it takes to train these models.
Alpaca marked a democratisation of LLMs, bringing LLaMA to the masses. By bringing down the fine-tuning cost to a few hundred dollars and open sourcing the model, Alpaca put the power of LLMs in the hands of developers all over the world, leading them to add some functionality to this LLM.
After the model was open-sourced and researchers began to harness the power of Alpaca, programmers and developers began to see the use-cases of this LLM. While it started slowly, with a dev using Alpaca to create a Homer Simpson bot, the model soon began to see many useful applications.
User ‘LXE’ on GitHub created a simple WebUI that allowed anyone from the community to fine-tune the model using their own text. Similarly, user ‘Sahil280114’ also created a fine-tuned code generation model from Alpaca, termed CodeAlpaca. Llama Index, a project to connect LLMs with external data, also migrated from using GPT to using LLaMA due to its open-source nature.
Dalai was launched as an easy way to get both Alpaca and LLaMA running on any platform with just a command, further reducing the barrier to entry for LLMs. Another model, called GPT4All, was built upon the legacy of Alpaca. This was trained on around 800,000 GPT 3.5 generations, further increasing the power of LLaMA. The use cases just kept pouring in.
Colossal-AI created a ChatGPT alternative by training LLaMA with reinforcement learning with human feedback, and the community created Llamahub to keep up with all the ways one can connect to the model. The best part is that all of this occurred within 1 month of the model’s release, showing the true power of the open-source community.
Open-source does the work
Not only has the community built and improved the model released by Meta, it has also created a host of use cases — all from one LLM. While LLMs might be all the craze right now, they are only a single type of model in the vast AI landscape. Similar to LLaMA, another model that gained users, a community, and various offshoots was Stable Diffusion.
While the open-source rise of Stable Diffusion warrants an essay of its own, suffice it to say that the model is the go-to option for image generation for developers. One only needs to look at the number of forks that the Stable Diffusion GitHub page has — over 7,600 at the time of writing — to see the impact this diffusion model has had on the open-source community.
As models get bigger, it gets more expensive and difficult to train them, centralising the power of LLMs in big-tech companies like OpenAI, Microsoft, Google and Meta. With models being open-sourced, more power goes towards the communities building products around these powerful models, eventually laying the groundwork for a free AI world.