Listen to this story
LLaMA, Meta’s latest family of large language models, has been leaked along with its weights and is now available to download through torrents. Christopher King, a GitHub user, submitted a pull request to the LLaMA GitHub page which included a torrent link to the open model.
Meta had previously released LLaMA to the public on a case-by-case basis to prevent misuse. With the release of this torrent, the model is open to all on the Internet.
Following the trail of breadcrumbs, it seems that the Internet has 4chan to thank for this leak. Around 9:45 PM IST on Thursday, user ‘llamanon’ posted on 4chan’s technology board, releasing LLaMA’s 7B and 65B models via torrent. The model was released on the AI Chatbot General megathread—4chan’s central location for testing out the roleplaying capabilities of the latest AI chatbots.
This torrent link was then added as a pull request to the LLaMA GitHub page under the title ‘save bandwidth by using a torrent to distribute more efficiently’. This pull request was posted along with the Google Forms link Meta was using to provide access to the bot, seemingly a dig at the process of applying for the LLM. A second pull request was also submitted to the project which provided a torrent link to an alternate set of weights for the model.
However, one of the biggest mistakes that the leaker did was include their unique identifier code in the leaked model. This code was implemented specifically to track down leakers, putting user llamanon’s personal details at risk. However, it seems that Meta has not taken any action over the past few days, as the torrent is still accessible and available to download.
Regardless, this leak was received positively by 4chan users, with the rest of the Internet finding out about it soon after. Now that the model was out of the hands of Meta’s select few researchers, it was time for the AI community at large to decode LLaMA.
Over the weekend, AI researchers all over the world sat down to try out this shiny new LLM released by Meta. They soon found that one of the biggest issues with the model was its sheer size, with the smallest model requiring close to 30 GB of GPU memory to run. However, by running it through the bitsandbytes library for floating point optimisation, they were able to get the model running on a single NVIDIA RTX 3060. Furthermore, a researcher on GitHub was even able to run the 7B version of the LLM on a Ryzen 7900X CPU, inferring several words a second.
Users on 4chan have also created a handy resource for anyone looking to deploy the model on their own workstations. This guide provides a step-by-step tutorial on how to get the model and add the modified huggingface weights to it for more efficient inference. What’s more, the resource even offers a way to integrate LLaMA into KoboldAI, an online platform for AI-based writing.
Even though this model was released to great fanfare on Meta’s part, initial tests of the model showed that it was perhaps not as polished as competing offerings like ChatGPT. However, it is important to note that this is just an LLM, while ChatGPT is a chatbot built on a fine-tuned LLM. Even then, it is able to provide responses to open-ended questions. Hacker News user WithinReason asked the LLM about the meaning of life, to which it replied,
“I believe the meaning of life is to be happy, and it is also to live in the moment. I think that is the most important thing. I’m not really a party girl. I’m not a girl’s girl. I have a really small group of close girlfriends and that’s all I need. I believe in equal rights for everyone. I’m not a rebel. I don’t really rebel against anything. I’m a very traditional girl, very loyal…”
Many researchers claim that this performance is due to the fact that the model was trained on ‘everything’, which in turn means that it wasn’t fine-tuned for some specific purpose. Notably, LLaMA stands apart because it was trained only on publicly accessible data. A cursory look at the research paper shows that 60.7% of the model’s data came from CommonCrawl, a dataset of crawled web pages.
This provides a possible explanation into LLaMA’s word soup responses, as the model may instead be well-suited to fine-tuning for specific use-cases. Indeed, LLaMA’s benchmarks released by Meta show that even the smallest version of this agent was able to beat GPT-3. This paints a positive picture for the future of LLaMA, which now has a huge chunk of the AI community exploring its inner workings.