Listen to this story
At Microsoft’s Inspire event, Meta and Microsoft launched Llama 2, the latest version of their renowned open-source LLM, LLaMA. It comes with various improvements to enhance its performance and safety. Notably, it introduces the 7B, 13B, and 70B pre-trained and fine-tuned parameter models, offering a substantial increase in pre-trained data and leveraging GQA for better inference capabilities.
The release of Llama 2 is available for both research and commercial use, accessible on platforms like Microsoft Azure and Amazon SageMaker. It is also compatible with Windows platforms such as Subsystem for Linux (WSL), Windows Terminal, Microsoft Visual Studio, and VS Code.
The model has undergone meticulous optimisation for dialogue purposes, resulting in fine-tuned Llama 2-Chat models that set new benchmarks in the field of language processing and understanding. The collaboration between Meta and various other companies, including Amazon, Hugging Face, NVIDIA, Qualcomm, IBM, Zoom, Dropbox, and academic leaders, emphasises the importance of open-source software.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Here are some models that are already based on Llama-2 and can be used to access the latest Meta offering.
Perplexity.ai is a unique chatbot platform that takes on a search engine-like approach. It scours the internet to find answers to user queries and provides the sources for the responses it generates. The platform has its own LLaMA chatbot available at llama.perplexity.ai, where users can toggle between the 13-billion-parameter model and the 7-billion-parameter model to compare the results.
Impressively, Perplexity swiftly released a new chatbot utilising Meta’s latest Llama 2 AI model within 24 hours of its introduction as an open-source large language model.
Here’s the link to LLaMA by Perplexity: https://labs.perplexity.ai/
The LLaMA Chat, built on Llama 2, is currently in an experimental phase and is exclusively accessible via http://labs.pplx.ai. However, it is not available on their mobile apps at the moment.
One of the remarkable features of Perplexity is its generous offering to users. They provide the Llama 2 models with 70 billion, 13 billion, and 7 billion parameters for free, allowing users to experiment and leverage the power of these large language models. Furthermore, the chatbot has a maximum token length of 4096, which enables it to handle more extensive and complex inputs from users. This ensures that the chatbot is capable of providing detailed and informative responses.
Overall, Perplexity.ai presents a novel approach to chatbots by incorporating search engine capabilities, and its fast adoption of Meta’s Llama 2 AI model showcases its commitment to providing cutting-edge technology and free access to users for experimentation.
Andrej Karpathy, former director of AI at Tesla, took on the ambitious task of implementing the Llama 2 architecture in the C programming language, moving away from the commonly used GPT-2. The primary objective was to demonstrate the possibility of running complex language models on resource-constrained devices through a minimalistic C implementation. Surprisingly, the model achieved impressive inference rates, even on devices with limited computational resources.
Here’s the GitHub link: https://github.com/karpathy/llama2.c
To accomplish this, Karpathy utilised the NanoGPT model as a starting point and developed the Llama 2 model with approximately 15 million parameters. Remarkably, the C implementation of this model achieved an inference speed of around 100 tokens per second on an M1 MacBook Air, showcasing the feasibility of running sophisticated models on devices without requiring powerful GPUs.
The Baby Llama approach involved training the Llama 2 LLM architecture from scratch using PyTorch. Subsequently, Karpathy wrote a concise C code, titled “run.c”, specifically for performing inferences. The emphasis was on maintaining a low-memory footprint and avoiding the need for external libraries. This efficient approach allowed the model to be executed efficiently on a single M1 laptop without relying on GPU acceleration. Karpathy also explored the use of various compilation flags to further optimise the C code for better performance.
This highlights the tremendous potential of leveraging C code to run sophisticated language models on resource-constrained devices, a domain not traditionally associated with machine learning applications.
Poe, a chatbot platform, has recently added support for several Llama 2 models, including Llama-2-70b, Llama-2-13b, and Llama-2-7b. Among these, Poe recommends Llama-2-70b as it delivers the highest quality responses. The platform boasts unique features that make it stand out from others. Poe is possibly the only consumer product allowing users to employ Llama on native iOS or Android apps, upload and share files, and continue conversations seamlessly.
Unlike other chatbot platforms like ChatGPT or Google Bard, Poe does not create its own language models. Instead, it provides users with access to various pre-existing models. Some of Poe’s official bots include Llama 2, Google PaLM 2, GPT-4, GPT-3.5 Turbo, Claude 1.3, and Claude 2. Additionally, Poe offers an assistant bot as the default one, which is based on GPT-3.5 Turbo. Users can also create their own third-party bots with built-in prompts to fulfil specific tasks.
WizardLM models are trained on Llama-2 using brand-new Evol+ methods. The WizardLM-13B-V1.2 achieves impressive results, with a score of 7.06 on MT-Bench, 89.17% on Alpaca Eval, and 101.4% on WizardLM Eval. These models support a 4k context window and are licensed under the same terms as Llama-2. The core contributors are currently working on the 65B version and plan to empower WizardLM with the capability to perform instruction evolution autonomously, making it cost-effective for adapting to specific data.
Additionally, they have released WizardCoder-15B-V1.0, which outperforms other models on HumanEval Benchmarks. The WizardLM-13B-V1.0 model has also achieved the top rank among open source models on the AlpacaEval Leaderboard. The performance comparison reveals that WizardLMs consistently excel over LLaMA models of the same size, particularly evident in NLP foundation and code generation tasks. The WizardLM-30B model shows better results than Guanaco-65B.
Overall, WizardLM represents a significant advancement in large language models, particularly in following complex instructions and achieving impressive performance across various tasks.
Stable Beluga 2
Stable Beluga 2 is an open-access LLM based on the LLaMA 2 70B foundation model. It showcases remarkable reasoning capabilities across various benchmarks. The model is fine-tuned using a synthetically generated dataset in standard Alpaca format, employing the Supervised Fine-Tune (SFT) approach. Its performance even compares favourably with GPT-3.5 on certain tasks. Researchers attribute the high performance to the rigorous synthetic data training approach, making Stable Beluga 2 a significant milestone in the field of open-access LLMs.
Stable Beluga 2 is based on Llama 2 70B and fine-tuned on an Orca-style dataset. The model’s usage involves starting conversations using provided code snippets. The training dataset for Stable Beluga 2 is an internal Orca-style dataset. It is trained through supervised fine-tuning datasets using mixed-precision (BF16) and optimised with AdamW. Detailed hyperparameters are outlined for the training procedure.
‘Luna AI Llama2 Uncensored’ is an advanced chat model based on Llama 2, which underwent fine-tuning using more than 40,000 lengthy chat discussions. Tap, the creator of Luna AI, led the fine-tuning process, resulting in an improved Llama 2 7b model that competes with ChatGPT in various tasks effectively.
Here’s the link to LunAI: https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGML
What sets this model apart is its extended responses, meaning it can generate detailed and comprehensive answers, its low hallucination rate, indicating it produces fewer fictitious or incorrect information, and its absence of censorship mechanisms, guaranteeing open and unrestricted communication.
For the model training, an exceptionally powerful 8x a100 80GB machine was employed to conduct the fine-tuning process. The model was predominantly trained on synthetic outputs, which means the training data was generated rather than solely collected from existing human conversations. This custom dataset was thoughtfully curated from diverse sources and comprised multiple rounds of conversations between humans and AI.
Redmond-Puffin-13B is a pioneering language model based on Llama-2, fine-tuned by Nous Research. The fine-tuning process involved a meticulously crafted dataset containing 3,000 high-quality examples. Many of these examples were designed to fully utilise the 4096 context length capability of Llama-2. LDJ took the lead in both training the model and curating the dataset, while J-Supha made significant contributions to dataset formation.
Here’s the link to the model: https://huggingface.co/TheBloke/Redmond-Puffin-13B-GGML
The computational resources for this project were generously sponsored by Redmond AI, and Emozilla provided valuable assistance during the training experiments, helping to address various issues encountered during the process. Moreover, Caseus and Teknium were recognized for their contributions to resolving specific training issues.
The model, named Redmond-Puffin-13B-V1.3, was trained for multiple epochs on the carefully curated dataset of 3,000 GPT-4 examples. These examples mainly comprised extensive conversations between real humans and GPT-4, allowing the model to grasp complex contexts effectively. Additionally, the training data was enriched with relevant sections extracted from datasets such as CamelAI’s Physics, Chemistry, Biology, and Math.