Not even halfway through the year, we have already seen back-to-back Large Language Models (LLMs) being released by tech giants. A month ago, Google’s 540-billion parameter Pathways Language Model (PaLM) came out, followed by DeepMind’s Chinchilla. Now, Meta AI has released Open Pretrained Transformer (OPT-175B), a language model with 175 billion parameters trained on publicly available data sets. Meta says that with this release, it aims to build more community engagement in understanding LLMs. In terms of performance, Meta claims that OPT-175B is comparable to GPT-3 but requires only 1/7th of the carbon footprint to develop.
There are so many LLMs. What makes this one special?
Is it just another LLM or does it set itself apart in some way? OPT is generating a lot of buzz globally as the model release is under a noncommercial license. In a move usually not followed by big tech companies, the access to the model will be given to academic researchers, those affiliated with organisations in the government, civil society, academia and industry research laboratories around the world. Meta goes as far as to call this “Democratising access to large-scale language models, to maintain integrity and prevent misuse.”
And while we're here, I'd like to add: "democratizing" is used in such a misleading way in "AI". "Democracy" entails shared *governance* not just "anyone can come play with it". (From the title of the Meta blog post.)https://t.co/cq10EDnnhn
— Emily M. Bender (@emilymbender) May 3, 2022
Along with this release, Meta is also releasing a suite of smaller-scale baseline models, trained on the same data set for researchers to study the effect of scale alone. Meta is releasing all the notes documenting the development process, including the full logbook detailing the day-to-day training process.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Public models do exist, just not at this scale
Though not of this magnitude, companies have open-sourced their models in the past as well.

- GPT-NeoX-20B by EleutherAI – It is a 20 billion parameter autoregressive language model whose weights, training and evaluation code are open-source. During its submission, the researchers claimed that “to the best of our knowledge, it is the largest dense autoregressive model that is publicly available.”
- Salesforce Research – This paper’s researchers proposed a conversational program synthesis approach through large language models, which solves the challenges of “searching over a vast program space and user intent specification faced in prior approaches”. They train a family of LLMs called CodeGen on natural language and programming language data. CodeGen outperforms OpenAI’s Codex on the HumanEval benchmark. The training library JaxFormer, including checkpoints, is open-source.
- BigScience Research workshop – The BigScience project is an open collaboration boot-strapped by HuggingFace, GENCI and the Institute for Development and Resources in Intensive Scientific Computing (IDRIS). This project wants to change the fact that most of the control of the transformative changes that LLMs can bring is in the hands of tech mammoths, and wants to open up more participation in LLM research.
Why not grant access to everyone?
When such LLMs are released by tech leaders, most like to give access to their innovations to “selected” people, organisations and big research labs. LLMs are plagued with issues such as bias, toxicity, robustness, etc., as we are aware. How are these issues to be solved without open access to everyone? While giving access to a vast number of people and entities is a bold move on Meta AI’s part, the access is still made available on request, which has made some people question its intentions.
Why would @MetaAI not fully open source OPT-175b ?
— Sébastien Toth (@TothSebastien) May 3, 2022
What's the worse that could happen? A massive bunch of start-ups trying new innovations (and still failing) ?
Still, it's a huge step forward and a key contribution to OS.https://t.co/h25IFVL19U
Shortly after Meta’s announcement of OPT-175B, many enthusiasts who wanted to tinker with the model had a similar grievance. Some commentators on discussion forums wrote that they do not like the “available on request” clause. They questioned and complained that if someone is not an academic or researcher but is still interested if they will be considered for access. People also pondered about the minimum requirements needed to access it.
At least big tech is talking about the environmental impact of LLMs
“Training a single BERT base model (without hyperparameter tuning) on GPUs was estimated to require as much energy as a trans-American flight,” said the paper titled “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” The environmental impact of building such kinds of models is huge and it is high time that tech firms acknowledge it and work on ways to reduce such effects.
Image: Carbon Emissions and Large Neural Network Training (arxiv.org)
So, the development of this new LLM (OPT-175B) had an estimated carbon emissions footprint (CO2eq) of 75 metric tons https://t.co/rhrTx1Hcu6 pic.twitter.com/KML51oSEzy
— Panos Bozelos (@BozelosP) May 3, 2022
It's great to see new AI papers discussing the carbon footprint of their models. Despite the very rough estimation, OPT-175B is presented as an alternative to GPT-3, with 1/7 of the carbon footprint 🌿
— Luís Cruz (@luismcruz) May 3, 2022
🔗pre-print: https://t.co/I4xjgEGRLp
Well done, @MetaAI#GreenAI
Limitations of OPT
Meta believes that this technology is premature for commercial deployment.
Let us look at a few of the limitations the model carries:
- OPT does not work well with declarative instructions or point-blank interrogatives. It also tends to be repetitive and can get stuck in a loop.
- When testing was conducted with the RealToxicityPrompts data set, OPT-175B showed a higher toxicity rate than either PaLM or Davinci.
Image: Meta
- It also has a high propensity to generate toxic language and reinforce harmful stereotypes, even when provided with a relatively innocuous prompt (Gehman et al., 2020), and adversarial prompts are trivial to find.