Not even halfway through the year, we have already seen back-to-back Large Language Models (LLMs) being released by tech giants. A month ago, Google’s 540-billion parameter Pathways Language Model (PaLM) came out, followed by DeepMind’s Chinchilla. Now, Meta AI has released Open Pretrained Transformer (OPT-175B), a language model with 175 billion parameters trained on publicly available data sets. Meta says that with this release, it aims to build more community engagement in understanding LLMs. In terms of performance, Meta claims that OPT-175B is comparable to GPT-3 but requires only 1/7th of the carbon footprint to develop.
There are so many LLMs. What makes this one special?
Is it just another LLM or does it set itself apart in some way? OPT is generating a lot of buzz globally as the model release is under a noncommercial license. In a move usually not followed by big tech companies, the access to the model will be given to academic researchers, those affiliated with organisations in the government, civil society, academia and industry research laboratories around the world. Meta goes as far as to call this “Democratising access to large-scale language models, to maintain integrity and prevent misuse.”
Along with this release, Meta is also releasing a suite of smaller-scale baseline models, trained on the same data set for researchers to study the effect of scale alone. Meta is releasing all the notes documenting the development process, including the full logbook detailing the day-to-day training process.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Public models do exist, just not at this scale
Though not of this magnitude, companies have open-sourced their models in the past as well.
- GPT-NeoX-20B by EleutherAI – It is a 20 billion parameter autoregressive language model whose weights, training and evaluation code are open-source. During its submission, the researchers claimed that “to the best of our knowledge, it is the largest dense autoregressive model that is publicly available.”
- Salesforce Research – This paper’s researchers proposed a conversational program synthesis approach through large language models, which solves the challenges of “searching over a vast program space and user intent specification faced in prior approaches”. They train a family of LLMs called CodeGen on natural language and programming language data. CodeGen outperforms OpenAI’s Codex on the HumanEval benchmark. The training library JaxFormer, including checkpoints, is open-source.
- BigScience Research workshop – The BigScience project is an open collaboration boot-strapped by HuggingFace, GENCI and the Institute for Development and Resources in Intensive Scientific Computing (IDRIS). This project wants to change the fact that most of the control of the transformative changes that LLMs can bring is in the hands of tech mammoths, and wants to open up more participation in LLM research.
Why not grant access to everyone?
When such LLMs are released by tech leaders, most like to give access to their innovations to “selected” people, organisations and big research labs. LLMs are plagued with issues such as bias, toxicity, robustness, etc., as we are aware. How are these issues to be solved without open access to everyone? While giving access to a vast number of people and entities is a bold move on Meta AI’s part, the access is still made available on request, which has made some people question its intentions.
Shortly after Meta’s announcement of OPT-175B, many enthusiasts who wanted to tinker with the model had a similar grievance. Some commentators on discussion forums wrote that they do not like the “available on request” clause. They questioned and complained that if someone is not an academic or researcher but is still interested if they will be considered for access. People also pondered about the minimum requirements needed to access it.
At least big tech is talking about the environmental impact of LLMs
“Training a single BERT base model (without hyperparameter tuning) on GPUs was estimated to require as much energy as a trans-American flight,” said the paper titled “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” The environmental impact of building such kinds of models is huge and it is high time that tech firms acknowledge it and work on ways to reduce such effects.
Limitations of OPT
Meta believes that this technology is premature for commercial deployment.
Let us look at a few of the limitations the model carries:
- OPT does not work well with declarative instructions or point-blank interrogatives. It also tends to be repetitive and can get stuck in a loop.
- When testing was conducted with the RealToxicityPrompts data set, OPT-175B showed a higher toxicity rate than either PaLM or Davinci.
- It also has a high propensity to generate toxic language and reinforce harmful stereotypes, even when provided with a relatively innocuous prompt (Gehman et al., 2020), and adversarial prompts are trivial to find.