Google updates its LaMDA language model

LaMDA is built by fine-tuning a family of Transformer-based neural language models specialised for dialog, with up to 137B model parameters.


In May last year, Google announced a language model called ‘LaMDA’, or ‘Language Model for Dialogue Applications,’ at Google I/O 2021, and it has now come up with advances in the model of the same project. LaMDA is built by fine-tuning a family of Transformer-based neural language models specialised for dialog, with up to 137B model parameters. 

Google says that it has been building the conversational skills of LaMDA for a long time. It added that the architecture produces a model that can be trained to read many words, work on how they relate to each other and predict what word comes next.


Sign up for your weekly dose of what's up in emerging technology.


As per the paper titled, “LaMDA: Language Models for Dialog Applications“, the benefits of model scaling with LaMDA are studied across three metrics.

  • Quality
  • Safety
  • Groundedness

Image: LaMDA: Language Models for Dialog Applications

The research team observed that model scaling alone improves quality, but its improvements on safety and groundedness are far behind human performance. It also saw that combining scaling and fine-tuning improves LaMDA by a significant amount for all three metrics. “Even if the model’s performance remains below human levels in safety and groundedness, the quality gap to measured crowd worker levels can be narrowed”, added the team.

  • Quality

The paper says that this is based on three components – sensibleness, specificity, and interestingness. The team collected annotated data that describes how sensible, specific and interesting a response is for a multiturn context. After that, they used these annotations to fine-tune a discriminator to re-rank candidate responses.

  • Safety

It is aimed to reduce the number of unsafe responses. The team defined an illustrative set of safety objectives that captures the behaviour that the model can exhibit in a dialog. They use a demographically diverse set of crowd workers to label responses in multiturn dialogs for the objectives. These labels are used to fine-tune a discriminator to detect and remove unsafe responses.

  • Groundedness

It is introduced to produce responses that are grounded in known sources where they contain verifiable external world information. The paper adds that though grounding in known sources does not guarantee factual accuracy, it can allow users to judge the validity of a response based on the reliability of its source and its reproduction.


LaMDA undergoes two-stage training- pre-training and fine-tuning. The team has created a dataset of 1.56T words for the pre-training stage from public dialog data and other public web documents. Then the dataset is tokenised into 2.81T SentencePiece tokens and pre-trained using GSPMD to predict every next token in a sentence (given the previous tokens). 


Image: Google

Here, the team trains LaMDA to perform a mix of generative tasks for natural-language responses to given contexts. The paper adds, “The LaMDA generator is trained to predict the next token on a dialog dataset restricted to back-and-forth dialog between two authors, while the LaMDA classifiers are trained to predict the Safety and Quality (SSI) ratings for the response in context using annotated data.”

The LaMDA generator generates many candidate responses given the current multi-turn dialog context. The LaMDA classifiers help predict the SSI and Safety scores. The responses with low Safety scores are filtered out first, and then the remaining candidates are re-ranked by their SSI scores. The top result is selected as the chosen response. 


The team collected responses from the pre-trained model, fine-tuned model, human raters, and multi-turn two-author dialogs. Then, they asked a different set of human raters a bunch of questions to evaluate these responses against the three metrics of quality, safety, and groundedness.

The results show that LaMDA significantly outperforms the pre-trained model (in all dimensions and across all model sizes). 

Image: Google 

  • Quality 

The paper says that the quality metrics generally improve with the number of model parameters, with or without fine-tuning.

  • Safety 

It does not benefit from the model scaling alone but shows improvement with fine-tuning.

  • Groundedness

As the model size increases, groundedness improves. The model can access external knowledge sources and effectively shift some of the load of remembering knowledge to an external knowledge source through fine tuning.

More Great AIM Stories

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
Amit Raja Naik
Oh boy, is JP Morgan wrong?

The global brokerage firm has downgraded Tata Consultancy Services, HCL Technology, Wipro, and L&T Technology to ‘underweight’ from ‘neutral’ and slashed its target price by 15-21 per cent.