Google updates its LaMDA language model

In May last year, Google announced a language model called ‘LaMDA’, or ‘Language Model for Dialogue Applications,’ at Google I/O 2021, and it has now come up with advances in the model of the same project. LaMDA is built by fine-tuning a family of Transformer-based neural language models specialised for dialog, with up to 137B model parameters. 

Google says that it has been building the conversational skills of LaMDA for a long time. It added that the architecture produces a model that can be trained to read many words, work on how they relate to each other and predict what word comes next.


As per the paper titled, “LaMDA: Language Models for Dialog Applications“, the benefits of model scaling with LaMDA are studied across three metrics.


Sign up for your weekly dose of what's up in emerging technology.
  • Quality
  • Safety
  • Groundedness

Image: LaMDA: Language Models for Dialog Applications

The research team observed that model scaling alone improves quality, but its improvements on safety and groundedness are far behind human performance. It also saw that combining scaling and fine-tuning improves LaMDA by a significant amount for all three metrics. “Even if the model’s performance remains below human levels in safety and groundedness, the quality gap to measured crowd worker levels can be narrowed”, added the team.

Download our Mobile App

  • Quality

The paper says that this is based on three components – sensibleness, specificity, and interestingness. The team collected annotated data that describes how sensible, specific and interesting a response is for a multiturn context. After that, they used these annotations to fine-tune a discriminator to re-rank candidate responses.

  • Safety

It is aimed to reduce the number of unsafe responses. The team defined an illustrative set of safety objectives that captures the behaviour that the model can exhibit in a dialog. They use a demographically diverse set of crowd workers to label responses in multiturn dialogs for the objectives. These labels are used to fine-tune a discriminator to detect and remove unsafe responses.

  • Groundedness

It is introduced to produce responses that are grounded in known sources where they contain verifiable external world information. The paper adds that though grounding in known sources does not guarantee factual accuracy, it can allow users to judge the validity of a response based on the reliability of its source and its reproduction.


LaMDA undergoes two-stage training- pre-training and fine-tuning. The team has created a dataset of 1.56T words for the pre-training stage from public dialog data and other public web documents. Then the dataset is tokenised into 2.81T SentencePiece tokens and pre-trained using GSPMD to predict every next token in a sentence (given the previous tokens). 


Image: Google

Here, the team trains LaMDA to perform a mix of generative tasks for natural-language responses to given contexts. The paper adds, “The LaMDA generator is trained to predict the next token on a dialog dataset restricted to back-and-forth dialog between two authors, while the LaMDA classifiers are trained to predict the Safety and Quality (SSI) ratings for the response in context using annotated data.”

The LaMDA generator generates many candidate responses given the current multi-turn dialog context. The LaMDA classifiers help predict the SSI and Safety scores. The responses with low Safety scores are filtered out first, and then the remaining candidates are re-ranked by their SSI scores. The top result is selected as the chosen response. 


The team collected responses from the pre-trained model, fine-tuned model, human raters, and multi-turn two-author dialogs. Then, they asked a different set of human raters a bunch of questions to evaluate these responses against the three metrics of quality, safety, and groundedness.

The results show that LaMDA significantly outperforms the pre-trained model (in all dimensions and across all model sizes). 

Image: Google 

  • Quality 

The paper says that the quality metrics generally improve with the number of model parameters, with or without fine-tuning.

  • Safety 

It does not benefit from the model scaling alone but shows improvement with fine-tuning.

  • Groundedness

As the model size increases, groundedness improves. The model can access external knowledge sources and effectively shift some of the load of remembering knowledge to an external knowledge source through fine tuning.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges