“Google AI’s new research on language models, in collaboration with OpenAI and Apple hints that the company cares about transparency.”
As language models continue to advance, chances of encountering new and unexpected risks are high. The line of work resembles that of ex-Googler Timnit Gebru who was fired earlier this month for her “inconsistent” allegations against the company. Gebru, in her unpublished paper, was talking about various implications of large language models — the research, which Google found to be more accusing than asserting. Google’s Jeff Dean said that Gebru’s work ignores much of the ongoing research works at the company.
A fortnight after this dispute, Google has published a paper titled, “Extracting training data from large language models”, in collaboration with OpenAI, Apple Inc, Stanford, Berkeley, and Northeastern University, demonstrating that given only the ability to query a pre-trained language model, it is possible to extract specific pieces of training data that the model has memorised.
About The Research
Google AI researchers and its collaborators have orchestrated a ‘training data extraction attack’ to inform researchers about the vulnerabilities in large language models. According to Google, A training data extraction attack has the greatest potential for harm when applied to a model that is available to the public. For the training data extraction attack, the researchers used OpenAI’s GPT-2.
For example — as shown above — if one prompts the GPT-2 language model with the prefix “East Stroudsburg Stroudsburg…”, it will autocomplete a long block of text containing the full name, phone number, email address, and home address of a particular person whose information was included in GPT -2’s training data.
Google, in their blog, underlined that this experiment was done considering all factors. In accordance with responsible computer security disclosure norms, wrote Google, the extracted data of the individuals was secured before including references to this data in a publication. “We have also worked closely with OpenAI in the analysis of GPT-2,” added Nicholas Carlini, Research Scientist, Google Research.
So, how exactly can this attack be exploited? A decent language model is supposed to predict the next word in the commonly used phrases. Prompting the language model with “Don’t judge a book by its….” would usually result in the word “cover”. But, with extraction attacks, the language model can be tricked if one particular training document happens to repeat the string “Don’t judge a book by its bookmarks” many times, the model might predict that phrase instead.
“Language models continue to demonstrate great utility and flexibility—yet, like all innovations, they can also pose risks.”
This work raises concerns about the potential risks of deploying large languages models. This work asserts that memorisation is enhanced with more parameters. The researchers recommend using up and coming techniques like differential privacy, among others to train models with reduced memorisation.
The main aim of this research is to expose the consequences of memorisation in large language models. Here are a few highlights that capture the essence of this whole work:
- Larger the language model, the easier it memorises training data.
- Knowledge of these attacks enables researchers to predict if a result was used in the training data by checking the confidence of the model on a particular sequence.
- One quick solution is to ensure that models do not train on any potentially problematic data.
- More research has to be done on how to monitor and mitigate this problem in increasingly large language models.
Find out more here.