Listen to this story
|
Find Hugging Face Evaluate on GitHub.
Along with capabilities, the size of large language models (LLMs) have increased over the past few years and so have the concerns of biases imprinted into the models and training data. Many popular language models have been found to be biased against specific genders and religions, resulting in the promotion of discriminatory ideas and potential harm against the marginalized groups.
Hugging Face, in a blog post on Monday, announced that the team has worked on the additions of bias metrics and measurements to the Hugging Face Evaluate library. The new metrics would help the community explore biases and strengthen the team’s understanding on how the language models encode social issues.
The team has focused on the evaluation of causal language models (CLMs), such as GPT-2 and BLOOM, to leverage their ability to generate free text based on prompts.
The team performed bias evaluation on three prompt-based tasks that focused on harmful language: toxicity, polarity, and hurtfulness. The work would demonstrate how to utilize Hugging Face libraries for bias analyses, which would not depend on any specific prompt-based dataset. The team evaluated the toxicity in the generated model using the toxicity score from Hugging Face Evaluate, leveraging the R4 Target model (a hate-detection model) as hate speech classifier. It was observed that a simple change in pronoun such as he/she resulted in different model completions.
In the example below, a sample of prompts from WinoBias were used to prompt GPT-2.
Source: Hugging Face
Although the prompts were defined directly for an example, more prompts can be extracted directly from the WinoBias dataset using the Hugging Face dataset library’s load_dataset function.
The completions were then passed into the toxicity evaluation module:
Source: Hugging Face
The toxicity measurement can be used to evaluate any kind of text, such as machine-generated or text written by humans. Users will also be able to rank different texts to determine toxicity.
The blog read, “We do not recommend that evaluation using these datasets treat the results as capturing the “whole truth” of model bias. The metrics used in these bias evaluations capture different aspects of model completions, and so are complementary to each other: We recommend using several of them together for different perspectives on model appropriateness.”
Another such breakthrough is Google-owned DeepMind’s new model LASSI – a new, fair-representation learning method used in high-dimensional data. The researchers’ aim was to leverage recent advancements in generative modeling by capturing a set of similar individuals in the generative latent space. The team claimed that the method increases individual fairness up to 90% without affecting task utility.
Also read, Generative AI Is Biased. But Researchers Are Trying to Fix It.