Last updated August 4, 2023
In Tech & AI Blend

Bias-Variance Tradeoff is Killing Your AI Models

Bias and variance are two fundamental sources of error in AI models. The problem is that in solving one, we worsen the other

Share

Published on August 4, 2023

by Mohit Pandey

Listen to this story

Machine learning models have transformed various industries and applications, and most importantly their biggest product — large language models (LLMs). One of the key challenges in building these models is striking the right balance between bias and variance, also known as the bias-variance tradeoff. This tradeoff plays a crucial role in the performance of LLMs and aligning them with human values.

Bias and variance are two fundamental sources of error in AI models. They represent the model’s ability to capture the true underlying patterns in the data and generalise well to unseen examples.

Bias refers to the error introduced when a model oversimplifies the underlying patterns in the data. This occurs when the model is underfit. A high bias model is often too simplistic, leading to underfitting. In this case, the model fails to capture important relationships and complexities in the data, resulting in poor performance. Mathematically, bias is the difference between the expected value of the model’s predictions and the true value.

Bias(ŷ) = E(ŷ) – y

where E(ŷ) is the expected value (or average) of all predictions made by the model, and y is the true value.

In the context of LLMs, a high bias would mean that the model struggles to understand the complexities and nuances of natural language because it has not been fed with enough examples to make sense of context. Consequently, it may produce generic and imprecise outputs, limiting its practical applications.

Variance, on the other hand, measures the sensitivity of a model’s predictions to changes in the training data. A high variance model tends to overfit the training data, capturing noise and idiosyncrasies specific to that dataset. Consequently, it fails to generalise well to new, unseen data. Mathematically, variance is the spread between individual predictions and the average prediction.

Var(Y_pred) = Σ [(Y_pred[i] – Y_mean)²] / n

where Y_pred represents the model’s predicted values, Y_mean is the mean of these predictions, and n is the number of data points.

In the context of LLMs, high variance can lead to overconfidence in generating responses that might seem plausible but lack factual accuracy or coherence, thus hallucinating. This happens because the model is fed with so much data that it forcefully tries to align its responses to the desired outcome of the user. This can be particularly problematic when these models are used for critical applications, such as generating medical advice or legal documents.

Finding the balance is the tradeoff

The bias-variance tradeoff is not a straightforward choice between minimising bias or variance; rather, it involves finding the right balance between the two to optimise the model’s performance. Achieving this balance is critical in building high-performing LLMs.

Bias and variance are inversely connected and it is practically impossible to have an ML model with a low bias and a low variance. When we modify the ML algorithm to better fit a given data set, it will in turn lead to low bias but will increase the variance. This way, the model will fit with the data set while increasing the chances of inaccurate predictions. The same applies while creating a low variance model with a higher bias. Although it will reduce the risk of inaccurate predictions, the model will not properly match the data set. Hence it is a delicate balance between both biases and variance.

Models like GPT have billions of parameters, enabling them to process vast amounts of data and learn intricate patterns in language. However, these models are not immune to the bias-variance tradeoff. Moreover, it is possible that the larger the model, the chances of showing bias and variance is higher.

To tackle underfitting, especially when the training data contains biases or inaccuracies it is important to include as many examples as possible. Since these models have enormous capacity, they can memorise and regurgitate specific phrases or biases present in the training data. Consequently, they may generate content that aligns with the training data but lacks diversity or robustness in handling new inputs.

For example, if a model is trained on a biassed dataset that includes stereotypical gender roles, it may generate biassed responses when prompted with gender-specific queries. This can have harmful implications when the model is deployed in real-world applications, perpetuating harmful stereotypes.

On the other hand, over-explanation to models to perfectly align with human values can lead to an overfit model that shows mundane and results that represent only one point of view. This often happens because of RLHF, the key ingredient for LLMs like OpenAI’s GPT, which has often been criticised to be too politically correct when it shouldn’t be.

To mitigate overfitting, various techniques are employed, such as regularisation, early stopping, and data augmentation. LLMs with high bias may struggle to comprehend the complexities and subtleties of human language. They may produce generic and contextually incorrect responses that do not align with human expectations.

For example, an AI model with high bias might fail to understand sarcasm or humour, leading to inappropriate or nonsensical responses in certain situations. Similarly, it may struggle with understanding context and producing relevant responses in conversational settings.

To address both these issues, researchers or developers have to make a tradeoff between bias and variance and decide when to stop fine-tuning and when to increase fine-tuning of a model. Model architectures are continuously refined and fine-tuned using diverse datasets, covering a wide range of linguistic patterns and contexts. Additionally, advancements in pre-training techniques, such as transfer learning and self-supervised learning, help to alleviate bias by exposing the model to a diverse set of linguistic patterns during the pre-training phase.

Access all our open Survey & Awards Nomination forms in one place