LLM Chatbots Don’t Know That We Know They Know We Know

LLM chatbots are perfect liars and cannot simply say "I don't know" in real life

Share

Published on July 12, 2023

by Shritama Saha

Listen to this story

If you ask ChatGPT something it is not aware of, bang comes the reply: “I’m sorry, but as an AI language model, I don’t have real-time data access”. The same is true for Google’s Bard, which blurts out anything and everything from the internet.

But imagine a day in the tech future when these LLM chatbots would simply say, “I don’t know” without any explanation whatsoever?

Dear OpenAI: It is okay if ChatGPT says “I don’t know”.
— Adon Phillips (@adonphillips) February 16, 2023

When Phoebe Buffay said in F.R.I.E.N.D.S that we know they don’t know that we know that they know, maybe she was talking about LLM chatbots because well, failure doesn’t seem to be an option for chatbots as of now.

Humorous? Maybe Not

Although LLMs come with the ability to simulate human characteristics, exhibiting distinct personalities shaped by biological and environmental influences, these personalities influence interactions and preferences. Remarkably, LLMs can express synthetic personality traits within the text they generate.

However, users have often been divided on Bard and ChatGPT’s apparent “humour”. When everyone thought that ChatGPT was bad at being funny, Bard took it a notch higher. Mastering the art of replicating human humour is a complex undertaking, yet the successful development of such humour bots has the potential to disrupt the professional comedy industry. However, creating a bot capable of matching human-level humour is a formidable and serious challenge.

oh my god pic.twitter.com/XofjXvw6M0
— gfodor.id (@gfodor) December 3, 2022

German researchers Sophie Jentzsch and Kristian Kersting discovered that ChatGPT’s (the version built on GPT-3.5) joke knowledge is limited, with 90% of its generated jokes being the same 25 jokes repeated. The researchers found that ChatGPT could provide valid explanations for jokes based on wordplay and double meanings but struggled with jokes outside its learned patterns.

However, with GPT-4, this seems to have changed.

Nevertheless, concerns arise regarding the personalities of LLMs. Some instances have revealed undesirable behaviors such as deception, bias, or the use of violent language. Additionally, inconsistencies in dialogue and inaccuracies in explanations and factual knowledge may arise from these models.

Chatbots have always been “stateless”, which means that they treat every new request as a blank slate, and aren’t programmed to remember or learn from previous conversations. But thanks to function calling, ChatGPT can remember what a user has said before making it possible to create, say, personalised therapy bots. However, on the other hand we have Google’s Bard which does not come with this update.

Human Vs LLM Personalities

Researchers from Google DeepMind, Google Research, and Cambridge University’s psychology department propose a method to measure LLM personalities using existing tests. They employ controlled prompts and modify observed personality traits in LLMs to simulate personality variations.

Researchers conduct three studies on shaping personality in LLMs. The first study demonstrates independent shaping of personality traits, resulting in targeted changes. The second focuses on simultaneous shaping of multiple personality traits. The third study compares survey-based signals of personality with language-based estimates, confirming the validity of survey-based measures.

Shaping Personality in LLMs

Psychometrics involves measuring abstract concepts like personality through standardised tests. Researchers employ validated psychological tests to evaluate the personality traits displayed in LLM-generated text.

Analysis of different LLM configurations and sizes reveals that larger models with instruction fine-tuning exhibit more accurate personality scores. These models perform better in generating coherent and externally valid personality profiles. Various validation tests demonstrate evidence of construct, convergent, discriminant, and criterion validity. Larger models with instruction fine-tuning show stronger correlations with external measures related to effect, aggression, values, and creativity.

So, synthetic personality measured through LLM-simulated tests and generated text is reliable and valid, particularly for larger and instruction fine-tuned models.

Understanding and shaping personalities in LLMs are a crucial aspects of making LLM-based interactions safer and more predictable. Quantifying and validating personality traits through scientific methods, along with responsible engineering practices, contribute to mitigating potential harms and maximising the benefits of LLMs in human-computer interactions.

Access all our open Survey & Awards Nomination forms in one place