MITB Banner

Prompt Injection Threat is Real, Will Turn LLMs into Monsters

A newly-discovered prompt injection attack has the potential to break Bing Chat wide open.

Share

Listen to this story

Prompt injection attacks such as ChatGPT’s DAN (Do Anything Now) and Sydney (Bing Chat) are no longer funny. In the case of ChatGPT, the prompt made ChatGPT take on the persona of another chatbot named DAN which ignored OpenAI’s content policy and provided information on all sorts of restricted topics. They have displayed the vulnerability in the chatbot system which can be exploited for malicious activity, including the theft of personal information. 

With this new crop of exploits, LLMs have become powerful tools in the hands of hackers.

From innocence to destruction

Security researchers from Saarland University presented a paper titled, ‘More than you’ve asked for’, in which they discussed methods of implementing prompt engineering attacks in chatbots.

The researchers behind the paper have found a method to inject prompts indirectly. By harnessing the new ‘application-integrated LLMs’ such as Bing Chat and GitHub Copilot, they found a way to inject prompts from an external source, thereby widening the attack vectors available for hackers.

By injecting a prompt into a document that is likely to be retrieved by the LLM during inference, malicious actors can execute the prompt indirectly without additional input from the user. The engineered prompt can then be used to collect user information, turning the LLM into a method to execute a social engineering attack.

One of the researchers behind the paper, Kai Greshake, illustrated an example wherein he was able to get Bing Chat to collect the users’ personal and financial information. By forcing the bot to crawl a website with an embedded prompt hidden in it, the chatbot was able to execute a command which made it masquerade as a Microsoft support executive selling Surface Laptops at a discount. Using this as a cover, the bot was able to extract the user’s name, email ID and financial information.

Reportedly, using this method can allow malicious actors to achieve a persistent attack prompt, triggered by a token keyword. This exploit can also be spread to other LLMs and can even be used as an avenue to retrieve new instructions from an attacker’s server. User ComplexSystems on the Hacker News forum succinctly explained the potential of this exploit, stating

“It is probably worth noting that you don’t even need the user to click on anything. Bing will readily go and search and read from external websites given some user request. You could probably get Bing, very easily, to just silently take the user’s info and send it to some malicious site without their even knowing, or perhaps disguised as a normal search.”

An interesting variable discussed in the paper was the impact of reinforcement learning with human feedback on the effectiveness of these impacts. To test indirect prompt injection attacks, the researchers built a model using LangChain and davinci-003. However, they couldn’t find whether implementing RLHF increases the effectiveness of these attacks or decreases them.

This paper represents a shift in the effective use-cases of prompt injection attacks. PI attacks have graduated from being playful prompts that can generate racy content to an actual cybersecurity issue utilising one of the most sinister attack vectors—social engineering.

There’s no fixing LLMs 

Naturally, the release of this paper prompted a lot of discussion, especially on the Hacker News forum. In response to a comment thread exploring how this attack can be prevented, Greshake stated,

“Even if you can mitigate this one specific injection, this is a much larger problem. It goes back to Prompt Injection itself—what is instruction and what is code? If you want to extract useful information from a text in a smart and useful manner, you’ll have to process it.“

This statement accurately captures the integral problem with prompt injections as a concept, as there are very few security measures that can be used to protect against them. LLMs are designed to take user prompts and process them in the most efficient way—the better the LLMs capability to understand prompts, the bigger the attack surface for prompt injection

Others offered up the probability that the unique identifier used in the sample prompt, termed [system], was one of the puzzle pieces that made the attack work. Hence, this avenue of attack can be fixed simply by changing this unique token. However, Greshake argued that any prompt injection is equivalent to arbitrary code injection into the LLM itself, thus preventing any way of patching this vulnerability. 

The research paper ends with a call for more research and an in-depth investigation on how these attacks can be mitigated. However, considering the internal architecture of LLMs and the black box nature of large neural networks, it seems that the solution for prompt injection attacks is far off in the future. 

Share
Picture of Anirudh VK

Anirudh VK

I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.