Last updated July 27, 2023
In Innovation in AI

LMQL: The Cure for LLM Chatbot Hallucination?

LMQL takes a hybrid approach to programming and combines natural language prompts with programming language for accurate responses from language models

Share

Published on July 27, 2023

by K L Krithika

Listen to this story

Large language models don’t always respond correctly to your questions. They are difficult to control because the user cannot fully understand what goes on inside them. Recently, there have been a lot of complaints about LLM chatbots hallucinating, giving unsatisfactory responses, and a good option to fix them is to improve prompting techniques. Language Model Query Language (LMQL) solves this issue by combining language prompting with simple scripting.

Researchers from ETH Zurich wrote a paper titled ‘Prompting Is Programming: A Query Language for Large Language Models’ — on the emerging discipline of clever prompting, which is an intuitive combination of natural language and programming language prompts. Users can specify constraints on a language model’s output, and get it to perform multiple tasks at the same time by providing high-level semantics.

How does it work?

LMQL is a declarative programming language, which means the language states only what the end result of the task is and abstracts the control flow of logic required for the software to perform the action. It is inspired by SQL but integrates Python into its framework. Users can ask the model prompts that contain both text and code.

The language grammar, according to the paper, has five essential parts. The decoder, as the name suggests, decodes the algorithm that generates the text. It is a string of code which transforms the output into meaningful results, improving the quality and diversity of words.

The Query block written in Python syntax serves as the core interaction mechanism with the language model. Each top-level string within the query block is a direct query to the language model. The Model/from clause specifies the model being queried. This defines the underlying language model used for text generation and Where Clause on the other hand allows users to define the constraints that influence the generated output. It defines the output required by the language model to stick to the desired qualities.

And finally, Distribution Instructions, which is an optional instruction, guides the distribution of generated values. It defines how the generated results should be distributed and presented, enabling the user to control the outcome’s format and structure.

Control the interaction

For simple queries, users can guide the language model using natural language, but when the tasks increase in complexity and when the user requires responses to specific questions, it is better to have full control of the query. If you’re tech savvy, even for simple tasks like asking the model to tell you a joke, you can be in full control of the result you wish to get.

LMQL offers a dedicated Playground IDE to make query development easier. Users can examine the interpreter’s status, validation outcomes, and model responses at any stage of text generation. This comprises the capability to analyze and explore various hypotheses produced during beam search, providing useful insights to refine the language model’s behavior.

Efficiency and performance are a big challenge according to the paper. Despite being more efficient, the inference step in modern Language Models rely on costly, high-end GPUs to achieve satisfactory performance.

With LMQL, the generation of text closely aligned with desired output becomes possible in the first attempt, eliminating the need for subsequent iterations. The evaluations show that LMQL improves accuracy in various tasks while significantly reducing the computational costs in pay-to-use APIs. This translates to an impressive cost savings ranging from 13% to 85%.

One of the authors of LMQL said on HackerNews, “Cost is definitely a dimension we are considering (research has limited funding after all), especially with the OpenAI API. Lock-step token-level control is difficult to implement with the very limited OpenAI API. As a solution to this, we implement speculative execution, allowing us to lazily validate constraints against the generated output, while still failing early if necessary. This means, we don’t re-query the API for each token (very expensive), but rather can do it in segments of continuous token streams, and backtrack where necessary.”

Language Model Programming

This isn’t the first hybrid approach to prompt engineering. Jargon, SudoLang, and prlang all do something similar. “LLMs+PLs is a very interesting field right now, with lots of directions to explore,” said another author of LMQL. They offer users the ability to express both common and advanced prompting techniques in a simple and concise manner.

But if you can use any programming language on LLMs, why learn a specific query language like LMQL?

LMQL gives you a concise way to define multi-part prompts and enforce constraints on LLMs. For instance, you can make sure the model always adheres to a specific output format, where parsing of the output is automatically taken care of. Also abstracts a number of things like APIs and local models, tokenisation, optimisation and makes tool integration (e.g. tool function calls during LLM reasoning) much easier. This is also language model agnostic, improving portability and can be used across LLMs.

Language Model Programming (LMP) makes it easier to adapt language models for different tasks while abstracting the model’s internals and providing high-level semantics. LMQL represents a promising development, as evidenced by its ability to enhance the efficiency and accuracy of language model programming. It empowers users to achieve their desired results with fewer resources, making text generation more accessible and efficient.

Access all our open Survey & Awards Nomination forms in one place