MITB Banner

LMQL: The Cure for LLM Chatbot Hallucination?

LMQL takes a hybrid approach to programming and combines natural language prompts with programming language for accurate responses from language models

Share

Listen to this story

Large language models don’t always respond correctly to your questions. They are difficult to control because the user cannot fully understand what goes on inside them. Recently, there have been a lot of complaints about LLM chatbots hallucinating, giving unsatisfactory responses, and a good option to fix them is to improve prompting techniques. Language Model Query Language (LMQL) solves this issue by combining language prompting with simple scripting. 

Researchers from ETH Zurich wrote a paper titled ‘Prompting Is Programming: A Query Language for Large Language Models’ — on the emerging discipline of clever prompting, which is an intuitive combination of natural language and programming language prompts. Users can specify constraints on a language model’s output, and get it to perform multiple tasks at the same time by providing high-level semantics. 

How does it work?

LMQL is a declarative programming language, which means the language states only what the end result of the task is and abstracts the control flow of logic required for the software to perform the action. It is inspired by SQL but integrates Python into its framework. Users can ask the model prompts that contain both text and code. 

The language grammar, according to the paper, has five essential parts. The decoder, as the name suggests, decodes the algorithm that generates the text. It is a string of code which transforms the output into meaningful results, improving the quality and diversity of words. 

The Query block written in Python syntax serves as the core interaction mechanism with the language model. Each top-level string within the query block is a direct query to the language model. The Model/from clause specifies the model being queried. This defines the underlying language model used for text generation and Where Clause on the other hand allows users to define the constraints that influence the generated output. It defines the output required by the language model to stick to the desired qualities. 

And finally, Distribution Instructions, which is an optional instruction, guides the distribution of generated values. It defines how the generated results should be distributed and presented, enabling the user to control the outcome’s format and structure. 

Control the interaction

For simple queries, users can guide the language model using natural language, but when the tasks increase in complexity and when the user requires responses to specific questions, it is better to have full control of the query. If you’re tech savvy, even for simple tasks like asking the model to tell you a joke, you can be in full control of the result you wish to get. 

LMQL offers a dedicated Playground IDE to make query development easier. Users can examine the interpreter’s status, validation outcomes, and model responses at any stage of text generation. This comprises the capability to analyze and explore various hypotheses produced during beam search, providing useful insights to refine the language model’s behavior.

Efficiency and performance are a big challenge according to the paper. Despite being more efficient, the inference step in modern Language Models rely on costly, high-end GPUs to achieve satisfactory performance. 

With LMQL, the generation of text closely aligned with desired output becomes possible in the first attempt, eliminating the need for subsequent iterations. The evaluations show that LMQL improves accuracy in various tasks while significantly reducing the computational costs in pay-to-use APIs. This translates to an impressive cost savings ranging from 13% to 85%.

One of the authors of LMQL said on HackerNews, “Cost is definitely a dimension we are considering (research has limited funding after all), especially with the OpenAI API. Lock-step token-level control is difficult to implement with the very limited OpenAI API. As a solution to this, we implement speculative execution, allowing us to lazily validate constraints against the generated output, while still failing early if necessary. This means, we don’t re-query the API for each token (very expensive), but rather can do it in segments of continuous token streams, and backtrack where necessary.”

Language Model Programming

This isn’t the first hybrid approach to prompt engineering. Jargon, SudoLang, and prlang all do something similar. “LLMs+PLs is a very interesting field right now, with lots of directions to explore,” said another author of LMQL. They offer users the ability to express both common and advanced prompting techniques in a simple and concise manner. 

But if you can use any programming language on LLMs, why learn a specific query language like LMQL?

LMQL gives you a concise way to define multi-part prompts and enforce constraints on LLMs. For instance, you can make sure the model always adheres to a specific output format, where parsing of the output is automatically taken care of. Also abstracts a number of things like APIs and local models, tokenisation, optimisation and makes tool integration (e.g. tool function calls during LLM reasoning) much easier. This is also language model agnostic, improving portability and can be used across LLMs.

Language Model Programming (LMP) makes it easier to adapt language models for different tasks while abstracting the model’s internals and providing high-level semantics. LMQL represents a promising development, as evidenced by its ability to enhance the efficiency and accuracy of language model programming. It empowers users to achieve their desired results with fewer resources, making text generation more accessible and efficient.

Share
Picture of K L Krithika

K L Krithika

K L Krithika is a tech journalist at AIM. Apart from writing tech news, she enjoys reading sci-fi and pondering the impossible technologies, trying not to confuse it with reality.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.