AI Chatbots are not Ready for Customer Service

A software engineer prompted a car dealership chatbot to sell him a Chevy for just a dollar.

Share

Published on January 8, 2024

by Pritam Bordoloi

After the launch of ChatGPT, the discussion surrounding AI replacing customer service jobs gained considerable traction. However, the situation intensified in July 2023, when Dukaan, a DIY platform for online stores, terminated 90 percent of its support staff, opting to replace them with an AI chatbot called Lina.

After this development, concerns were raised about the potential job displacement of customer agents, particularly in economies like India. However, industry experts interviewed by AIM expressed the view that at its current stage, AI is not equipped to replace humans in the context of BPO operations.

Moreover, recent occurrences, such as a software engineer successfully manipulating a car dealership chatbot powered by OpenAI’s GPT models to sell him a Chevy for just a dollar, confirm their assertions. While LLMs bring numerous benefits to those in customer service, instances like this do raise one moot question: Are LLMs reliable for customer-facing sales and support roles?

Generative AI for customer service

Ethan Mollick, an associate professor at the Wharton School of the University of Pennsylvania thinks LLMs are not ready for external facing sales and support roles. “They are gullible and hallucinate,” he posted on X. Persistent hallucinations continue to pose a major challenge for LLMs, and with the industry yet to find a solution, this issue may persist as we progress.

Currently, the models like GPT-4 are being fine tuned and trained with enterprise data to make them ready for enterprise application, however, it does not eliminate the risks. Despite being fine-tuned, it does not eliminate the chances of the bot providing plausible-sounding but incorrect or nonsensical answers.

Sanjeev Menon, co-founder and head of product & tech at E42.ai believes using generative AI such as ChatGPT in customer service does elevate efficiency and experience, provided enough thought is put into the design, along with fine tuning with specific data.

“However, generative AI is not a panacea for all maladies in customer support—clarity on the capabilities and limitations is very essential,” he told AIM.

The reality is many enterprises today have an LLM-powered chatbot integrated into their platform, and the number is only going to increase. Moreover, it can be argued that not most customers interacting with a car dealership bot will not ask for a python script.

“The behaviour does not reflect what normal shoppers do. Most people use it to ask a question like, ‘My brake light is on, what do I do?’ or ‘I need to schedule a service appointment,” Aharon Horwitz, CEO at Fullpath told Business Insider.

Yet that does not mean the risks associated with it can be ignored. As more enterprises embrace these bots, AI mishaps might only increase.

Humans in the loop

Hence, according to Menon, human intervention still plays a significant role despite the advancements in AI. He said that checks on prompt toxicity, data updates, and supervision during complex or sensitive situations are paramount to guaranteeing customers a positive and secure experience.

“In doing so, we not only enhance efficiency but also eliminate risks associated with the use of language models, ensuring a seamless and reliable customer service interaction.”

Gaurav Singh, founder and chief executive officer at Verloop.io, also said empowering agents as gatekeepers ensures quality control. “More than 90 percent of queries can be effectively handled by LLM-powered Conversational AI, but in instances of uncertainty, seamless transfer of queries to agents allows verification and editing, maintaining accuracy in automated responses for optimal query resolution,” he told AIM.

While there have been instances like Dukaan, contrary to earlier fears, widespread job loss has not occurred. Furthermore, occurrences such as chatbots recommending poison recipes underscore the crucial need for human intervention and caution against excessive reliance on AI chatbots.

“It’s important to strike a balance and use human agents where emotional intelligence, nuanced understanding, and complex problem-solving are required. A hybrid approach that combines the strengths of both AI and human agents may be the most effective solution for providing excellent customer service,” Beerud Sheth, co-founder and CEO at Gupshup, told AIM.

Are Small Language Models the answer?

LLMs like GPT-4 have billions of parameters and are trained on terabytes of data scraped from the web. These models have world knowledge, meaning they know everything from historical facts to contemporary events, providing a vast understanding of diverse topics.

However, does an enterprise need such worldly knowledge? Does a car dealership chatbot know Python script? No, and this is where Small Language Models (SLM) come in.

A SLM can be fine-tuned or trained specifically for a particular industry or domain. This enables the model to better understand industry-specific terminology, customer inquiries, and context, leading to more accurate and relevant responses.

These models also allow enterprises more control over the training process and can customise the model to align with their specific customer service needs.

“Leveraging domain-specific data and knowledge, these models ensure that their generated outputs align precisely with customers’ queries, industry standards, and specific requirements,” Rashid Khan, co-founder and CPO at Yellow.ai, told AIM.

However, the problem of hallucination pertains even in SLMs. While it has posed challenges to end hallucinations in these models, it certainly can be reduced. For instance, Yellow.ai, leverages a maker-checker model setup.

“One model generates responses, while another validates their relevance and accuracy. We also implemented the RAG architecture, ensuring fact-based answers to reduce hallucination chances and refining our model to provide accurate responses from a given paragraph,” Khan added.

Nonetheless, a domain-specific model with a human in the loop might still be the best approach for enterprises to mitigate risk. “With domain adaptation for precision, strict moderation for safety, and calculated human involvement for accountability, one can balance the efficiency of AI while guarding against unforeseen issues,” Sheth said.

Access all our open Survey & Awards Nomination forms in one place