The Missing Link for Indian Language Chatbots: Indic Data 

“You will see many claiming that they can make a chatbot or LLM for Indian languages; 99% of those are transient,” said Raj Dabre.
The Missing Link for Indian Language Chatbots: Indic Data
Image by Raghavendra Rao
In recent times, there has been a noticeable upswing in the efforts to build Indic language models. And even though some of these models are adequate for various tasks, their adoption remains abysmally low compared to their ‘superior’ English counterparts. A huge challenge here is the availability of Indic languages datasets. In a conversation with AIM, Raj Dabre, a prominent researcher at NICT in Kyoto, adjunct faculty at IIT Madras and a visiting professor at IIT Bombay, discussed the complexities of developing chatbots for Indian languages. "These models [GPT-3] have seen close to tens of trillions of tokens or words in English. Unless you have seen the entirety of the web, or more or less all of it, none of these models will be able to actually solve the generative AI problem
Subscribe or log in to Continue Reading

Uncompromising innovation. Timeless influence. Your support powers the future of independent tech journalism.

Already have an account? Sign In.

📣 Want to advertise in AIM? Book here

Picture of Mohit Pandey
Mohit Pandey
Mohit writes about AI in simple, explainable, and often funny words. He's especially passionate about chatting with those building AI for Bharat, with the occasional detour into AGI.
Related Posts
AIM Print and TV
Don’t Miss the Next Big Shift in AI.
Get one year subscription for ₹5999
Download the easiest way to
stay informed