Published on February 21, 2024
In AI Features

IIT Hyderabad Professor Believes Indic Data is All You Need

“One of the things that we are focused on is developing models that reach a vast audience, even in remote villages," said Professor Maunendra Sankar Desarkar.

Image by Raghavendra Rao

By Mohit Pandey

While fine-tuning with Indic language tokens on top of existing English models is a viable approach, building foundational models from scratch offers several advantages, and that is what BharatGPT is aiming to do. “Existing models may not adequately represent the Indian cultural and linguistic diversity, which can lead to biases and limitations in their applicability,” said Professor Maunendra Sankar Desarkar from IIT Hyderabad, who is also a core team member of the BharatGPT initiative. “Moreover, fine-tuning may not fully address the unique linguistic challenges posed by Indic languages,” he added. He further said that by building foundational models tailored to the Indian context, we can ensure greater inclusivity and effectiveness across diverse linguistic communities, which would deliver AI in the best possible way in India. “We're sourcing data from various repositories available on the web, including digitised books and datasets,” Desarkar added. He said

Subscribe or log in to Continue Reading

Uncompromising innovation. Timeless influence. Your support powers the future of independent tech journalism.

Already have an account? Sign In.

📣 Want to advertise in AIM? Book here

Mohit Pandey

Mohit writes about AI in simple, explainable, and often funny words. He's especially passionate about chatting with those building AI for Bharat, with the occasional detour into AGI.