MITB Banner

AI4Bharat Releases Airavata: An Instruction-tuned Hindi LLM

The research lab has also released the instruction tuning datasets to enable further research for IndicLLMs

Share

airavata ai4bharat

AI4Bharat, an AI research lab incubated at IIT Madras, has released Airavata, an instruction-tuned model for Hindi. The model has been built by fine-tuning Sarvam AI’s OpenHathi, with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks, it said in a blog post. 

Along with Airavata, AI4Bharat has also released the instruction tuning datasets used for the model to enable more innovation in the IndicLLM space. 

“We rely on human-curated, license-friendly instruction-tuned datasets to build ‘Airavata’. We do not use data generated from proprietary models like GPT-4 etc. We think this is a more sustainable way of building instruction-tuned models at scale for most Indic languages, where relying on distilled data from commercial models would increase costs and restrict their free usage in downstream applications due to licensing restrictions,” it said.

Effective performance of LLMs relies significantly on high-quality instruction tuning datasets. Unfortunately, there is a scarcity of diverse datasets available for Hindi.

AI4Bharat’s approach in developing Airavata involves translating well-constructed English-supervised instruction-tuning datasets into Hindi. For this translation task, we leverage IndicTrans2, a state-of-the-art open-source machine translation model specifically designed for Indian languages, it added.

Previously, AI4Bharat introduced Chitralekha,  an open-source AI-powered video transcreation platform developed in partnership with EkStep. 

It has an integrated workforce management system, which enables end-to-end transcreation of a video from one language to another through the stages of transcription, translation and voice-over for the translated language. 

Earlier this month, AI4Bharat announced the hiring process for its AI resident (and associates) programme for 2024-25. This year-long, pre-doctoral programme focuses on intensive work in NLP, speech, and vision projects.

Share
Picture of Pritam Bordoloi

Pritam Bordoloi

I have a keen interest in creative writing and artificial intelligence. As a journalist, I deep dive into the world of technology and analyse how it’s restructuring business models and reshaping society.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.