MITB Banner

Tech Mahindra to Launch OpenAI Rival ‘Project Indus’ Early Next Year

Project Indus team has gathered 1.2 terabytes of data in Hindi and its related dialects

Share

Listen to this story

Indian IT firm Tech Mahindra intends to launch Project Indus, its LLM designed for Hindi and its 37 dialects, by the end of December or early January, reported Economic Times. This initiative comes four months after the company introduced Project Indus, a strategic effort by the fifth largest software services firm to develop a foundational model for Indian languages.

Over the last two months, the 15-member Project Indus team has gathered 1.2 terabytes of data in Hindi and related dialects. Currently, they are working on refining this data into web text, which they plan to release as open source by the end of November, stated Nikhil Malhotra, global head of maker’s lab at Tech Mahindra, the report added. 

“In the meantime, we have started constructing the model… We are looking at probably the end of December or starting of January, we will release the model for at least Hindi and its dialects. And then the other work starts for other dialects in other regions,” Malhotra said.

The team encountered difficulties related to data availability and collection. “In Hindi, the maximum number of tokens available is about 2.8 billion, which doesn’t meet the model’s requirements. For instance, to create a 7 billion parameter model, I would need at least around 100 billion tokens,” explained Malhotra.

At the beginning, a portal was established to gather voice samples in local dialects through crowd-sourcing. Initially, there were 1,500 responses within the first two days, but the response gradually decreased. In total, only 6,000 samples were received, Malhotra informed ET.

To address this, teams were dispatched to regions like Uttar Pradesh, Madhya Pradesh, Haryana, and Jammu to collect data in person. Additionally, the Hyderabad campus of Tech Mahindra organized a camp where employees contributed samples in dialects like Hyderabadi Dakhini.

According to Tech Mahindra’s chief CP Gurnani, the model will be the biggest Indic LLM and could possibly cater to 25% of the world’s population. While Tech Mahindra has not revealed the cost associated with the project or when the model is expected to be launched, the aim is to build a 7-billion parameter LLM to begin with, Malhotra, told AIM in an exclusive interview last month.

Share
Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.