TIME recently covered a brilliant piece on how the rural market is contributing to the development of AI systems such as ChatGPT and alike, where it not only highlighted the rising demand for training datasets in one’s own native language, but emphasised the need for more inclusive solutions. The company behind this led us to Bangalore-launched Karya, a nonprofit organisation that is working towards accelerating social mobility in the country via AI training and upskilling.
Manu and Vivek were Microsoft researchers. From 2017 to 2020 Karya was a Microsoft incubated centre, post which it has been functioning as a separate private entity. Exploring for a project by Microsoft that involved data collection in particular languages involved seeking a partnership with people from rural areas, however, the approach could not be a direct one – building trust was the main challenge.
That is where Jeevitha Satheeshkumar joined them, currently the Director of Operations at Karya. With over 10 years of experience as a software trainer, she began experimenting with projects with a social implication very early on. She has been part of campaigns such as ‘Plastic Free India’ to spread awareness amongst mothers to use stainless steel feeding bottles as opposed to plastic feeding bottles. Her foray into language training happened when she started freelancing as a Tamil linguist. With the company requiring over 100 people for language transcription, she then looked into tapping the rural market.
“It is difficult for people who live in rural areas to get opportunities outside their area, and with online job frauds and scams, trust becomes a problem- I wanted to change that so I started training people on data annotating or labelling.”
From here, for over six years, Satheeshkumar became a second party vendor for Microsoft, Google and other companies, and expanded the training to over 5000 people who work with multiple languages. She eventually built her own company, and gave opportunities to rural people, homemakers and even those with physical disabilities. Understanding the cyclical nature of demand for this service, she decided to build it as a product offering. “In India, only two or three companies have their own transcription tool/ AI tool to do this kind of work.”
The tool named ‘Labely’ was built to perform functions such as transcription and various types of annotations. Multiple modules such as manager, proofreader, transcriber were built onto the platform. “Any company that wishes to create a data set can use this tool.”
“Having a shared vision of creating the same kind of social impact, creating opportunities for rural people and helping communities looking for a job or some kind of learning, we joined hands,” said Satheeshkumar. “Last year, we successfully completed our first pilot project with IIT Madras”
“It is not possible to go directly to them and offer them a job, as there is no trust element built between the parties. So, we connected with local NGOs, as they are aware of the right set of people who would benefit from this. They started in a remote area in Rajasthan and gave data collection work in Rajasthani dialect,” said Sathheshkumar. Karya has completed over 30M+ digital tasks.
“Our goal is to reach 100,000 rural Indians by the end of this fiscal year, 1.5 million rural Indians by next fiscal year and 100 million rural Indians by 2030. Fundamentally, Karya’s goal is to use technology to accelerate social mobility in rural India. It currently takes an average low-income Indian over 7 generations to make $1500 in savings and a Karya worker can make the same amount in less than a year. We think of Karya not as a job, but as societal wealth distribution,” said Chopra, in an exclusive interview with AIM.
Designed for Rural India
The Karya application has been built such that anyone can use it with ease. The app is simple with an easy user interface, where a person can read and record the text shown in the app, with an option to re-record as well. A basic Android phone is required for using the app. However, Karya is working towards catering to low end phone models too, owing to rural areas where the target group might not have access to an android phone but a lower version too.
Earlier, voice recordings were done on phones and shared via Google Drive, but with the Karya app, it can be directly recorded. Once the recording is submitted, it moves to a ‘proofreader validator.’ First there is an automation process to check the quality of audio- to check for missing speech or non technical inputs in audio. Post validation, it then moves to ‘transcriber’. The process involves segmentation(classification) and transcription, where labeling and transcribing exact word-by-word for the recorded speech happens. It then reaches the final process of validation and the file is converted into either a txt or JSON format depending on their customer’s requirement.
Rural Supersedes Urban
People in rural areas show better prowess when it comes to learning and implementing new skills. “For language recordings, people who live in rural areas would be able to do it better than their urban counterparts. Owing to having studied and schooled in their native language, they know the language better than others.” Satheeshkumar has also learned that people who live in villages have better grasping power than the ones from cities. “The people there are focussed and always ready to do anything that will help them. As opposed to people in urban areas who can get distracted, the people here are fully focussed during training and listen attentively without batting an eyelid. Though such opportunities are high-hanging fruits for them, we want to bring it to their doorsteps.” said Satheeshkumar.
Socially Fueling the Indian AI Ecosystem
In addition to working with big tech companies and foundations such as Microsoft, Bill and Melinda Gates Foundation, Karya has also partnered with universities such as MIT, Stanford, IIT Madras and IIT Bombay. Currently, they are working with AI4Bharat, which builds open-source AI for Indian languages. Touted as a promising player for revolutionising the LLM space in India, AI4Bharat has over 200 translators over 22 Indian languages. Karya generates and provides datasets for this Indic platform.
With a simple linear model of functioning, the workers who provide their service are rewarded justly- from a minimum of $5 per hour, it can go as high as $30/hr, depending on skill sets. “Other companies who outsource such projects involve multiple vendors with multiple stages of transfer, ultimately leading to the last man receiving $1 or $2. However, in Karya, the structure is simple with no middle vendors. It goes : customer, Karya then workers. If a person is able to provide transcription worth $30 daily, in a month they can receive up to INR 50000. The payment process is automated as well. After validation of the tasks completed by the workers, within 10 to 15 days payment is done through UPI or bank account.
Identifying the Workers
As there will never be a shortage of people for these tasks, Karya ensures that it reaches the right set of people. Initially, when the app was launched, the first 100 downloads were done by men from urban areas who can do high level jobs, which defeats the purpose. After which, any form of task requirements are approached via NGOs in relevant areas who help connect with their right audience.
There are over 180 NGO partners for Karya. “Currently we are working on two projects that require a minimum of 800 to 1000 people in each district. India has 766 districts and employing a 1000 for each would require our NGO partners for the right identification.”
Speaking about future plans, Manu said, “To scale our operations, we want to capture a bigger section of the global AI training data market. To do this, we have to raise more awareness about our work and work with many more big tech companies.” said Manu