An AI or machine learning model is as good as the data it is trained on. As things stand, 80 percentage time of an AI project is earmarked for wrangling training data, including data labelling. Data labelling takes up the bulk of data scientists’ time, which could otherwise have been devoted to building the algorithm.
Many companies outsource data labelling and annotation so the data scientists can focus on algorithm development and avoid project delays. According to a 2019 Cognilytica report, the market valuation of third-party data labelling services is projected to cross $1 billion by 2023.
India has emerged as the top outsourcing destination for data labelling for apparent reasons. Globalisation, demographic advantage, and cheap labour, to name a few. Thanks to the BPO boom in the country at the turn of the century, the Indian workforce was more than ready to lap up data labelling jobs.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Since data labelling is a process-driven task, even a person with a high-school education can pick up the required skills. The market for data annotation and labelling is exploding in India.
“AI requires properly annotated, classified and anonymised data. For this, whether you like it or not, you will use automation, but you will also have to use a skilled human workforce, and that is the opportunity it presents for India,” said Sangeeta Gupta, Senior VP at NASSCOM, in an earlier interview.
In the initial stages, Amazon’s MTurk was the go-to portal to find data labelling and annotating jobs. Freelancers will get paid based on tasks completed. However, Amazon soon put a restriction on non-US workers (lifted later).
Amazon MTurk paved the way for similar organisations. The popular ones include:
Playment: It is a complete data labelling platform founded by ex-Flipkart employees–Siddharth Mall, Ajinkya Malasane and Akshay Kumar Lal. It breaks down large labelling tasks into micro-tasks and distributes them among its community of trained annotators. As of 2019, Playment’s platform has 300,000 labellers and annotators. A labeller attached with Playment can earn up to 30,000 per month.
iMerit: iMerit’s data labelling services are used in advanced machine learning algorithms, computer vision, natural language processing, augmented reality, and data analytics. As per the company, its workforce is adequately trained to label data for transformative technologies such as cancer research, driverless car training, and crop yield optimisation. The company is funded by Omidyar Network and Micheal and Susan Dell Foundation.
Infolks: Founded in 2016, Infolks is among India’s top data labelling companies. It offers services in machine learning, artificial intelligence, training data as a service, image annotation and data categorisation.
“Numerous data labelling firms have sprung up to address this growing need, and many of them are tapping into a global pool of ‘gig workers’ that can get this done effectively. Software and algorithms make it easier to divvy up tasks and have people work at their convenience. India offers a huge talent pool with ready access to smartphones and the ability to tap into a new income source or to supplement their earnings. Time difference, in this case, can even be an asset,” said Girish Muckai, Chief Sales & Marketing Officer of HEAL Software Inc.
“Training AI models to deliver high levels of accuracy is critical to success. However, labelling training data sets is tedious work. It’s time consuming, complex and requires significant workforce. The tech industry’s outsourcing boom in India and its large population, make it a growing hotbed of this precision work. Its people and skills position India as a key resource for years to come in an increasingly digital world,” said Lori McKellar, Senior Director, Product Marketing at OpenText.
“India has emerged as a huge pool of employable workers to undertake data labelling jobs. The reasons are essentially the same which led to the expansion of the BPO/KPO service industry in India in the past 20 years:
- Cost-effective workforce
- English literacy and basic computing skills
- High speed and cheap internet
- Stable economy – compared to some other East-European/African/South-Asian countries
The need to provide a reliable and cheap way to produce training data is paramount now. Most of these are quite low pay + low skill jobs (compared to an average software developer) and require considerable basic training for the employee to become autonomous. Very soon, other developing economies like Romania, Indonesia, Vietnam, the Philippines etc. are likely to follow through and join this sphere, mostly due to the same factors/reasons. If India wants to maintain a lead in this market, we’ll have to keep evolving consistently by providing similar support to other AI operations which require more complexity and mid to high level of technical competency,” said Shishir Thakur, CEO and founder, Cranberry Tech.