Data annotation and labelling is one such area, which is reportedly an emerging sector in India that has been creating jobs in recent years.
A slew of startups today offer automated services that promise quality candidates to companies at lower costs, eliminating the need for expensive recruitment and talent acquisition teams. However, machine learning algorithms that enable these ‘automated’ functions need to be trained with relevant data, and that process involves a lot of steps in the data preparation stage, some of which may require human input and control.
Data annotation and labelling is one such area, which is reportedly an emerging sector in India that has been creating jobs in recent years. In fact, according to a study conducted by Grand View Research, the market size of the global data annotation tools is expected to reach $2.57 billion by 2027. Amid mass layoffs and increased automation, India can leverage the opportunity presented by this market to give a leg up to its workforce as existing jobs become scarce.
Potential Of Data Labelling As Automation Looms
Machine learning algorithms must be trained on a vast pool of images that need to be precisely marked and appropriately labelled. For instance, in image recognition systems in a supervised learning application, large volumes of clean and properly labelled data is required to train it over multiple iterations to build a model that can accurately recognise future images.
But data labelling is a time-intensive and human labour-oriented process in the overall cycle of machine learning projects. Although some companies like Google, Amazon and Facebook employ user-driven labelling (a case in point being CAPTCHA), even large companies like these still heavily rely on human input to drive their data labelling needs for their ML projects.
ALSO READ: Bot-Heavy Data Labelling Platforms Are The Cause Of Bad Results In AI Research
This presents a unique opportunity for India to direct its workforce to use their cognitive power in data labelling tasks for machine learning purposes. This is because, in addition to using customers as well as internal workforce to do the labelling work for them, many companies also take recourse to outsourced or third-party human labour for well-labelled data to help them build robust ML capabilities.
This is because it is critical for these companies to get access to good quality data for their ML projects. And customers who in most cases, don’t even know they are participating in the data labelling process cannot ensure accuracy, and the same may hold true for easy but reluctant workforce within companies. Moreover, since labelling demands careful decision making, ensuring quality, especially for large-scale data labelling efforts, is of utmost importance.
Data Labelling Providers In India
There are several firms in India that provide data annotation and labelling services to
large enterprises like Microsoft, TripAdvisor, eBay, and Autodesk, among others. A leading example is iMerit, which reportedly employs more than 2,500 people in its Kolkata office and serves over 100 global clients. Its all-female workforce labels data that powers algorithms in ML, Computer Vision and NLP, as well as verifies and audits the annotations.
What is more, the company has also partnered with AWS to provide its services on SageMaker Ground Truth. With this, iMerit can provide data labelling service to customers who use SageMaker as well, expanding their revenue stream and giving them more exposure.
From training driverless cars to developing better risk analysis models in insurance, iMerit enhances innovations in advancing several critical and relevant use cases today.

For instance, one of the largest market segments for iMerit has been autonomous vehicles. How does the company provide image recognition labelling and annotation here? Its employees study image and sensor data to segment and annotate over 200 million images and videos to power computer vision (CV) algorithms.
The company’s CV tools can also work in tricky edge cases and multi-frame sequences to give clients an edge. In fact, these algorithms are used by clients from diverse fields, including medical imagery analysis, augmented reality, agriculture, robotics and even sports. According to the company, these workflows can be quite complex, and hence, demands deep expertise.
According to a report, iMerit aims to soon expand its workforce to 10,000 employees, and has since its launch, opened offices in the US and Bhutan as well. Another startup offering similar data preparation services to big clients is Kerala-based Infolks. Started in 2015 from meagre beginnings, it currently employs over 250 people in its office and is poised to expand with the automation boom.
Building on the previous example of the criticality of data labelling in advancing autonomous vehicles, the startup provides clients with a wide range of datasets for training and validation purposes. From annotating objects like vehicles and pedestrians on the street to detecting number plates, analysing semaphore, lane differentiation, in-cabin monitoring, sensor modelling, and more, it packs together a range of annotation techniques to serve various use cases of clients.