Why Are Data Labelling Firms Eyeing Indian Market?

Toloka, the crowdsourced data-labelling service provider incorporated in Switzerland, has announced its plan to tap into India’s billion-strong worker base. The company assists businesses that use artificial intelligence and machine learning in refining and improving the quality of the data.

“We’re excited to continue our push into India, as well as into the surrounding areas, including Pakistan, Myanmar, Bangladesh, and Indonesia. Our 300,000 Tolokers from this region have already proven extremely valuable to our customers, but a great deal of untapped potential remains in this market. We hope to see new talents from India and beyond on our platform soon and look forward to setting new industry standards together,” said Olga Megorskaya, Toloka CEO.

Analytics India Magazine spoke with Olga Megorskaya, Founder and CEO of Toloka, to understand the company’s plan, partners and expansion strategy.


Sign up for your weekly dose of what's up in emerging technology.

Bet on Indian market

The Indian market seems to be exciting for Olga Megorskaya. She said, “Its sheer size and prevalence of STEM education mean that there is a vast pool of talents, who could potentially take on data-labelling tasks. We call such people Tolokers. While there are already around 300,000 Tolokers from India and the surrounding region on our platform, we believe that the area’s full potential remains largely untapped.”

The company hopes to see new Tolokers from India join this growing industry, gain data-labelling skills and earn extra income on a schedule that works for them. Toloka is open to new partnerships that fit its model and ethos in India. “However, we do work with a wide variety of companies around the world that use AI in their processes,” said Olga. 

The Indian artificial intelligence market is valued at $6.4 billion in 2020, and is expected to further rise as per a report from AIMResearch and Jigsaw academy.

“AI and Machine Learning rely on vast streams of high-quality data to carry out calculations and improve their outcomes. A team of human data labellers, or Tolokers, work behind the scenes to ensure the quality of the data is untarnished. Tolokers complete data-labelling tasks, which then go through a quality control mechanism, ensuring only the highest quality data is fed back to our clients and their AI and ML services,” Olga Megorskaya said.

The expansion opens up a lot of employment opportunities. As per Olga, any individual can choose which tasks they’d like to complete on the platform and earn extra income while contributing to AI and ML technology advancement. “While we are looking forward to training more Tolokers from India and the surrounding countries, I’d like to stress that Toloka is a truly global platform,” she added.

Use cases

“One case study that demonstrates Toloka’s capabilities is our work with chatbots. We’ve all seen how difficult it could be for chatbots to carry out a conversation that mimics human speech. In addition to issues with authenticity, there have been instances where chatbots trained on open-source materials have engaged in inappropriate and offensive speech. Toloka can help perfect chatbots and we have demonstrated our abilities in this area in a joint programme with DeepHack hackathon. Contestants in the hackathon created their own chatbots, which engaged in conversations with our real-life Tolokers.

“Our Tolokers, meanwhile, were rating every response from the chatbot, providing it with the data needed to learn and improve its ability to conduct lifelike and accurate responses. Over 4 days, 200 Tolokers rated 1,800 dialogues, with excellent results – the quality of conversation from the chatbots was dramatically improved,” Olga said.

“Toloka’s services were also used successfully to improve self-driving technology. An important task for the creator of a self-driving vehicle is to train it to extract information about its surroundings from the data it receives from sensors. During the ride, the car records everything it sees around it. This data is uploaded to the cloud, where the preliminary analysis is completed, and then it goes to post-processing, which includes labeling the data. The labeled data is sent to the machine learning algorithms, the result is returned to the vehicle, and the cycle repeats, improving the quality of object detection through multiple iterations. Tolokers labeled tens of thousands of images to train the neural networks to recognize the objects a car might encounter on the streets.

To do this the developers added their own visual editor, which has layers, transparency, selection, zoom, and classification (you can embed any interface in Toloka and send data via the API). This increased the speed and quality of the data labeling by a long way. In addition, the API allows you to automatically split tasks into simpler ones and then piece the results together.

“For example, before labelling an image, you can select what objects there are in it. This will make it clear which classes to use for labelling the image. In addition to human Tolokers, neural networks can also be used to perform labelling. Some networks have already learned to do this task as well as people do, but the quality of their work also needs to be evaluated. That’s why tasks have a mix of images labelled by Tolokers and by a neural network. This way, Toloka is integrated directly into the training of neural networks and becomes part of the general machine learning pipeline,” Olga explained.

More Great AIM Stories

kumar Gandharv
Kumar Gandharv, PGD in English Journalism (IIMC, Delhi), is setting out on a journey as a tech Journalist at AIM. A keen observer of National and IR-related news.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM