5 Data Labelling Projects That Impacted The AI Industry The Most

The article explores the emerging data labelling projects that are innovative pathways for human convenience.

Data labelling is a key process in machine learning. It facilitates in training machine learning models and accelerates the development of artificial intelligence. Data annotation is frequently outsourced to data labelling firms, which annotate images, videos, audios and text language. In addition to providing outsourcing data annotation services to firms, data labelling companies have also collaborated and partnered with firms to enable research and innovation in the field of data annotation and AI. This article presents the top five data labelling projects of 2021. 

Scale AI and Oxford University’s Reddit Data Set 

Scale AI, a data annotation platform, has collaborated with Oxford University to build a comprehensive dataset on online debates and discourse. Natural language processing is currently in its nascent stage, and NLP models often struggle with understanding the context of online exchanges. For example, the NLP models fail to process slang, sarcasm, context-specific jokes, and diverse online interactions by default. 

Scale AI and Oxford University created a dataset, ‘Debagreement’, containing comment-reply interactions across five subreddits: Democrats, Republicans, Black Lives Matter, Brexit, and Climate. Each comment-reply interaction is annotated with “agree,” “disagree,” “neutral,” or “unsure” labels by at least three raters, allowing the ML model to detect the stance of Redditors in online discourse. The collaborative project has been viewed as the first step in training socially aware language models. 


Sign up for your weekly dose of what's up in emerging technology.

CrowdAI’s Response to Hurricane Ida 

CrowdAI is a computer vision annotation platform. Hurricane Ida has been one of the deadliest and destructive hurricanes to hit the United States. To support humanitarian aid and disaster response efforts, CrowdAI has open-sourced its Hurricane Ida building damage data. The company produced the data through a multiclass image segmentation model trained to detect wind damage to buildings in aerial imagery. 

icometrix and Aidoc Collaboration 

icometrix, the world leader in imaging AI solutions for people with neurological conditions, has announced a partnership with Aidoc, a leading provider of enterprise AI solutions for medical imaging. The two AI platforms are on a joint mission to armour clinicians and radiologists with AI image solutions for triaging stroke patients. 

To assess the severity of the damage, icometrix’s stroke solution ‘icobrain’ quantifies the volume of the lesions in the core and perfusion. Complementarily, the Aidoc stroke solution provides real-time notifications of patients with suspected large vessel and internal haemorrhage. This combined effort creates one of the most comprehensive neuro suites. 

Spacept And Superannotate Collaboration 

Spacept is a platform for AI automated satellite analysis, which uses images to prevent fires and power outages caused by extreme weather. Due to global warming, there is an increasing need for frequent inspections. The acceleration of inspections has been aided and facilitated by Superannotate, which has provided high-quality computer vision annotations. 

Toloka’s New Dataset 

Toloka, a data annotation platform, has announced a new dataset, ‘IMDB-WIKI-SbS’, that focuses on subjective human responses to improve human-centric AI systems. For the project, Tokola has partnered with Data-Centric AI, and the two companies are on a joint mission to make data ownership universally available. The dataset contains over 9,000 images that have been annotated to appear in 250,249 pairs. The project will eventually facilitate the development of computer systems involving retail, e-commerce, recommendations, rankings, etc. 

Emerging Data Labelling Projects

Today, there are plenty of other data annotation projects that are solving real-time human problems. Via data annotation, companies can evaluate the damage caused by a wildfire or a hurricane. They can monitor the earth for real-time images to enable situational awareness. Medical AI projects are saving thousands of lives, with iCovid, a lung imaging AI solution for analysis of chest CT scans, being a leading example. Thus, data annotation projects lead to an acceleration in human convenience. 

More Great AIM Stories

Abhishree Choudhary
Abhishree is a budding tech journalist with a UGD in Political Science. In her free time, Abhishree can be found watching French new wave classic films and playing with dogs.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM