MITB Banner

5 Data Labelling Projects That Impacted The AI Industry The Most

The article explores the emerging data labelling projects that are innovative pathways for human convenience.

Share

Data labelling is a key process in machine learning. It facilitates in training machine learning models and accelerates the development of artificial intelligence. Data annotation is frequently outsourced to data labelling firms, which annotate images, videos, audios and text language. In addition to providing outsourcing data annotation services to firms, data labelling companies have also collaborated and partnered with firms to enable research and innovation in the field of data annotation and AI. This article presents the top five data labelling projects of 2021. 

Scale AI and Oxford University’s Reddit Data Set 

Scale AI, a data annotation platform, has collaborated with Oxford University to build a comprehensive dataset on online debates and discourse. Natural language processing is currently in its nascent stage, and NLP models often struggle with understanding the context of online exchanges. For example, the NLP models fail to process slang, sarcasm, context-specific jokes, and diverse online interactions by default. 

Scale AI and Oxford University created a dataset, ‘Debagreement’, containing comment-reply interactions across five subreddits: Democrats, Republicans, Black Lives Matter, Brexit, and Climate. Each comment-reply interaction is annotated with “agree,” “disagree,” “neutral,” or “unsure” labels by at least three raters, allowing the ML model to detect the stance of Redditors in online discourse. The collaborative project has been viewed as the first step in training socially aware language models. 

CrowdAI’s Response to Hurricane Ida 

CrowdAI is a computer vision annotation platform. Hurricane Ida has been one of the deadliest and destructive hurricanes to hit the United States. To support humanitarian aid and disaster response efforts, CrowdAI has open-sourced its Hurricane Ida building damage data. The company produced the data through a multiclass image segmentation model trained to detect wind damage to buildings in aerial imagery. 

icometrix and Aidoc Collaboration 

icometrix, the world leader in imaging AI solutions for people with neurological conditions, has announced a partnership with Aidoc, a leading provider of enterprise AI solutions for medical imaging. The two AI platforms are on a joint mission to armour clinicians and radiologists with AI image solutions for triaging stroke patients. 

To assess the severity of the damage, icometrix’s stroke solution ‘icobrain’ quantifies the volume of the lesions in the core and perfusion. Complementarily, the Aidoc stroke solution provides real-time notifications of patients with suspected large vessel and internal haemorrhage. This combined effort creates one of the most comprehensive neuro suites. 

Spacept And Superannotate Collaboration 

Spacept is a platform for AI automated satellite analysis, which uses images to prevent fires and power outages caused by extreme weather. Due to global warming, there is an increasing need for frequent inspections. The acceleration of inspections has been aided and facilitated by Superannotate, which has provided high-quality computer vision annotations. 

Toloka’s New Dataset 

Toloka, a data annotation platform, has announced a new dataset, ‘IMDB-WIKI-SbS’, that focuses on subjective human responses to improve human-centric AI systems. For the project, Tokola has partnered with Data-Centric AI, and the two companies are on a joint mission to make data ownership universally available. The dataset contains over 9,000 images that have been annotated to appear in 250,249 pairs. The project will eventually facilitate the development of computer systems involving retail, e-commerce, recommendations, rankings, etc. 

Emerging Data Labelling Projects

Today, there are plenty of other data annotation projects that are solving real-time human problems. Via data annotation, companies can evaluate the damage caused by a wildfire or a hurricane. They can monitor the earth for real-time images to enable situational awareness. Medical AI projects are saving thousands of lives, with iCovid, a lung imaging AI solution for analysis of chest CT scans, being a leading example. Thus, data annotation projects lead to an acceleration in human convenience. 

Share
Picture of Abhishree Choudhary

Abhishree Choudhary

Abhishree is a budding tech journalist with a UGD in Political Science. In her free time, Abhishree can be found watching French new wave classic films and playing with dogs.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.