Data labelling is a key process in machine learning. It facilitates in training machine learning models and accelerates the development of artificial intelligence. Data annotation is frequently outsourced to data labelling firms, which annotate images, videos, audios and text language. In addition to providing outsourcing data annotation services to firms, data labelling companies have also collaborated and partnered with firms to enable research and innovation in the field of data annotation and AI. This article presents the top five data labelling projects of 2021.
Scale AI and Oxford University’s Reddit Data Set
Scale AI, a data annotation platform, has collaborated with Oxford University to build a comprehensive dataset on online debates and discourse. Natural language processing is currently in its nascent stage, and NLP models often struggle with understanding the context of online exchanges. For example, the NLP models fail to process slang, sarcasm, context-specific jokes, and diverse online interactions by default.
Scale AI and Oxford University created a dataset, ‘Debagreement’, containing comment-reply interactions across five subreddits: Democrats, Republicans, Black Lives Matter, Brexit, and Climate. Each comment-reply interaction is annotated with “agree,” “disagree,” “neutral,” or “unsure” labels by at least three raters, allowing the ML model to detect the stance of Redditors in online discourse. The collaborative project has been viewed as the first step in training socially aware language models.
Sign up for your weekly dose of what's up in emerging technology.
CrowdAI’s Response to Hurricane Ida
CrowdAI is a computer vision annotation platform. Hurricane Ida has been one of the deadliest and destructive hurricanes to hit the United States. To support humanitarian aid and disaster response efforts, CrowdAI has open-sourced its Hurricane Ida building damage data. The company produced the data through a multiclass image segmentation model trained to detect wind damage to buildings in aerial imagery.
icometrix and Aidoc Collaboration
icometrix, the world leader in imaging AI solutions for people with neurological conditions, has announced a partnership with Aidoc, a leading provider of enterprise AI solutions for medical imaging. The two AI platforms are on a joint mission to armour clinicians and radiologists with AI image solutions for triaging stroke patients.
To assess the severity of the damage, icometrix’s stroke solution ‘icobrain’ quantifies the volume of the lesions in the core and perfusion. Complementarily, the Aidoc stroke solution provides real-time notifications of patients with suspected large vessel and internal haemorrhage. This combined effort creates one of the most comprehensive neuro suites.
Spacept And Superannotate Collaboration
Spacept is a platform for AI automated satellite analysis, which uses images to prevent fires and power outages caused by extreme weather. Due to global warming, there is an increasing need for frequent inspections. The acceleration of inspections has been aided and facilitated by Superannotate, which has provided high-quality computer vision annotations.
Toloka’s New Dataset
Toloka, a data annotation platform, has announced a new dataset, ‘IMDB-WIKI-SbS’, that focuses on subjective human responses to improve human-centric AI systems. For the project, Tokola has partnered with Data-Centric AI, and the two companies are on a joint mission to make data ownership universally available. The dataset contains over 9,000 images that have been annotated to appear in 250,249 pairs. The project will eventually facilitate the development of computer systems involving retail, e-commerce, recommendations, rankings, etc.
Emerging Data Labelling Projects
Today, there are plenty of other data annotation projects that are solving real-time human problems. Via data annotation, companies can evaluate the damage caused by a wildfire or a hurricane. They can monitor the earth for real-time images to enable situational awareness. Medical AI projects are saving thousands of lives, with iCovid, a lung imaging AI solution for analysis of chest CT scans, being a leading example. Thus, data annotation projects lead to an acceleration in human convenience.