Last updated November 16, 2021
In Tech & AI Blend

How This Delhi-based Startup Is Building A Time Machine For AI

DataNeuron witnessed a reduction of 97 per cent in a number of paragraphs validated when compared to manual human-in-loop labelling.

Published on November 16, 2021

by Amit Raja Naik

With the rapid adoption of AI across enterprises, the need for labelled data has increased significantly over the last few years. However, many companies are still using manual annotation, even to this day. This leads to human bias, affecting the model accuracy. Studies show that 85 per cent of machine learning (ML) and artificial intelligence (AI) projects fail or do not progress beyond minimal viable product (MVP) due to the quality and quantity of labelled data.

This is where New Delhi and Palo Alto-based DataNeuron comes into play. The company helps in accelerating and automating human-in-loop labelling for developing AI solutions. It automates data labelling, the creation of models, and end-to-end lifecycle management of ML. In other words, it is building a time machine for AI.

Founded in May 2021, DataNeuron was started by Bharath Rao (also the founder of Precily AI), alongside Nishant Chhetri, Rohit Goyal, Anil Advani and Rohit Adlakha. Soon, the company is roping in Sheetal D as a co-founder. It is currently backed by early-stage venture capital firm Windrose Capital and global technology law firm Inventus Law.

However, DataNeuron is not alone in this space. Other global players include Snorkel, Scale AI, IBM Watson, and Appen.

Team DataNeuron said that existing data annotation platforms today provide a limited set of features for data annotation, such as named entity recognition, ML transcription, etc. Plus, companies do not have a secured platform for data exchange/model creation.

This is where DataNeuron differs from other players in the market.

Some of the key highlights include:

DataNeuron provides fully automated annotation/labelling with minimal validation based on label heuristics and parameters.
The platform needs only a Masterlist instead of other platforms, which requires multiple weak learner labelling functions to be defined by the user.
Incremental and evolving annotation. Meaning, it tweaks the Masterlist in real-time, also known as Dynamic Masterlist Support.
It is an end-to-end ML lifecycle management platform with AutoML, no-code prediction, and optimisation.
It allows data prediction without writing any code, and AI/Masterlist Suggestions improves the model performance.
The platform supports strategic annotation capturing more information in a lesser amount of data with active learning.

Here’s a sneak-peak of its platform:

How This Delhi-based Startup Is Building A Time Machine For AI — DataNeuron’s Masterlist (Source: DataNeuron/Microsoft Azure Marketplace)

DataNeuron Tech Stack

Rao told Analytics India Magazine that they have multiple tech stacks running within DataNeuron. For user interaction and workflow, its platform is built on the MERN framework. For infrastructure requirements, the platform is deployed on Microsoft Azure. Data is stored on Azure secured cloud storage.

“We have various algorithms from unsupervised to context-based filtering algorithms built grounds-up to automate the entire data annotation pipeline,”
– Bharath Rao, founder and CEO at DataNeuron.

Check out the complete details of how DataNeuron works here.

Tech behind DataNeuron

Rao said DataNeuron uses self-supervised learning and has made a significant breakthrough with its automated learning platform (ALP), which automates data labelling and eliminates human-in-the-loop annotation.

DataNeuron ALP provides labelled data based using an ensemble algorithm by analysing the Masterlist and relevant label parameters. Interestingly, its platform does not require any pre-training or rules. “We have ‘active learning’ to retrain the model from the validations and reduce the user interaction,” said Rao.

Further, he claimed that their platform saw a reduction of 97 per cent in a number of paragraphs validated when compared to manual human-in-loop labelling. “We have tested on multiple domains and datasets: ALP has achieved comparable accuracy (within ~1-2 per cent margins) to the state-of-the-art solutions with just 2 per cent of the labelled data when compared to human-in-loop labelling,” said Rao.

While DataNeuron is set to increase efficiency by bringing 90 per cent; first-pass machine accuracy relative to manual effort, the team said it reduced project staffing by 70-90 per cent and an RoI of 200-400 per cent.

Eyes Expansion

According to Grand View Research, the global data annotation tools market is expected to grow at a CAGR of 27.1 per cent from 2021 to 2028. In 2020, however, the market size was valued at $494 million. The sector’s growth is driven by the massive adoption of data annotation tools in the automotive, retail, and healthcare sectors.

DataNeuron is looking to target customers from ITeS, data science, knowledge-based industries, life sciences, and tax and legal domains. It is now available on Microsoft Azure Marketplace. “We are currently acquiring customers through direct contacts, partners, through leadership and advisors,” said Rao. However, it is planning to start targeted advertising campaigns in 2022.

Road Ahead

“We are currently working towards auto-validation to future reduce the human-in-loop validation based on accuracy/confidence,” said Rao.

He said they are also launching an Advanced Masterlist to support subjective labelling of datasets (where clear class distribution is missing), custom NER, model versioning to support datasets that require constant changes to support incremental learning cycles, and lastly, multi-user validation (weighted voting).

DataNeuron believes that the focus within machine learning will shift from algorithms to high-value and explainable data. “We are continuing our research in artificial general intelligence (AGI) to enable 100 per cent automation in data labelling required to scale supervised learning-based algorithms,” said Rao.

Further, he said their goal is to scale the development of AI models by providing better data, explainability of AI and reducing the opinion bias caused by human-in-loop labelling. “We also want to scale DataNeuron’s capabilities beyond NLP, with possible applications in computer vision, audio, and image labelling,” said Rao, sharing the roadmap.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Amit Raja Naik

Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

India Draws Inspiration from Census To Collect Data for AI

Pritam Bordoloi

India’s census efforts involved sending trained enumerators to every household in India and collecting data based on various socio-economic parameters.