MITB Banner

How This Delhi-based Startup Is Building A Time Machine For AI

DataNeuron witnessed a reduction of 97 per cent in a number of paragraphs validated when compared to manual human-in-loop labelling.
Share
How This Delhi-based Startup Is Building A Time Machine For AI

With the rapid adoption of AI across enterprises, the need for labelled data has increased significantly over the last few years. However, many companies are still using manual annotation, even to this day. This leads to human bias, affecting the model accuracy. Studies show that 85 per cent of machine learning (ML) and artificial intelligence (AI) projects fail or do not progress beyond minimal viable product (MVP) due to the quality and quantity of labelled data. 

This is where New Delhi and Palo Alto-based DataNeuron comes into play. The company helps in accelerating and automating human-in-loop labelling for developing AI solutions. It automates data labelling, the creation of models, and end-to-end lifecycle management of ML. In other words, it is building a time machine for AI. 

Founded in May 2021, DataNeuron was started by Bharath Rao (also the founder of Precily AI), alongside Nishant Chhetri, Rohit Goyal, Anil Advani and Rohit Adlakha. Soon, the company is roping in Sheetal D as a co-founder. It is currently backed by early-stage venture capital firm Windrose Capital and global technology law firm Inventus Law

However, DataNeuron is not alone in this space. Other global players include Snorkel, Scale AI, IBM Watson, and Appen

Team DataNeuron said that existing data annotation platforms today provide a limited set of features for data annotation, such as named entity recognition, ML transcription, etc. Plus, companies do not have a secured platform for data exchange/model creation. 

This is where DataNeuron differs from other players in the market.

Some of the key highlights include:

  • DataNeuron provides fully automated annotation/labelling with minimal validation based on label heuristics and parameters. 
  • The platform needs only a Masterlist instead of other platforms, which requires multiple weak learner labelling functions to be defined by the user.
  • Incremental and evolving annotation. Meaning, it tweaks the Masterlist in real-time, also known as Dynamic Masterlist Support. 
  • It is an end-to-end ML lifecycle management platform with AutoML, no-code prediction, and optimisation. 
  • It allows data prediction without writing any code, and AI/Masterlist Suggestions improves the model performance. 
  • The platform supports strategic annotation capturing more information in a lesser amount of data with active learning. 

Here’s a sneak-peak of its platform: 

How This Delhi-based Startup Is Building A Time Machine For AI
DataNeuron’s Masterlist (Source: DataNeuron/Microsoft Azure Marketplace)

DataNeuron Tech Stack 

Rao told Analytics India Magazine that they have multiple tech stacks running within DataNeuron. For user interaction and workflow, its platform is built on the MERN framework. For infrastructure requirements, the platform is deployed on Microsoft Azure. Data is stored on Azure secured cloud storage

“We have various algorithms from unsupervised to context-based filtering algorithms built grounds-up to automate the entire data annotation pipeline,” 

Bharath Rao, founder and CEO at DataNeuron.

Check out the complete details of how DataNeuron works here

How This Delhi-based Startup Is Building A Time Machine For AI
The DataNeuron Pipeline (Source: DataNeuron)  

Tech behind DataNeuron

Rao said DataNeuron uses self-supervised learning and has made a significant breakthrough with its automated learning platform (ALP), which automates data labelling and eliminates human-in-the-loop annotation. 

DataNeuron ALP provides labelled data based using an ensemble algorithm by analysing the Masterlist and relevant label parameters. Interestingly, its platform does not require any pre-training or rules. “We have ‘active learning’ to retrain the model from the validations and reduce the user interaction,” said Rao. 

Further, he claimed that their platform saw a reduction of 97 per cent in a number of paragraphs validated when compared to manual human-in-loop labelling. “We have tested on multiple domains and datasets: ALP has achieved comparable accuracy (within ~1-2 per cent margins) to the state-of-the-art solutions with just 2 per cent of the labelled data when compared to human-in-loop labelling,” said Rao. 

While DataNeuron is set to increase efficiency by bringing 90 per cent; first-pass machine accuracy relative to manual effort, the team said it reduced project staffing by 70-90 per cent and an RoI of 200-400 per cent. 

Eyes Expansion 

According to Grand View Research, the global data annotation tools market is expected to grow at a CAGR of 27.1 per cent from 2021 to 2028. In 2020, however, the market size was valued at $494 million. The sector’s growth is driven by the massive adoption of data annotation tools in the automotive, retail, and healthcare sectors. 

DataNeuron is looking to target customers from ITeS, data science, knowledge-based industries, life sciences, and tax and legal domains. It is now available on Microsoft Azure Marketplace. “We are currently acquiring customers through direct contacts, partners, through leadership and advisors,” said Rao. However, it is planning to start targeted advertising campaigns in 2022.

Road Ahead 

“We are currently working towards auto-validation to future reduce the human-in-loop validation based on accuracy/confidence,” said Rao. 

He said they are also launching an Advanced Masterlist to support subjective labelling of datasets (where clear class distribution is missing), custom NER, model versioning to support datasets that require constant changes to support incremental learning cycles, and lastly, multi-user validation (weighted voting). 

DataNeuron believes that the focus within machine learning will shift from algorithms to high-value and explainable data. “We are continuing our research in artificial general intelligence (AGI) to enable 100 per cent automation in data labelling required to scale supervised learning-based algorithms,” said Rao. 

Further, he said their goal is to scale the development of AI models by providing better data, explainability of AI and reducing the opinion bias caused by human-in-loop labelling. “We also want to scale DataNeuron’s capabilities beyond NLP, with possible applications in computer vision, audio, and image labelling,” said Rao, sharing the roadmap. 

PS: The story was written using a keyboard.
Share
Picture of Amit Raja Naik

Amit Raja Naik

Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India