With the rapid adoption of AI across enterprises, the need for labelled data has increased significantly over the last few years. However, many companies are still using manual annotation, even to this day. This leads to human bias, affecting the model accuracy. Studies show that 85 per cent of machine learning (ML) and artificial intelligence (AI) projects fail or do not progress beyond minimal viable product (MVP) due to the quality and quantity of labelled data.
This is where New Delhi and Palo Alto-based DataNeuron comes into play. The company helps in accelerating and automating human-in-loop labelling for developing AI solutions. It automates data labelling, the creation of models, and end-to-end lifecycle management of ML. In other words, it is building a time machine for AI.
Founded in May 2021, DataNeuron was started by Bharath Rao (also the founder of Precily AI), alongside Nishant Chhetri, Rohit Goyal, Anil Advani and Rohit Adlakha. Soon, the company is roping in Sheetal D as a co-founder. It is currently backed by early-stage venture capital firm Windrose Capital and global technology law firm Inventus Law.
However, DataNeuron is not alone in this space. Other global players include Snorkel, Scale AI, IBM Watson, and Appen.
Team DataNeuron said that existing data annotation platforms today provide a limited set of features for data annotation, such as named entity recognition, ML transcription, etc. Plus, companies do not have a secured platform for data exchange/model creation.
This is where DataNeuron differs from other players in the market.
Some of the key highlights include:
- DataNeuron provides fully automated annotation/labelling with minimal validation based on label heuristics and parameters.
- The platform needs only a Masterlist instead of other platforms, which requires multiple weak learner labelling functions to be defined by the user.
- Incremental and evolving annotation. Meaning, it tweaks the Masterlist in real-time, also known as Dynamic Masterlist Support.
- It is an end-to-end ML lifecycle management platform with AutoML, no-code prediction, and optimisation.
- It allows data prediction without writing any code, and AI/Masterlist Suggestions improves the model performance.
- The platform supports strategic annotation capturing more information in a lesser amount of data with active learning.
Here’s a sneak-peak of its platform:
DataNeuron Tech Stack
Rao told Analytics India Magazine that they have multiple tech stacks running within DataNeuron. For user interaction and workflow, its platform is built on the MERN framework. For infrastructure requirements, the platform is deployed on Microsoft Azure. Data is stored on Azure secured cloud storage.
“We have various algorithms from unsupervised to context-based filtering algorithms built grounds-up to automate the entire data annotation pipeline,”– Bharath Rao, founder and CEO at DataNeuron.
Check out the complete details of how DataNeuron works here.
Tech behind DataNeuron
Rao said DataNeuron uses self-supervised learning and has made a significant breakthrough with its automated learning platform (ALP), which automates data labelling and eliminates human-in-the-loop annotation.
DataNeuron ALP provides labelled data based using an ensemble algorithm by analysing the Masterlist and relevant label parameters. Interestingly, its platform does not require any pre-training or rules. “We have ‘active learning’ to retrain the model from the validations and reduce the user interaction,” said Rao.
Further, he claimed that their platform saw a reduction of 97 per cent in a number of paragraphs validated when compared to manual human-in-loop labelling. “We have tested on multiple domains and datasets: ALP has achieved comparable accuracy (within ~1-2 per cent margins) to the state-of-the-art solutions with just 2 per cent of the labelled data when compared to human-in-loop labelling,” said Rao.
While DataNeuron is set to increase efficiency by bringing 90 per cent; first-pass machine accuracy relative to manual effort, the team said it reduced project staffing by 70-90 per cent and an RoI of 200-400 per cent.
According to Grand View Research, the global data annotation tools market is expected to grow at a CAGR of 27.1 per cent from 2021 to 2028. In 2020, however, the market size was valued at $494 million. The sector’s growth is driven by the massive adoption of data annotation tools in the automotive, retail, and healthcare sectors.
DataNeuron is looking to target customers from ITeS, data science, knowledge-based industries, life sciences, and tax and legal domains. It is now available on Microsoft Azure Marketplace. “We are currently acquiring customers through direct contacts, partners, through leadership and advisors,” said Rao. However, it is planning to start targeted advertising campaigns in 2022.
“We are currently working towards auto-validation to future reduce the human-in-loop validation based on accuracy/confidence,” said Rao.
He said they are also launching an Advanced Masterlist to support subjective labelling of datasets (where clear class distribution is missing), custom NER, model versioning to support datasets that require constant changes to support incremental learning cycles, and lastly, multi-user validation (weighted voting).
DataNeuron believes that the focus within machine learning will shift from algorithms to high-value and explainable data. “We are continuing our research in artificial general intelligence (AGI) to enable 100 per cent automation in data labelling required to scale supervised learning-based algorithms,” said Rao.
Further, he said their goal is to scale the development of AI models by providing better data, explainability of AI and reducing the opinion bias caused by human-in-loop labelling. “We also want to scale DataNeuron’s capabilities beyond NLP, with possible applications in computer vision, audio, and image labelling,” said Rao, sharing the roadmap.