MITB Banner

Guide To Dataturks – The Human-in-the-Loop Data Annotation Platform

Dataturks makes Machine learning data annotation easier with its auto ML features and Human-in-The -Loop interactions. This AI startup was founded by Gajendra Dadheech and Mohan Gupta (both of them previously held executive positions in Flipkart) in 2018, initially headquartered in Bangalore, India and later acquired by Walmart Labs.

Share

Dataturks makes Machine learning data annotation easier with its auto ML features and Human-in-The -Loop interactions. This AI startup was founded by Gajendra Dadheech and Mohan Gupta (both of them previously held executive positions in Flipkart) in 2018, initially headquartered in Bangalore, India and later acquired by Walmart Labs.

Dataturks allows companies, developers and researchers to upload their data which can be image, video or text and start the annotation process rapidly through their interface and help in enhancing the quality of datasets to be fed for training. Projects can be managed with team members or 3rd party outsourced annotators provided by the platform itself, which are then reviewed by a set of separate teams and generate real-time reports. Users can also access the pre-annotated datasets to auto-label or use them as per their use cases. Some open datasets have been made available on Kaggle. Dataturks claims to have achieved ten times better ROI(Region Of Interest) for data labelling.

Dataturks has made its platform fully open-sourced on GitHub. The cloud version of Dataturks provides end-to-end service to use platforms on the web, but there are certain situations where companies have to provide solutions internally. This is made possible by using docker which works fully offline yet has access to all kinds of services just as web. This version is supported in both Linux and Windows OS. A guide on the complete process has been provided in this blog.

Image and Video Annotation

Images can be uploaded in a zip file. The uploaded images will be stored on Dataturks servers. If the user wishes to download the results, then exported annotation will be shown on these URLs. Another way would be uploading a text file where each line in the file is a public URL of the image to be tagged. Dataturks will then show the images on the tool directly from these URLs. The available tools allow image classification and segmentation, object detection using polygons and bounding boxes, OCR. Export formats can be Pascal VOC or Tensorflow.

Image Classification

Object with Multiple Labels with Bounding boxes

Image Segmentation: Polygons

Text Annotation

Browsers do not support raw text data on PDF or Doc files and fail to work with special characters etc. in text files. Dataturks provides a Utility tool to convert all your files to raw text and also do necessary pre-processing to make text compatible on a web browser. The wide range of text annotation facilities available is Document Annotation(Pdf, Docs, CSV or any other text format), Sublabels, NER, PoS(Parts-of-Speech) Tagging, Text Classification, Text Summarization, Content Moderation. Export formats can be JSON, Standard NLP or Spacy format.

Overlapping and Multi-label Entities

NER/PoS Tagging

Text Summarization

Project Report provides real-time project analysis and visualization with project statistics and each team member’s contributions.

API

Authentication Request

curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/create_my_Project' \
--header 'Content-Types: application/json' \
--header 'key: 56ca2058-7852-41f2-b04e-69d7cblp790e' \
--header 'secret key: sVYaFGI1Ld23vE1FhpwGZIs8CxSzRkb11OVprfV1z7pmAVYIRBopi' \

Project Creation

--data-raw '{
"name_of_project": "My Project",
"task_Type": "NER_POS_TAGGING",
"access_Type" : "RESTRICTED",
"short_Descrip": "Short description of the project",
"details": "Detailed project description",
"rules" : "{\"tags\": \"Tag1, Tag2, Tag3, Tag4\",\"instructions\": \"Tag all relevant entities present in the text\"}"
}'

Upload data

curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/NER Tagging Project/upload' \
--header 'secret: SECRET' \
--header 'key: KEY no.' \
--form 'file=@/path/to/files'

Upload Pre-Annotated Data

curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/NER Tagging Project/upload' \
--header 'secret: SECRET' \
--header 'key: KEY no.' \
--form 'file=@/path/to/files'

Download Data

curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/NER Tagging Project/download' \
--header 'secret: SECRET' \
--header 'key: KEY no.'

Conclusion

With the acquisition in 2019 by Walmart Labs, Dataturks has been accelerating its machine learning solutions based on data annotations. The open-source contributions on GitHub are actively going on to make available great datasets with high-quality and training methodologies collaborating with teams for workflow management and keeping track of model performance. 

Share
Picture of Jayita Bhattacharyya

Jayita Bhattacharyya

Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.