Now Reading
Guide To Dataturks – The Human-in-the-Loop Data Annotation Platform

Guide To Dataturks – The Human-in-the-Loop Data Annotation Platform

Dataturks makes Machine learning data annotation easier with its auto ML features and Human-in-The -Loop interactions. This AI startup was founded by Gajendra Dadheech and Mohan Gupta (both of them previously held executive positions in Flipkart) in 2018, initially headquartered in Bangalore, India and later acquired by Walmart Labs.

Dataturks allows companies, developers and researchers to upload their data which can be image, video or text and start the annotation process rapidly through their interface and help in enhancing the quality of datasets to be fed for training. Projects can be managed with team members or 3rd party outsourced annotators provided by the platform itself, which are then reviewed by a set of separate teams and generate real-time reports. Users can also access the pre-annotated datasets to auto-label or use them as per their use cases. Some open datasets have been made available on Kaggle. Dataturks claims to have achieved ten times better ROI(Region Of Interest) for data labelling.

Dataturks has made its platform fully open-sourced on GitHub. The cloud version of Dataturks provides end-to-end service to use platforms on the web, but there are certain situations where companies have to provide solutions internally. This is made possible by using docker which works fully offline yet has access to all kinds of services just as web. This version is supported in both Linux and Windows OS. A guide on the complete process has been provided in this blog.

Image and Video Annotation

Images can be uploaded in a zip file. The uploaded images will be stored on Dataturks servers. If the user wishes to download the results, then exported annotation will be shown on these URLs. Another way would be uploading a text file where each line in the file is a public URL of the image to be tagged. Dataturks will then show the images on the tool directly from these URLs. The available tools allow image classification and segmentation, object detection using polygons and bounding boxes, OCR. Export formats can be Pascal VOC or Tensorflow.

Image Classification

Object with Multiple Labels with Bounding boxes

Image Segmentation: Polygons

Text Annotation

Browsers do not support raw text data on PDF or Doc files and fail to work with special characters etc. in text files. Dataturks provides a Utility tool to convert all your files to raw text and also do necessary pre-processing to make text compatible on a web browser. The wide range of text annotation facilities available is Document Annotation(Pdf, Docs, CSV or any other text format), Sublabels, NER, PoS(Parts-of-Speech) Tagging, Text Classification, Text Summarization, Content Moderation. Export formats can be JSON, Standard NLP or Spacy format.

Overlapping and Multi-label Entities

NER/PoS Tagging

Text Summarization

Project Report provides real-time project analysis and visualization with project statistics and each team member’s contributions.


See Also

Authentication Request

curl --location --request POST '' \
--header 'Content-Types: application/json' \
--header 'key: 56ca2058-7852-41f2-b04e-69d7cblp790e' \
--header 'secret key: sVYaFGI1Ld23vE1FhpwGZIs8CxSzRkb11OVprfV1z7pmAVYIRBopi' \

Project Creation

--data-raw '{
"name_of_project": "My Project",
"task_Type": "NER_POS_TAGGING",
"access_Type" : "RESTRICTED",
"short_Descrip": "Short description of the project",
"details": "Detailed project description",
"rules" : "{\"tags\": \"Tag1, Tag2, Tag3, Tag4\",\"instructions\": \"Tag all relevant entities present in the text\"}"

Upload data

curl --location --request POST ' Tagging Project/upload' \
--header 'secret: SECRET' \
--header 'key: KEY no.' \
--form '[email protected]/path/to/files'

Upload Pre-Annotated Data

curl --location --request POST ' Tagging Project/upload' \
--header 'secret: SECRET' \
--header 'key: KEY no.' \
--form '[email protected]/path/to/files'

Download Data

curl --location --request POST ' Tagging Project/download' \
--header 'secret: SECRET' \
--header 'key: KEY no.'


With the acquisition in 2019 by Walmart Labs, Dataturks has been accelerating its machine learning solutions based on data annotations. The open-source contributions on GitHub are actively going on to make available great datasets with high-quality and training methodologies collaborating with teams for workflow management and keeping track of model performance. 

What Do You Think?

Join Our Telegram Group. Be part of an engaging online community. Join Here.

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top