Guide To Dataturks – The Human-in-the-Loop Data Annotation Platform

Dataturks makes Machine learning data annotation easier with its auto ML features and Human-in-The -Loop interactions. This AI startup was founded by Gajendra Dadheech and Mohan Gupta (both of them previously held executive positions in Flipkart) in 2018, initially headquartered in Bangalore, India and later acquired by Walmart Labs.

Dataturks makes Machine learning data annotation easier with its auto ML features and Human-in-The -Loop interactions. This AI startup was founded by Gajendra Dadheech and Mohan Gupta (both of them previously held executive positions in Flipkart) in 2018, initially headquartered in Bangalore, India and later acquired by Walmart Labs.

Dataturks allows companies, developers and researchers to upload their data which can be image, video or text and start the annotation process rapidly through their interface and help in enhancing the quality of datasets to be fed for training. Projects can be managed with team members or 3rd party outsourced annotators provided by the platform itself, which are then reviewed by a set of separate teams and generate real-time reports. Users can also access the pre-annotated datasets to auto-label or use them as per their use cases. Some open datasets have been made available on Kaggle. Dataturks claims to have achieved ten times better ROI(Region Of Interest) for data labelling.

Dataturks has made its platform fully open-sourced on GitHub. The cloud version of Dataturks provides end-to-end service to use platforms on the web, but there are certain situations where companies have to provide solutions internally. This is made possible by using docker which works fully offline yet has access to all kinds of services just as web. This version is supported in both Linux and Windows OS. A guide on the complete process has been provided in this blog.

Image and Video Annotation

Images can be uploaded in a zip file. The uploaded images will be stored on Dataturks servers. If the user wishes to download the results, then exported annotation will be shown on these URLs. Another way would be uploading a text file where each line in the file is a public URL of the image to be tagged. Dataturks will then show the images on the tool directly from these URLs. The available tools allow image classification and segmentation, object detection using polygons and bounding boxes, OCR. Export formats can be Pascal VOC or Tensorflow.

Image Classification

Object with Multiple Labels with Bounding boxes

Image Segmentation: Polygons

Text Annotation

Browsers do not support raw text data on PDF or Doc files and fail to work with special characters etc. in text files. Dataturks provides a Utility tool to convert all your files to raw text and also do necessary pre-processing to make text compatible on a web browser. The wide range of text annotation facilities available is Document Annotation(Pdf, Docs, CSV or any other text format), Sublabels, NER, PoS(Parts-of-Speech) Tagging, Text Classification, Text Summarization, Content Moderation. Export formats can be JSON, Standard NLP or Spacy format.

Overlapping and Multi-label Entities

NER/PoS Tagging

Text Summarization

Project Report provides real-time project analysis and visualization with project statistics and each team member’s contributions.

API

Authentication Request

curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/create_my_Project' \
--header 'Content-Types: application/json' \
--header 'key: 56ca2058-7852-41f2-b04e-69d7cblp790e' \
--header 'secret key: sVYaFGI1Ld23vE1FhpwGZIs8CxSzRkb11OVprfV1z7pmAVYIRBopi' \

Project Creation

--data-raw '{
"name_of_project": "My Project",
"task_Type": "NER_POS_TAGGING",
"access_Type" : "RESTRICTED",
"short_Descrip": "Short description of the project",
"details": "Detailed project description",
"rules" : "{\"tags\": \"Tag1, Tag2, Tag3, Tag4\",\"instructions\": \"Tag all relevant entities present in the text\"}"
}'

Upload data

curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/NER Tagging Project/upload' \
--header 'secret: SECRET' \
--header 'key: KEY no.' \
--form 'file=@/path/to/files'

Upload Pre-Annotated Data

curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/NER Tagging Project/upload' \
--header 'secret: SECRET' \
--header 'key: KEY no.' \
--form 'file=@/path/to/files'

Download Data

curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/NER Tagging Project/download' \
--header 'secret: SECRET' \
--header 'key: KEY no.'

Conclusion

With the acquisition in 2019 by Walmart Labs, Dataturks has been accelerating its machine learning solutions based on data annotations. The open-source contributions on GitHub are actively going on to make available great datasets with high-quality and training methodologies collaborating with teams for workflow management and keeping track of model performance. 

Download our Mobile App

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR