Active Hackathon

Guide To Dataturks – The Human-in-the-Loop Data Annotation Platform

Dataturks makes Machine learning data annotation easier with its auto ML features and Human-in-The -Loop interactions. This AI startup was founded by Gajendra Dadheech and Mohan Gupta (both of them previously held executive positions in Flipkart) in 2018, initially headquartered in Bangalore, India and later acquired by Walmart Labs.

Dataturks makes Machine learning data annotation easier with its auto ML features and Human-in-The -Loop interactions. This AI startup was founded by Gajendra Dadheech and Mohan Gupta (both of them previously held executive positions in Flipkart) in 2018, initially headquartered in Bangalore, India and later acquired by Walmart Labs.

Dataturks allows companies, developers and researchers to upload their data which can be image, video or text and start the annotation process rapidly through their interface and help in enhancing the quality of datasets to be fed for training. Projects can be managed with team members or 3rd party outsourced annotators provided by the platform itself, which are then reviewed by a set of separate teams and generate real-time reports. Users can also access the pre-annotated datasets to auto-label or use them as per their use cases. Some open datasets have been made available on Kaggle. Dataturks claims to have achieved ten times better ROI(Region Of Interest) for data labelling.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Dataturks has made its platform fully open-sourced on GitHub. The cloud version of Dataturks provides end-to-end service to use platforms on the web, but there are certain situations where companies have to provide solutions internally. This is made possible by using docker which works fully offline yet has access to all kinds of services just as web. This version is supported in both Linux and Windows OS. A guide on the complete process has been provided in this blog.

Image and Video Annotation

Images can be uploaded in a zip file. The uploaded images will be stored on Dataturks servers. If the user wishes to download the results, then exported annotation will be shown on these URLs. Another way would be uploading a text file where each line in the file is a public URL of the image to be tagged. Dataturks will then show the images on the tool directly from these URLs. The available tools allow image classification and segmentation, object detection using polygons and bounding boxes, OCR. Export formats can be Pascal VOC or Tensorflow.

Image Classification

Object with Multiple Labels with Bounding boxes

Image Segmentation: Polygons

Text Annotation

Browsers do not support raw text data on PDF or Doc files and fail to work with special characters etc. in text files. Dataturks provides a Utility tool to convert all your files to raw text and also do necessary pre-processing to make text compatible on a web browser. The wide range of text annotation facilities available is Document Annotation(Pdf, Docs, CSV or any other text format), Sublabels, NER, PoS(Parts-of-Speech) Tagging, Text Classification, Text Summarization, Content Moderation. Export formats can be JSON, Standard NLP or Spacy format.

Overlapping and Multi-label Entities

NER/PoS Tagging

Text Summarization

Project Report provides real-time project analysis and visualization with project statistics and each team member’s contributions.

API

Authentication Request

curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/create_my_Project' \
--header 'Content-Types: application/json' \
--header 'key: 56ca2058-7852-41f2-b04e-69d7cblp790e' \
--header 'secret key: sVYaFGI1Ld23vE1FhpwGZIs8CxSzRkb11OVprfV1z7pmAVYIRBopi' \

Project Creation

--data-raw '{
"name_of_project": "My Project",
"task_Type": "NER_POS_TAGGING",
"access_Type" : "RESTRICTED",
"short_Descrip": "Short description of the project",
"details": "Detailed project description",
"rules" : "{\"tags\": \"Tag1, Tag2, Tag3, Tag4\",\"instructions\": \"Tag all relevant entities present in the text\"}"
}'

Upload data

curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/NER Tagging Project/upload' \
--header 'secret: SECRET' \
--header 'key: KEY no.' \
--form 'file=@/path/to/files'

Upload Pre-Annotated Data

curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/NER Tagging Project/upload' \
--header 'secret: SECRET' \
--header 'key: KEY no.' \
--form 'file=@/path/to/files'

Download Data

curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/NER Tagging Project/download' \
--header 'secret: SECRET' \
--header 'key: KEY no.'

Conclusion

With the acquisition in 2019 by Walmart Labs, Dataturks has been accelerating its machine learning solutions based on data annotations. The open-source contributions on GitHub are actively going on to make available great datasets with high-quality and training methodologies collaborating with teams for workflow management and keeping track of model performance. 

More Great AIM Stories

Jayita Bhattacharyya
Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM