Dataturks makes Machine learning data annotation easier with its auto ML features and Human-in-The -Loop interactions. This AI startup was founded by Gajendra Dadheech and Mohan Gupta (both of them previously held executive positions in Flipkart) in 2018, initially headquartered in Bangalore, India and later acquired by Walmart Labs.
Dataturks allows companies, developers and researchers to upload their data which can be image, video or text and start the annotation process rapidly through their interface and help in enhancing the quality of datasets to be fed for training. Projects can be managed with team members or 3rd party outsourced annotators provided by the platform itself, which are then reviewed by a set of separate teams and generate real-time reports. Users can also access the pre-annotated datasets to auto-label or use them as per their use cases. Some open datasets have been made available on Kaggle. Dataturks claims to have achieved ten times better ROI(Region Of Interest) for data labelling.
Dataturks has made its platform fully open-sourced on GitHub. The cloud version of Dataturks provides end-to-end service to use platforms on the web, but there are certain situations where companies have to provide solutions internally. This is made possible by using docker which works fully offline yet has access to all kinds of services just as web. This version is supported in both Linux and Windows OS. A guide on the complete process has been provided in this blog.
Image and Video Annotation
Images can be uploaded in a zip file. The uploaded images will be stored on Dataturks servers. If the user wishes to download the results, then exported annotation will be shown on these URLs. Another way would be uploading a text file where each line in the file is a public URL of the image to be tagged. Dataturks will then show the images on the tool directly from these URLs. The available tools allow image classification and segmentation, object detection using polygons and bounding boxes, OCR. Export formats can be Pascal VOC or Tensorflow.
Image Classification
Object with Multiple Labels with Bounding boxes
Image Segmentation: Polygons
Text Annotation
Browsers do not support raw text data on PDF or Doc files and fail to work with special characters etc. in text files. Dataturks provides a Utility tool to convert all your files to raw text and also do necessary pre-processing to make text compatible on a web browser. The wide range of text annotation facilities available is Document Annotation(Pdf, Docs, CSV or any other text format), Sublabels, NER, PoS(Parts-of-Speech) Tagging, Text Classification, Text Summarization, Content Moderation. Export formats can be JSON, Standard NLP or Spacy format.
Overlapping and Multi-label Entities
NER/PoS Tagging
Text Summarization
Project Report provides real-time project analysis and visualization with project statistics and each team member’s contributions.
API
Authentication Request
curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/create_my_Project' \ --header 'Content-Types: application/json' \ --header 'key: 56ca2058-7852-41f2-b04e-69d7cblp790e' \ --header 'secret key: sVYaFGI1Ld23vE1FhpwGZIs8CxSzRkb11OVprfV1z7pmAVYIRBopi' \
Project Creation
--data-raw '{ "name_of_project": "My Project", "task_Type": "NER_POS_TAGGING", "access_Type" : "RESTRICTED", "short_Descrip": "Short description of the project", "details": "Detailed project description", "rules" : "{\"tags\": \"Tag1, Tag2, Tag3, Tag4\",\"instructions\": \"Tag all relevant entities present in the text\"}" }'
Upload data
curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/NER Tagging Project/upload' \ --header 'secret: SECRET' \ --header 'key: KEY no.' \ --form 'file=@/path/to/files'
Upload Pre-Annotated Data
curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/NER Tagging Project/upload' \ --header 'secret: SECRET' \ --header 'key: KEY no.' \ --form 'file=@/path/to/files'
Download Data
curl --location --request POST 'https://dataturks.com/dtAPI/v1/Jayita/NER Tagging Project/download' \ --header 'secret: SECRET' \ --header 'key: KEY no.'
Conclusion
With the acquisition in 2019 by Walmart Labs, Dataturks has been accelerating its machine learning solutions based on data annotations. The open-source contributions on GitHub are actively going on to make available great datasets with high-quality and training methodologies collaborating with teams for workflow management and keeping track of model performance.