How Self-Supervised Text Annotation Works In TagTog

TagTog is an AI startup company making NLP modelling easier with its text analytics, visualization and annotation system democratized by subject matter experts bringing in domain-specific insights.

TagTog is an AI startup company making NLP modelling easier with its text analytics, visualization and annotation system democratized by subject matter experts bringing in domain-specific insights. It can annotate text, pdf, source code, or web URLs manually, using semi-supervised learning, and automation. It was launched in October 2017. Founders Jorge Campos Prieto and Dr Juan Miguel Cejuela during their PhD research in text mining applied biomedical in the University of Munich. Dr Cejuela along with some colleagues had represented a paper-based on TagTog. TagTog is based in Munich (Germany) and Gdansk (Poland). 

TagTog helps in generating high-quality text datasets for training NLP algorithms with moderation and customization. The platform uses ML assisted models in learning from pre-annotated data to quickly annotate new data and put through the relevant information in the text. Manually annotation services are also provided following customer’s guidelines. TagTog specializes in text classification and annotation, entity extraction, entity normalisation, concept search ( Discover patterns in unstructured text, identify problems, realize solutions), Big Texts, annotated corpus, semantic search, text mining, business intelligence, and CRM data enrichment. Its automatic review annotations help in saving costs and time. They have an active open-sourced GitHub community.



Sign up for your weekly dose of what's up in emerging technology.

Generate training data for ML methods and create labelled datasets:

  • To  manage the team to annotate text manually or import pre-annotated data 
  • Leverage machine-learning models with constant feedback to work at scale and semi-supervised manner.
  • Find out if data is biased.
  • Multiple formats supported for documents and not only plain text. Annotate PDFs or import text from different file formats HTML, TXT, CSV, Markdown, source code files.
  • Supports Unicode and Multilingual (English, Spanish, Hindi, Bengali, French, Chinese, Japanese, Arabic, Swedish, Dutch, etc.)
  • Dictionary annotations use ML to learn from pre-annotated data and automatically generate similar annotations.

Text corpus with ontology

Download our Mobile App

Team Collaboration and Quality management – Invite team members to annotate text and to create an annotated corpus. Specify instructions and roles to each user at any moment. Distribute tasks automatically reflecting on dashboards among users based on your quality requirements. 

Track quality and compare the performance- the interface provides the evaluation of the different annotators based on the inter-annotator agreement (IAA).


Machine Learning with people in the loop Feed the system with the ML model and a team of SMEs provide feedback on the predictions made from continuous training. Improve the quality of training data and accuracy.

Chatbot Training with overlapping entities

Host on secure Cloud or On-premises On the Cloud, there is nothing to install, no servers to run on. On-premises, run in as docker with SSO integration, with no Internet access required. In both cases, just the browser is needed. 

Export Data annotations using the API or the web interface in numerous formats available.


In Python

import requests
tagtogUrl = ""
authr = requests.auth.HTTPBasicAuth(username="your-Username", password="your-Pswd")
params = {"project": "ProjectName", "admin": "your-Username", "format": "formatted", "output": "null"}
payload = {
    "text": "The filmstars are George Cooney, Jennifer Aniston  and Angelina Jolie"
responses =, params=params, auth=authr, data=payload)

In Javascript

var input = document.querySelector('input[type="file"]')
var form_data = new FormData()
form_data.append("file", input.files[0])
fetch('', {
  method_type: 'POST',
  headers: {'Authorize' : "Basic " + btoa('your_Username' + ":" + 'your_Password')},
  body: data
}).then(responses => responses.text()).then(text => {
}).catch(function(error) {
  console.log('Error: ', error);


TagTog has worked with the Fortune 500 companies. FlyBase, AWS, Lancaster University, Wolters Kluwer, Wevo, University of North Carolina and Chapel Hill, University of Copenhagen, University of Luxembourg, Center for Open Science.

More Great AIM Stories

Jayita Bhattacharyya
Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.

AIM Upcoming Events

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 10th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox