How Self-Supervised Text Annotation Works In TagTog

TagTog is an AI startup company making NLP modelling easier with its text analytics, visualization and annotation system democratized by subject matter experts bringing in domain-specific insights.

TagTog is an AI startup company making NLP modelling easier with its text analytics, visualization and annotation system democratized by subject matter experts bringing in domain-specific insights. It can annotate text, pdf, source code, or web URLs manually, using semi-supervised learning, and automation. It was launched in October 2017. Founders Jorge Campos Prieto and Dr Juan Miguel Cejuela during their PhD research in text mining applied biomedical in the University of Munich. Dr Cejuela along with some colleagues had represented a paper-based on TagTog. TagTog is based in Munich (Germany) and Gdansk (Poland). 

TagTog helps in generating high-quality text datasets for training NLP algorithms with moderation and customization. The platform uses ML assisted models in learning from pre-annotated data to quickly annotate new data and put through the relevant information in the text. Manually annotation services are also provided following customer’s guidelines. TagTog specializes in text classification and annotation, entity extraction, entity normalisation, concept search ( Discover patterns in unstructured text, identify problems, realize solutions), Big Texts, annotated corpus, semantic search, text mining, business intelligence, and CRM data enrichment. Its automatic review annotations help in saving costs and time. They have an active open-sourced GitHub community.


Generate training data for ML methods and create labelled datasets:

  • To  manage the team to annotate text manually or import pre-annotated data 
  • Leverage machine-learning models with constant feedback to work at scale and semi-supervised manner.
  • Find out if data is biased.
  • Multiple formats supported for documents and not only plain text. Annotate PDFs or import text from different file formats HTML, TXT, CSV, Markdown, source code files.
  • Supports Unicode and Multilingual (English, Spanish, Hindi, Bengali, French, Chinese, Japanese, Arabic, Swedish, Dutch, etc.)
  • Dictionary annotations use ML to learn from pre-annotated data and automatically generate similar annotations.

Text corpus with ontology

Team Collaboration and Quality management – Invite team members to annotate text and to create an annotated corpus. Specify instructions and roles to each user at any moment. Distribute tasks automatically reflecting on dashboards among users based on your quality requirements. 

Track quality and compare the performance- the interface provides the evaluation of the different annotators based on the inter-annotator agreement (IAA).


Machine Learning with people in the loop Feed the system with the ML model and a team of SMEs provide feedback on the predictions made from continuous training. Improve the quality of training data and accuracy.

Chatbot Training with overlapping entities

Host on secure Cloud or On-premises On the Cloud, there is nothing to install, no servers to run on. On-premises, run in as docker with SSO integration, with no Internet access required. In both cases, just the browser is needed. 

Export Data annotations using the API or the web interface in numerous formats available.


In Python

import requests
tagtogUrl = ""
authr = requests.auth.HTTPBasicAuth(username="your-Username", password="your-Pswd")
params = {"project": "ProjectName", "admin": "your-Username", "format": "formatted", "output": "null"}
payload = {
    "text": "The filmstars are George Cooney, Jennifer Aniston  and Angelina Jolie"
responses =, params=params, auth=authr, data=payload)

In Javascript

var input = document.querySelector('input[type="file"]')
var form_data = new FormData()
form_data.append("file", input.files[0])
fetch('', {
  method_type: 'POST',
  headers: {'Authorize' : "Basic " + btoa('your_Username' + ":" + 'your_Password')},
  body: data
}).then(responses => responses.text()).then(text => {
}).catch(function(error) {
  console.log('Error: ', error);


TagTog has worked with the Fortune 500 companies. FlyBase, AWS, Lancaster University, Wolters Kluwer, Wevo, University of North Carolina and Chapel Hill, University of Copenhagen, University of Luxembourg, Center for Open Science.

Download our Mobile App

Jayita Bhattacharyya
Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.