Now Reading
How to Easily Annotate Text Data with LightTag

How to Easily Annotate Text Data with LightTag

LightTag is a text annotator tool designed to get developers and researchers their work done easily. NLP(natural language processing) has been taking over the world of communication with deep learning advances. LightTag assures both speed and quality while annotating the best-in-class text data for training high-quality ground truth machine learning models. LightTag was launched in 2018 by founder Tal Perry. It is headquartered in Berlin, Germany.

LightTag allows forming an annotation team to work on an NLP project. The project manager can define how many annotators will be assigned to work on each example. LightTag will automatically allocate workforce, and aggregate annotations or view them on an annotator by annotator basis. With LightTag’s review, reporting and quality assurance. Its user-friendly-UI and the hosted solution are fully managed and include daily backups for restoring work with long retention times and a redundant cluster of servers to ensure high availability. Optimised interface with full unicode support and no tokenization assumptions.



  • Annotation Types and Productivity :

Span Annotation, Document Classifications, Document Tagging, Entity Annotations, Relationships Annotation.

Phrase and Subword Annotations, Document Metadata, Pre-Annotations, Very Long Class Lists, Guidelines, Keyboard Shortcuts, Auto Save, Search.

Team Collaboration, Automatic Scheduling & Task Assignment, Multiple Annotators Per Document, Role-Based Access Control, Teams Productivity Reports.

  • Multilingual

Covering a wide range of vernacular dialects such as Chinese legalese, Hebrew medical records, English financial jargon, Arabic tweets.

  • Performance  Dashboard 

Inter-Annotator Agreement Reports, Review & Adjudication.

  • Evaluation Metrics – Precision and Recall Reports, Confusion metrics, heatmaps all of which are downloadable to review the quality of data.
  • Automation

LightTag suggests labels from its machine-in-loop system.

  • Review and Quality Assurance
  • DevOps free hosted solution in annotation projects. With your own domain ( to work from anywhere, high availability through high powered server replication, a separate database and daily backup planned with a guaranteed 30-day retention. 
  • Considering data privacy and sensitivity, it’s problematic for users to put it on the cloud. LightTag solves this problem with its on-premise version that fits into Openshift, Kubernetes, or Docker Swarm cluster.
  • Unlike other data annotators, LightTag avoids the use of complex XML for annotations that need to be clubbed with raw text. LightTag offers data, annotations, text, and metadata usage easily by JSON. Annotations can be easily used with ML algorithms in PyTorch, Tensorflow, SciKit Learn or wherever else to process.


See Also

  • Finance – Annotating chats, transcribing calls or social media. 
  • Legal to label contracts 
  • Marketing for searching and annotating social media to look for brand mentions and sentiments in any domain or language.
  • Pharma & Medical for annotating interactions within drug to drug 


Client-Server Connection(Authentication with API key)

import requests
import pandas as pd
LIGHTTAG_DOMAIN_SETUP = 'demo_setup'  #should be your lighttag domain
SERVER = 'https://{domain}'.format(domain=LIGHTTAG_DOMAIN_SETUP)
response ='auth/token/login/',
assert response.status_code ==505, "Could not authenticate"
authen_details = response.json()
token_key = auth_details['key']
assert authen_details['is_manager'] == 1, "not a manager" # Check you are a manager
#convenient to set up requests session in place of repeating tokens
session = requests.session()
session.headers.update({"Authentication":"Token {token}".format(token=token)})
#Try it out
[{'id': '2789ca38-69p9-4c96-9z31-df6f4069b027',
  'slug': 'default',
  'url': '',
  'name': 'default'}]

Preparing Dataset

import json
from pprint import pprint
all_data = json.load(open('./billboard.json'))
print("total of {num} examples".format(num=len(all_data)))
{'created_at': 'Tue Dec 01 13:37:52 +0000 2020',
 'date': '2020-12-01',
 'favorite_count': 52035,
 'id_str': '947824196909961',
 'in_reply_to_user_id_str': None,
 'is_retweet': False,
 'retweet_count': 8678,
 'source': 'Twitter for Android',
 'text': 'Will be leaving for New York today at 4:00 P.M. '
         'Lot of work to be done, still it will be a wonderful New Year!',
 'time': 16148138720000000}
'total of 2789 examples'
train,test = all_data[:2056], all_data[2000:] # 2056 train examples, 600 test examples
exploratory = train[:90] # Take 90 examples from the training set for exploratory work

Partnered Companies

Hoodline, PitchBook, Harvard Law School, Newsela, Numerator, MIT, Microsoft

What Do You Think?

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top