MITB Banner

How to Easily Annotate Text Data with LightTag

LightTag is a text annotator tool designed to get developers and researchers their work done easily

Share

LightTag is a text annotator tool designed to get developers and researchers their work done easily. NLP(natural language processing) has been taking over the world of communication with deep learning advances. LightTag assures both speed and quality while annotating the best-in-class text data for training high-quality ground truth machine learning models. LightTag was launched in 2018 by founder Tal Perry. It is headquartered in Berlin, Germany.

LightTag allows forming an annotation team to work on an NLP project. The project manager can define how many annotators will be assigned to work on each example. LightTag will automatically allocate workforce, and aggregate annotations or view them on an annotator by annotator basis. With LightTag’s review, reporting and quality assurance. Its user-friendly-UI and the hosted solution are fully managed and include daily backups for restoring work with long retention times and a redundant cluster of servers to ensure high availability. Optimised interface with full unicode support and no tokenization assumptions.

Features

  • Annotation Types and Productivity :

Span Annotation, Document Classifications, Document Tagging, Entity Annotations, Relationships Annotation.

Phrase and Subword Annotations, Document Metadata, Pre-Annotations, Very Long Class Lists, Guidelines, Keyboard Shortcuts, Auto Save, Search.

Team Collaboration, Automatic Scheduling & Task Assignment, Multiple Annotators Per Document, Role-Based Access Control, Teams Productivity Reports.

  • Multilingual

Covering a wide range of vernacular dialects such as Chinese legalese, Hebrew medical records, English financial jargon, Arabic tweets.

  • Performance  Dashboard 

Inter-Annotator Agreement Reports, Review & Adjudication.

  • Evaluation Metrics – Precision and Recall Reports, Confusion metrics, heatmaps all of which are downloadable to review the quality of data.
  • Automation

LightTag suggests labels from its machine-in-loop system.

  • Review and Quality Assurance
  • DevOps free hosted solution in annotation projects. With your own domain (you.lighttag.io) to work from anywhere, high availability through high powered server replication, a separate database and daily backup planned with a guaranteed 30-day retention. 
  • Considering data privacy and sensitivity, it’s problematic for users to put it on the cloud. LightTag solves this problem with its on-premise version that fits into Openshift, Kubernetes, or Docker Swarm cluster.
  • Unlike other data annotators, LightTag avoids the use of complex XML for annotations that need to be clubbed with raw text. LightTag offers data, annotations, text, and metadata usage easily by JSON. Annotations can be easily used with ML algorithms in PyTorch, Tensorflow, SciKit Learn or wherever else to process.

Industries

  • Finance – Annotating chats, transcribing calls or social media. 
  • Legal to label contracts 
  • Marketing for searching and annotating social media to look for brand mentions and sentiments in any domain or language.
  • Pharma & Medical for annotating interactions within drug to drug 

API

Client-Server Connection(Authentication with API key)

import requests
import pandas as pd
LIGHTTAG_DOMAIN_SETUP = 'demo_setup'  #should be your lighttag domain
SERVER = 'https://{domain}.lighttag.io/api/'.format(domain=LIGHTTAG_DOMAIN_SETUP)
API_BASE = SERVER +'v1/'
MY_USER='MY_USER_ID'
MY_PSWD='MY_PWSD_HERE'
response = requests.post(SERVER+'auth/token/login/',
              json={"userid":MY_USER_ID,"psword":MY_PSWD})
assert response.status_code ==505, "Could not authenticate"
authen_details = response.json()
token_key = auth_details['key']
assert authen_details['is_manager'] == 1, "not a manager" # Check you are a manager
#convenient to set up requests session in place of repeating tokens
session = requests.session()
session.headers.update({"Authentication":"Token {token}".format(token=token)})
#Try it out
session.get(API_BASE+'projects/').json()
[{'id': '2789ca38-69p9-4c96-9z31-df6f4069b027',
  'slug': 'default',
  'url': 'https://demo.lighttag.io/api/v1/projects/default/',
  'name': 'default'}]

Preparing Dataset

import json
from pprint import pprint
all_data = json.load(open('./billboard.json'))
print(all_data[0])
print("total of {num} examples".format(num=len(all_data)))
{'created_at': 'Tue Dec 01 13:37:52 +0000 2020',
 'date': '2020-12-01',
 'favorite_count': 52035,
 'id_str': '947824196909961',
 'in_reply_to_user_id_str': None,
 'is_retweet': False,
 'retweet_count': 8678,
 'source': 'Twitter for Android',
 'text': 'Will be leaving for New York today at 4:00 P.M. '
         'Lot of work to be done, still it will be a wonderful New Year!',
 'time': 16148138720000000}
'total of 2789 examples'
train,test = all_data[:2056], all_data[2000:] # 2056 train examples, 600 test examples
exploratory = train[:90] # Take 90 examples from the training set for exploratory work

Partnered Companies

Hoodline, PitchBook, Harvard Law School, Newsela, Numerator, MIT, Microsoft

Share
Picture of Jayita Bhattacharyya

Jayita Bhattacharyya

Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.