Flair: Hands-on Guide to Robust NLP Framework Built Upon PyTorch

Flair is a powerful open-source library for natural language processing to get insight from text extraction, named entity recognition.

Flair is a powerful open-source library for natural language processing. It is mainly used to get insight from text extraction, word embedding, named entity recognition, parts of speech tagging, and text classification. All these features are pre-trained in flair for NLP models. It also supports biomedical data that is more than 32 biomedical datasets already using flair library for natural language processing tasks. Easily integrated with Pytorch NLP framework for embedding in document and sentence.

Humboldt University of Berlin and friends mainly develop flair. The Humboldt University of Berlin maintains the Flair library and has already done more than a hundred industry project implementations and research-based projects using Flair.

Github: https://github.com/flairNLP/flair

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Research Paper: 

  1. http://alanakbik.github.io/papers/coling2018.pdf
  2. https://www.aclweb.org/anthology/N19-4010/
  3. https://www.aclweb.org/anthology/C18-1139/

Let’s look at Flair’s performance based on the nlp task such as named entity recognition, parts of speech tagging, and chunking with their accuracy in the table below.


Download our Mobile App



Installation:

Using pip:

pip install flair

Source:https://pypi.org/project/flair/

Using conda:

conda install -c bioconda flair

Source:https://anaconda.org/bioconda/flair

Flair Model:

First, import sentences from flair’s data library, then import the model for SequenceTagger. Make a sentence using the Sentence object, then load Named entity recognition on SequenceTagger, then run the code.

For an example of the flair model, see the code below.

 from flair.data import Sentence
 from flair.models import SequenceTagger
 # make a sentence
 sentence = Sentence('I love India .')
 # load the NER tagger
 tagger = SequenceTagger.load('ner')
 # run NER over sentence
 tagger.predict(sentence) 

Flair has  the following pre-trained models for NLP Tasks:

  • Name-Entity Recognition 
  • Parts-of-Speech Tagging 
  • Text Classification
  • Training Custom Models

Tokenization:

In the flair library, there is a predefined tokenizer using the segtok library of python. To

use the tokenization just the “use_tokenizer”  flag value is true. If not want to implement the write false. We can also define the label of each sentence and its related topic using the function add_tag.

For example, see the code below:

 from flair.data import Sentence
 # Make a sentence object by passing an untokenized string and the 'use_tokenizer' flag
 untokenized_sentence = Sentence('The grass is green.', use_tokenizer=False
 # Print the object to see what's in there
 print(untokenized_sentence) 

In this case, no tokenization occurs use_tokenizer is false.

Source:https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_1_BASICS.md

Word Embeddings:

Here is the list of embedding in the library. We will learn about flair library in detail, and there code implementation.

Flair Embedding:

Effective embeddings are contextual string embeddings that capture latent syntactic-semantic data that goes beyond standard word embedding. The main differences are: 

(1) Without any clear notion of vocabulary, they are educated and thus essentially model words as character sequences. 

(2) they are contextualized by their surrounding text, meaning that depending on their contextual use, the same word will have distinct embeddings.

Code:

 from flair.embeddings import FlairEmbeddings
 # init embedding
 flair_embedding_forward = FlairEmbeddings('news-forward')
 # create a sentence
 sentence = Sentence('The grass is green .')
 # embed words in sentence
 flair_embedding_forward.embed(sentence) 

Training a Text Classification Model:

We are training a text classifier over the TREC-6 corpus, using a combination of simple GloVe embeddings and Flair embeddings.

In this code, import Corpus and TREC_6 for datasets, WordEmbeddings, FlairEmbeddings, and Document RNN Embeddings, TextClassifier, ModelTrainer.  

Code:

 from flair.data import Corpus
 from flair.datasets import TREC_6
 from flair.embeddings import WordEmbeddings, FlairEmbeddings, DocumentRNNEmbeddings
 from flair.models import TextClassifier
 from flair.trainers import ModelTrainer
 # 1. get the corpus
 corpus: Corpus = TREC_6()
 # 2. create the label dictionary
 label_dict = corpus.make_label_dictionary()
 # 3. make a list of word embeddings
 word_embeddings = [WordEmbeddings('glove')]
 # 4. initialize document embedding by passing a list of word embeddings
 # Can choose between many RNN types (GRU by default, to change use rnn_type parameter)
 document_embeddings = DocumentRNNEmbeddings(word_embeddings, hidden_size=256)
 # 5. create the text classifier
 classifier = TextClassifier(document_embeddings, label_dictionary=label_dict)
 # 6. initialize the text classifier trainer
 trainer = ModelTrainer(classifier, corpus)
 # 7. start the training
 trainer.train('resources/taggers/trec',
               learning_rate=0.1,
               mini_batch_size=32,
               anneal_factor=0.5,
               patience=5,
               max_epochs=150) 

Source:https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_7_TRAINING_A_MODEL.md

Summary

We learn about the Flair open-source library for NLP problems. We also covered the area about NLP and the use of Flair to solve the tasks and their use in the industry. Some important Flair pipelines and their code in the development of pre-trained NLP models.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Amit Singh
Amit Singh is Data Scientist, graduated in Computer Science and Engineering. Data Science writer at Analytics India Magazine.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: Evolution of Data Science: Skillset, Toolset, and Mindset

In my opinion, there will be considerable disorder and disarray in the near future concerning the emerging fields of data and analytics. The proliferation of platforms such as ChatGPT or Bard has generated a lot of buzz. While some users are enthusiastic about the potential benefits of generative AI and its extensive use in business and daily life, others have raised concerns regarding the accuracy, ethics, and related issues.