Named Entity Recognition (NER) is one of the major tasks that are required to be performed in many NLP tasks. Achieving state-of-the-art performance in NER is very difficult. Transformers can be considered as the pre-trained models that can be used in every aspect of data science to achieve high performance. We can also perform NER using transformers. In this article, we will discuss how we can use a BERT transformer for accomplishing the NER in a very easy way. The major points to be discussed in the article are listed below.
Table of contents
- What is Named Entity Recognition (NER)?
- What is BERT?
- Applying BERT for NER
Let’s start by introducing Named Entity Recognition (NER).
What is Named Entity Recognition (NER)?
In one of the previous articles, we covered that in any text document, named entities are the words that are objects of the real world. Examples of Named entities can be the name of any person, place, or thing. Named entities have their own identity in the corpus. Virat Kohli, Delhi, Lenovo laptop can be an example of a Named Entity in any text data.
Named entities can be of different classes like Virat Kohli is the name of a person and Lenovo is the name of a company. The process of recognizing such entities with their class and specification can be considered as Named Entity Recognition. In traditional ways of performing NER, we mostly find usage of spacy and NLTK.
There can be a variety of applications of NER in natural language processing. For example, we use this for summarizing information from the documents and search engine optimization, content recommendation, and identification of different Biomedical subparts processes.
In this article, we aim to make the implementation of NER easy and using transformers like BERT we can do this. Implementation of NER will be performed using BERT, so we are required to know what BERT is, which we will explain in our next section.
What is BERT?
In one of the previous articles, we had a detailed introduction to BERT. BERT stands for Bidirectional Encoder Representations from Transformers. It is a famous transformer in the field of NLP. This transformer is a pre-trained transformer like the others. The training for this transformer is performed by deep bidirectional representation from the unlabeled text by jointly conditioning on both left and right context and using the data from English Wikipedia(2500 M words) and wordsBooksCorpus.
Talking about the variants of BERT, we get BERT BASE and BERT LARGE pre-trained transformers. The BERT BASE variant is a composition of 12 encoder layers, 12 attention heads, and 768 feedforward networks. The BERT LARGE variant has 24 encoder layers with 16 attention heads and 1024 feed-forward networks. Also with this article, we can utilize a beginner guide to using BERT for text classification. Next in this article, we will be using the BERT model for the NER process of NLP. Let’s see how we can do this.
Applying BERT for NER
This article is focused on making the procedure of NER easy and trustworthy. Since we know about the success of transformers in machine learning, we can say that for NER using a transformer can be reliable for us. Also, BERT is one of the most successful transformers for performing different NLP tasks to achieve a state-of-art performance. In this article, we will proceed with installing the transformer library using which we can utilize many transformers.
Installation
Installation of the transformer library can be performed using the following lines of codes.
!pip install transformers
Output:

After installation, we are ready to use the BERT transformer for Named Entity Recognition.
Importing libraries
In this implementation, we are going to use modules only from the transformer library.
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
From the library, we have imported an auto tokenizer for tokenizing the words and a model for automatic token classification.
Instantiation of BERT
In this implementation, we are going to use a variant of the BERT model Named Bert-base-NER which is a fine-tuned BERT model for Named Entity Recognition. We can achieve state-of-the-art performance in NER tasks using this model. This also has two variants – base and large one like we have discussed above. We can instantiate the model using the following lines of codes.
tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")
Output:

After instantiation, we are ready to use bert-base-NER.
Defining pipeline and example
Here we will use the pipeline module of the transformer library for defining a pipeline through which we can pass our data.
NER = pipeline("ner", model=model, tokenizer=tokenizer)
example = "My name is yugesh and I live in India"
In the above codes, we have defined one pipeline and an example for testing the results.
Fitting and evaluating pipeline
After performing the above-given procedures we are ready to use the pipeline in the example.
results = NER(example)
results
Output:

Here we can see the results of NER. in the output pipeline we can see the probability with the class of the Named Entity. This model is trained using the following abbreviation:
Abbreviation | Description |
O | Outside of a Named Entity |
B-MIS | Beginning of a miscellaneous Entity right after another miscellaneous Entity |
I-MIS | Miscellaneous Entity |
B-PER | Beginning of a person’s name right after another person’s name |
I-PER | Person’s name |
B-ORG | Beginning of an organization right after another organization |
B-ORG | Beginning of an organization right after another organizationI-ORG organization |
I-LOC | Location |
B-LOC | Beginning of a location right after another location |
Final words
In this article, we have gone through a small introduction to the Named Entity Recognition and BERT model. Through a hands-on implementation, we could understand how we can use a transformer for named entity recognition in a very easy way and just in a few steps.
References