Texthero Guide: A Python Toolkit for Text Processing

TextHero Text Processing

Text processing is a method to extract and analyze information from textual datasets. Textual datasets contain data in text formats and are used to store some useful information. Processing the textual data is important in order to clean, analyze, and visualize the data and further use it for machine learning models.

Texthero is one such library that is used to analyze and process the textual datasets and make them zero to hero. It is a python package that is used to work with textual data efficiently and quickly.

In this article, we will try to explore texthero and its text processing capabilities. We will see how efficiently and easily we can process data using texthero.


Sign up for your weekly dose of what's up in emerging technology.


Like any other library, we first need to install texthero using pip install texthero.

  1. Importing required libraries

We will be importing texthero for text processing and pandas for loading the dataset and manipulating it.  

Download our Mobile App

import pandas as pd

import texthero as hero

  1. Loading the dataset

The dataset we will be using here can be downloaded from Kaggle. This dataset contains certain attributes which we will analyze but we will mainly focus on the ‘content’ column.

df = pd.read_csv(‘text.csv’)


Dateset Used
  1. Processing the dataset

We can see that our dataset contains a sentiment analysis of tweets of different authors. We will focus on the tweets and will try and apply different functions used for text processing using Texthero.

We will start by cleaning the text in the ‘content’ column which is the tweets by the users. We will clean the text and store it in a new column.

  1. Preprocessing the Text
  • Cleaning the text

df['clean_content'] = hero.clean(df['content']) 


The clean function has certain defined properties which like, it removes all stopwords, punctuations, digits, whitespaces, etc. Also, it converts the text into all lowercase. We can use all these functions separately according to our wish.

  • Tokenize the text

Tokenize function returns a pandas series where each row contains a list of tokens


  • Stemming

Stemming means removing the end of words with a heuristic process. Stem function makes use of two NLTK stemming algorithms known as Snowball Stemmer and Porter Stemmer. 

hero.stem(df['clean_content'], stem=’snowball’)

  1. Visualize the Cleaned Text

There are many ways of visualizing the textual data, here we will use ‘Wordcloud’ to visualize the cleaned data we created.

hero.visualization.wordcloud(df['clean_content'], width= 250, height = 150, max_words=200, background_color='WHITE')

Word-cloud of Clean Text, Texthero

Similarly, we can visualize the most frequently used words or the top used words using the top_words visualization by TextHero.     


Top words visualization, Texthero
  1. NLP Operations on Text

Now we will implement some of the NLP operations provided by TextHero on our data.

  • Named Entities

Named entities function returns a Pandas Series where each row contains a list of tuples containing information regarding the given named entities. We will be using the spacy as a package here. 

hero.named_entities(df['clean_content'], package='spacy')

  • Noun Chunks

It returns a group of consecutive word that belongs together. As our dataset is pretty large so we will analyze the noun chunks in only 100 rows.


Noun Chunks, Texthero


In this article, we learned about TextHero, a python library used for text processing. We saw how we can use texthero for basic preprocessing, visualization and then performed some NLP operations on the text. Texthero is simple and easy to use with a wide variety of text processing functions.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Himanshu Sharma
An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox