MITB Banner

Guide To Regular Expression(Regex) with Python Codes

Share

Introduction

A regular expression (regex, regexp) is a string-searching algorithm, which you can use for making a search pattern in a sequence of characters or strings. Usually, these patterns are used to find or find and replace operations. 

Regular expressions are commonly used in search engines, text processing, web scraping, pattern matching etc. With this, we specify the rules for matching a set of possible strings; by the rules, you ask questions such as “does that string is contained by a particular set of string”, “in what places this pattern is available”. Regex can also be used for the modification of strings in various ways. This article will cover some common uses of regular expressions using the regex (re) module in python.

Python has a built-in package called re for regex, which contains various functions such as findall, search, split, sub etc. In addition, the re module provides a set of functions to search a particular pattern or patterns of the strings. 

Let’s get started with the module Regular Expression(re)

Python has a built-in module package called re, which usually help to work with Regular Expressions.

Import the re module.

import re

After importing the re module let’s look at a basic search operation: 

 pattern = "^analytics.*magazine$"
 test_string = 'analytics india magazine'
 result = re.search(pattern, test_string)
 if result:
   print("Search successful.")
 else:
   print("Search unsuccessful.")  

Output:

In this example, I searched for analytics and magazine word, in the string ‘analytics india magazine’. To get more into re, knowledge of metacharacters is necessary. Metacharacters are special characters that affect how the regular expression finds the patterns and are mostly used to define the pattern of search or manipulation.

Below, there is a list of metacharacters:

Image Source

For some examples of metacharacter, you can go through this link.

Special sequence 

There are some basic predefined character classes, which are represented by the special sequence. Each special sequence has a unique meaning that helps us find or match other strings or sets of strings using a specialized syntax present in a pattern. The special sequence consists of alphabetic characters lead by / (backlash).

Regex special sequences and their meanings are given in the following table:

                                                       Image source 

For some examples of special sequences, you can go through this link.

Sets (character sets or character classes) 

Character sets are a predefined range of characters enclosed by a square bracket. With the use of sets, we can match only one out of several characters. Simply place the character you want to match in a square bracket. For example, we can use we[ea]k to match either week or weak.

This is a common feature of regular expressions. You can search for a word, even if it is misspelled. There are some sets in python regex with their special meaning list below.

                                                 Image source                               

For some examples of character sets, you can go through this link.

The compile() function

re.compile(pattern, repl, string):

In the compile function, we can combine/compile expressions into an object which can be used for further matching. In this article, we will be using this function with other functions also. In python, we can use it like this-

Input:

 import re
 pattern=re.compile('analytics india magazine')
 print(pattern) 

Output :

The findall() function

This function is used for finding the matches of any specified pattern in an object. Below is an example of it.

Input: 

 import re
 pattern=re.compile('analytics india magazine')
 txt = "The article in analytics india magazine"
 x = re.findall(pattern, txt)
 print(x) 

Output: 

If there is no match present in the text, then it will give an empty list.

Input: 

 pattern=re.compile('nothing')
 txt = "The article in analytics india magazine"
 x = re.findall(pattern, txt)
 print(x) 

Output:

The above example is findall function where the object has the pattern which we compiled.

The search() function

The search function searches for the string compiled into a pattern or present in an object and tells the place from where it starts.

Input: 

 txt = "i love the analytics india magazine"
 pattern=re.compile('analytics india magazine')
 x = re.search(pattern, txt)
 print("The string start from the position:", x.start(), pattern)  

Output: 

If no matches are found, it will give None.

Input:

 txt = "i love the analytics india magazine"
 pattern=re.compile('nothing')
 x = re.search(pattern, txt)
 print(x) 

Output:

Match object

Match object is a function used for asking the information about the search we have done before. We can retrieve three functions under it:

  • object.string()
  • Object.span()
  • object .group()

object.string()

It searches for the pattern in a string or set of strings passed in the search function.

Input:

 txt = "i love the analytics india magazine"
 pattern=re.compile('analytics india magazine')
 x = re.search(pattern, txt)
 print(x.string) 

Output:

object.span()

We use it to know the start and end position in the search function; it returns a tuple. 

Input:

 txt = "i love the analytics india magazine"
 pattern=re.compile('analytics india magazine')
 x = re.search(pattern, txt)
 print(x.span()) 

Output:

object.group()

    This function gives the part of the string where the search function finds the

 match. 

Input :

 txt = "i love the analytics india magazine"
 pattern=re.compile('analytics india magazine')
 x = re.search(pattern, txt)
 print(x.group()) 

Output:

The split function

The split function returns a list of string which is split from the specified separator. 

Input :

 txt = "analytics_india magazine"
 x = re.split("\s", txt)
 print(x) 

Output:

Splitting text using _(underscore) separator:

Input: 

 txt = "analytics_india magazine"
 x = re.split("_", txt)
 print(x) 

Output:

 Sub Function

The sub function replaces the text of the object with the text of your choice.

Input:

 txt = "i love the analytics india magazine"
 x = re.sub("\s", "_", txt)
 print(x) 

Output:

Application of Regex

  • Regex is widely used in text processing.
  • This is also widely used in search engines for making users searching experience better. 
  • Regex is used in data scraping(web scraping),  data mugging, wrangling, and many other tasks.

These are the basics of Regex we have seen in the article. We discussed and implemented some functions, metacharacters, special sequences and character sets. There are a lot of use cases of regex. It is necessary to be aware of every function and use of a character that will help you understand re.

Reference:

All the information written in this article is gathered from :

PS: The story was written using a keyboard.
Picture of Yugesh Verma

Yugesh Verma

Yugesh is a graduate in automobile engineering and worked as a data analyst intern. He completed several Data Science projects. He has a strong interest in Deep Learning and writing blogs on data science and machine learning.
Related Posts

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories

Featured

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

AIM Conference Calendar

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives. Revel in intimate events that encapsulate the heart and soul of the AI Industry.

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed