Active Hackathon

# How To Perform Set Operations On Pandas DataFrames

Through this article, we will understand what these set operations are and how they are used for comparison. In this experiment, we will first create two data frames and then will perform these sets of operations.

In Data Science we often extract and scrape data from multiple sources. While analyzing this data we come to situations where we need to do a comparison of different data frames, for example, checking what all is different in each of the data frames or what is common in both the data frames. To achieve this we have different ways also known as set operations like Union, Intersection, and Difference. Through this article, we will understand what these set operations are and how they are used for comparison. In this experiment, we will first create two data frames and then will perform these sets of operations.

#### THE BELAMY

1. What are Set Operations in Pandas Dataframe?
2. What is Union Operation? How to perform this?
3. What is Intersection Operation? How to perform this?
4. What is the Difference Operation? How to perform this operation?
1. What are Set Operations?

Set operations are the mathematical operations that are used for comparison purposes. Consider we have two data frames having 2 columns each containing students with their ID who are enrolled in different courses that are Machine learning, NLP, and Computer Vision. Now we want to look for all the students or students who are in ML but now in NLP and combinations like these. Refer to the below tables for all the three courses.

Machine Learning                                   NLP                                                  CV

Now we will create these three tables using pandas. Use the below code to do the same. First, we will import the pandas’ package, and then we will create these tables.

```import pandas as pd
ML_df = pd.DataFrame ({"Name":["Rohit","Arpit","Chiranjeev","Piyush"],
"Student_ID":["101","102","103","104"]})
NLP_df = pd.DataFrame ({"Name":["Rohit","Aman","Ayush","Piyush"],
"Student_ID":["101","105","106","104"]})
CV_df = pd.DataFrame ({"Name":["Rohit","Arpit","Pawan","Ayush"],
"Student_ID":["101","102","107","106"]})```
1. What is Union Operation? How to perform this?

Union operation is an operation that counts everything present in all the tables. Suppose in this case we need to find all the students enrolled in all three courses with their ID then we will make use of Union Operation.

`All Students = ML ∪ NLP ∪ CV `

Use the below code to compute union between all three data frames.

`all_students = pd.concat([ML_df,NLP_df,CV_df], ignore_index = True)`

`all_students = all_students.drop_duplicates()`

`print(all_students)`

Output:

1. What is Intersection Operation? How to perform this?

The intersection is opposite of union where we only keep the common between the two data frames. Consider we have to pick those students that are enrolled for both ML and NLP courses or students that are there in ML and CV. Refer to the below to code to understand how to compute the intersection between two data frames.

`Common_ML_NLP = ML ∩ NLP `

`Common_ML_NLP = ML_df.merge(NLP_df)`

`print(Common_ML_NLP)`

Output:

`Common_ML_CV = ML ∩ CV`

`Common_ML_CV = ML_df.merge(CV_df)`

`print(Common_ML_CV)`

Output:

1. What is the Difference Operation? How to perform this operation?

It is the type of operation that is done on a data frame to pick the data that is not common in both the data frame or the difference in the two. Consider in this case we need to find students that are only present in ML or NLP. That means we need to compute data that is uncommon in both the data frames. Refer to the below code to compute the same.

`ML_NLP = ML_df[ML_df.Student_ID.isin(NLP_df.Student_ID) == False]`

`print(ML_NLP) `

Output:

`ML_CV = ML_df[ML_df.Student_ID.isin(CV_df.Student_ID) == False]`

`print(ML_CV) `

Output:

Conclusion

In this article, we discussed the basic set of operations of pandas that are performed between different data frames to compute similarity, dissimilarity, and common data between the data frame. We first checked the union operation followed by intersection and different operations. These are very useful sets of operations that are used to manipulate your data frames well and understand the data.

## More Great AIM Stories

### Why Is Federated Learning Getting So Popular

I am currently enrolled in a Post Graduate Program In Artificial Intelligence and Machine learning. Data Science Enthusiast who likes to draw insights from the data. Always amazed with the intelligence of AI. It's really fascinating teaching a machine to see and understand images. Also, the interest gets doubled when the machine can tell you what it just saw. This is where I say I am highly interested in Computer Vision and Natural Language Processing. I love exploring different use cases that can be build with the power of AI. I am the person who first develops something and then explains it to the whole community with my writings.

## Our Upcoming Events

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Telegram Channel

Discover special offers, top stories, upcoming events, and more.

### Build a dream career in data engineering with the MTech program from SRM Institute of Science & Technology

The program enables participants to gain an in-depth understanding of data science, artificial intelligence and machine learning techniques and tools widely used in the industry.

### India’s Digital Agriculture Mission is about people, not projects

Indian agritech companies aim to provide innovative solutions and actionable advice.

### The Data Analyst Job Openings in Indian Government

Emerging data analyst profiles in government organisations at central and state levels.

### The Insane Salaries of Indian IT CEOs

HCL Technologies’ C Vijayakumar is the highest paid CEO among Indian software companies, with a whopping \$16.52 million remuneration

### Can TikTok be the Giant Slayer?

Facebook’s parent company Meta posted its first-ever revenue decline after a stupendous run for years.

### Council Post: Enabling a Data-Driven culture within BFSI GCCs in India

Data is the key element across all the three tenets of engineering brilliance, customer-centricity and talent strategy and engagement and will continue to help us deliver on our transformation agenda. Our data-driven culture fosters continuous performance improvement to create differentiated experiences and enable growth.

### Indian IT is Trying to Make Their Metaverse Happen

TCS is working on 60 metaverse projects globally.

### Should we call Rust a Failed Programming Language?

Rust has been ranked as the most liked language by its users for two years in surveys but programmers say otherwise

### WhatsApp Journeys – Instant Gratification with No frills

It is not merely the availability of customers on WhatsApp that is of value but also, the ease of their journey.

### Ouch, Cognizant

The company has reduced its full-year 2022 revenue growth guidance to 8.5% – 9.5% in constant currency from the 9-11% in the previous quarter