MITB Banner

My first Web Scraping project – Analyzing Flipkart Product Reviews using Text Mining

My first Web Scraping project - Analyzing Flipkart Product Reviews using Text Mining - Scrape and Analyze customer reviews

Share

Flipkart review scraping

E-commerce websites generate large amounts of textual data. These firms hire data science professionals to refine this unstructured data and gather meaningful insights from it that can help in understanding the end-user in a better way. For example, by analyzing product reviews, Flipkart can understand the insights of the product, Netflix can find users’ likeness on their content and we can’t imagine doing this analysis will happen without Text analytics.

Topics we cover in this article:

  • How to Extract Product reviews from Flipkart website
  • Preprocessing of the Extracted reviews
  • Extracting and Analyzing Positive reviews 
  • Extracting and Negative reviews 

In this article, we will extract the reviews of Macbook air laptop from the Flipkart website and perform text analysis.

Hands-on implementation of Flipkart review scarping

#Importing required libraries
import requests   
from bs4 import BeautifulSoup as bs 
import re 
import nltk
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import os

Extracting reviews from Flipkart for MacBook Air

Here we are going to extract reviews of Macbook air laptops from the URL.








#Scraping review using beautifulsoup
macbook_reviews=[]
for i in range(1,30):
  mac=[]  
  url="https://www.flipkart.com/apple-macbook-air-core-i5-5th-gen-8-gb-128-gb-ssd-mac-os-sierra-mqd32hn-a-a1466/product-reviews/itmevcpqqhf6azn3?pid=COMEVCPQBXBDFJ8C&page="+str(i)
  response = requests.get(url)
  soup = bs(response.content,"html.parser")# creating soup object to iterate over the extracted content 
  reviews = soup.findAll("div",attrs={"class","qwjRop"})# Extracting the content under specific tags  
  for i in range(len(reviews)):
    mac.append(reviews[i].text)  
  macbook_reviews=macbook_reviews+mac 
#here we saving the extracted data 
with open("macbook.txt","w",encoding='utf8') as output:
    output.write(str(macbook_reviews))

Until here we extracted reviews from the website and stored them in a file named macbook_reviews.

Preprocessing 

The extracted product reviews include unwanted characters like spaces, capital letters, symbols, smiley emojis.  We don’t want to include those unwanted characters in text analysis, so in preprocessing we need to clean the data by removing unwanted characters.

os.getcwd()
os.chdir("/content/chider")  

# Joining all the reviews into single paragraph 
mac_rev_string = " ".join(macbook_reviews) 

# Removing unwanted symbols incase if exists
mac_rev_string = re.sub("[^A-Za-z" "]+"," ",mac_rev_string).lower()
mac_rev_string = re.sub("[0-9" "]+"," ",mac_rev_string)   

#here we are splitting the words as individual string
mac_reviews_words = mac_rev_string.split(" ")

#removing the stop words
#stop_words = stopwords('english')

In the below code snippet, we will gather the words from the reviews and display it using the word cloud

with open("/content/stop.txt","r") as sw:
    stopwords = sw.read()
temp = ["this","is","awsome","Data","Science"]
[i for i in temp if i not in "is"]
mac_reviews_words = [w for w in mac_reviews_words if not w in stopwords]
mac_rev_string = " ".join(mac_reviews_words)
#creating word cloud for all words
wordcloud_mac = WordCloud(
                      background_color='black',
                      width=1800,
                      height=1400
                     ).generate(mac_rev_string)
plt.imshow(wordcloud_mac)

















From word cloud output the words like good, read, the laptop appears in the bigger size that illustrates these words are repeated more times in the MacBook air reviews. By observing this word cloud, we can see the highlighted words like performance, battery, delivery, the laptop, we can’t conclude how the battery works and, how it performs, to get insights from this output we need to divide this into a positive and negative word cloud.

In the below code snippet we will extract Positive words from product reviews

with open("/content/positive-words.txt","r") as pos:
  poswords = pos.read().split("\n")  
  poswords = poswords[36:]

mac_pos_in_pos = " ".join ([w for w in mac_reviews_words if w in poswords])
wordcloud_pos_in_pos = WordCloud(
                      background_color='black',
                      width=1800,
                      height=1400
                     ).generate(mac_pos_in_pos)
plt.imshow(wordcloud_pos_in_pos)

#here we get wordcloud of all postive words in reviews
Flipkart review scraping 

Here, through this positive word cloud, we can get some insights like the product was good, smooth, fast, awesome product, recommend to others, portable to use, beautiful product, these are the positive insights from the MacBook air product.

In the below code snippet we will extract Negative words from product reviews 

with open("/content/negative-words.txt","r",encoding = "ISO-8859-1") as neg:
  negwords = neg.read().split("\n")
  negwords = negwords[37:]

# negative word cloud
# Choosing the only words which are present in negwords
mac_neg_in_neg = " ".join ([w for w in mac_reviews_words if w in negwords])

wordcloud_neg_in_neg = WordCloud(
                      background_color='black',
                      width=1800,
                      height=1400
                     ).generate(mac_neg_in_neg)
plt.imshow(wordcloud_neg_in_neg)

#here we are getting the most repeated negative Wordcloud
Flipkart review scraping 

Now through this negative word cloud, we can illustrate that the product was lag, slow, crashed, we have issues in the product, it was so expensive, pathetic.

Conclusion

By analyzing the product reviews using text mining we gathered most appeared positive and negative words using the word clouds. We can conclude that text mining gains insights into customer sentiment and can help companies in addressing the problems. This technique provides an opportunity to improve the overall customer experience which returns huge profits.

Share
Picture of Prudhvi varma

Prudhvi varma

AI enthusiast, Currently working with Analytics India Magazine. I have experience of working with Machine learning, Deep learning real-time problems, Neural networks, structuring and machine learning projects. I am a Computer Vision researcher and I am Interested in solving real-time computer vision problems.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.