MITB Banner

My first Web Scraping project – Analyzing Flipkart Product Reviews using Text Mining

My first Web Scraping project - Analyzing Flipkart Product Reviews using Text Mining - Scrape and Analyze customer reviews
Flipkart review scraping

E-commerce websites generate large amounts of textual data. These firms hire data science professionals to refine this unstructured data and gather meaningful insights from it that can help in understanding the end-user in a better way. For example, by analyzing product reviews, Flipkart can understand the insights of the product, Netflix can find users’ likeness on their content and we can’t imagine doing this analysis will happen without Text analytics.

Topics we cover in this article:

  • How to Extract Product reviews from Flipkart website
  • Preprocessing of the Extracted reviews
  • Extracting and Analyzing Positive reviews 
  • Extracting and Negative reviews 

In this article, we will extract the reviews of Macbook air laptop from the Flipkart website and perform text analysis.

Hands-on implementation of Flipkart review scarping

#Importing required libraries
import requests   
from bs4 import BeautifulSoup as bs 
import re 
import nltk
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import os

Extracting reviews from Flipkart for MacBook Air

Here we are going to extract reviews of Macbook air laptops from the URL.








#Scraping review using beautifulsoup
macbook_reviews=[]
for i in range(1,30):
  mac=[]  
  url="https://www.flipkart.com/apple-macbook-air-core-i5-5th-gen-8-gb-128-gb-ssd-mac-os-sierra-mqd32hn-a-a1466/product-reviews/itmevcpqqhf6azn3?pid=COMEVCPQBXBDFJ8C&page="+str(i)
  response = requests.get(url)
  soup = bs(response.content,"html.parser")# creating soup object to iterate over the extracted content 
  reviews = soup.findAll("div",attrs={"class","qwjRop"})# Extracting the content under specific tags  
  for i in range(len(reviews)):
    mac.append(reviews[i].text)  
  macbook_reviews=macbook_reviews+mac 
#here we saving the extracted data 
with open("macbook.txt","w",encoding='utf8') as output:
    output.write(str(macbook_reviews))

Until here we extracted reviews from the website and stored them in a file named macbook_reviews.

Preprocessing 

The extracted product reviews include unwanted characters like spaces, capital letters, symbols, smiley emojis.  We don’t want to include those unwanted characters in text analysis, so in preprocessing we need to clean the data by removing unwanted characters.

os.getcwd()
os.chdir("/content/chider")  

# Joining all the reviews into single paragraph 
mac_rev_string = " ".join(macbook_reviews) 

# Removing unwanted symbols incase if exists
mac_rev_string = re.sub("[^A-Za-z" "]+"," ",mac_rev_string).lower()
mac_rev_string = re.sub("[0-9" "]+"," ",mac_rev_string)   

#here we are splitting the words as individual string
mac_reviews_words = mac_rev_string.split(" ")

#removing the stop words
#stop_words = stopwords('english')

In the below code snippet, we will gather the words from the reviews and display it using the word cloud

with open("/content/stop.txt","r") as sw:
    stopwords = sw.read()
temp = ["this","is","awsome","Data","Science"]
[i for i in temp if i not in "is"]
mac_reviews_words = [w for w in mac_reviews_words if not w in stopwords]
mac_rev_string = " ".join(mac_reviews_words)
#creating word cloud for all words
wordcloud_mac = WordCloud(
                      background_color='black',
                      width=1800,
                      height=1400
                     ).generate(mac_rev_string)
plt.imshow(wordcloud_mac)

















From word cloud output the words like good, read, the laptop appears in the bigger size that illustrates these words are repeated more times in the MacBook air reviews. By observing this word cloud, we can see the highlighted words like performance, battery, delivery, the laptop, we can’t conclude how the battery works and, how it performs, to get insights from this output we need to divide this into a positive and negative word cloud.

In the below code snippet we will extract Positive words from product reviews

with open("/content/positive-words.txt","r") as pos:
  poswords = pos.read().split("\n")  
  poswords = poswords[36:]

mac_pos_in_pos = " ".join ([w for w in mac_reviews_words if w in poswords])
wordcloud_pos_in_pos = WordCloud(
                      background_color='black',
                      width=1800,
                      height=1400
                     ).generate(mac_pos_in_pos)
plt.imshow(wordcloud_pos_in_pos)

#here we get wordcloud of all postive words in reviews
Flipkart review scraping 

Here, through this positive word cloud, we can get some insights like the product was good, smooth, fast, awesome product, recommend to others, portable to use, beautiful product, these are the positive insights from the MacBook air product.

In the below code snippet we will extract Negative words from product reviews 

with open("/content/negative-words.txt","r",encoding = "ISO-8859-1") as neg:
  negwords = neg.read().split("\n")
  negwords = negwords[37:]

# negative word cloud
# Choosing the only words which are present in negwords
mac_neg_in_neg = " ".join ([w for w in mac_reviews_words if w in negwords])

wordcloud_neg_in_neg = WordCloud(
                      background_color='black',
                      width=1800,
                      height=1400
                     ).generate(mac_neg_in_neg)
plt.imshow(wordcloud_neg_in_neg)

#here we are getting the most repeated negative Wordcloud
Flipkart review scraping 

Now through this negative word cloud, we can illustrate that the product was lag, slow, crashed, we have issues in the product, it was so expensive, pathetic.

Conclusion

By analyzing the product reviews using text mining we gathered most appeared positive and negative words using the word clouds. We can conclude that text mining gains insights into customer sentiment and can help companies in addressing the problems. This technique provides an opportunity to improve the overall customer experience which returns huge profits.

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Prudhvi varma

Prudhvi varma

AI enthusiast, Currently working with Analytics India Magazine. I have experience of working with Machine learning, Deep learning real-time problems, Neural networks, structuring and machine learning projects. I am a Computer Vision researcher and I am Interested in solving real-time computer vision problems.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories