Last updated October 17, 2020
In AI Mysteries

My first Web Scraping project – Analyzing Flipkart Product Reviews using Text Mining

My first Web Scraping project - Analyzing Flipkart Product Reviews using Text Mining - Scrape and Analyze customer reviews

Share

Published on July 28, 2020

by Prudhvi varma

E-commerce websites generate large amounts of textual data. These firms hire data science professionals to refine this unstructured data and gather meaningful insights from it that can help in understanding the end-user in a better way. For example, by analyzing product reviews, Flipkart can understand the insights of the product, Netflix can find users’ likeness on their content and we can’t imagine doing this analysis will happen without Text analytics.

Topics we cover in this article:

How to Extract Product reviews from Flipkart website
Preprocessing of the Extracted reviews
Extracting and Analyzing Positive reviews
Extracting and Negative reviews

In this article, we will extract the reviews of Macbook air laptop from the Flipkart website and perform text analysis.

Hands-on implementation of Flipkart review scarping

#Importing required libraries
import requests   
from bs4 import BeautifulSoup as bs 
import re 
import nltk
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import os

Extracting reviews from Flipkart for MacBook Air

Here we are going to extract reviews of Macbook air laptops from the URL.






#Scraping review using beautifulsoup
macbook_reviews=[]
for i in range(1,30):
  mac=[]  
  url="https://www.flipkart.com/apple-macbook-air-core-i5-5th-gen-8-gb-128-gb-ssd-mac-os-sierra-mqd32hn-a-a1466/product-reviews/itmevcpqqhf6azn3?pid=COMEVCPQBXBDFJ8C&page="+str(i)
  response = requests.get(url)
  soup = bs(response.content,"html.parser")# creating soup object to iterate over the extracted content 
  reviews = soup.findAll("div",attrs={"class","qwjRop"})# Extracting the content under specific tags  
  for i in range(len(reviews)):
    mac.append(reviews[i].text)  
  macbook_reviews=macbook_reviews+mac 
#here we saving the extracted data 
with open("macbook.txt","w",encoding='utf8') as output:
    output.write(str(macbook_reviews))

Until here we extracted reviews from the website and stored them in a file named macbook_reviews.

Preprocessing

The extracted product reviews include unwanted characters like spaces, capital letters, symbols, smiley emojis. We don’t want to include those unwanted characters in text analysis, so in preprocessing we need to clean the data by removing unwanted characters.

os.getcwd()
os.chdir("/content/chider")  

# Joining all the reviews into single paragraph 
mac_rev_string = " ".join(macbook_reviews) 

# Removing unwanted symbols incase if exists
mac_rev_string = re.sub("[^A-Za-z" "]+"," ",mac_rev_string).lower()
mac_rev_string = re.sub("[0-9" "]+"," ",mac_rev_string)   

#here we are splitting the words as individual string
mac_reviews_words = mac_rev_string.split(" ")

#removing the stop words
#stop_words = stopwords('english')

In the below code snippet, we will gather the words from the reviews and display it using the word cloud

with open("/content/stop.txt","r") as sw:
    stopwords = sw.read()
temp = ["this","is","awsome","Data","Science"]
[i for i in temp if i not in "is"]
mac_reviews_words = [w for w in mac_reviews_words if not w in stopwords]
mac_rev_string = " ".join(mac_reviews_words)
#creating word cloud for all words
wordcloud_mac = WordCloud(
                      background_color='black',
                      width=1800,
                      height=1400
                     ).generate(mac_rev_string)
plt.imshow(wordcloud_mac)

From word cloud output the words like good, read, the laptop appears in the bigger size that illustrates these words are repeated more times in the MacBook air reviews. By observing this word cloud, we can see the highlighted words like performance, battery, delivery, the laptop, we can’t conclude how the battery works and, how it performs, to get insights from this output we need to divide this into a positive and negative word cloud.

In the below code snippet we will extract Positive words from product reviews

with open("/content/positive-words.txt","r") as pos:
  poswords = pos.read().split("\n")  
  poswords = poswords[36:]

mac_pos_in_pos = " ".join ([w for w in mac_reviews_words if w in poswords])
wordcloud_pos_in_pos = WordCloud(
                      background_color='black',
                      width=1800,
                      height=1400
                     ).generate(mac_pos_in_pos)
plt.imshow(wordcloud_pos_in_pos)

#here we get wordcloud of all postive words in reviews

Here, through this positive word cloud, we can get some insights like the product was good, smooth, fast, awesome product, recommend to others, portable to use, beautiful product, these are the positive insights from the MacBook air product.

In the below code snippet we will extract Negative words from product reviews

with open("/content/negative-words.txt","r",encoding = "ISO-8859-1") as neg:
  negwords = neg.read().split("\n")
  negwords = negwords[37:]

# negative word cloud
# Choosing the only words which are present in negwords
mac_neg_in_neg = " ".join ([w for w in mac_reviews_words if w in negwords])

wordcloud_neg_in_neg = WordCloud(
                      background_color='black',
                      width=1800,
                      height=1400
                     ).generate(mac_neg_in_neg)
plt.imshow(wordcloud_neg_in_neg)

#here we are getting the most repeated negative Wordcloud

Now through this negative word cloud, we can illustrate that the product was lag, slow, crashed, we have issues in the product, it was so expensive, pathetic.

Conclusion

By analyzing the product reviews using text mining we gathered most appeared positive and negative words using the word clouds. We can conclude that text mining gains insights into customer sentiment and can help companies in addressing the problems. This technique provides an opportunity to improve the overall customer experience which returns huge profits.

Access all our open Survey & Awards Nomination forms in one place

Prudhvi varma

AI enthusiast, Currently working with Analytics India Magazine. I have experience of working with Machine learning, Deep learning real-time problems, Neural networks, structuring and machine learning projects. I am a Computer Vision researcher and I am Interested in solving real-time computer vision problems.