Now Reading
Advanced text mining by extracting insights from various forums helps in addressing and analyzing customer feedback


Advanced text mining by extracting insights from various forums helps in addressing and analyzing customer feedback


10The client, the world’s leading computer technology company faced issues of classifying customer complaints:

  • The client was classifying customer issues and complaints manually based on customers complaints posted on the company’s multitude of Tech Forums and call center agent transcripts
  • The client wanted to automate the classification process of textual data
  • The client wanted a comprehensive dictionary based on a thorough understanding of the entire dataset so that this dictionary could be used for standardizing other mediums through which customer queries can be classified such as email, speech, and social media

The company used to captures volumes of “conversations” across different customer feedback forums as well as agent notes related to customer calls that are stored in databases. It had a large team that extracted samples of these “conversations” and codes these conversations into one or more pre-defined code frames and categorizes them accordingly.



The engagement scope outlined building a “text mining and codification engine” through the application of Natural Language Processing to automatically categorize conversations based on different buckets. This categorization should in turn help Microsoft analyse the text data objectively and quantitatively. The focus was to improve the accuracy of the categorization throughout the engagement lifecycle. The task is also to identify key metrics and formulate visualization capabilities for insights from unstructured (primarily textual) data.

Solution:

The overall solution had four major milestones as defined below.

text m

Step 1: Data Access

BMI accessed textual data for the 6 levels of classification using a secured connection and downloaded it onto a secure location within BMI premises

Step 2 : Data Loading

Merged and cleaned forum and agent transcript data to create 130,000 rows

Data received had a breadth of topics and had 6 levels of classification for each row. Level 5 has the highest number of distinct nodes at 4277 followed by Level 3 at 616 nodes 

Step 3: Classification Design

Evaluated TAXIS at 3 levels

Used comprehensive Natural Language Processing techniques to ensure most categories are captured –

  • Verb replacement
  • Lemmatization
  • Normalization

Used Stemming approach to reduce inflected verb forms

Enhanced the results further by combining the earlier techniques, For e.g. Basic and NLP refined

Converted multi-class records into single class by applying basic as well as weightage technique

Step 4: Development Accuracy

Development set was fed into the classification engine (internal benchmark set at 80%)

See Also

Step 5: Validation Accuracy

Validation set was fed into the classification engine (internal benchmark set at 70%)

Step 6: Reporting

Reports were designed, as per the parameters specified in the reporting framework

Step 7: QA Design and Process

Tested using unseen or unlabeled data to ensure  whether the auto-coded results can be generalized and be used to accurately classify data points

Outcome:

The text mining framework was able to classify the texts automatically. There were five levels of classification ensuring that the result was accurate. This enabled the company to take quick informed decisions based on customer feedback.

  • Achieved accuracy close to 60% in the 1st iteration
  • Team built a comprehensive dictionary based on a thorough understanding of the entire dataset
  • The technology major has been able to identify the major points of concern for its customers
  • Time and cost for identification of customer issues have been reduced, as classified information (from unofficial forums and mediums is now available) is now readily available
  • Identified overlapping categories and organized under one category to represent the data
  • Identified duplicate categories and removed the additional categories
  • Overall, helped the client revisit their categorization process and established appropriate categories
  • Provided a roadmap to enable classification of issues on general blogs and forums
  • In future, this process can be executed in a similar manner to potentially cover data from other sources like email, speech and other forms of social media



Register for our upcoming events:


Enjoyed this story? Join our Telegram group. And be part of an engaging community.


Our annual ranking of Artificial Intelligence Programs in India for 2019 is out. Check here.

Provide your comments below

comments

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Scroll To Top