10The client, the world’s leading computer technology company faced issues of classifying customer complaints:

The company used to captures volumes of “conversations” across different customer feedback forums as well as agent notes related to customer calls that are stored in databases. It had a large team that extracted samples of these “conversations” and codes these conversations into one or more pre-defined code frames and categorizes them accordingly.

The engagement scope outlined building a “text mining and codification engine” through the application of Natural Language Processing to automatically categorize conversations based on different buckets. This categorization should in turn help Microsoft analyse the text data objectively and quantitatively. The focus was to improve the accuracy of the categorization throughout the engagement lifecycle. The task is also to identify key metrics and formulate visualization capabilities for insights from unstructured (primarily textual) data.


The overall solution had four major milestones as defined below.

text m

Step 1: Data Access

BMI accessed textual data for the 6 levels of classification using a secured connection and downloaded it onto a secure location within BMI premises

Step 2 : Data Loading

Merged and cleaned forum and agent transcript data to create 130,000 rows

Data received had a breadth of topics and had 6 levels of classification for each row. Level 5 has the highest number of distinct nodes at 4277 followed by Level 3 at 616 nodes 

Step 3: Classification Design

Evaluated TAXIS at 3 levels

Used comprehensive Natural Language Processing techniques to ensure most categories are captured –

Used Stemming approach to reduce inflected verb forms

Enhanced the results further by combining the earlier techniques, For e.g. Basic and NLP refined

Converted multi-class records into single class by applying basic as well as weightage technique

Step 4: Development Accuracy

Development set was fed into the classification engine (internal benchmark set at 80%)

Step 5: Validation Accuracy

Validation set was fed into the classification engine (internal benchmark set at 70%)

Step 6: Reporting

Reports were designed, as per the parameters specified in the reporting framework

Step 7: QA Design and Process

Tested using unseen or unlabeled data to ensure  whether the auto-coded results can be generalized and be used to accurately classify data points


The text mining framework was able to classify the texts automatically. There were five levels of classification ensuring that the result was accurate. This enabled the company to take quick informed decisions based on customer feedback.

Leave a Reply