- The client was classifying customer issues and complaints manually based on customers complaints posted on the company’s multitude of Tech Forums and call center agent transcripts
- The client wanted to automate the classification process of textual data
- The client wanted a comprehensive dictionary based on a thorough understanding of the entire dataset so that this dictionary could be used for standardizing other mediums through which customer queries can be classified such as email, speech, and social media
The company used to captures volumes of “conversations” across different customer feedback forums as well as agent notes related to customer calls that are stored in databases. It had a large team that extracted samples of these “conversations” and codes these conversations into one or more pre-defined code frames and categorizes them accordingly.
The engagement scope outlined building a “text mining and codification engine” through the application of Natural Language Processing to automatically categorize conversations based on different buckets. This categorization should in turn help Microsoft analyse the text data objectively and quantitatively. The focus was to improve the accuracy of the categorization throughout the engagement lifecycle. The task is also to identify key metrics and formulate visualization capabilities for insights from unstructured (primarily textual) data.
The overall solution had four major milestones as defined below.
Over 100,000 people subscribe to our newsletter.
See stories of Analytics and AI in your inbox.
Step 1: Data Access
BMI accessed textual data for the 6 levels of classification using a secured connection and downloaded it onto a secure location within BMI premises
Step 2 : Data Loading
Merged and cleaned forum and agent transcript data to create 130,000 rows
Data received had a breadth of topics and had 6 levels of classification for each row. Level 5 has the highest number of distinct nodes at 4277 followed by Level 3 at 616 nodes
Step 3: Classification Design
Evaluated TAXIS at 3 levels
Used comprehensive Natural Language Processing techniques to ensure most categories are captured –
- Verb replacement
Used Stemming approach to reduce inflected verb forms
Enhanced the results further by combining the earlier techniques, For e.g. Basic and NLP refined
Converted multi-class records into single class by applying basic as well as weightage technique
Step 4: Development Accuracy
Development set was fed into the classification engine (internal benchmark set at 80%)
Step 5: Validation Accuracy
Validation set was fed into the classification engine (internal benchmark set at 70%)
Step 6: Reporting
Reports were designed, as per the parameters specified in the reporting framework
Step 7: QA Design and Process
Tested using unseen or unlabeled data to ensure whether the auto-coded results can be generalized and be used to accurately classify data points
The text mining framework was able to classify the texts automatically. There were five levels of classification ensuring that the result was accurate. This enabled the company to take quick informed decisions based on customer feedback.
- Achieved accuracy close to 60% in the 1st iteration
- Team built a comprehensive dictionary based on a thorough understanding of the entire dataset
- The technology major has been able to identify the major points of concern for its customers
- Time and cost for identification of customer issues have been reduced, as classified information (from unofficial forums and mediums is now available) is now readily available
- Identified overlapping categories and organized under one category to represent the data
- Identified duplicate categories and removed the additional categories
- Overall, helped the client revisit their categorization process and established appropriate categories
- Provided a roadmap to enable classification of issues on general blogs and forums
- In future, this process can be executed in a similar manner to potentially cover data from other sources like email, speech and other forms of social media