In The Wake Of BigBasket Data Breach, Strong Data Classification Policy Is The Need Of Hour

BigBasket Data Breach

Top online grocer, BigBasket, recently suffered a massive data breach which left data of 20 million users exposed. As per the cybersecurity firm Cyble, the breach occurred on October 14 (made public on November 7). The firm has also claimed that private information of users such as full names, email addresses, date of birth, IP addresses of user devices have been compromised and put up on sale on the dark web for $40,000.

BigBasket joins the ranks of organisations/websites such as PayTM, Dr.Reddy’s Laboratories, IRCTC, Bharat Matrimony, and PM Narendra Modi’s, which have suffered major data breaches in the last few months.

A report by IBM revealed that out of 17 countries surveyed, the cost of data breach resulted in a loss of $3.92 million on an average. The loss incurred in case of a data breach extends way beyond financial losses. It results in possible data theft, causes loss of millions, in some cases billions of private records and sensitive data, affecting not only the organisation but also individuals whose information may have been stolen. Hence, it goes without saying that organisations must have strong data protection techniques in place. One of the major techniques is data classification.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Data classification is a process of organising data by relevant categories for efficient usage and protection of data. It helps data security, compliance, and risk management. Experts advise that companies must invest in strong data classification policy to protect their data from breaches.

Data Classification 

A data classification policy helps an enterprise understand and prioritise data by classifying it based on its importance and sensitivity. If there are no defined policies to systematically and continuously categorise company data, many a time there occurs a failure to understand where the sensitive data is sitting, making it highly susceptible to breaches.

This policy maps out a variety of components in an organisation and subsequently classifies it according to storage and permission rights. A data classification policy also takes into consideration any specific data classification levels or categories adopted by the industry regulation or strategy.

There are broadly three types of industry-standard approaches for classification: 

  • Content-based classification: Here, the data is inspected and interpreted to look for sensitive information using methods such as document fingerprinting and regular expression.
  • Context-based classification: This type assesses the application, location, and creator, among other variables as indirect indicators of sensitive information.
  • User-based classification: This approach relies on user knowledge for creation, editing, reviewing, and dissemination for flagging sensitive information. Unlike content- and context-based approaches which can be automated, user-based is largely a manual classification process.

 Using the above three approaches, the data is then classified into one of the following types:

  • Public: This kind of data can be openly shared on the company’s website and be discussed in public with anyone. It does not require any additional control when used.
  • Internal: Internal information needs to be protected with limited controls. This data may include employee handbook, policies, and company-wide memos. Even if disclosed, internal information has minimal impact on business.
  • Confidential: This type of information must be contained within the business. It may include pricing, marketing materials, and contact information. Any disclosure on this front may tarnish the company image.
  • Restricted: This includes highly sensitive information, and its use needs to be limited on a need to know basis. Sometimes, it is even protected under a non-disclosure agreement (NDA). It includes data such as potentially identifiable information (PII), cardholder data, and health information.

Automated Data Classification

With the growing amount of data generated and stored, it becomes highly difficult to perform manual data classification. There are a few additional challenges associated with the manual method such as inaccuracy, inconsistency, inflexibility, and failure.

Keeping in view these shortcomings, many companies are favouring automated data classification processes or hybrid processes, which is a combination of both manual and automated data classification.

Automated data classification methods use machine learning to train the software on a number of file types and taxonomies. These tools modify the metadata of files, PDFs to label them according to the level of classification. It also allows for faster and accurate data extraction to achieve clear and actionable data. A few software gets smarter as it processes more files by understanding the organisation’s classification and routing rules. It also helps in cost reduction. Some of the popular automated data classification products are — SoftWorks AI’s Trapeze Classification Module, Boldon James, PKWARE, and SealPath.

Wrapping Up

By adopting a data classification policy, companies can derive multiple benefits. It helps in estimating the location, access level, and integrity and security level required for a particular data. It is the most effective system for protecting data as it helps in categorising data into public, confidential, and critical information and allows the organisation to determine what security measures need to be invested for each of the categories.

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox