MITB Banner

In The Wake Of BigBasket Data Breach, Strong Data Classification Policy Is The Need Of Hour

Share

BigBasket Data Breach

Top online grocer, BigBasket, recently suffered a massive data breach which left data of 20 million users exposed. As per the cybersecurity firm Cyble, the breach occurred on October 14 (made public on November 7). The firm has also claimed that private information of users such as full names, email addresses, date of birth, IP addresses of user devices have been compromised and put up on sale on the dark web for $40,000.

BigBasket joins the ranks of organisations/websites such as PayTM, Dr.Reddy’s Laboratories, IRCTC, Bharat Matrimony, and PM Narendra Modi’s narendramodi.in, which have suffered major data breaches in the last few months.

A report by IBM revealed that out of 17 countries surveyed, the cost of data breach resulted in a loss of $3.92 million on an average. The loss incurred in case of a data breach extends way beyond financial losses. It results in possible data theft, causes loss of millions, in some cases billions of private records and sensitive data, affecting not only the organisation but also individuals whose information may have been stolen. Hence, it goes without saying that organisations must have strong data protection techniques in place. One of the major techniques is data classification.

Data classification is a process of organising data by relevant categories for efficient usage and protection of data. It helps data security, compliance, and risk management. Experts advise that companies must invest in strong data classification policy to protect their data from breaches.

Data Classification 

A data classification policy helps an enterprise understand and prioritise data by classifying it based on its importance and sensitivity. If there are no defined policies to systematically and continuously categorise company data, many a time there occurs a failure to understand where the sensitive data is sitting, making it highly susceptible to breaches.

This policy maps out a variety of components in an organisation and subsequently classifies it according to storage and permission rights. A data classification policy also takes into consideration any specific data classification levels or categories adopted by the industry regulation or strategy.

There are broadly three types of industry-standard approaches for classification: 

  • Content-based classification: Here, the data is inspected and interpreted to look for sensitive information using methods such as document fingerprinting and regular expression.
  • Context-based classification: This type assesses the application, location, and creator, among other variables as indirect indicators of sensitive information.
  • User-based classification: This approach relies on user knowledge for creation, editing, reviewing, and dissemination for flagging sensitive information. Unlike content- and context-based approaches which can be automated, user-based is largely a manual classification process.

 Using the above three approaches, the data is then classified into one of the following types:

  • Public: This kind of data can be openly shared on the company’s website and be discussed in public with anyone. It does not require any additional control when used.
  • Internal: Internal information needs to be protected with limited controls. This data may include employee handbook, policies, and company-wide memos. Even if disclosed, internal information has minimal impact on business.
  • Confidential: This type of information must be contained within the business. It may include pricing, marketing materials, and contact information. Any disclosure on this front may tarnish the company image.
  • Restricted: This includes highly sensitive information, and its use needs to be limited on a need to know basis. Sometimes, it is even protected under a non-disclosure agreement (NDA). It includes data such as potentially identifiable information (PII), cardholder data, and health information.

Automated Data Classification

With the growing amount of data generated and stored, it becomes highly difficult to perform manual data classification. There are a few additional challenges associated with the manual method such as inaccuracy, inconsistency, inflexibility, and failure.

Keeping in view these shortcomings, many companies are favouring automated data classification processes or hybrid processes, which is a combination of both manual and automated data classification.

Automated data classification methods use machine learning to train the software on a number of file types and taxonomies. These tools modify the metadata of files, PDFs to label them according to the level of classification. It also allows for faster and accurate data extraction to achieve clear and actionable data. A few software gets smarter as it processes more files by understanding the organisation’s classification and routing rules. It also helps in cost reduction. Some of the popular automated data classification products are — SoftWorks AI’s Trapeze Classification Module, Boldon James, PKWARE, and SealPath.

Wrapping Up

By adopting a data classification policy, companies can derive multiple benefits. It helps in estimating the location, access level, and integrity and security level required for a particular data. It is the most effective system for protecting data as it helps in categorising data into public, confidential, and critical information and allows the organisation to determine what security measures need to be invested for each of the categories.

Share
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.