Top online grocer, BigBasket, recently suffered a massive data breach which left data of 20 million users exposed. As per the cybersecurity firm Cyble, the breach occurred on October 14 (made public on November 7). The firm has also claimed that private information of users such as full names, email addresses, date of birth, IP addresses of user devices have been compromised and put up on sale on the dark web for $40,000.
BigBasket joins the ranks of organisations/websites such as PayTM, Dr.Reddy’s Laboratories, IRCTC, Bharat Matrimony, and PM Narendra Modi’s narendramodi.in, which have suffered major data breaches in the last few months.
A report by IBM revealed that out of 17 countries surveyed, the cost of data breach resulted in a loss of $3.92 million on an average. The loss incurred in case of a data breach extends way beyond financial losses. It results in possible data theft, causes loss of millions, in some cases billions of private records and sensitive data, affecting not only the organisation but also individuals whose information may have been stolen. Hence, it goes without saying that organisations must have strong data protection techniques in place. One of the major techniques is data classification.
Sign up for your weekly dose of what's up in emerging technology.
Data classification is a process of organising data by relevant categories for efficient usage and protection of data. It helps data security, compliance, and risk management. Experts advise that companies must invest in strong data classification policy to protect their data from breaches.
A data classification policy helps an enterprise understand and prioritise data by classifying it based on its importance and sensitivity. If there are no defined policies to systematically and continuously categorise company data, many a time there occurs a failure to understand where the sensitive data is sitting, making it highly susceptible to breaches.
This policy maps out a variety of components in an organisation and subsequently classifies it according to storage and permission rights. A data classification policy also takes into consideration any specific data classification levels or categories adopted by the industry regulation or strategy.
There are broadly three types of industry-standard approaches for classification:
- Content-based classification: Here, the data is inspected and interpreted to look for sensitive information using methods such as document fingerprinting and regular expression.
- Context-based classification: This type assesses the application, location, and creator, among other variables as indirect indicators of sensitive information.
- User-based classification: This approach relies on user knowledge for creation, editing, reviewing, and dissemination for flagging sensitive information. Unlike content- and context-based approaches which can be automated, user-based is largely a manual classification process.
Using the above three approaches, the data is then classified into one of the following types:
- Public: This kind of data can be openly shared on the company’s website and be discussed in public with anyone. It does not require any additional control when used.
- Internal: Internal information needs to be protected with limited controls. This data may include employee handbook, policies, and company-wide memos. Even if disclosed, internal information has minimal impact on business.
- Confidential: This type of information must be contained within the business. It may include pricing, marketing materials, and contact information. Any disclosure on this front may tarnish the company image.
- Restricted: This includes highly sensitive information, and its use needs to be limited on a need to know basis. Sometimes, it is even protected under a non-disclosure agreement (NDA). It includes data such as potentially identifiable information (PII), cardholder data, and health information.
Automated Data Classification
With the growing amount of data generated and stored, it becomes highly difficult to perform manual data classification. There are a few additional challenges associated with the manual method such as inaccuracy, inconsistency, inflexibility, and failure.
Keeping in view these shortcomings, many companies are favouring automated data classification processes or hybrid processes, which is a combination of both manual and automated data classification.
Automated data classification methods use machine learning to train the software on a number of file types and taxonomies. These tools modify the metadata of files, PDFs to label them according to the level of classification. It also allows for faster and accurate data extraction to achieve clear and actionable data. A few software gets smarter as it processes more files by understanding the organisation’s classification and routing rules. It also helps in cost reduction. Some of the popular automated data classification products are — SoftWorks AI’s Trapeze Classification Module, Boldon James, PKWARE, and SealPath.
By adopting a data classification policy, companies can derive multiple benefits. It helps in estimating the location, access level, and integrity and security level required for a particular data. It is the most effective system for protecting data as it helps in categorising data into public, confidential, and critical information and allows the organisation to determine what security measures need to be invested for each of the categories.