Listen to this story
In recent years, the debates and developments surrounding data, privacy, and big-tech, have brought up “the right to be forgotten” which essentially means that users should have the right to decide whether and when their personal data can be made inaccessible, or deleted. In the context of artificial intelligence, this means that machine learning models should be designed to forget and discard irrelevant information.
Companies like Amazon and Flipkart spend millions of dollars to build recommendation engines and the best algorithm to suggest products by tracking customer choices. A product distribution website recommending you products by accessing the information you provide to them might feel like a moderate bargain.
The idea of machine learning is essentially feeding chunks of information into a machine so that it can remember it, and produce results as you want it to. Now this information, which is called data, is one of the biggest factors that decides the efficiency and accuracy of the model. But can machines unlearn this data that they are trained on, if a controversy arises?
Though this “duty to forget” is not currently being considered to be up there with human rights, the idea still strikes as meaningful. Researchers and scientists should delve into developing methods so that machines, apart from learning, can also “unlearn” things, i.e. remove the input data and thus settle the debate about AI risking privacy.
Machine unlearning can improve data privacy by allowing machine learning models to forget sensitive information that may have been included in the training data. In some cases, personal or sensitive information may be inadvertently included in the data used to train a machine learning model. This is particularly important in fields like healthcare, where personal information must be kept private.
Researchers have been trying to find an effective and efficient way to tackle this challenge and build a “machine unlearning” algorithm. In 2020, researchers from University of Toronto and University of Wisconsin tried the SISA method to remove data primarily concerned about privacy for users. But since then, very little improvement has been made in the field.
Why is it necessary?
We’ve all heard about the Facebook controversy that erupted when the company leaked 87 million users’ personal information across the web for political advertising, resulting in a stream of lawsuits and users exiting the platform. In 2020, Facebook revealed the “clear history” button on its website, which was supposed to delete the user’s data from the website, but all it did was remove the user’s ability to check if the data is still there. It is not easy for users to delete data, for which, the access was granted to the companies or models.
However, removing the data that a machine is trained on is not an easy task. The concept of “machine unlearning” is to remove or reduce the training data without affecting the performance of the model. Apart from the privacy part of it, machine learning models are prone to biases that often occur because of underfitting or overfitting of data. This can result in a system that does not achieve perfect results when testing. Thus, the developers have to start from scratch, select or build another dataset, and build the model again, which proves to be a cumbersome process.
Machine unlearning is not a new topic, but is a territory that definitely needs more exploration. An important use case of machine unlearning is to remove the unwanted data from a dataset to improve the accuracy even further. For example, Amazon’s sexist recruitment system that was biassed against women when scanning their profiles, was fed on the dataset of the engineering field which is largely male-dominated. Thus, cleaning the data that it is trained on is essential to remove the bias. This is where algorithms for machine unlearning can greatly improve the models without building the model from zero.
Though the discussion about if the data that the machine learning algorithms are trained on in these websites is stored on their servers is still unsolved, the algorithms that the models are built on definitely can still access it.
The recent draft of the Digital Personal Data Protection Bill of India talks about the privacy of an individual’s data and how organisations need to delete data no longer needed. This can include the right to access, correct, or delete their personal information. It pushes the researchers in this field to make more innovations and figure out what needs to be done to delete datasets of ML models and improve the privacy of users and make models perform better.
Machine unlearning has been proven to not be an easy challenge but the approaches that have already been tested still require a lot of improvement. With the increasing regulations, policies, and parameters for machine learning models, the need for AI to unlearn things is the need of the hour.