Researchers at IBM have created a unique method to reduce the amount of personal data required to train machine learning models and preserve high levels of accuracy.
The General Data Protection Regulation and the California Privacy Rights Act (CPRA) mandates companies to comply with data protection and data privacy laws. However, companies find it difficult to determine the minimal amount of personal data needed to train their ML models. The aim of machine learning is to achieve the highest level of accuracy in predictions or classifications, regardless of the amount of data used.
‘Data minimisation’ is a major component of the legislation in both the GDPR and CPRA. Researchers claimed the new model will reduce the amount of data needed to train machine learning models.
In the study, the prediction accuracy never dropped below 33% even after the entire dataset was generalised, and had no traces of original data. In some cases, the researchers were able to achieve 100% accuracy with generalised data.
Researchers claimed the new model will reduce the dependency on large datasets and also bring down costs in data storage and management fees.