In a recent announcement, Google released an open-source version of the differential privacy library which it currently uses to power some of its core products like Google Map. It will let developers and organisations implement features that are otherwise difficult to execute from scratch, hence promising them ease of use and deployment.
As the company explains, differential privacy strategically adds random noise to user information stored in databases so that companies can still analyse it without being able to single people out. Open-sourcing can, therefore, help other developers achieve that same level of differential privacy defence. The idea is to make it possible for companies to mine and analyse their database information without invasive identity profiles or tracking. The tech giant is hopeful that it can drastically help mitigate data breach.
How It Works
Differentially private data analysis enables organisations to learn from their data while simultaneously ensuring that those results do not allow any individual’s data to be distinguished or re-identified. The goal of differential privacy for machine learning is to only “encode general patterns rather than facts about specific training examples.”
This allows user data to remain private, while the system overall still learns and can advance from general behaviour. It will not only offer equations and models needed to set boundaries and constraints on identifying data but also include an interface to make it easier for more developers to actually implement the protections.
Google has been working on this function along with other privacy settings such as Federated Learning and Google’s Responsible AI Practices. Google currently uses it to protect all different types of information, like location data, generated by its Google Fi mobile customers.
- It will facilitate users with statistical functions allowing most common data science operations to be supported such as compute counts, averages, medians, percentiles and more
- It includes an extensible ‘Stochastic Differential Privacy Model Checker library’ apart from an extensive test suite, to help prevent mistakes
- Query engines are a major analytical tool for data scientists, and one of the most common ways for analysts to write queries is with Structured Query Language (SQL). It has included PostgreSQL extension along with common recipes to help get started. It also makes it ready-to-use
- Google researchers have also included other functionalities such as additional mechanisms, aggregation functions, or privacy budget management
8 Ways It Can Help Developers
- It will allow developers to build their own tools with the help of this library. It will also allow them to aggregate data without revealing personally identifiable information either inside or outside their companies. It will allow developers to build tools that analyze personal data without compromising the privacy of the people whose data they are working with.
- It will bring strong privacy protections in place to make the most of the data and help maintain citizen trust.
- It will add to the existing privacy offerings by Google such as Tensorflow Privacy and Tensorflow Federated, which it had announced last year.
- TensorFlow Federated is an open-source framework which implements an approach called Federated Learning and allows experimentation with machine learning and other computations on decentralised data. TensorFlow Privacy, on the other hand, is also an open-source library that allows developers to train machine-learning models with privacy.
- Its flexibility will ensure that it is applicable to as many database features and products as possible.
- Differential privacy is usually complicated and is difficult to design one from scratch. Google hopes that its open-source tool will be easy enough to be a one-stop-shop for developers.
- This system is able to capture most data analysis tasks based on aggregations, performs well for typical use-cases, and provides a mechanism to deduce privacy parameters from accuracy requirements, allowing a principled decision.
- It can be used to produce aggregate statistics over numeric data sets containing private or sensitive information.
Provide your comments below
If you loved this story, do join our Telegram Community.
Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
Srishti currently works as Associate Editor at Analytics India Magazine. When not covering the analytics news, editing and writing articles, she could be found reading or capturing thoughts into pictures.