Recently, Facebook AI open-sourced a new high-speed library for training PyTorch models with differential privacy (DP) known as Opacus. The library is claimed to be more scalable than existing state-of-the-art methods.
According to the developers at the social media giant, differential privacy is a mathematically rigorous framework for quantifying the anonymisation of sensitive data. With the growing interest in the machine learning (ML) community, this framework is often used in analytics and computations.
Differential privacy constitutes a strong standard for privacy guarantees for algorithms on aggregate databases. It is usually defined in terms of the application-specific concept of adjacent databases. The framework has several properties that make it particularly useful in applications, such as group privacy, robustness to auxiliary information, among others.
Sign up for your weekly dose of what's up in emerging technology.
Opacus is a library which facilitates the training of PyTorch models with differential privacy. The library supports training with minimal code modifications required and has little impact on training performance.
The library allows the client to track the privacy budget online at any given moment. The Opacus library also comprises pre-trained as well as fine-tuned models along with tutorials for large-scale models including the infrastructure that is designed for experiments in privacy researches.
Download our Mobile App
The developers stated that the core idea behind this algorithm is to protect the privacy of a training dataset. This can be achieved by intervening on the parameter gradients that the model will use to update its weights.
According to a blog post, the Opacus library is aimed mainly at two target audiences.
- Machine learning practitioners, who will find this library as an introduction to training a model with differential privacy.
- Differential privacy scientists, who will find this easy to experiment as well as tinker with.
As mentioned by the developers, Opacus defines a lightweight API by introducing the PrivacyEngine abstraction. This takes care of tracking the privacy budget as well as working on the gradients of the model.
The Privacy Engine is attached to a standard PyTorch optimiser which makes training with Opacus easier than the traditional methods. After the training, the resulting artefact is a standard PyTorch model with no extra steps for deploying private models.
Features of this Library
Opacus provides a number of intuitive features, such as:
- Speed: Opacus has the capability to compute batched per-sample gradients by leveraging the Autograd hooks in PyTorch. This results in an order of magnitude speedup when compared with the existing differential privacy libraries that rely only on micro-batching.
- Safety: Opacus provides safety as this library uses a cryptographically safe pseudo-random number generator for its security-critical code.
- Flexibility: Using Opacus, developers can quickly prototype their ideas by mixing and matching the code with PyTorch code and pure Python code.
- Productivity: Opacus comes with various tutorials as well as helper functions that have the ability to warn about the incompatible layers before the training of a model starts.
- Interactivity: Opacus keeps track of how much of the privacy budget developers are spending at any given point in time by enabling early stopping and real-time monitoring. The private budget is referred to as a core mathematical concept in differential privacy.
According to the developers, the goal behind the development of Opacus is to preserve the privacy of each training sample while limiting the impact on the accuracy of the final model. The library accomplishes this by modifying a standard PyTorch optimiser in order to enforce (and measure) DP during training.
Opacus is open-source for public use, and it is licensed under Apache-2.0. The latest release of Opacus can be installed via pip: pip install opacus
Know more here.