[The National Institute of Standards and Technology (NIST) was founded in 1901 and is now part of the US Department of Commerce. NIST is one of the nation’s oldest physical science laboratories.]
Differential privacy as a concept has been around since the early 2000s. As a technology, it has also been named in the 2020 Gartner Hype cycle. This technology is found on the principles of protecting private data. Data, in an era where it is its own industry, whether it be its population census or customer feedback on app stores — no company should be able to trace the source easily at least not without consent. This is where differential privacy comes into the picture. Companies like Google and Apple are actively researching this domain. Now, the efforts have gained national prominence in the US.
To know more about this, we got in touch with Christine Task of Knexus Research and Gary S. Howarth of National Institute Of Standards and Technology (NIST). Christine is the technical lead of the NIST differential privacy challenge, and Gary is the prize challenge specialist at NIST. They are organising the largest challenge yet on differential privacy, and we wanted to know why a premier government institution is interested in differential privacy and how significant is machine learning with regards to it. In this interview, Gary and Christine talk about the challenge, the current state of differential privacy and the future of machine learning.
How NIST Is Leading Differential Privacy Efforts
Talking about the significance of differential privacy in governance, Gary said that it is one of the tools that is protecting sensitive data. Public safety agencies are data generators. He continued, public safety agencies collect information on emergency response calls for service, disaster response and assistance, crime and health statistics, and many other records. These data are invaluable to policy planners, researchers, and the public at large. However, these records often contain personal information about individuals. Differential privacy provides a way of de-identifying those records such that we can guarantee that personal information remains private, protecting individuals from re-linkage attacks, while still allowing the data to be utilised by the widest possible audience.
“NIST’s Big Data Interoperability Framework was published in 2019, and big data is an important consideration for privacy engineering.”
Though the benefits of differential privacy are well documented, Christine admitted that the adoption would take time. “It is still a relatively young but rapidly developing field, and it is beginning to see a significant transition from theory to practice. Major tech companies, including Google, Facebook and Apple, as well as government agencies, including the US Census Bureau and the National Science Foundation, are exploring approaches to integrate differential privacy protections into their data workflows,” she explained.
As data collection becomes more comprehensive, protecting data with simple anonymisation techniques becomes intractable. Research has shown that anonymised individuals can be uniquely identified with only a small handful of time-stamped GPS locations or credit card transactions. Talking about the challenges involved in implementing differential privacy, Christine noted that though differential privacy guarantees stronger privacy that prevents the risk of re-identification, it isn’t a simple solution.
“If your data modelling approach is too sensitive to individual data points, then it is inherently in conflict with privacy, and it will provide very poor utility if it is made differentially private. Generating high quality privatised data requires finding efficient, stable ways to model the underlying patterns in human data without being overly sensitive to small variations in the input data. More robust machine learning techniques will produce better privacy tools,” said Christine hinting at what lies ahead for differential privacy.
About The Challenge
The Privacy Temporal Map Challenge, said Gary, is a series of contests, with prize awards up to $276,000 for excellence in formal data privacy. This challenge focuses on de-identifying public safety data sets that contain geographic and time data. The contests include a metrics development contest (Better Meter Stick Contest) in the form of a white paper, a series of algorithm sprints that will explore new methods in differential privacy, and a contest designed to improve the usability of the participants’ source code, open to participants of the algorithm sprints.
“If we only consider privacy, then protecting data is simple — never share the data publicly. The research and engineering feat is to protect privacy while still maintaining the utility of the dataset for public analysis. This challenge invites competitors to produce high-quality temporal map data while satisfying the rigorous differential privacy guarantee,” explained Christine.
Through this challenge, NIST is trying to understand the data users’ needs better and enable the public to develop tools that can assist public safety agencies in releasing de-identified data sets.
“Specifically, through the Better Meter Stick contest, we anticipate data users offering new methods of evaluating the quality and utility of de-identified data sets. In our algorithm contest, we anticipate innovations in the computational methods used to apply differential privacy. And through the open-source and development contest, we are incentivising our solvers to make their software available and accessible to the public at large. It is our hope that these efforts will result in usable software that public safety agencies can use to make more data available,” added Gary.
ML For Differential Privacy And Future Direction
Recently we have learnt that for the first time in its 200-year-old history, the US Census Bureau has announced that this year’s survey will implement new standards to safeguard citizen data. The US census bureau even called differential privacy as the new gold standard for data protection.
“I can’t speak for the whole US government, but differential privacy shows great promise as one tool for protecting privacy in massive datasets,” said Gary. That’s why the project undertaken by NIST is aimed at exploring differential privacy as a way to balance protecting private information in public safety datasets with retaining usable information. “Our project is focused on the fact that the large amounts of data collected by the technology we interact with every day, and the increased availability of that data, significantly increases the vulnerability of anonymised data sets to reidentification attacks. This challenge is part of NIST’s research into differential privacy as one approach of generating de-identified data without these vulnerabilities,” he added.
“Nearly any machine learning technique can be modified to provide differential privacy.”
The advance of differential privacy also coincides with the rising popularity of machine learning as a field. “Nearly any machine learning technique can be modified to provide differential privacy,” explained Christine. “However, some techniques will provide better or worse utility for a given data set and problem definition. These challenges provide a unique opportunity for researchers to compete for different techniques directly against each other on real-world problems, and for the research community as a whole to learn what works well.”
When asked about the misconceived notions of the ML outsiders, Christine opined that in machine learning, topics such as Probabilistic Graphical Models or Generative Adversarial Networks, refer to specific techniques. Differential privacy differs in that it defines a relationship between the sensitive input data and the privatised output, but it doesn’t provide any guidance on how that relationship should be achieved. So nearly any technique, including Probabilistic Graphical Models and Generative Adversarial Networks can be modified to become differentially private.
On a concluding note, we asked Christine what does the future entail for differential privacy and machine learning in general. To which she said that machine learning is a comparatively new field and she expects machine learning to continue to evolve significantly over the coming decades. “I hope the insights that are uncovered while researching effective privacy-preserving data analytics will play a role in that evolution. I find many problem-solving approaches in AI are complementary to each other. Differential privacy research often taps into and even creatively combines, techniques from a diverse variety of subdomains. I would be surprised to see any single AI approach dominate,” concluded Christine.