How Cryptogenomics realises data anonymization in genetic research

Stanford professor Gill Bejerano developed a method to analyse the DNA of large numbers of patients without storing or holding the DNA samples in a database.
How Cryptogenomics realises data anonymization in genetic research

Advertisement

Cryptogenomics combines cryptography with genomics to crunch DNA data. According to Stanford professor Gill Bejarano, the field could address the privacy concerns of researchers investigating genetic diseases. 

Researchers compare the DNA data of healthy individuals with the ones suffering from genetic diseases. In doing so, researchers can figure out the disparities in their genetic makeup. This allows them to evaluate a patient’s risk of a disease, predict how a patient will respond to treatment, etc. Over the last three decades, biobanking has grown as a discipline to aid genetic research. It has evolved from basic biological sample storage in university-based repositories, to larger infrastructure networks.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Why is genetic privacy important? 

Every time DNA data sourced from biobanks is used for genetic research, the information is shared between various stakeholders, compromising the privacy of subjects who volunteered their DNA data for the research.

Though researchers generally try to protect the privacy of the subjects, the accountability is missing. While researchers might stay true to their noble purpose, the threat of nefarious third parties breaking into systems because of a few bad apples is a risk worth looking into.

Criminal databases in the United States retain the genetic samples of suspects even before conviction, leading to unfair profiling of innocent citizens.

Bejerano’s research

Gill Bejerano (a professor of developmental biology, computer science, pediatrics, and biomedical data science at Stanford University) has–using the principles of cryptogenomics–published a method to analyse the DNA of large numbers of patients without storing or holding the DNA samples in a database. 

The idea to marry cryptography with genomics came to Bejerano while he was attending a talk of his colleague, Dan Boneh, from the Computer Science department at Stanford. During this seminar, he learned it is possible to perform very complicated computations in cryptography without ever showing the inputs. He realised it has applications in genomics. 

With cryptogenomics, you can have as many people as possible (depending on the efficiency of the computation) put their inputs (genomes) into research without ever having this information shared. In other words, researchers can ask the exact same questions and solve the exact same problems without ever having to share the genetic information of the subjects amongst themselves. 

Usually, in genomics, you have to share three billion data points to get insights around two-three areas in the genome data. Now, with crytogenomics, you don’t have to share all three billion data points (which contain irrelevant information unrelated to the problem being solved). Instead, you can perform the same computation by finding and sharing only the two relevant data points.

One of the basic tenets of cryptography (that cryptogenomics takes advantage of) is the concept of oblivious transfer (OT). This is a two-party procedure between a sender and a receiver in which the sender transfers some information to the receiver, but the sender remains oblivious of what information the receiver gets. 

Applications

So far Bejerano and his team have written three papers on cryptogenomics. The first two are in the medical domain and show how to work with genomes without sharing the information. The third paper is in law enforcement, and demonstrates how you can profile people and canvas an entire neighborhood without having to collect everyone’s data. 
The code used in these papers is unpatented and freely available. The method is applicable now, and we just have to wait and see how long it takes for it to have real world applications.

More Great AIM Stories

Srishti Mukherjee
Drowned in reading sci-fi, fantasy, and classics in equal measure; Srishti carries her bond with literature head-on into the world of science and tech, learning and writing about the fascinating possibilities in the fields of artificial intelligence and machine learning. Making hyperrealistic paintings of her dog Pickle and going through succession memes are her ideas of fun.

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MORE FROM AIM
Amit Raja Naik
Oh boy, is JP Morgan wrong?

The global brokerage firm has downgraded Tata Consultancy Services, HCL Technology, Wipro, and L&T Technology to ‘underweight’ from ‘neutral’ and slashed its target price by 15-21 per cent.