How Cryptogenomics realises data anonymization in genetic research

Stanford professor Gill Bejerano developed a method to analyse the DNA of large numbers of patients without storing or holding the DNA samples in a database.
How Cryptogenomics realises data anonymization in genetic research

Cryptogenomics combines cryptography with genomics to crunch DNA data. According to Stanford professor Gill Bejarano, the field could address the privacy concerns of researchers investigating genetic diseases. 

Researchers compare the DNA data of healthy individuals with the ones suffering from genetic diseases. In doing so, researchers can figure out the disparities in their genetic makeup. This allows them to evaluate a patient’s risk of a disease, predict how a patient will respond to treatment, etc. Over the last three decades, biobanking has grown as a discipline to aid genetic research. It has evolved from basic biological sample storage in university-based repositories, to larger infrastructure networks.

Why is genetic privacy important? 

Every time DNA data sourced from biobanks is used for genetic research, the information is shared between various stakeholders, compromising the privacy of subjects who volunteered their DNA data for the research.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Though researchers generally try to protect the privacy of the subjects, the accountability is missing. While researchers might stay true to their noble purpose, the threat of nefarious third parties breaking into systems because of a few bad apples is a risk worth looking into.

Criminal databases in the United States retain the genetic samples of suspects even before conviction, leading to unfair profiling of innocent citizens.

Download our Mobile App

Bejerano’s research

Gill Bejerano (a professor of developmental biology, computer science, pediatrics, and biomedical data science at Stanford University) has–using the principles of cryptogenomics–published a method to analyse the DNA of large numbers of patients without storing or holding the DNA samples in a database. 

The idea to marry cryptography with genomics came to Bejerano while he was attending a talk of his colleague, Dan Boneh, from the Computer Science department at Stanford. During this seminar, he learned it is possible to perform very complicated computations in cryptography without ever showing the inputs. He realised it has applications in genomics. 

With cryptogenomics, you can have as many people as possible (depending on the efficiency of the computation) put their inputs (genomes) into research without ever having this information shared. In other words, researchers can ask the exact same questions and solve the exact same problems without ever having to share the genetic information of the subjects amongst themselves. 

Usually, in genomics, you have to share three billion data points to get insights around two-three areas in the genome data. Now, with crytogenomics, you don’t have to share all three billion data points (which contain irrelevant information unrelated to the problem being solved). Instead, you can perform the same computation by finding and sharing only the two relevant data points.

One of the basic tenets of cryptography (that cryptogenomics takes advantage of) is the concept of oblivious transfer (OT). This is a two-party procedure between a sender and a receiver in which the sender transfers some information to the receiver, but the sender remains oblivious of what information the receiver gets. 


So far Bejerano and his team have written three papers on cryptogenomics. The first two are in the medical domain and show how to work with genomes without sharing the information. The third paper is in law enforcement, and demonstrates how you can profile people and canvas an entire neighborhood without having to collect everyone’s data. 
The code used in these papers is unpatented and freely available. The method is applicable now, and we just have to wait and see how long it takes for it to have real world applications.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Srishti Mukherjee
Drowned in reading sci-fi, fantasy, and classics in equal measure; Srishti carries her bond with literature head-on into the world of science and tech, learning and writing about the fascinating possibilities in the fields of artificial intelligence and machine learning. Making hyperrealistic paintings of her dog Pickle and going through succession memes are her ideas of fun.

Our Upcoming Events

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023

21 Jul, 2023 | New York
MachineCon USA 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

The Great Indian IT Reshuffling

While both the top guns of TCS and Tech Mahindra are reflecting rather positive signs to the media, the reason behind the resignations is far more grave.

OpenAI, a Data Scavenging Company for Microsoft

While it might be true that the investment was for furthering AI research, this partnership is also providing Microsoft with one of the greatest assets of this digital age, data​​, and—perhaps to make it worse—that data might be yours.