Cryptogenomics combines cryptography with genomics to crunch DNA data. According to Stanford professor Gill Bejarano, the field could address the privacy concerns of researchers investigating genetic diseases.
Researchers compare the DNA data of healthy individuals with the ones suffering from genetic diseases. In doing so, researchers can figure out the disparities in their genetic makeup. This allows them to evaluate a patient’s risk of a disease, predict how a patient will respond to treatment, etc. Over the last three decades, biobanking has grown as a discipline to aid genetic research. It has evolved from basic biological sample storage in university-based repositories, to larger infrastructure networks.
Why is genetic privacy important?
Every time DNA data sourced from biobanks is used for genetic research, the information is shared between various stakeholders, compromising the privacy of subjects who volunteered their DNA data for the research.
Though researchers generally try to protect the privacy of the subjects, the accountability is missing. While researchers might stay true to their noble purpose, the threat of nefarious third parties breaking into systems because of a few bad apples is a risk worth looking into.
Criminal databases in the United States retain the genetic samples of suspects even before conviction, leading to unfair profiling of innocent citizens.
Bejerano’s research
Gill Bejerano (a professor of developmental biology, computer science, pediatrics, and biomedical data science at Stanford University) has–using the principles of cryptogenomics–published a method to analyse the DNA of large numbers of patients without storing or holding the DNA samples in a database.
The idea to marry cryptography with genomics came to Bejerano while he was attending a talk of his colleague, Dan Boneh, from the Computer Science department at Stanford. During this seminar, he learned it is possible to perform very complicated computations in cryptography without ever showing the inputs. He realised it has applications in genomics.
With cryptogenomics, you can have as many people as possible (depending on the efficiency of the computation) put their inputs (genomes) into research without ever having this information shared. In other words, researchers can ask the exact same questions and solve the exact same problems without ever having to share the genetic information of the subjects amongst themselves.
Usually, in genomics, you have to share three billion data points to get insights around two-three areas in the genome data. Now, with crytogenomics, you don’t have to share all three billion data points (which contain irrelevant information unrelated to the problem being solved). Instead, you can perform the same computation by finding and sharing only the two relevant data points.
One of the basic tenets of cryptography (that cryptogenomics takes advantage of) is the concept of oblivious transfer (OT). This is a two-party procedure between a sender and a receiver in which the sender transfers some information to the receiver, but the sender remains oblivious of what information the receiver gets.
Applications
So far Bejerano and his team have written three papers on cryptogenomics. The first two are in the medical domain and show how to work with genomes without sharing the information. The third paper is in law enforcement, and demonstrates how you can profile people and canvas an entire neighborhood without having to collect everyone’s data.
The code used in these papers is unpatented and freely available. The method is applicable now, and we just have to wait and see how long it takes for it to have real world applications.