HasGeek is organizing The Fifth Elephant, a community powered conference designed to encourage thinking about big data ecosystem. As excitement builds for the event, Analytics India Magazine will interview conference speakers about the talks that they are going to deliver.
First in the series is Ramesh Hariharan from Strand Life Sciences, who is speaking on “Big Data Challenges in Personalized Medicine”.
AIMAnalytics India Magazine: You are speaking at The Fifth Elephant about genomic measurement and Big Data Challenges in Personalized Medicine. Could you tell us what you mean by that?
RHRamesh Hariharan: We are at the threshold of a major revolution in health care: thanks to two decades of explosive research in tools and techniques that interrogate living cells at the molecular level, doctors will soon have an invaluable tool added to their arsenal to help diagnose and cure disease, i.e., the genome of the patient. Several success stories have already emerged, for instance, a little boy who needed several futile operations before sequencing his genome indicated a defect in the immune system, which was then solved with a blood transplant.
The genome and its associated paraphernalia is quite large and that naturally calls for Big Data techniques to manage and deliver genomic information to clinicians, consumers, and researchers. To just give you a feel, sequencing machines generate upwards of 150GB of compressed data for a single individual and analysing this data is equivalent to sifting through 30 finely shredded copies of a 200,000 page telephone directory!
AIM: Could you provide some insights into the award winning Avadis® platform by Strand Life Sciences?
RH: Avadis was conceptualized as a data analytical platform for data resulting from Life Sciences experiments. It specializes at making large amounts of data visually accessible to scientists and at bringing together analytical methods with knowledge mined from literature for effective discovery. Over the last several years, Avadis has fuelled several thousand discoveries published in scientific literature and is the leading platform worldwide on this score.
AIM: How was Strand Life Sciences incepted, how has it evolved over years and what is the next step?
RH: Strand was conceived in 2000 by 4 founders, all faculty at the Indian Institute of Science. Strand’s vision was to bring together the best of Computer Science and the best of Biology to further research in understanding how life works, and in using this research for better health. In the last 10 years, Strand has achieved a leading position worldwide providing analytical tools for biological research. The next 5 years will see Strand broaden its activities towards clinical applications, i.e., applying genomics knowledge to avoid or better treat diseases.
AIM: Though you would be speaking about it in your talk, can you briefly provide an overview of techniques in handling large volumes of genomic data?
AIM: A brief overview of how genomic measurement has evolved over time?
RH: The microscope was invented in the 1500’s. It allowed us to see microbes and cells and some cellular features like the nucleus but not much more. In the 1800s, chemists figured out the nucelus contained a substance rich in phosphorus and nitrogen, and simultaneously, great observations made by Mendel and Darwin showed that this material was the basis for heredity and for individual variations. It wasn’t until the 1950’s that the gross structure of this DNA was determined, and even much later in the 1970s that the program encoded by this DNA could be read, albeit on a small scale. Continuous improvements lead to the whole human genome being read in 2002. This was DNA pooled from 5 individuals and sequencing took hundreds of millions of dollars; this clearly didn’t scale to sequencing individuals. Further technology improvements over the last few years have brought the cost down to a few thousand dollars now (and everyone expects the cost to be below $1000 soon), making it now possible to sequence large numbers of individuals and study individual variations.
AIM: Would you like to share any example of a data driven insight that converted to huge success story in this area?
RH: A few recent examples
There is disease called CHIME disease characterized by holes in the eye, scaling of skin, mental retardation, and ear anomalies or epilepsy. Researchers used our tools to sequence individuals with this disease and identify the causative mutation in the PIGL gene. In each case, both parents had one copy of the mutation (so they were carriers), which the affected patients had two copies, one from each parent. Now that the causative mutation is known, it is possible to screen couples who are carriers so this disease can effectively be abolished.
Another example is a young boy with a digestive system disorder who couldn’t tolerate whatever he ate. A 100 operations were done on him in vain. Finally, they sequenced his genome, found an unusual mutation in the XIAP gene which pointed to an immune system problem; this suggested a cord blood transplant, which worked and the boy is now steady and growing up healthy.
AIM: What challenges have handling genome data posed for your team?
RH: Multi-disciplinarity is a challenge: handling genome data requires really skilled Computer Scientists as well as good biologists. It requires making the right algorithmic choices, the right hardware platform choices, and the right methods to interpret results. This being an emerging and highly dynamic area, one has to evolve with the field and learn at a fast rate. All of these are challenges that make this area very lively.