A wealth of biological data has fueled the ability of scientists and healthcare researchers to utilise powerful computational methods to uncover insights. High-performance computer systems and domain-specific software frameworks are balancing out this digital biology revolution.
Two of the world’s most powerful supercomputers, based on the NVIDIA DGX SuperPOD standard architecture- NVIDIA’s Cambridge-1 and Recursion’s BioHive-1, have recently been named in the TOP500 ranking of most powerful systems. Last year, researchers at the Athinoula A Martinos Center for Biomedical Imaging used various NVIDIA AI systems, including NVIDIA DGX-1, to accelerate their research on calculating lung disease severity from X-ray images to predict outcomes in COVID patients.
While these advancements are taking place, Next-Generation Sequencing (NGS) activities are now being powered by the NVIDIA Clara Parabricks suite of genomics libraries and reference applications to push the barriers of genetic research.
What is Clara Parabricks?
NVIDIA Clara Parabricks is a computing framework for genomics applications ranging from DNA to RNA. It builds GPU-accelerated libraries, pipelines, and reference application workflows for primary, secondary, and tertiary analysis using NVIDIA’s CUDA, AI, HPC, and data analytics stacks. It is a complete solution for genomic labs to facilitate new application development.
(Source: NVIDIA)
Based on the Broad Institute’s Genome Analysis Toolkit (GATK), Clara Parabricks Pipelines enable GPU-accelerated GATK and other third-party tools like Google’s DeepVariant caller. Clara Parabricks maps, aligns, filters and calls variants for either germline or somatic variant detection, starting with DNA sequencing readings. STAR and STAR-Fusion align sequencing reads for RNA-based projects, allowing the readings to be divided into exon or intron boundaries, followed by variant calling.
Accelerating research with Clara Parabricks
The Parabricks Pipelines are a robust set of genetic tools that can be customised to match the demands of scientific research and laboratories. Researchers use NVIDIA GPU systems– ranging from desktop workstations to GPU-accelerated clouds and some of the world’s fastest supercomputers– to perform Parabricks Pipelines workloads.
Shanghai-based Mingma Biotechnology became the first research facility in China to launch Clara Parabricks Pipelines this month to support its precision medicine work. It comes on the heels of large-scale genomics programmes launched earlier this year in Thailand and Japan.
Even Houston-based Greffex is leveraging Parabricks Pipelines and NVIDIA Clara Discovery to enhance its efforts to build a universal flu vaccine just weeks after getting started with an NVIDIA RTX data science workstation. The startup combines genomic sequences, molecular dynamic techniques, and wet laboratory research to investigate how influenza strains develop through time and the impact of these changes on the vaccine’s effectiveness.
To monitor flu changes, Greffex collects tens of thousands of flu genomes worldwide and performs massive lineups on NVIDIA RTX 8000 GPUs to recognise the changes to the virus’s genetic code. The company saves up to 13 hours per sample while operating genomic workloads on GPUs, enabling its team to fine-tune the alignment results.
Genomic insights for population studies
On NVIDIA GPUs, researchers using Parabricks Pipelines can accelerate DNA and RNA-based projects up to 50 times, allowing scientists to extract as much usable information as possible from the hundreds of gigabytes of instrument data generated every day. This acceleration is especially significant for public health institutions and research labs doing population studies involving tens of thousands of genomes to be processed.
Mingma Biotechnology has implemented Parabricks Pipelines and NVIDIA T4 Tensor Core GPUs to speed up its sequencing and multi-omic analysis of data. The company helps medical institutions, pharmaceutical corporations, and researchers conduct medical research by providing genetic insights to identify and explore the root causes of diseases.
Powering genetic analysis
Apart from this, Genomics Thailand is being powered by an NVIDIA DGX A100 system at Thailand’s National Biobank to deliver genomic medicine as a standard healthcare service. The research organisation is analysing genetic variants using whole-genome sequencing data from 50,000 Thai participants utilising Parabricks Pipelines.
NVIDIA DGX A100 is the world’s first five petaFLOPS AI system–providing unprecedented compute density, performance, and flexibility. The NVIDIA DGX A100 includes the world’s most advanced accelerator– the NVIDIA A100 Tensor Core GPU– allowing businesses to combine training, inference, and analytics into a single, easy-to-deploy AI infrastructure with direct access to NVIDIA AI experts.
The combination of the DGX system with Parabricks Pipelines decreased the project’s entire genome data-processing time by four months. The findings of the study will help researchers better understand genetic variance in the Thai population.
In Japan, the Human Genome Center at the University of Tokyo recently introduced SHIROKANE, the country’s fastest supercomputer for life sciences. The DGX A100-powered system uses Parabricks Pipelines to sequence the entire genomes of 92,000 patients, resulting in a database that will serve as the cornerstone for precision medicine efforts in cancer and other complicated diseases.
While NVIDIA’s Parabricks pipeline acts as a powerful tool in the hands of genetic researchers, it will be interesting to see how it paves the way in genetics and genomics with future developments and updates.