In the wake of the world’s attempt to prepare for the Corona outbreak, Baidu’s AI team has released a tool — LinearFold to reduce 2019-nCoV prediction time from 55 minutes to 27 seconds.
The new, or “novel” coronavirus, now called 2019-nCoV, hasn’t previously been detected before the outbreak, which was reported in Wuhan, China in December 2019. It has now claimed deaths of nearly 500 people, and the whole world is on high alert.
Compared to the SARS (severe acute respiratory syndrome) outbreak in 2003, which infected 8,098 and killed 774 in 17 countries, the incubation period of 2019-nCoV lasts longer, spanning up to two weeks, and is highly contagious.
With time, the experts now believe that 2019-nCoV will likely to continue mutating, making it unpredictable and harder to control.
As the medical experts are trying to figure out a clear defence strategy against this global pandemic, Baidu’s AI team has lent a helping hand in the form of LinearFold. Linearfold’s ability, announced by the researchers, has fetched it a placement in the top academic conference in bioinformatics, as well as in the Bioinformatics journal.
With Baidu’s LinearFold, claimed by the researchers, it takes 27 seconds to analyse the structural information of the virus. This efficiency is crucial for understanding the virus and developing its vaccine.
Overview Of LinearFold
The challenge with the existing algorithm for RNA secondary structure prediction is the runtime that scales cubically with the RNA length. This delay in computation has been a huge challenge in predicting structures and applicability on RNA viruses which has large genomes such as HIV, Ebola, and in particular, the coronavirus family that ranges from 26 to 32 kilobases — the largest for an RNA virus.
LinearFold is the first RNA folding algorithm to achieve linear runtime. Given an RNA sequence, x∈{A,C,G,U}, the secondary structure prediction problem aims to find the best-scoring pseudoknot-free structure.
In this framework, scores for different pairs can be assigned, and a penalty can be given for each unpaired nucleotide.
LinearFold is the combination of computational linguistics and incremental parsing algorithms that are used to scan the RNA sequence in a faster way.
Key Takeaways
According to the original paper, the authors list the following advantage of LinearFold:
- Though LinearFold uses only a fraction of time and memory compared to existing algorithms.
- The accuracy improvement of LinearFold is more pronounced on longer families of rRNAs.
- LinearFold is also more accurate than the baselines at predicting long-range base pairs, which are challenging for the current models
- Although the performance of LinearFold depends on the beam size, the accuracy of the prediction is stable.
Genes are often expressed in terms of RNA’s (Ribonucleic Acid). RNAs play a key role in many biochemical reactions, and knowing their structure will help in guessing what role they will be playing.
The 2019-nCoV belongs to a family of enveloped coronaviruses that are single-stranded RNA viruses, such as HIV, Ebola and influenza, which mutate faster and make vaccine development more difficult.
So obtaining the sequence of RNA is a key to grasp its function. These sequences can be long and similar for a great length, and the crucial chunk of sequence can appear somewhere in the whole structure. Predicting this structure and then, in turn, predicting the function of RNA will help in designing drugs to control the catalysis of the enzymes, which the viruses use for synthesis. Baidu’s Linearfold offers the much needed accurate yet quick prediction that can cut down the diagnosis time.
Know more about LinearFold here.