As researchers scour numerous databases to combat the threat of coronavirus, timely access to the right data has become critical. With the urgency of this public health crisis intensifying, it has become imperative that access to reliable public data be made open. These, in turn, are likely to bring about crucial collaborations within the global research community to discover new insights to tackle the outbreak.
Open datasets — in their original, unabridged form — are essential to obtain a deeper understanding of the current crisis. This data, coupled with technological interventions like AI and natural language processing, has made it possible to improve forecasting models, make valuable predictions, and analyse the impact of the coronavirus.
What is more, given that the crisis is an ongoing problem that throws up new findings on a regular basis, maintaining reliable data assets that researchers can turn to has become paramount. Responding to the urgency of this crisis, several big organisations, including Google and Amazon, have offered researchers free access to their open datasets. Adding to a treasure trove of datasets gathered from a coalition of leading research groups, as well as leading institutes like John Hopkins, let us take a look at datasets that were recently released on Covid-19:
Google’s Covid-19 Public Dataset Program
With effect until September 15, Google has opened access to its repository of Covid-19 public datasets. Aimed at researchers, data scientists, and data analysts for research and educational purposes, the company has also encouraged them to use BigQuery ML to train advanced ML models for free under this program.
According to Google, this will allow greater participation among researchers as they collaborate to collectively combat this crisis. With the launch of this program, researchers and data scientists can access data from the Google Cloud Console. In addition to a description of the data, it also carries sample queries to advance research.
AWS Covid-19 Data Lake
Amazon recently announced that it has made a public AWS data lake around Covid-19 available for free. Calling it a central repository for ‘up-to-date and curated datasets’ on the disease, it allows researchers to study and analyse the data in one place in an efficient manner.
Hosted on the AWS cloud, the AWS Covid-19 data lake carries data from Johns Hopkins, The New York Times, and information from over 45,000 research articles covering the disease. The company claims to be regularly updating this repository with sources increasingly making their data public.
World Bank’s Covid-19 Data Catalog
The World Bank has joined tech companies and other institutions that are maintaining datasets, which researchers can take advantage of to appropriately respond to the Covid-19 pandemic. It has been curating datasets across various sectors, including healthcare and finance.
According to the institute, the datasets in the Covid-19 subset were sourced from World Bank Group research, various publications as well as through metadata analysis using relevant keywords.
CAS Open Access Dataset
The American Chemical Society’s data division CAS has open-sourced its antiviral dataset to support research into Covid-19 treatments. This dataset carries information on 50,000 compounds that potentially have antiviral properties.
According to CAS, the dataset can enable researchers to use previously published chemical knowledge with emerging technologies like AI to accelerate research on treatments for Covid-19.
EU’s COVID-19 Coronavirus Data
This dataset, maintained by the European Centre For Disease Prevention And Control (ECDC), hosts the latest available data on Covid-19. This comprises, but is not limited to, the epidemiological curve and the geographical distribution of cases across the EU and the world.
ECDC has been curating data around the numbers of active Covid-19 cases and related deaths using reports from global health authorities. The institute has been monitoring the outbreak with a keen eye to constantly refine this process, and ensure the accuracy of the data.
On the key approaches taken to screen Covid-19 infected patients has been to use chest radiography images. This has spurred a number of AI systems that show promising results when it comes to accurately detect Covid-19 infections using these images.Since these deep learning-based AI systems have been closed to the public, these proposed solutions and discoveries are available to the wider research community with Covid-Net. It is a deep convolutional neural network design for the detection of this disease using a dataset comprising chest radiography images. As of now, this dataset contains ‘13,800 chest radiography images across 13,725 patient cases from three open access data repositories.’