Recently Released Datasets For Researchers To Fight Covid-19


As researchers scour numerous databases to combat the threat of coronavirus, timely access to the right data has become critical. With the urgency of this public health crisis intensifying, it has become imperative that access to reliable public data be made open. These, in turn, are likely to bring about crucial collaborations within the global research community to discover new insights to tackle the outbreak.

Open datasets — in their original, unabridged form — are essential to obtain a deeper understanding of the current crisis. This data, coupled with technological interventions like AI and natural language processing, has made it possible to improve forecasting models, make valuable predictions, and analyse the impact of the coronavirus.

What is more, given that the crisis is an ongoing problem that throws up new findings on a regular basis, maintaining reliable data assets that researchers can turn to has become paramount. Responding to the urgency of this crisis, several big organisations, including Google and Amazon, have offered researchers free access to their open datasets. Adding to a treasure trove of datasets gathered from a coalition of leading research groups, as well as leading institutes like John Hopkins, let us take a look at datasets that were recently released on Covid-19:


Sign up for your weekly dose of what's up in emerging technology.

Google’s Covid-19 Public Dataset Program

With effect until September 15, Google has opened access to its repository of Covid-19 public datasets. Aimed at researchers, data scientists, and data analysts for research and educational purposes, the company has also encouraged them to use BigQuery ML to train advanced ML models for free under this program.

According to Google, this will allow greater participation among researchers as they collaborate to collectively combat this crisis. With the launch of this program, researchers and data scientists can access data from the Google Cloud Console. In addition to a description of the data, it also carries sample queries to advance research. 

Download our Mobile App

AWS Covid-19 Data Lake

Amazon recently announced that it has made a public AWS data lake around Covid-19 available for free. Calling it a central repository for ‘up-to-date and curated datasets’ on the disease, it allows researchers to study and analyse the data in one place in an efficient manner.

Hosted on the AWS cloud, the AWS Covid-19 data lake carries data from Johns Hopkins, The New York Times, and information from over 45,000 research articles covering the disease. The company claims to be regularly updating this repository with sources increasingly making their data public.

ALSO READ: Why Open-Source Is Seeing Higher Adoption During Covid-19 Crisis

World Bank’s Covid-19 Data Catalog

The World Bank has joined tech companies and other institutions that are maintaining datasets, which researchers can take advantage of to appropriately respond to the Covid-19 pandemic. It has been curating datasets across various sectors, including healthcare and finance.

According to the institute, the datasets in the Covid-19 subset were sourced from World Bank Group research, various publications as well as through metadata analysis using relevant keywords.

CAS Open Access Dataset

The American Chemical Society’s data division CAS has open-sourced its antiviral dataset to support research into Covid-19 treatments. This dataset carries information on 50,000 compounds that potentially have antiviral properties.

According to CAS, the dataset can enable researchers to use previously published chemical knowledge with emerging technologies like AI to accelerate research on treatments for Covid-19.

ALSO READ: Top ML Projects To Fight Fake News Fatigue During Covid-19

EU’s COVID-19 Coronavirus Data

This dataset, maintained by the European Centre For Disease Prevention And Control (ECDC), hosts the latest available data on Covid-19. This comprises, but is not limited to, the epidemiological curve and the geographical distribution of cases across the EU and the world.

ECDC has been curating data around the numbers of active Covid-19 cases and related deaths using reports from global health authorities. The institute has been monitoring the outbreak with a keen eye to constantly refine this process, and ensure the accuracy of the data.

Covid-Net Dataset

On the key approaches taken to screen Covid-19 infected patients has been to use chest radiography images. This has spurred a number of AI systems that show promising results when it comes to accurately detect Covid-19 infections using these images.Since these deep learning-based AI systems have been closed to the public, these proposed solutions and discoveries are available to the wider research community with Covid-Net. It is a deep convolutional neural network design for the detection of this disease using a dataset comprising chest radiography images. As of now, this dataset contains ‘13,800 chest radiography images across 13,725 patient cases from three open access data repositories.’

More Great AIM Stories

Anu Thomas
Anu is a writer who stews in existential angst and actively seeks what’s broken. Lover of avant-garde films and BoJack Horseman fan theories, she has previously worked for Economic Times. Contact:

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Do machines feel pain?

Scientists worldwide have been finding ways to bring a sense of awareness to robots, including feeling pain, reacting to it, and withstanding harsh operating conditions.

IT professionals and DevOps say no to low-code

The obsession with low-code is led by its drag-and-drop interface, which saves a lot of time. In low-code, every single process is shown visually with the help of a graphical interface that makes everything easier to understand.

Neuralink elon musk

What could go wrong with Neuralink?

While the broad aim of developing such a BCI is to allow humans to be competitive with AI, Musk wants Neuralink to solve immediate problems like the treatment of Parkinson’s disease and brain ailments.

Understanding cybersecurity from machine learning POV 

Today, companies depend more on digitalisation and Internet-of-Things (IoT) after various security issues like unauthorised access, malware attack, zero-day attack, data breach, denial of service (DoS), social engineering or phishing surfaced at a significant rate.