MITB Banner

Top Valuable DataSets For COVID-19 Researchers

Share

covid 19 datasets data

There is an increasing urgency to maintain reliable data assets around COVID-19 because of the speed at which developments are unfolding. This has made it challenging for the medical research community to keep up. These freely available datasets are offered to the global research community to produce new insights as the world continues its fight against COVID-19.

Here, we look at what these data assets are, and where they can be located:  

Visual Dashboard Dataset

This is the data repository for the Coronavirus Visual Dashboard, managed by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Multiple organizations have extensively used it to track the geographic spread of the viral epidemic. The dataset is also supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).

Research Articles Dataset

In response to the COVID-19 pandemic, the Allen Institute for AI, White House and a group of top research groups have developed the COVID-19 Open Research Dataset (CORD-19). CORD-19 comprises over 47,000 scholarly articles, including over 36,000 with full text about COVID-19, SARS-CoV-2, and associated coronaviruses. 

The CORD-19 dataset serves as the most comprehensive machine-readable coronavirus literature compilation ready for data mining at the moment. The Allen Institute produced this dataset for AI in cooperation with the Microsoft Research, Georgetown University’s Center for Security and Emerging Technology, Chan Zuckerberg Initiative, and National Institutes of Health, under collaboration with White House Office of Science and Technology Policy in the US.

The World Health Organization (WHO) has also been gathering the latest scientific verdicts and knowledge on COVID-19, and is organizing it in a database. WHO updates the database daily from the exploration of bibliographic databases, manual searches of the table of contents of associated scientific journals, and the addition of other relevant scientific articles. The entries in the database are not fixed, and additional research is supplemented daily. 

Scan Images Dataset

The British Society of Thoracic Imaging (BSTI), in connection with Cimar UK’s Imaging Cloud Technology (cimar.co.uk), produced and deployed an anonymized and encrypted web portal to submit and refer images of patients from confirmed COVID-19 cases. From these, BSTI hopes to give an imaging database of established UK patient examples for reference and teaching. The intention is to quickly disseminate clinical and diagnostic information to frontline healthcare workers in the UK.

Lan Dao, Joseph Paul Cohen and Paul Morrison from the University of Montreal have also created a database of COVID-19 reported incidents with chest X-ray or CT scans and images. The database contains images from publications and has been released publicly in this GitHub repo. The researchers say the goal is to use these images to develop AI-based approaches to predict and understand the infection better. 

Twitter Data

The repository comprises an ongoing compilation of tweet IDs connected with the novel coronavirus COVID-19 (SARS-CoV-2), which began on January 28, 2020. Emily Chen from the University of Southern California used Twitter’s search API to find old Tweets from the preceding seven days, leading to the first tweets in the dataset dating back to January 22, 2020. Twitter’s streaming API was leveraged to follow particularized accounts and also collect real-time tweets that discussed specific keywords. To comply with Twitter’s Terms of Service, the dataset is only publicly released with the Tweet IDs of the collected Tweets for non-commercial research use.

Genome Sequences Data

Laboratories around the world are generating and sharing an increasing number of hCoV-19 genome sequences, clinical and epidemiological data associated with the novel coronavirus through GISAID. The genome sequences of hCoV-19 are essential to produce and assess diagnostic tests, to track and trace the ongoing outbreak, and to recognize possible intervention choices. The GISAID initiative supports the global sharing of all influenza virus sequences, and associated clinical and epidemiological data linked with human viruses to help researchers. 

Share
Picture of Vishal Chawla

Vishal Chawla

Vishal Chawla is a senior tech journalist at Analytics India Magazine and writes about AI, data analytics, cybersecurity, cloud computing, and blockchain. Vishal also hosts AIM's video podcast called Simulated Reality- featuring tech leaders, AI experts, and innovative startups of India.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.