In a recent AWS big data blog, the company has announced that it has made a public ‘AWS COVID-19 data lake’ available for free to fight this disease. According to the blog, the ‘AWS COVID-19 data lake’ is a centralised repository of up-to-date and curated datasets on or related to the spread and characteristics of the novel coronavirus, COVID-19.
The blog stated that — “As the COVID-19 pandemic continues to threaten and take lives around the world, we must work together across organisations and scientific disciplines to fight this disease. Innumerable healthcare workers, medical researchers, scientists, and public health officials are already on the front lines caring for patients, searching for therapies, educating the public, and helping to set policy.”
It further stated, “At AWS, we believe that one way we can help is to provide these experts with the data and tools needed to understand better, track, plan for, and eventually contain and neutralise the virus that causes COVID-19.”
AWS confirmed that the company has been working with its partners to make this crucial data, which has been hosted on AWS cloud, freely available and keep it up-to-date. The company has also seeded the curated data lake with COVID-19 case tracking data from Johns Hopkins and The New York Times, hospital bed availability from Definitive Healthcare, and over 45,000 research articles about COVID-19 and related coronaviruses from the Allen Institute for AI.
“We will regularly add to this data lake as other reliable sources make their data publicly available,” said the company on the blog.
The AWS’ public COVID-19 data lake has been designed to allow users to quickly run analyses on the data in place without wasting time extracting and wrangling data from all the available data sources. The users can also use AWS or third-party tools to perform trend analysis, do a keyword search, perform question/answer analysis, build and run machine learning models, or run custom analyses to meet their specific needs.
Alongside, the users can choose to work with the public data lake, combine it with their data, or subscribe to the source datasets directly through AWS Data Exchange.
AWS expects local health authorities to build dashboards in order to track infections and collaborate to deploy vital resources like hospital beds and ventilators efficiently. The data lake could also be helpful for epidemiologists in order to complement their models and datasets to generate better forecasts of hotspots and trends.
To access this data lake, users have to have access to an AWS account and have permissions to create an AWS CLoudFormation stack, and AWS Glue resources.