Last updated January 10, 2022
In Creative AI

The two papers awarded under Benchmarks & Datasets track at NeurIPS 2021

One of the highlights of NeurIPS 2021 was the introduction of a new award category – Dataset and Benchmark track.

Published on December 18, 2021

by Sohini Das

The 35th edition of NeurIPS (Neural Information Processing Systems), one of the world’s most prestigious industry and academic gatherings was recently concluded. NeurIPS 2021 received 9,122 submissions, of which 2,344 were accepted. Twenty-six per cent of papers were accepted (with 3 per cent designated as spotlight papers), a slight increase from last year and the highest since 2013.

One of the highlights of this year’s conference was the introduction of a new award category – Dataset and Benchmark track. Under this category, two papers were awarded.

Idea behind announcing a new category

NeurIPS wrote in a blog that the Datasets and Benchmarks track would act as a novel venue for high-quality publications and talks on pertinent topics of valuable ML datasets and benchmarks. It would also serve as a forum for discussions on how to improve dataset development. Datasets and benchmarks are important for the development of machine learning methods but require their own reviewing guidelines. They also require additional specific checks like a proper description of the collected data on parameters like accessibility and bias. The submission to this track was reviewed according to a set of criteria that were designed specifically for datasets and benchmarks.

The following two papers were recognised in the new category of Datasets & Benchmarks Best Paper Awards:

Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research

This paper was published by a group of researchers from the University of California, Los Angeles, and Google Research. This paper explored the use of datasets within different machine learning subcommunities and the interaction between dataset adoption and creation. It calls for researchers to select benchmark datasets with greater care and promote the creation of new and more diverse datasets.

This paper found that despite the foundational role of benchmarking practices in ML research, little attention has been paid to benchmark dataset use and reuse dynamics. The researchers studied how the usage patterns differ across ML subcommunities between 2015-2020. They found that the increasing concentration on fewer datasets within task communities, adoption of datasets from other tasks, and concentration across the field on datasets that have been introduced by researchers situated within a small number of elite institutions,” the scientists noted. The result of this study can be used for scientific evaluation, AI ethics, and equity/access within the field.

ATOM3D Tasks on Molecules in Three Dimensions

The ATOM3D database contains datasets that describe the three-dimensional structure of biomolecules, including proteins, small molecules, and nucleic acids. They represent a variety of important structural, functional, and engineering challenges and serve as a benchmark for machine learning methods that operate on molecular structure. A Python package is also provided with all datasets, including processing code, utilities, models, and data loaders for common machine learning frameworks such as PyTorch. ATOM3D’s datasets are updated as the field progresses, and tasks are added according to the project’s needs.

At the moment, Atom3D contains eight datasets, which can roughly be categorised into four sections that cover a wide range of problems ranging from single molecular structures to interactions between biomolecules and molecular functional properties and design/engineering tasks.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

The Impact of Lok Sabha Election on India’s AI Progress

Vidyashree Srinivas

The BJP aims to safeguard citizen safety and privacy, leaning towards regulation, while the Congress views AI advancements as an opportunity to create jobs.