MITB Banner

The two papers awarded under Benchmarks & Datasets track at NeurIPS 2021

One of the highlights of NeurIPS 2021 was the introduction of a new award category – Dataset and Benchmark track.
Share
NeurIPS

The 35th edition of NeurIPS (Neural Information Processing Systems), one of the world’s most prestigious industry and academic gatherings was recently concluded. NeurIPS 2021 received 9,122 submissions, of which 2,344 were accepted. Twenty-six per cent of papers were accepted (with 3 per cent designated as spotlight papers), a slight increase from last year and the highest since 2013.

One of the highlights of this year’s conference was the introduction of a new award category – Dataset and Benchmark track. Under this category, two papers were awarded. 

Idea behind announcing a new category

NeurIPS wrote in a blog that the Datasets and Benchmarks track would act as a novel venue for high-quality publications and talks on pertinent topics of valuable ML datasets and benchmarks. It would also serve as a forum for discussions on how to improve dataset development. Datasets and benchmarks are important for the development of machine learning methods but require their own reviewing guidelines. They also require additional specific checks like a proper description of the collected data on parameters like accessibility and bias. The submission to this track was reviewed according to a set of criteria that were designed specifically for datasets and benchmarks. 

The following two papers were recognised in the new category of Datasets & Benchmarks Best Paper Awards:

Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research

This paper was published by a group of researchers from the University of California, Los Angeles, and Google Research. This paper explored the use of datasets within different machine learning subcommunities and the interaction between dataset adoption and creation. It calls for researchers to select benchmark datasets with greater care and promote the creation of new and more diverse datasets.

This paper found that despite the foundational role of benchmarking practices in ML research, little attention has been paid to benchmark dataset use and reuse dynamics. The researchers studied how the usage patterns differ across ML subcommunities between 2015-2020. They found that the increasing concentration on fewer datasets within task communities, adoption of datasets from other tasks, and concentration across the field on datasets that have been introduced by researchers situated within a small number of elite institutions,” the scientists noted. The result of this study can be used for scientific evaluation, AI ethics, and equity/access within the field. 

ATOM3D Tasks on Molecules in Three Dimensions

The ATOM3D database contains datasets that describe the three-dimensional structure of biomolecules, including proteins, small molecules, and nucleic acids. They represent a variety of important structural, functional, and engineering challenges and serve as a benchmark for machine learning methods that operate on molecular structure. A Python package is also provided with all datasets, including processing code, utilities, models, and data loaders for common machine learning frameworks such as PyTorch. ATOM3D’s datasets are updated as the field progresses, and tasks are added according to the project’s needs.

At the moment, Atom3D contains eight datasets, which can roughly be categorised into four sections that cover a wide range of problems ranging from single molecular structures to interactions between biomolecules and molecular functional properties and design/engineering tasks. 

PS: The story was written using a keyboard.
Share
Picture of Sohini Das

Sohini Das

Sohini graduated from the University of Kalyani with a master's degree in nanosciences and nanotechnology. She hopes to become a tech journalist one day. Her work focuses on digital transformation, geopolitics, and emerging technologies.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India