5 Best ML Research Papers At ICML 2021

In ICML 2020, Google topped the charts of total research papers submitted, followed by DeepMind, Microsoft, Facebook, and Spotify.
5 Best ML Research Papers At ICML 2021

‘International Conference on Machine Learning,’ has announced the best paper awards. The 38th edition of ICML, one of the fastest-growing artificial intelligence conferences in the world, saw participation from academics, industrial researchers, entrepreneurs, engineers, graduate students and postdocs. 


ICML is renowned for presenting and publishing cutting-edge research on all aspects of machine learning.

Last year, ICML conference attracted close to 4,990 submissions, of which 1088 were accepted, at a 21.8% acceptance rate, lower than the previous year’s 22.6%. In ICML 2020, Google topped the charts of total research papers submitted, followed by DeepMind, Microsoft, Facebook, and Spotify.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Here is the list of papers that won ICML 2021 awards: 

Outstanding paper 

Unbiased Gradient Estimation In Unrolled Computation Graphs With Persistent Evolution 

Researchers from Google Brain and the University of Toronto, Paul Vicol, Luke Metz and Jascha Sohl-Dickstein, introduced a method for unbiased gradient estimation in untolled computation graphs, called Persistent Evolution Strategies (PES). 

PES obtains gradients from truncated unrolls, which speeds up optimisation by allowing for frequent parameter updates while not suffering from truncation bias that affects many competing approaches. The researchers showed PES is broadly applicable, with experiments demonstrating its application to an RNN-like task, hyperparameter optimisation, reinforcement learning, and meta-training of learned optimisers. 

Check out the full research paper here

Outstanding paper honorable mention

Oops I took a gradient: Scalable sampling for discrete distributions

Google Brain researchers Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud and Chris J. Maddison proposed a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Their approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a MetropolisHastings sampler. 

The researchers showed empirically that this approach outperforms generic samplers in many complex settings, including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. They also demonstrated the use of their improved sampler for training deep energy-based models (EBM) on high dimensional discrete data. Further, this approach outperforms variational auto-encoders and existing EBM.

Check out the full paper here

Optimal complexity in decentralised training 

Researchers at Cornell University, Yucheng Lu and Christopher De Sa, showed how decentralisation is a promising method of scaling up parallel machine learning systems.The researchers provided a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. 

The paper stated the tower bound revealed a theoretical gap in the known convergence rate of many existing decentralised training algorithms, such as D-PSGD. The researchers proved the lower bound is tight and achievable. 

The researchers further proposed DeTAG, a practical gossip-style decentralised algorithm that achieves the lower bound with only a logarithm gap. Empirically, they compared DeTaG with other decentralised algorithms on image classification tasks and noted that DeTAG enjoys faster convergence than baselines, especially on unshuffled data and sparse networks. 

Check out the full research paper here

Understanding self-supervised learning dynamics without contrastive pairs

Facebook AI researchers Yuandong Tian, Xinlei Chen, and Surya Ganguli discussed various methods around self-supervised learning (SSL) and proposed a novel theoretical approach, DirectPred that directly sets the linear predictor based on the statistics of its inputs, without gradient training. 

On the ImageNet dataset, it performed comparably with more complex two-layer non-linear predictors that employ BatchNorm and outperformed a linear predictor by 2.5 percent in 300-epoch training (and 5 percent in 60-epoch). The researchers said DirectPred is motivated by their theoretical study of the non-linear learning dynamics of non-contrastive SSL in simple linear networks. 

Further, the study showed conceptual insights into how non-contrastive SSL methods learn, how they avoided representational collapse, and how multiple factors, like predictor networks, stop-gradients, exponential moving averages, and weight decay all came into play. Finally, the researchers said their simple theory recapitulates the results of real-world ablation studies in both STL-10 and ImageNet. The source code is released on GitHub


Check out the research paper here

Solving high-dimensional parabolic PDEs using the tensor train format

Lorenz Richter, Leon Sallandt, and Nikolas Nüsken showed tensor trains provide an appealing approximation framework for parabolic partial differential equations (PDEs): the combination of reformulations in terms of backward stochastic differential equations and regression-type methods in the tensor format holds the promise of leveraging latent low-rank structures enabling both compression and efficient computation. 

In line with this, the researchers have developed novel iterative schemes involving either explicit and fast or implicit and accurate updates. Their methods achieve a favourable trade-off between accuracy and computational efficiency compared with SOTA neural network-based approaches. 

Check out the full paper here

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox