5 Best ML Research Papers At ICML 2021

In ICML 2020, Google topped the charts of total research papers submitted, followed by DeepMind, Microsoft, Facebook, and Spotify.

Share

Published on July 21, 2021

by Amit Raja Naik

‘International Conference on Machine Learning,’ has announced the best paper awards. The 38th edition of ICML, one of the fastest-growing artificial intelligence conferences in the world, saw participation from academics, industrial researchers, entrepreneurs, engineers, graduate students and postdocs.

https://twitter.com/icmlconf/status/1417110371161317378

ICML is renowned for presenting and publishing cutting-edge research on all aspects of machine learning.

Last year, ICML conference attracted close to 4,990 submissions, of which 1088 were accepted, at a 21.8% acceptance rate, lower than the previous year’s 22.6%. In ICML 2020, Google topped the charts of total research papers submitted, followed by DeepMind, Microsoft, Facebook, and Spotify.

Here is the list of papers that won ICML 2021 awards:

Outstanding paper

Unbiased Gradient Estimation In Unrolled Computation Graphs With Persistent Evolution

Researchers from Google Brain and the University of Toronto, Paul Vicol, Luke Metz and Jascha Sohl-Dickstein, introduced a method for unbiased gradient estimation in untolled computation graphs, called Persistent Evolution Strategies (PES).

PES obtains gradients from truncated unrolls, which speeds up optimisation by allowing for frequent parameter updates while not suffering from truncation bias that affects many competing approaches. The researchers showed PES is broadly applicable, with experiments demonstrating its application to an RNN-like task, hyperparameter optimisation, reinforcement learning, and meta-training of learned optimisers.

Check out the full research paper here.

Outstanding paper honorable mention

Oops I took a gradient: Scalable sampling for discrete distributions

Google Brain researchers Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud and Chris J. Maddison proposed a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Their approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a MetropolisHastings sampler.

The researchers showed empirically that this approach outperforms generic samplers in many complex settings, including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. They also demonstrated the use of their improved sampler for training deep energy-based models (EBM) on high dimensional discrete data. Further, this approach outperforms variational auto-encoders and existing EBM.

Check out the full paper here.

Optimal complexity in decentralised training

Researchers at Cornell University, Yucheng Lu and Christopher De Sa, showed how decentralisation is a promising method of scaling up parallel machine learning systems.The researchers provided a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting.

The paper stated the tower bound revealed a theoretical gap in the known convergence rate of many existing decentralised training algorithms, such as D-PSGD. The researchers proved the lower bound is tight and achievable.

The researchers further proposed DeTAG, a practical gossip-style decentralised algorithm that achieves the lower bound with only a logarithm gap. Empirically, they compared DeTaG with other decentralised algorithms on image classification tasks and noted that DeTAG enjoys faster convergence than baselines, especially on unshuffled data and sparse networks.

Check out the full research paper here.

Understanding self-supervised learning dynamics without contrastive pairs

Facebook AI researchers Yuandong Tian, Xinlei Chen, and Surya Ganguli discussed various methods around self-supervised learning (SSL) and proposed a novel theoretical approach, DirectPred that directly sets the linear predictor based on the statistics of its inputs, without gradient training.

On the ImageNet dataset, it performed comparably with more complex two-layer non-linear predictors that employ BatchNorm and outperformed a linear predictor by 2.5 percent in 300-epoch training (and 5 percent in 60-epoch). The researchers said DirectPred is motivated by their theoretical study of the non-linear learning dynamics of non-contrastive SSL in simple linear networks.

Further, the study showed conceptual insights into how non-contrastive SSL methods learn, how they avoided representational collapse, and how multiple factors, like predictor networks, stop-gradients, exponential moving averages, and weight decay all came into play. Finally, the researchers said their simple theory recapitulates the results of real-world ablation studies in both STL-10 and ImageNet. The source code is released on GitHub.

Congratulations to Facebook AI researchers Yuandong Tian (@tydsh), Xinlei Chen (@endernewton) & @SuryaGanguli for their #ICML2021 Outstanding Paper Honorable Mention for their work to demystify non-contrastive learning. Learn more: https://t.co/mVgxbBUcnB https://t.co/xpVvlgNOKv
— Meta AI (@MetaAI) July 19, 2021

Check out the research paper here.

Solving high-dimensional parabolic PDEs using the tensor train format

Lorenz Richter, Leon Sallandt, and Nikolas Nüsken showed tensor trains provide an appealing approximation framework for parabolic partial differential equations (PDEs): the combination of reformulations in terms of backward stochastic differential equations and regression-type methods in the tensor format holds the promise of leveraging latent low-rank structures enabling both compression and efficient computation.

In line with this, the researchers have developed novel iterative schemes involving either explicit and fast or implicit and accurate updates. Their methods achieve a favourable trade-off between accuracy and computational efficiency compared with SOTA neural network-based approaches.

Check out the full paper here.

Access all our open Survey & Awards Nomination forms in one place

Amit Raja Naik

Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.