MITB Banner

5 Best ML Research Papers At ICML 2021

In ICML 2020, Google topped the charts of total research papers submitted, followed by DeepMind, Microsoft, Facebook, and Spotify.

Share

5 Best ML Research Papers At ICML 2021

‘International Conference on Machine Learning,’ has announced the best paper awards. The 38th edition of ICML, one of the fastest-growing artificial intelligence conferences in the world, saw participation from academics, industrial researchers, entrepreneurs, engineers, graduate students and postdocs. 

https://twitter.com/icmlconf/status/1417110371161317378

ICML is renowned for presenting and publishing cutting-edge research on all aspects of machine learning.

Last year, ICML conference attracted close to 4,990 submissions, of which 1088 were accepted, at a 21.8% acceptance rate, lower than the previous year’s 22.6%. In ICML 2020, Google topped the charts of total research papers submitted, followed by DeepMind, Microsoft, Facebook, and Spotify.

Here is the list of papers that won ICML 2021 awards: 

Outstanding paper 

Unbiased Gradient Estimation In Unrolled Computation Graphs With Persistent Evolution 

Researchers from Google Brain and the University of Toronto, Paul Vicol, Luke Metz and Jascha Sohl-Dickstein, introduced a method for unbiased gradient estimation in untolled computation graphs, called Persistent Evolution Strategies (PES). 

PES obtains gradients from truncated unrolls, which speeds up optimisation by allowing for frequent parameter updates while not suffering from truncation bias that affects many competing approaches. The researchers showed PES is broadly applicable, with experiments demonstrating its application to an RNN-like task, hyperparameter optimisation, reinforcement learning, and meta-training of learned optimisers. 

Check out the full research paper here

Outstanding paper honorable mention

Oops I took a gradient: Scalable sampling for discrete distributions

Google Brain researchers Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud and Chris J. Maddison proposed a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Their approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a MetropolisHastings sampler. 

The researchers showed empirically that this approach outperforms generic samplers in many complex settings, including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. They also demonstrated the use of their improved sampler for training deep energy-based models (EBM) on high dimensional discrete data. Further, this approach outperforms variational auto-encoders and existing EBM.

Check out the full paper here

Optimal complexity in decentralised training 

Researchers at Cornell University, Yucheng Lu and Christopher De Sa, showed how decentralisation is a promising method of scaling up parallel machine learning systems.The researchers provided a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. 

The paper stated the tower bound revealed a theoretical gap in the known convergence rate of many existing decentralised training algorithms, such as D-PSGD. The researchers proved the lower bound is tight and achievable. 

The researchers further proposed DeTAG, a practical gossip-style decentralised algorithm that achieves the lower bound with only a logarithm gap. Empirically, they compared DeTaG with other decentralised algorithms on image classification tasks and noted that DeTAG enjoys faster convergence than baselines, especially on unshuffled data and sparse networks. 

Check out the full research paper here

Understanding self-supervised learning dynamics without contrastive pairs

Facebook AI researchers Yuandong Tian, Xinlei Chen, and Surya Ganguli discussed various methods around self-supervised learning (SSL) and proposed a novel theoretical approach, DirectPred that directly sets the linear predictor based on the statistics of its inputs, without gradient training. 

On the ImageNet dataset, it performed comparably with more complex two-layer non-linear predictors that employ BatchNorm and outperformed a linear predictor by 2.5 percent in 300-epoch training (and 5 percent in 60-epoch). The researchers said DirectPred is motivated by their theoretical study of the non-linear learning dynamics of non-contrastive SSL in simple linear networks. 

Further, the study showed conceptual insights into how non-contrastive SSL methods learn, how they avoided representational collapse, and how multiple factors, like predictor networks, stop-gradients, exponential moving averages, and weight decay all came into play. Finally, the researchers said their simple theory recapitulates the results of real-world ablation studies in both STL-10 and ImageNet. The source code is released on GitHub

 

Check out the research paper here

Solving high-dimensional parabolic PDEs using the tensor train format

Lorenz Richter, Leon Sallandt, and Nikolas Nüsken showed tensor trains provide an appealing approximation framework for parabolic partial differential equations (PDEs): the combination of reformulations in terms of backward stochastic differential equations and regression-type methods in the tensor format holds the promise of leveraging latent low-rank structures enabling both compression and efficient computation. 

In line with this, the researchers have developed novel iterative schemes involving either explicit and fast or implicit and accurate updates. Their methods achieve a favourable trade-off between accuracy and computational efficiency compared with SOTA neural network-based approaches. 

Check out the full paper here

Share
Picture of Amit Raja Naik

Amit Raja Naik

Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India