Active Hackathon

5 Best ML Research Papers At ICML 2021

In ICML 2020, Google topped the charts of total research papers submitted, followed by DeepMind, Microsoft, Facebook, and Spotify.
5 Best ML Research Papers At ICML 2021

‘International Conference on Machine Learning,’ has announced the best paper awards. The 38th edition of ICML, one of the fastest-growing artificial intelligence conferences in the world, saw participation from academics, industrial researchers, entrepreneurs, engineers, graduate students and postdocs. 

https://twitter.com/icmlconf/status/1417110371161317378

ICML is renowned for presenting and publishing cutting-edge research on all aspects of machine learning.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Last year, ICML conference attracted close to 4,990 submissions, of which 1088 were accepted, at a 21.8% acceptance rate, lower than the previous year’s 22.6%. In ICML 2020, Google topped the charts of total research papers submitted, followed by DeepMind, Microsoft, Facebook, and Spotify.

Here is the list of papers that won ICML 2021 awards: 

Outstanding paper 

Unbiased Gradient Estimation In Unrolled Computation Graphs With Persistent Evolution 

Researchers from Google Brain and the University of Toronto, Paul Vicol, Luke Metz and Jascha Sohl-Dickstein, introduced a method for unbiased gradient estimation in untolled computation graphs, called Persistent Evolution Strategies (PES). 

PES obtains gradients from truncated unrolls, which speeds up optimisation by allowing for frequent parameter updates while not suffering from truncation bias that affects many competing approaches. The researchers showed PES is broadly applicable, with experiments demonstrating its application to an RNN-like task, hyperparameter optimisation, reinforcement learning, and meta-training of learned optimisers. 

Check out the full research paper here

Outstanding paper honorable mention

Oops I took a gradient: Scalable sampling for discrete distributions

Google Brain researchers Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud and Chris J. Maddison proposed a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Their approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a MetropolisHastings sampler. 

The researchers showed empirically that this approach outperforms generic samplers in many complex settings, including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. They also demonstrated the use of their improved sampler for training deep energy-based models (EBM) on high dimensional discrete data. Further, this approach outperforms variational auto-encoders and existing EBM.

Check out the full paper here

Optimal complexity in decentralised training 

Researchers at Cornell University, Yucheng Lu and Christopher De Sa, showed how decentralisation is a promising method of scaling up parallel machine learning systems.The researchers provided a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. 

The paper stated the tower bound revealed a theoretical gap in the known convergence rate of many existing decentralised training algorithms, such as D-PSGD. The researchers proved the lower bound is tight and achievable. 

The researchers further proposed DeTAG, a practical gossip-style decentralised algorithm that achieves the lower bound with only a logarithm gap. Empirically, they compared DeTaG with other decentralised algorithms on image classification tasks and noted that DeTAG enjoys faster convergence than baselines, especially on unshuffled data and sparse networks. 

Check out the full research paper here

Understanding self-supervised learning dynamics without contrastive pairs

Facebook AI researchers Yuandong Tian, Xinlei Chen, and Surya Ganguli discussed various methods around self-supervised learning (SSL) and proposed a novel theoretical approach, DirectPred that directly sets the linear predictor based on the statistics of its inputs, without gradient training. 

On the ImageNet dataset, it performed comparably with more complex two-layer non-linear predictors that employ BatchNorm and outperformed a linear predictor by 2.5 percent in 300-epoch training (and 5 percent in 60-epoch). The researchers said DirectPred is motivated by their theoretical study of the non-linear learning dynamics of non-contrastive SSL in simple linear networks. 

Further, the study showed conceptual insights into how non-contrastive SSL methods learn, how they avoided representational collapse, and how multiple factors, like predictor networks, stop-gradients, exponential moving averages, and weight decay all came into play. Finally, the researchers said their simple theory recapitulates the results of real-world ablation studies in both STL-10 and ImageNet. The source code is released on GitHub

 

Check out the research paper here

Solving high-dimensional parabolic PDEs using the tensor train format

Lorenz Richter, Leon Sallandt, and Nikolas Nüsken showed tensor trains provide an appealing approximation framework for parabolic partial differential equations (PDEs): the combination of reformulations in terms of backward stochastic differential equations and regression-type methods in the tensor format holds the promise of leveraging latent low-rank structures enabling both compression and efficient computation. 

In line with this, the researchers have developed novel iterative schemes involving either explicit and fast or implicit and accurate updates. Their methods achieve a favourable trade-off between accuracy and computational efficiency compared with SOTA neural network-based approaches. 

Check out the full paper here

More Great AIM Stories

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Our Upcoming Events

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

Council Post: Enabling a Data-Driven culture within BFSI GCCs in India

Data is the key element across all the three tenets of engineering brilliance, customer-centricity and talent strategy and engagement and will continue to help us deliver on our transformation agenda. Our data-driven culture fosters continuous performance improvement to create differentiated experiences and enable growth.

Ouch, Cognizant

The company has reduced its full-year 2022 revenue growth guidance to 8.5% – 9.5% in constant currency from the 9-11% in the previous quarter