7 outstanding papers at ICLR 2022

The International Conference on Learning Representations (ICLR) has announced the ICLR 2022 Outstanding Paper Awards. The selection committee consisted of Andreas Krause (ETH-Zurich), Atlas Wang (UT Austin), Been Kim (Google Brain), Bo Li (University of Illinois Urbana-Champaign), Bohyung Han (Seoul National University), He He (New York University), and Zaid Harchaoui (University of Washington). 

The outstanding papers are listed below:


Sign up for your weekly dose of what's up in emerging technology.



  • Bo Zhang-Department of Computer Science & Technology, Institute for AI, Tsinghua-Huawei Joint Center for AI
  • Fan Bao-Department of Computer Science & Technology, Institute for AI, Tsinghua-Huawei Joint Center for AI
  • Chongxuan Li-Gaoling School of Artificial Intelligence, Renmin University of China, Beijing
  • Jun Zhu-Department of Computer Science & Technology, Institute for AI, Tsinghua-Huawei Joint Center for AI

Diffusion probabilistic models (DPMs)– first proposed by Sohl-Dickstein et al., 2015–fall under the class of generative models. The problem comes in with the inference of DPMs as it is too expensive since it requires iteration over thousands of timesteps. It needs to estimate the variance in each timestep of the reverse process. Till now, most of the work done on this uses a handcrafted value for all timesteps (Nichol & Dhariwal, 2021).

Here, the researchers of the paper titled, Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models, have proposed Analytic-DPM, a “training-free inference framework that estimates the analytic forms of the variance and Kullback–Leibler divergence (KL divergence) using the Monte Carlo method and a pretrained score-based model.” The researchers also said Analytic-DPM applies to various DPMs like Ho et al., 2020; Song et al., 2020a; Nichol & Dhariwal, 2021) in a plug-and-play manner. Analytic-DPM improves the log-likelihood of various DPMs, produces high-quality samples, and also produces 20× to 80× speed. 

To read the paper, click here.

Hyperparameter Tuning with Renyi Differential Privacy


Nicolas Papernot– Google Research, Brain Team

Thomas Steinke-Google Research, Brain Team 

Differential privacy is a system for publicly sharing information about a dataset by disclosing patterns in the groups but not revealing information about individual entities in the dataset.

Noisy (stochastic) gradient descent is a popular method for ensuring differential privacy ( Song et al., 2013; Bassily et al., 2014; Abadi et al., 2016). In the paper titled, Hyperparameter Tuning with Renyi Differential Privacy, the researchers pointed out that DP-SGD differs from the standard gradient.

  • The gradients are computed on a per example basis 
  • Individual gradients are clipped so that its 2-norm is bounded. 
  • Gaussian noise is added to the gradients.

Due to these differences, it bounds the sensitivity of each update so that the added noise ensures differential privacy. The researchers showed how setting hyperparameters based on non-private training runs could leak private information. The team also provided privacy guarantees for hyperparameter search procedures within the framework of Renyi Differential Privacy. The results improve on and extend the work of Liu and Talwar (STOC 2019)

To read the paper, click here.

Learning Strides in Convolutional Neural Networks

CNNs have wide applications in image and text classification, speech recognition, translation etc. The paper, Learning Strides in Convolutional Neural Networks, addresses a critical issue while using CNNs – setting the strides in a principled way instead of trials and errors. 

What do you mean by Stride?

DeepAI defines “stride” as a neural network’s filter parameter that modifies the amount of movement over the image or video.

Inspired by the work titled Spectral Representations for Convolutional Neural Networks, the researchers proposed “DiffStride, the first downsampling layer with learnable strides”. Instead of cropping with a fixed bounding box controlled by a striding hyperparameter, DiffStride learns the size of its cropping box through backpropagation. 

Expressiveness and Approximation Properties of Graph Neural Networks 


  • Floris Geerts -Department of Computer Science, University of Antwerp, Belgium 
  • Juan L. Reutter, School of Engineering, Pontificia Universidad Catolica de Chile, Chile & IMFD

GNN architectures are characterised by the separation power of graph algorithms such as color refinement (CR) and k-dimensional Weisfeiler-Leman tests (k-WL). Understanding the separation power of a given GNN architecture requires complex proofs focused on the specifics of the architecture. 

In the paper titled, Expressiveness and Approximation Properties of Graph Neural Networks, the researchers proposed a tensor language-based technique to analyse the separation power of general GNNs. The approach also provides a toolbox with which GNN architecture designers can analyse the separation power of their GNNs without the need to figure out the intricacies of the WL-tests.

Comparing Distributions by Measuring Differences that Affect Decision Making


  •  Yutong He-Department of Computer Science Stanford University
  • Jiaming Song-Department of Computer Science Stanford University
  • Stefano Ermon– Department of Computer Science Stanford University

Quantifying the discrepancy between two probability distributions is a huge challenge in machine learning. The paper, Comparing Distributions by Measuring Differences that Affect Decision Making, introduces a new class of discrepancies based on the optimal loss for a decision task. By suitably choosing the decision task, it generalises the Jensen Shannon divergence and the maximum mean discrepancy family. 

By applying this approach to two-sample tests and various benchmarks, the team has achieved superior test power compared to competing methods. 

Neural Collapse Under MSE Loss


David L. Donoho-Stanford University

X.Y. Han,- Cornell University

Vardan Papyan– University of Toronto

The paper titled Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path says during the neural collapse, last-layer features collapse to their class-means while the class-means collapse to the same Simplex Equiangular Tight Frame. The classifier behaviour collapses to the nearest-class-mean decision rule. 

The paper proposes a new theoretical construct of “central path”, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics.

Bootstrapped Meta-Learning


Meta-learning essentially means ‘learning to learn‘. The paper titled, Bootstrapped Meta-Learning, outlines a few challenges that crop up in meta-learning. Meta-learning is challenging because it must first be applied to evaluate an update rule. And it comes with high computational costs. Several challenges in meta-optimisation degrade the performance. The researchers have proposed an algorithm that lets the meta-learner teach itself. 

The algorithm first bootstraps a target from the meta-learner and then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo) metric. The bootstrapping mechanism can extend the effective meta-learning horizon without requiring backpropagation through all updates. The researchers achieved new state-of-the-art for model-free agents on the Atari ALE benchmark and showed that it yields both performance and efficiency gains in multi-task meta-learning.

More Great AIM Stories

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.