The International Conference on Learning Representations (ICLR) has announced the ICLR 2022 Outstanding Paper Awards. The selection committee consisted of Andreas Krause (ETH-Zurich), Atlas Wang (UT Austin), Been Kim (Google Brain), Bo Li (University of Illinois Urbana-Champaign), Bohyung Han (Seoul National University), He He (New York University), and Zaid Harchaoui (University of Washington).
The outstanding papers are listed below:
Sign up for your weekly dose of what's up in emerging technology.
- Bo Zhang-Department of Computer Science & Technology, Institute for AI, Tsinghua-Huawei Joint Center for AI
- Fan Bao-Department of Computer Science & Technology, Institute for AI, Tsinghua-Huawei Joint Center for AI
- Chongxuan Li-Gaoling School of Artificial Intelligence, Renmin University of China, Beijing
- Jun Zhu-Department of Computer Science & Technology, Institute for AI, Tsinghua-Huawei Joint Center for AI
Diffusion probabilistic models (DPMs)– first proposed by Sohl-Dickstein et al., 2015–fall under the class of generative models. The problem comes in with the inference of DPMs as it is too expensive since it requires iteration over thousands of timesteps. It needs to estimate the variance in each timestep of the reverse process. Till now, most of the work done on this uses a handcrafted value for all timesteps (Nichol & Dhariwal, 2021).
Here, the researchers of the paper titled, Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models, have proposed Analytic-DPM, a “training-free inference framework that estimates the analytic forms of the variance and Kullback–Leibler divergence (KL divergence) using the Monte Carlo method and a pretrained score-based model.” The researchers also said Analytic-DPM applies to various DPMs like Ho et al., 2020; Song et al., 2020a; Nichol & Dhariwal, 2021) in a plug-and-play manner. Analytic-DPM improves the log-likelihood of various DPMs, produces high-quality samples, and also produces 20× to 80× speed.
To read the paper, click here.
Hyperparameter Tuning with Renyi Differential Privacy
Nicolas Papernot– Google Research, Brain Team
Thomas Steinke-Google Research, Brain Team
Differential privacy is a system for publicly sharing information about a dataset by disclosing patterns in the groups but not revealing information about individual entities in the dataset.
Noisy (stochastic) gradient descent is a popular method for ensuring differential privacy ( Song et al., 2013; Bassily et al., 2014; Abadi et al., 2016). In the paper titled, Hyperparameter Tuning with Renyi Differential Privacy, the researchers pointed out that DP-SGD differs from the standard gradient.
- The gradients are computed on a per example basis
- Individual gradients are clipped so that its 2-norm is bounded.
- Gaussian noise is added to the gradients.
Due to these differences, it bounds the sensitivity of each update so that the added noise ensures differential privacy. The researchers showed how setting hyperparameters based on non-private training runs could leak private information. The team also provided privacy guarantees for hyperparameter search procedures within the framework of Renyi Differential Privacy. The results improve on and extend the work of Liu and Talwar (STOC 2019).
To read the paper, click here.
Learning Strides in Convolutional Neural Networks
- Olivier Teboul-Google Research
- David Grangier-Google Research
- Neil Zeghidour-Google Research
CNNs have wide applications in image and text classification, speech recognition, translation etc. The paper, Learning Strides in Convolutional Neural Networks, addresses a critical issue while using CNNs – setting the strides in a principled way instead of trials and errors.
What do you mean by Stride?
DeepAI defines “stride” as a neural network’s filter parameter that modifies the amount of movement over the image or video.
Inspired by the work titled Spectral Representations for Convolutional Neural Networks, the researchers proposed “DiffStride, the first downsampling layer with learnable strides”. Instead of cropping with a fixed bounding box controlled by a striding hyperparameter, DiffStride learns the size of its cropping box through backpropagation.
Expressiveness and Approximation Properties of Graph Neural Networks
- Floris Geerts -Department of Computer Science, University of Antwerp, Belgium
- Juan L. Reutter, School of Engineering, Pontificia Universidad Catolica de Chile, Chile & IMFD
GNN architectures are characterised by the separation power of graph algorithms such as color refinement (CR) and k-dimensional Weisfeiler-Leman tests (k-WL). Understanding the separation power of a given GNN architecture requires complex proofs focused on the specifics of the architecture.
In the paper titled, Expressiveness and Approximation Properties of Graph Neural Networks, the researchers proposed a tensor language-based technique to analyse the separation power of general GNNs. The approach also provides a toolbox with which GNN architecture designers can analyse the separation power of their GNNs without the need to figure out the intricacies of the WL-tests.
Comparing Distributions by Measuring Differences that Affect Decision Making
- Shengjia Zhao-Department of Computer Science Stanford University
- Abhishek Sinha-Department of Computer Science Stanford University
- Yutong He-Department of Computer Science Stanford University
- Aidan Perreault-Department of Computer Science Stanford University
- Jiaming Song-Department of Computer Science Stanford University
- Stefano Ermon– Department of Computer Science Stanford University
Quantifying the discrepancy between two probability distributions is a huge challenge in machine learning. The paper, Comparing Distributions by Measuring Differences that Affect Decision Making, introduces a new class of discrepancies based on the optimal loss for a decision task. By suitably choosing the decision task, it generalises the Jensen Shannon divergence and the maximum mean discrepancy family.
By applying this approach to two-sample tests and various benchmarks, the team has achieved superior test power compared to competing methods.
Neural Collapse Under MSE Loss
David L. Donoho-Stanford University
X.Y. Han,- Cornell University
Vardan Papyan– University of Toronto
The paper titled Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path says during the neural collapse, last-layer features collapse to their class-means while the class-means collapse to the same Simplex Equiangular Tight Frame. The classifier behaviour collapses to the nearest-class-mean decision rule.
The paper proposes a new theoretical construct of “central path”, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics.
- Sebastian Flennerhag, Research Scientist at DeepMind
- Yannick Schroecker, Research Scientist at DeepMind
- Tom Zahavy-Senior Research Scientist at Deepmind
- Hado van Hasselt, Senior Staff Research Scientist at DeepMind
- Satinder Singh Baveja, Research Scientist at DeepMind
- David Silver, Principal Research Scientist, DeepMind
Meta-learning essentially means ‘learning to learn‘. The paper titled, Bootstrapped Meta-Learning, outlines a few challenges that crop up in meta-learning. Meta-learning is challenging because it must first be applied to evaluate an update rule. And it comes with high computational costs. Several challenges in meta-optimisation degrade the performance. The researchers have proposed an algorithm that lets the meta-learner teach itself.
The algorithm first bootstraps a target from the meta-learner and then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo) metric. The bootstrapping mechanism can extend the effective meta-learning horizon without requiring backpropagation through all updates. The researchers achieved new state-of-the-art for model-free agents on the Atari ALE benchmark and showed that it yields both performance and efficiency gains in multi-task meta-learning.