International Conference On Machine Learning (ICML) 2021 is back again with its 38th edition and is held virtually from July 18 – 24, 2021. The conference sees presentations of papers on all topics related to machine learning.
Sign up for your weekly dose of what's up in emerging technology.
The recent work from Vitaly Feldman, Audra McMillan and Kunal Talwar show how random shuffling strengthens differential privacy guarantees of locally randomised data. Such amplification implies significantly higher privacy guarantees for systems where data is contributed anonymously, which has sparked interest in the shuffle model of privacy. Researchers have provided a new analysis of privacy amplification by shuffling.
Gavin Brown, Mark Bun, Vitaly Feldman, Adam Smith and Kunal Talwar worked on the research. Training algorithms operate on a huge range of prediction tasks, from image classification to language translation, often involving highly sensitive data. To succeed, models must contain information about the data they were trained on. These models can clearly be said to memorise at least part of their training data. Commonly, however, memorisation is an implicit, unintended side effect. In this paper, researchers aim to understand when this sort of memorisation is unavoidable. They give natural prediction problems in which every reasonably accurate training algorithm must encode, in the prediction model, information about a large subset of its training examples.
Etai Littwin, Omid Saremi, Shuangfei Zhai, Vimal Thilak, Hanlin Goh, Joshua M. Susskind and Greg Yang investigated the effect of applying a bottleneck in an otherwise infinite width network. “The theoretical analysis reveals novel insights regarding the behaviour of input-output Jacobians, both at initialisation and training. Though stated for shallow, single hidden layer networks post bottleneck, we expect our results to hold in more general cases. Empirically, we observe that infinite width networks with bottlenecks train much faster than their fully infinite counterparts, while typically achieving better overall performance.”
Researchers including Shih-Yu Sun, Vimal Thilak, Etai Littwin, Omid Saremi and Joshua M. Susskind studied implicit regularisation induced by deep linear networks at autoencoder bottlenecks, revealing that latent codes are biased towards low-rank structures through greedy learning. Further, they showed that orthogonal initialisation removes prior spectral bias and significantly improves training stability across linear network depths when combined with principled learning speed adjustment.
Mitchell Wortsman, Maxwell Horton, Carlos Guestrin, Ali Farhadi and Mohammad Rastegari have worked on the neural network optimisation landscape. Recent observations have surfaced the existence of paths of high accuracy containing diverse solutions and wider minima offering improved performance. Researchers aim to leverage both these properties with a single method and in a single training run as opposed to previous methods that require multiple training runs.
Locally Differentially Private (LDP) Reports are often used for the collection of machine learning and statistics. The most well-known LDP algorithms frequently necessitate transmitting unreasonably large messages from the client to the server (such as when constructing histograms over large domains or learning a high-dimensional model). This communication costs of LDP algorithms can be minimised, but it results in utility loss. Vitaly Feldman and Kunal Talwar have come up with a general approach that compresses every effective LDP algorithm with low loss in privacy and utility guarantees under standard cryptographic assumptions.
Researchers including Hilal Asi, John Duchi, Alireza Fallah, Omid Javidbakht and Kunal Talwar introduce Pagan (Private AdaGrad with Adaptive Noise), a new differentially private variant of stochastic gradient descent and AdaGrad. They proposed a new private adaptive optimisation algorithm that analogises AdaGrad, showing that under certain natural distributional assumptions for the problems—similar to those that separate AdaGrad from non-adaptive methods [LD19] — the private versions of adaptive methods significantly outperform the standard non-adaptive private algorithms.
Convex optimisation is one of the most well-studied problems in private data analysis. Existing works have largely studied optimisation problems over ℓ2-bounded domains. However, several machine learning applications, such as LASSO and minimisation over the probability simplex, involve optimisation over ℓ1-bounded domains. In this work, researchers including Hilal Asi, Vitaly Feldman, Tomer Koren and Kunal Talwar study the problem of differentially private stochastic convex optimisation (DP-SCO) over ℓ1-bounded domains.
In this work, researchers leveraged uncertainty estimation to detect, and down-weight OOD backups in the Bellman squared loss for offline RL. The proposed technique – UWAC, achieves superior performance and improved training stability without introducing any additional model or losses. Furthermore, they experimentally demonstrate the effectiveness of dropout uncertainty estimation at detecting OOD samples in offline RL. UWAC also can be applied to stabilise other actor-critic methods.