The artificial intelligence sector sees over 14,000 papers published each year. This field attracts one of the most productive research groups globally.
AI conferences like NeurIPS, ICML, ICLR, ACL and MLDS, among others, attract scores of interesting papers every year. The year 2019 saw an increase in the number of submissions.
This year also saw noticeable trends like the increased usage of PyTorch as a framework for research increased by 194% among many others.
The papers published this year consisted of exceptional breakthroughs, ingenious architecture and thought-provoking satire.
Single Headed Attention RNN: Stop Thinking With Your Head
In this work of art, the Harvard grad author, Stephen “Smerity” Merity, investigated the current state of NLP, the models being used and other alternate approaches. In this process, he tears down the conventional methods from top to bottom, including etymology.
The author also voices the need for a Moore’s Law for machine learning that encourages a minicomputer future while also announcing his plans on rebuilding the codebase from the ground up both as an educational tool for others and as a strong platform for future work in academia and industry.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan and Quoc V. Le
In this work, the authors propose a compound scaling method that tells when to increase or decrease depth, height and resolution of a certain network.
Convolutional Neural Networks(CNNs) are at the heart of many machine vision applications.
EfficientNets are believed to superpass state-of-the-art accuracy with up to 10x better efficiency (smaller and faster).
Deep Double Descent By OpenAI
Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal
In this paper, an attempt has been made to reconcile classical understanding and modern practice within a unified performance curve.
The “double descent” curve overtakes the classic U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance.
The Lottery Ticket Hypothesis
Jonathan Frankle, Michael Carbin
Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy.
The authors find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, they introduce the “lottery ticket hypothesis:”
On The Measure Of Intelligence
This work summarizes and critically assesses the definitions of intelligence and evaluation approaches, while making apparent the historical conceptions of intelligence that have implicitly guided them.
The author, also the creator of keras, introduces a formal definition of intelligence based on Algorithmic Information Theory and using this definition, he also proposes a set of guidelines for what a general AI benchmark should look like.
Zero-Shot Word Sense Disambiguation Using Sense Definition Embeddings via IISc Bangalore & CMU
Sawan Kumar, Sharmistha Jat, Karan Saxena and Partha Talukdar
Word Sense Disambiguation (WSD) is a longstanding but open problem in Natural Language Processing (NLP). Current supervised WSD methods treat senses as discrete labels and also resort to predicting the Most-Frequent-Sense (MFS) for words unseen during training.
The researchers from IISc Bangalore in collaboration with Carnegie Mellon University propose Extended WSD Incorporating Sense Embeddings (EWISE), a supervised model to perform WSD by predicting over a continuous sense embedding space as opposed to a discrete label space.
Deep Equilibrium Models
Shaojie Bai, J. Zico Kolter and Vladlen Koltun
Motivated by the observation that the hidden layers of many existing deep sequence models converge towards some fixed point, the researchers at Carnegie Mellon University present a new approach to modeling sequential data through deep equilibrium model (DEQ) models.
Using this approach, training and prediction in these networks require only constant memory, regardless of the effective “depth” of the network.
IMAGENET-Trained CNNs are Biased Towards Texture
Robert G, Patricia R, Claudio M, Matthias Bethge, Felix A. W and Wieland B
Convolutional Neural Networks (CNNs) are commonly thought to recognise objects by learning increasingly complex representations of object shapes. The authors in this paper, evaluate CNNs and human observers on images with a texture-shape cue conflict. They show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence.
A Geometric Perspective on Optimal Representations for Reinforcement Learning
Marc G. B , Will D , Robert D , Adrien A T , Pablo S C , Nicolas Le R , Dale S, Tor L, Clare L
The authors propose a new perspective on representation learning in reinforcement learning
based on geometric properties of the space of value functions. This work shows that adversarial value functions exhibit interesting structure, and are good auxiliary tasks when learning a representation of an environment. The authors believe this work to open up the possibility of automatically generating auxiliary tasks in deep reinforcement learning.
Weight Agnostic Neural Networks
Adam Gaier & David Ha
In this work, the authors explore whether neural network architectures alone, without learning any weight parameters, can encode solutions for a given task. In this paper, they propose a search method for neural network architectures that can already perform a task without any explicit weight training.
Stand-Alone Self-Attention in Vision Models
Prajit Ramachandran, Niki P, Ashish Vaswani,Irwan Bello Anselm Levskaya, Jonathon S
In this work, the Google researchers verified that content-based interactions can serve the vision models. The proposed stand-alone local self-attention layer achieves competitive predictive performance on ImageNet classification and COCO object detection tasks while requiring fewer parameters and floating-point operations than the corresponding convolution baselines. Results show that attention is especially effective in the later parts of the network.
High-Fidelity Image Generation With Fewer Labels
Mario Lucic, Michael Tschannen, Marvin Ritter, Xiaohua Z, Olivier B and Sylvain Gelly
Modern-day models can produce high quality, close to reality when fed with a vast quantity of labelled data. To solve this large data dependency, researchers from Google released this work, to demonstrate how one can benefit from recent work on self- and semi-supervised learning to outperform the state of the art on both unsupervised ImageNet synthesis, as well as in the conditional setting.
The proposed approach is able to match the sample quality of the current state-of-the-art conditional model BigGAN on ImageNet using only 10% of the labels and outperform it using 20% of the labels.
ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin G, Piyush Sharma and Radu S
The authors present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT and to address the challenges posed by increasing model size and GPU/TPU memory limitations, longer training times, and unexpected model degradation
As a result, this proposed model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.
GauGANs-Semantic Image Synthesis with Spatially-Adaptive Normalization
Taesung Park, Ming-Yu Liu, Ting-Chun Wang and Jun-Yan Zhu
Nvidia in collaboration with UC Berkeley and MIT proposed a model which has a spatially-adaptive normalization layer for synthesizing photorealistic images given an input semantic layout.
This model retained visual fidelity and alignment with challenging input layouts while allowing the user to control both semantic and style.