Top 14 Machine Learning Research Papers Of 2019

The artificial intelligence sector sees over 14,000 papers published each year. This field attracts one of the most productive research groups globally. 

AI conferences like NeurIPS, ICML, ICLR, ACL and MLDS, among others, attract scores of interesting papers every year. The year 2019 saw an increase in the number of submissions.


Sign up for your weekly dose of what's up in emerging technology.

via Oreilly

This year also saw noticeable trends like the increased usage of PyTorch as a framework for research increased by 194% among many others.

The papers published this year consisted of exceptional breakthroughs, ingenious architecture and thought-provoking satire.

Single Headed Attention RNN: Stop Thinking With Your Head 

Stephen Merity

November 2019

In this work of art, the Harvard grad author, Stephen “Smerity” Merity, investigated the current state of NLP, the models being used and other alternate approaches. In this process, he tears down the conventional methods from top to bottom, including etymology.

The author also voices the need for a Moore’s Law for machine learning that encourages a minicomputer future while also announcing his plans on rebuilding the codebase from the ground up both as an educational tool for others and as a strong platform for future work in academia and industry.

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Mingxing Tan and Quoc V. Le 

November 2019

In this work, the authors propose a compound scaling method that tells when to increase or decrease depth, height and resolution of a certain network.

Convolutional Neural Networks(CNNs) are at the heart of many machine vision applications. 

EfficientNets are believed to superpass state-of-the-art accuracy with up to 10x better efficiency (smaller and faster).

Deep Double Descent By OpenAI

Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal

September 2019

In this paper, an attempt has been made to reconcile classical understanding and modern practice within a unified performance curve. 

The “double descent” curve overtakes the classic U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. 

The Lottery Ticket Hypothesis

Jonathan Frankle, Michael Carbin

March 2019

Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. 

The authors find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, they introduce the “lottery ticket hypothesis:”

On The Measure Of Intelligence 

Francois Chollet

November 2019

This work summarizes and critically assesses the definitions of intelligence and evaluation approaches, while making apparent the historical conceptions of intelligence that have implicitly guided them.

The author, also the creator of keras, introduces a formal definition of intelligence based on Algorithmic Information Theory and using this definition, he also proposes a set of guidelines for what a general AI benchmark should look like. 

Zero-Shot Word Sense Disambiguation Using Sense Definition Embeddings via IISc Bangalore & CMU

Sawan Kumar, Sharmistha Jat, Karan Saxena and Partha Talukdar

August 2019

Word Sense Disambiguation (WSD) is a longstanding  but open problem in Natural Language Processing (NLP).  Current supervised WSD methods treat senses as discrete labels  and also resort to predicting the Most-Frequent-Sense (MFS) for words unseen  during training.

The researchers from IISc Bangalore in collaboration with Carnegie Mellon University propose  Extended WSD Incorporating Sense Embeddings (EWISE), a supervised model to perform WSD  by predicting over a continuous sense embedding space as opposed to a discrete label space.

Deep Equilibrium Models 

Shaojie Bai, J. Zico Kolter and Vladlen Koltun 

October 2019 

Motivated by the observation that the hidden layers of many existing deep sequence models converge towards some fixed point, the researchers at Carnegie Mellon University present a new approach to modeling sequential data through deep equilibrium model (DEQ) models. 

Using this approach, training and prediction in these networks require only constant memory, regardless of the effective “depth” of the network.

IMAGENET-Trained CNNs are Biased Towards Texture

Robert G, Patricia R, Claudio M, Matthias Bethge, Felix A. W and Wieland B

September 2019

Convolutional Neural Networks (CNNs) are commonly thought to recognise objects by learning increasingly complex representations of object shapes. The authors in this paper, evaluate CNNs and human observers on images with a texture-shape cue conflict. They show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence.

A Geometric Perspective on Optimal Representations for Reinforcement Learning 

Marc G. B , Will D , Robert D , Adrien A T , Pablo S C , Nicolas Le R , Dale S, Tor L, Clare L

June 2019

The authors propose a new perspective on representation learning in reinforcement learning

based on geometric properties of the space of value functions. This work shows that adversarial value functions exhibit interesting structure, and are good auxiliary tasks when learning a representation of an environment. The authors believe this work to open up the possibility of automatically generating auxiliary tasks in deep reinforcement learning.

Weight Agnostic Neural Networks 

Adam Gaier & David Ha

September 2019

In this work, the authors explore whether neural network architectures alone, without learning any weight parameters, can encode solutions for a given task. In this paper, they propose a search method for neural network architectures that can already perform a task without any explicit weight training. 

Stand-Alone Self-Attention in Vision Models 

Prajit Ramachandran, Niki P, Ashish Vaswani,Irwan Bello Anselm Levskaya, Jonathon S

June 2019

In this work, the Google researchers verified that content-based interactions can serve the vision models. The proposed stand-alone local self-attention layer achieves competitive predictive performance on ImageNet classification and COCO object detection tasks while requiring fewer parameters and floating-point operations than the corresponding convolution baselines. Results show that attention is especially effective in the later parts of the network. 

High-Fidelity Image Generation With Fewer Labels 

Mario Lucic, Michael Tschannen, Marvin Ritter, Xiaohua Z, Olivier B and Sylvain Gelly 

March 2019

Modern-day models can produce high quality, close to reality when fed with a vast quantity of labelled data. To solve this large data dependency, researchers from Google released this work, to demonstrate how one can benefit from recent work on self- and semi-supervised learning to outperform the state of the art on both unsupervised ImageNet synthesis, as well as in the conditional setting.

The proposed approach is able to match the sample quality of the current state-of-the-art conditional model BigGAN on ImageNet using only 10% of the labels and outperform it using 20% of the labels.

ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin G, Piyush Sharma and Radu S

September 2019

The authors present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT and to address the challenges posed by increasing model size and GPU/TPU memory limitations, longer training times, and unexpected model degradation

As a result, this proposed model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.

GauGANs-Semantic Image Synthesis with Spatially-Adaptive Normalization 

Taesung Park, Ming-Yu Liu, Ting-Chun Wang and Jun-Yan Zhu

November 2019

Nvidia in collaboration with UC Berkeley and MIT proposed a model which has a spatially-adaptive normalization layer for synthesizing photorealistic images given an input semantic layout.

This model retained visual fidelity and alignment with challenging input layouts while allowing the user to control both semantic and style.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM