A Ten-Year-Old ML Paper That Is So Influential Today: NeurIPS Test Of Time Award Winner 2021

All four judges unanimously selected Online Learning for Latent Dirichlet Allocation as the winner
test of time

One of the leading annual conferences, NeurIPS, or Neural Information Processing System, held between December 6 and December 14 this year, recognised a decade-old paper ‘Online Learning for Latent Dirichlet Allocation’ for a test of time award. 

This year, a total of 9,122 papers were submitted in the 35th edition of this conference, and 2,344 papers were accepted. The rate of acceptance stood at 26 percent. It has been the highest since 2013. 

The top three contributing companies were Google (177 papers accepted), Microsoft (116), and DeepMind (81). MIT (142 papers accepted), Stanford University (139), and Carnegie Mellon University (117), on the other hand, were some of the top academic institutions that led the way at NeurIPS. 

Paper that stood the test of time

Among the large number of papers accepted, six were announced as outstanding submissions. Two papers were awarded the newly formulated datasets and benchmarks best paper awards. The unique test of time award was handed out ‘Online Learning for Latent Dirichlet Allocation’, published in 2010 and authored by Matthew Hoffman, David Blei, and Francis Bach; Princeton University and INRIA.

The  test of time award is given to a paper published at the NeurIPS conference ten years ago, which remains influential even today. For this year’s award, the panel of judges ranked all the NeurIPS 2010 papers according to citation count. 

In this category, 16 other papers were competing, and the cutoff threshold was 500 citations. The judges unanimously selected ‘Online Learning for Latent Dirichlet Allocation’ as the test of time award winner of this year’s NeurIPS

In Brief 

Hierarchical Bayesian modelling is a statistical model that uses the Bayesian method to estimate the parameters of the posterior distribution. As the name suggests, it is written in multiple steps or hierarchical form and has become an important technique in machine learning and applied statistics. Bayesian models encode assumptions about observed data, and examine the posterior distribution of model parameters and latent variables conditioned on a set of observations. 

Topic modelling is a technique to find themes of interest from a set of review data. It allows search engines to focus on the most important topics in a document. 

For functions like topic modelling, the posterior distribution gives a latent semantic structure that is used in many applications. However, computing posterior is a major challenge, and researchers generally rely on approximate posterior inference.

Approximate posterior inference algorithms can be broadly divided into two main categories – sampling approaches and optimisation approaches. The sampling approaches are generally based on Markov Chain Monte Carlo (MCMC) sampling. On the other hand, the optimisation approaches are based on variational inference, also called Variational Bayes (VB). 

This VB approach is also used in the Bayesian hierarchical model. While the MCMC methods generate independent samples from the posterior, VB seeks to optimise a simplified parametric distribution to be in Kullback-Leibler divergence to the posterior. 

Between the two approaches, VB is proven to be faster and as accurate as MCMC, making it an attractive option for applying Bayesian models to large datasets.

That said, large scale data analysis with VB is computationally difficult. VB algorithms iterate between analysing each observation and updating dataset-wide variational parameters. The batch algorithm can become very impractical for large datasets, particularly true for topic modelling applications. Topic modelling summarises the latent structure of massive document collections that cannot be annotated by hand. The main challenge here is to efficiently fit models to larger corpora.

To deal with this, the authors of this paper developed an online variational Bayes algorithm for Latent Dirichlet Allocation (LDA). It is one of the simplest topic models. The algorithm so developed is based on online stochastic optimisation that produces good parameter estimates. This method is considerably faster than batch algorithms on large datasets.

Online LDA can analyse massive collections of documents. It need not locally store or collect documents – they can arrive in a stream and be discarded after one look. The researchers were able to show that it converges to a stationary point of the variational objective function. The team was also able to show that LDA is on par, if not better than VB when finding topic models. 

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Yugesh Verma
All you need to know about Bayesian marketing mix modeling

Traditional Market Mix Models are not much eligible to equip the hard data with prior knowledge. The simple models are defined with the parameters which are independent of each other. Bayesian Market Mix Models can be eligible to deal with such hard data.

Sreejani Bhattacharyya
The Winning Papers At NeurIPS 2021

Let us take a look at the recipients of the 2021 Outstanding Paper Awards, the Test of Time Award, and the new Datasets and Benchmarks Track Best Paper Awards.

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM