HOGWILD! Wild as it sounds, the paper that goes by the same name was supposed to be an art project by Christopher Re, an associate professor at Stanford AI Lab, and his peers. Little did they know that the paper would change the way we do machine learning. Ten years later, it even bagged the prestigious “Test of Time” award at the latest NeurIPS conference.
To identify the most impactful paper in the past decade, the conference organisers selected a list of 12 papers published at NeurIPS over the years — NeurIPS 2009, NeurIPS 2010, NeurIPS 2011 — with the highest numbers of citations since their publication. They also collected data about the recent citations counts for each of these papers by aggregating citations that these papers received in the past two years at NeurIPS, ICML and ICLR. The organisers then asked the whole senior program committee with 64 SACs to vote on up to three of these papers to help in picking an impactful paper.
Most of the machine learning is about finding the right kind of variables for converging towards reasonable predictions. Hogwild! is a method that helps in finding those variables very efficiently. “The reason it had such a crazy name, to begin with, was it was intentionally a crazy idea,” said Re in an interview for Stanford AI.
With its small memory footprint, robustness against noise, and rapid learning rates, Stochastic Gradient Descent (SGD) has proved to be well suited to data-intensive machine learning tasks. However, SGD’s scalability is limited by its inherently sequential nature; it is difficult to parallelise. A decade ago, when the hardware was still playing catch up with the algorithms, the key objective for scalable data analysis, on vast data, is to minimise the overhead caused due to locking. Back then, when parallelisation of SGD was proposed, there was no way around memory locking, which deteriorated the performance. Memory locking was essential to reduce latency for between processes.
Re and his colleagues demonstrated that this work aims to show using novel theoretical analysis, algorithms, and implementation that stochastic gradient descent can be implemented without any locking.
In Hogwild!, the authors made the processors have equal access to shared memory and were able to update individual components of memory at will. The risk here is that a lock-free scheme can fail as processors could overwrite each other’s progress. “However, when the data access is sparse, meaning that individual SGD steps only to modify a small part of the decision variable, we show that memory overwrites are rare and that they introduce barely any error into the computation when they do occur,” explained the authors.
When asked about the weird exclamation point at the end of the already weird name “I thought the phrase “going hog-wild” was hysterical to describe what we were trying. So I thought an exclamation point would just make it better,” quipped Re.
In spite of being honoured with being a catalyst behind driving ML revolution, Re believes that this change would have happened with or without their paper. What really stands out, according to him, is that an “odd-ball”, goofy sounding research is recognised even after a decade. This is a testimony to an old adage — there is no such thing as a bad idea!
Find the original paper here.
Here are the “test of time” award winners in the past: