Active Hackathon

Can We Speed Up Matrix Multiplication?

Matrix multiplication is among the most fundamental and compute-intensive operations in machine learning.

“Matrix multiplication is among the most fundamental and compute-intensive operations in machine learning.”

The fundamental operations of any typical neural network can be reduced to a bunch of addition and multiplication operations. Neural networks can be expressed in terms of matrices. Matrix multiplication is one of the most important mathematical operations when it comes to deep neural networks. Be it a convolution operation of a CNN to recognise images or a language model to perform sentiment analysis; these basic arithmetic operations play a huge role. Modern large data sets, too, are often expressed as large matrices. For instance, in the bag-of-words model, each row of the matrix corresponds to one document. Whereas, in the case of image analysis, feature values are represented by a matrix with rows corresponding to images. Even Google TPUs are defined in terms of matrices.TPUs have Tensors, which are multi-dimensional arrays or matrices. Basic math operations performed on tensors, including addition, element-wise multiplication, and matrix multiplication, are inseparable from the world of deep learning.

As matrix multiplication is one of the fundamental processes of a deep neural network, any chance of speeding up this process can cut down long training times, which the DNNs are usually blamed for. Previous works targeted to speed up deep learning have developed high-speed matrix multiplication libraries, designed custom hardware to accelerate matrix multiplication and designed efficient Approximate Matrix Multiplication (AMM) algorithms under various assumptions. AMM, according to Ye et al., makes matrix computation suitable for large-scale datasets. 


Sign up for your weekly dose of what's up in emerging technology.

Recently, researchers Davis Blalock and John Guttag introduced a learning-based algorithm for this task that greatly outperforms existing methods. This method departs from the existing AMM approach as it relies on hashing and table lookups rather than multiply-add operations.

Existing methods try to minimise computation time by approximating linear operations. This challenge usually arises in machine learning when one has a data matrix, say A, whose rows are samples, and a linear operator, say B, which could also be a linear classifier, regressor, or embedding matrix. This new method by Blalock and Guttag, known as MADDNESS, uses a nonlinear preprocessing function and reduces the problem to table lookups. 

If applying a trained linear model to new data leads to knowledge of the linear operator, the MADDNESS algorithm does not require any multiply-add operations. “Our method is most closely related to vector quantisation methods used for similarity search. However, instead of using an expensive quantisation function that requires many multiply-adds, we introduce a family of quantisation functions that require no multiply-adds,” said the researchers in their recent work, which was presented at the ICML 2021 conference. The quantisation method, which is key to this algorithm, refers to a classic vector quantisation algorithm used for approximating inner products and Euclidean.

(via Paper by Blalock et al.,)

To assess MADDNESS’s effectiveness, the researchers implemented their method and existing algorithms in C++ and Python using a single thread on a Macbook Pro with a 2.6GHz Intel Core i7-4960HQ processor. Regarding the use of CPUs instead of GPUs, the authors stated that trained ML models are commonly deployed on billions of smartphones with just CPUs. This simulated a real-world scenario while also making it easy for researchers in the future to benchmark their work without needing an expensive GPU setup.

On validating the MADDNESS method on hundreds of matrices from diverse domains, Blalock and Guttag observed that their method “often” runs 100× faster than exact matrix products and 10× faster than current approximate methods.

That said, the authors also admit that their method will fall short when applied on state-of-the-art algorithms as convolution operations for SOTA models exploit structure that is not available to general matrix multiply algorithms. To match the performance, modifications have to be done to the MADDNESS approach. However, Blalock and Guttag believe that their method can inspire other research that can eventually lead to the acceleration of convolution operations and other workloads bottlenecked by linear transforms.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
How Data Science Can Help Overcome The Global Chip Shortage

China-Taiwan standoff might increase Global chip shortage

After Nancy Pelosi’s visit to Taiwan, Chinese aircraft are violating Taiwan’s airspace. The escalation made TSMC’s chairman go public and threaten the world with consequences. Can this move by China fuel a global chip shortage?

Another bill bites the dust

The Bill had faced heavy criticism from different stakeholders -citizens, tech firms, political parties since its inception

So long, Spotify

‘TikTok Music’ is set to take over the online streaming space, but there exists an app that has silently established itself in the Indian market.