Yann Lecun and team introduce an efficient method for training deep networks with unitary matrices

Researchers from MIT and Facebook AI have introduced projUNN, an efficient method for training deep networks with unitary matrices.

Share

Published on April 5, 2022

by Shraddha Goled

Researchers from MIT and Facebook AI have introduced projUNN, an efficient method for training deep networks with unitary matrices. In their paper, the authors, which also include Yann Lecun, introduce two variants – Direct (projUNN-D) and Tangent (projUNN-T) to parameterise full N-dimensional unitary or orthogonal matrices with training runtime scaling as O(kN2).

New paper: "projUNN: efficient method for training deep networks with unitary matrices"
by Bobak Kiani (MIT), Randall Balestriero (FAIR), Yann LeCun (FAIR/NYU), Seth Lloyd (MIT).https://t.co/cvNVVsH3lt
— Yann LeCun (@ylecun) March 29, 2022

Vanishing and exploding gradient problem

In cases where networks are deep, or the inputs are long sequences of data, learning in neural networks can become unstable. For example, in recurrent neural networks, the recurrent states are evolved through the repeated application of a linear transformation which is followed by pointwise nonlinearity, which can become unstable when the eigenvalues of the linear transformations are not of a unitary value. One can avoid this by using unitary matrices; they are usually used to overcome the vanishing and exploding gradients problem.

For the uninitiated, a gradient is a derivative of the loss function with respect to the weights. It is used to update the weights to minimise the loss function during backpropagation in neural networks. A vanishing gradient occurs when the derivative or the slope steadily gets smaller as we go backwards with every layer. When the weight update is exponentially small, the training time takes too much time and, in the worst case, may completely stop the neural network training. On the other hand, exploding gradients occur when the slop gets larger with every layer during backpropagation (opposite to what happens with vanishing gradients). Due to high weights, the gradient will never converge, resulting in it oscillating around the minima without really coming to a global minima point.

Unitary neural networks were initially developed to address the problem of vanishing and exploding gradients in RNNs while learning information in a long sequence of data, more efficiently than the existing parameterisation such as LSTM. In the previous studies, unitarity is maintained by constructing a series of parameterised unitary transformations. One of such popular methods is the efficient unitary recurrent neural network (EUNN) that parameterised unitary matrices by composing unitary transformations like Given rotations and Fourier transformers.

While employing unitary matrices in each layer is effective, maintaining long-range stability by restricting network parameters to be strictly unitary comes at the cost of expensive parameterisation of increased training runtime.

What is projUNN

RNNs are ‘notoriously difficult’ to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute 1, optimisation becomes difficult, especially when trying to learn long-term dependencies.

In the RNN setting, the earlier algorithms in applying n*n unitary matrices have parameterised matrices into layers of unitary/orthogonal transformations. In the layer-wise setting, unitary is enforced on all the values of the parameters, but many layers are required to form a composition that can recreate any desired unitary.

Credit: projUNN

The authors of the current study propose projUNN, where matrices are updated directly via gradient-based optimisation and projected back to the closest unitary (projUNN-D) or transported in the direction of the gradient (projUNN-T). The authors claimed that projUNN is especially effective in the extreme case where gradients are approximated by rank-one matrices. With RNN, projUNN matches or exceeds existing benchmarks for the state of the art unitary neural network algorithms.

“Our PROJUNN shows that one need not sacrifice performance or runtime in training unitary neural network architectures,” the authors wrote. They also claimed that the results take advantage of the approximate low-rank structure of parameter gradients to perform updates at almost optimal runtime.

Read the full paper here.

Access all our open Survey & Awards Nomination forms in one place

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.