Researchers have consistently demonstrated the positive correlation between the depth of a neural network and its capability to achieve high accuracy solutions. However, quantifying such an advantage still eludes the researchers, as it is still unclear how many layers one would need to make a certain prediction accurately. So, there is always a risk of incorporating complicated DNN architectures that exhaust the computational resources and make the whole training process a costly affair. Hence, removing neural network redundancy in whichever way possible led researchers to probe the abysses of neural network topology– subspaces.
Interest in neural network subspaces has been prevalent for over a couple of decades now, but their significance has become more obvious with the increasing size of deep neural networks. Apple’s machine learning team, especially, has recently showcased their work on neural network subspace at this year’s ICML conference. In this work, the researchers showed how neural network subspaces contain diverse solutions that can be used to train neural networks without the training cost independently.
To understand where subspaces come into the picture, we can consider a neural network with many inputs and few nodes in the hidden layer, which is the dot product of inputs, say “X”, and weights, say “W”, which is a vector. The output or prediction of the network, say “Y”, is nothing but an activation function performed over the addition of the dot products ((Y = WX) of weights and inputs.
In the early 90s, researchers from Carnegie Mellon University posited that the success of neural networks imply that there exists (somewhere) a relevant subspace of low dimension. An implication of layered neural network structure is that the output depends only on X and W. Even though the inputs lie in high-dimensional input space, the researchers stated that there is some low-dimensional “relevant subspace”. Since optimising a neural network is all about finding local minima in an objective landscape, understanding the geometric properties of this landscape (subspaces) has emerged as an important goal.
According to Apple researchers, independently trained models are connected by a curve in weight space along which loss remains low. Thus, a better understanding of the geometry of the local minima landscape can enable better convergence (or accuracy) of training procedures in deep learning.
According to researchers at Apple, if two models are trained independently, the line that connects them in weight space doesn’t contain high-accuracy solutions. To address this, researchers have come up with a new training method that allows weight space to have high accuracy solutions. Using this method on the CIFAR-10 dataset, the researchers were able to triangulate accurate solutions within three endpoints. Training subspace is different from traditional training methods, and the results prove the same. “Training a subspace and taking the model in the centre of the subspace allows us to exceed the accuracy of an independently trained model,” claimed the researchers.
The subspace training method can be summarised as follows:
- A subspace (weight space region) of neural networks is parameterised.
- Each point in a subspace represents the weights of a neural network.
- Connecting weight regions of different neural networks gives dimensionality. For example, in the case of two endpoints, it is a line, and a triangle in the case of three endpoints, as shown above.
- Perform standard forward pass and backward pass.
The researchers in their work have even explored larger subspaces beyond one dimension.
Key Takeaways
In this work, the researchers departed from the traditional methods such as Stochastic Weight Averaging and have tried to leverage subspaces to train more accurate models. They believe their work on subspaces will lead to advancements that can help produce better and more reliable machine learning models for users. Learning from neural network subspaces can have two advantages over traditional training:
- Helps figure out what weights contribute to high accuracy solutions.
- Training subspaces can lead to robustness, such as avoiding inaccurate labelling, which fits well into the current data-centric movement.
To know more about the mathematics of subspaces, check here.