Why Are Apple’s Researchers Interested In Neural Network Subspaces?

Neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost.

Researchers have consistently demonstrated the positive correlation between the depth of a neural network and its capability to achieve high accuracy solutions. However, quantifying such an advantage still eludes the researchers, as it is still unclear how many layers one would need to make a certain prediction accurately. So, there is always a risk of incorporating complicated DNN architectures that exhaust the computational resources and make the whole training process a costly affair. Hence, removing neural network redundancy in whichever way possible led researchers to probe the abysses of neural network topology– subspaces.

Interest in neural network subspaces has been prevalent for over a couple of decades now, but their significance has become more obvious with the increasing size of deep neural networks. Apple’s machine learning team, especially, has recently showcased their work on neural network subspace at this year’s ICML conference. In this work, the researchers showed how neural network subspaces contain diverse solutions that can be used to train neural networks without the training cost independently.

To understand where subspaces come into the picture, we can consider a neural network with many inputs and few nodes in the hidden layer, which is the dot product of inputs, say “X”, and weights, say “W”, which is a vector. The output or prediction of the network, say “Y”, is nothing but an activation function performed over the addition of the dot products ((Y = WX) of weights and inputs. 

(Image credits: Google Cloud Blog)

In the early 90s, researchers from Carnegie Mellon University posited that the success of neural networks imply that there exists (somewhere) a relevant subspace of low dimension. An implication of layered neural network structure is that the output depends only on X and W. Even though the inputs lie in high-dimensional input space, the researchers stated that there is some low-dimensional “relevant subspace”. Since optimising a neural network is all about finding local minima in an objective landscape, understanding the geometric properties of this landscape (subspaces) has emerged as an important goal.

According to Apple researchers, independently trained models are connected by a curve in weight space along which loss remains low. Thus, a better understanding of the geometry of the local minima landscape can enable better convergence (or accuracy) of training procedures in deep learning.

Image credits: Apple ML Research

According to researchers at Apple, if two models are trained independently, the line that connects them in weight space doesn’t contain high-accuracy solutions. To address this, researchers have come up with a new training method that allows weight space to have high accuracy solutions. Using this method on the CIFAR-10 dataset, the researchers were able to triangulate accurate solutions within three endpoints. Training subspace is different from traditional training methods, and the results prove the same. “Training a subspace and taking the model in the centre of the subspace allows us to exceed the accuracy of an independently trained model,” claimed the researchers.

The subspace training method can be summarised as follows:

  • A subspace (weight space region) of neural networks is parameterised.
  • Each point in a subspace represents the weights of a neural network. 
  • Connecting weight regions of different neural networks gives dimensionality. For example, in the case of two endpoints, it is a line, and a triangle in the case of three endpoints, as shown above. 
  • Perform standard forward pass and backward pass.

The researchers in their work have even explored larger subspaces beyond one dimension. 

Key Takeaways

In this work, the researchers departed from the traditional methods such as Stochastic Weight Averaging and have tried to leverage subspaces to train more accurate models. They believe their work on subspaces will lead to advancements that can help produce better and more reliable machine learning models for users. Learning from neural network subspaces can have two advantages over traditional training:

  • Helps figure out what weights contribute to high accuracy solutions.
  • Training subspaces can lead to robustness, such as avoiding inaccurate labelling, which fits well into the current data-centric movement.

To know more about the mathematics of subspaces, check here.

Download our Mobile App

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.