Google Wants To Improve AI’s Multi-Tasking

Task Affinity Groupings method shows which tasks should be trained together in multi-task neural networks

Advertisement

Which tasks should be trained together in multi-task neural networks? Google AI has a new method called Task Affinity Groupings (TAG) that will answer this. In a multi-task learning method, information learnt by one task can also benefit the training of other tasks. This method by Google aims to measure inter-task affinity by training all tasks together in a single multi-task network. Then it finds out the degree to which one task’s gradient update on the model’s parameters will affect the loss of the other tasks in the network. This quantity is averaged across training, and then the tasks are grouped together to maximise the affinity for each task.

Why is Multi-task learning important?

The research said multi-task learning helps in improving modelling performance. It does so by:

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.
  • Introducing an inductive bias to prefer hypothesis classes that explain multiple objectives
  • Focuses on relevant features 

When tasks compete for model capacity or are unable to build a shared representation that can generalise to all objectives, it may cause degraded performance. So, it becomes important to find groups of tasks that get benefitted from co-training.

But, as per the research, experience and intuition drive a human being’s understanding of similarity. Also, co-training’s benefit or loss depends on other non-trivial decisions like dataset characteristics, model architecture, hyperparameters, capacity, convergence, etc. This makes it crucial to find a technique to which tasks should train together in a multi-task neural network.

MAML as inspiration

The researchers were inspired by meta-learning for this method. One such meta-learning algorithm, Model-Agnostic Meta-Learning (MAML), first applies a gradient update to the models’ parameters for a collection of tasks. After that, it updates its original set of parameters. This minimises the loss for a subset of tasks in that collection that was computed at the updated parameter values. MAML effectively trains the model to learn representations that will not minimise the loss for its current set of weights but for the weights after one or more training steps.

What is TAG exactly?

TAG follows a similar model to MAML. Here is what it does:

  • Updates the model’s parameters with only one single task in focus
  • Observes how this change would affect the other tasks in the multi-task neural network
  • Undoes the update
  • Repeats the process for every other task to collect information on how each task in the network would interact with any other task
  • Updates the model’s shared parameters with respect to every task in the network

This reveals that certain tasks consistently exhibit beneficial relationships and others are antagonistic towards each other. Then as per the research, “A network selection algorithm can leverage this data in order to group tasks together that maximise inter-task affinity, subject to a practitioner’s choice of how many multi-task networks can be used during inference.”

Image: Google (Overview of TAG. First, tasks are trained together in the same network while computing inter-task affinities. Second, the network selection algorithm finds task groupings that maximise inter-task affinity. Third, the resulting multi-task networks are trained and deployed)

What did Google find out?

The researchers found out that TAG can select very strong task groupings. It worked on the CelebA and Taskonomy datasets. TAG showed it was competitive with the prior state-of-the-art while operating between 32x and 11.5x faster, respectively. The speedup on the Taskonomy dataset translated to 2,008 fewer Tesla V100 GPU hours to find task groupings.

The research shows that the empirical findings indicate the approach taken is highly competitive. It added that this outperforms multi-task training augmentations like Uncertainty Weights, GradNorm, and PCGrad. It performs competitively with grouping methods like HOA while showing improvement in computational efficiency by over an order of magnitude. 

The research showed that inter-task affinity scores can find close to optimal auxiliary tasks and implicitly measure generalisation capability among tasks.

Challenges

Though identifying task groupings in multi-task learning can have a big impact in saving time and computational resources, there are risks too with this. Inter-task affinities can be interpreted as “task similarity” by mistake. This can create an association and/or causation relationship among tasks with high mutual inter-task affinity scores. It can be a problem for datasets that contain sensitive prediction quantities related to race, gender, religion, age, status, physical traits, etc. Inter-task affinities, by mistake, can be used to support an unfounded conclusion that attempts to posit similarity among tasks. But acknowledging the risks is a good move. It reduces the chances of abuse.

More Great AIM Stories

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MORE FROM AIM
Amit Raja Naik
Oh boy, is JP Morgan wrong?

The global brokerage firm has downgraded Tata Consultancy Services, HCL Technology, Wipro, and L&T Technology to ‘underweight’ from ‘neutral’ and slashed its target price by 15-21 per cent.