Last updated October 7, 2021
In AI Mysteries

This New Framework By Amazon Bypasses The Need Of Human Labelled Data

Published on April 7, 2020
by Ambika Choudhury

Recently, Amazon AI, along with SenseTime Research and the Chinese University of Hong Kong, introduced a new framework which can leverage web data to train AI video recognition models. One strong point about this framework is that it overcomes the barriers between data formats for webly-supervised learning.

Representation learning has gained a lot of traction in image recognition and video classification over the past few years. However, developing AI models need large-scale human-labelled image datasets, which are both time-consuming and costly. According to the researchers, collecting these datasets is more difficult in the domain of trimmed video recognition, since most online videos contain numerous shots with multiple concepts.

Behind OmniSource

OmniSource is a unified framework for video classification which utilizes multiple sources of web data, including images, trimmed videos and untrimmed videos, simultaneously. To enhance data efficiency, the researchers proposed a task-driven data collection approach. This can be done by obtaining the topmost results using class labels to make the supervision most informative. The framework works under the semi-supervised setting, where both labelled and unlabelled data from the web co-exist.

The OmniSource framework consists mainly of three steps:-

To filter the noise, researchers used a teacher network to filter out samples with low confidence scores, and obtain pseudo labels for the remaining ones
One or more teacher networks are trained on the labelled dataset
For each source of data collected, the corresponding teacher network is applied to obtain pseudo-labels, and filter out irrelevant samples with low confidence scores
Different transforms are used to convert each type of web data. This includes images to the input format needed by the target task, such as video clips and training the student network.

Dataset Used

The researchers used Kinetics-400 dataset, which is one of the most extensive video datasets. In total, this dataset has around 240K, 19K, and 38K videos for training, validation and testing subset, respectively. They also used Youtube-car dataset, which is a fine-grained video recognition task containing 196 types of different cars and UCF101 dataset, which is a small scale video recognition dataset with 101 classes.

Features Of OmniSource

The features of OmniSource include several good practices, including data balancing, resampling, and cross-dataset mixup, which are adopted in joint training. According to the researchers, this framework is data-efficient in training.

The researchers claimed that with only 3.5M images and 800K minute videos crawled from the internet without human labelling, and less than 2% of prior works, the models learned with OmniSource improve Top-1 accuracy of 2D- and 3D-ConvNet baseline models by 3.0% and 3.9%, respectively, on the Kinetics-400 benchmark. With the help of this framework, one can also establish new records with different pre-training strategies for video recognition.

Contributions In This Project

Here are some of the contributions mentioned by the researchers of this project:-

OmniSource framework leverages a mixture of web data forms, including images, short videos and untrimmed videos into one student network
The researchers proposed several good practices to deal with problems during joint training on data from multiple sources, including source-target balancing, resampling and cross-dataset mixup.
The models trained by OmniSource achieve state-of-the-art performance on the Kinetics-400 benchmark for all pre-training strategies

Wrapping Up

The researchers proposed a unified framework for omni-sourced webly-supervised video recognition which exploits web data of various forms, such as images, trimmed videos, and untrimmed videos from multiple sources, such as search engine, social media, video sharing platform – all in an integrated way. Due to its data-efficient nature, this framework reduces the amount of data required to train a model. OmniSource achieves a Top-1 accuracy of 83.6%, establishing a new record and benchmarking Kinetics-400 dataset.

Read the paper here.

Access all our open Survey & Awards Nomination forms in one place >>

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Watch More

This New Framework By Amazon Bypasses The Need Of Human Labelled Data

Behind OmniSource

Dataset Used

Features Of OmniSource

Contributions In This Project

Wrapping Up

Ambika Choudhury

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.