MITB Banner

Guide To Google’s AudioSet Datasets With Implementation in PyTorch

AudioSet Dataset is developed by the Google Sound and Video Understanding team. The core member of the AudioSet Dataset is Jort Florent Gemmeke.

Share

AudioSet Dataset is developed by the Google Sound and Video Understanding team. The core member of the AudioSet Dataset is Jort Florent Gemmeke, Daniel P.W.Ellis, Dylan, Aren, Manoj Plakal, Marwin Ritter, Shawn Hershey, and two more members of the team. There are other twelve contributors to AudioSet DataSet who help to build a pipeline for the data storage in the form Youtube_url Id, start_time, end_time, and other classes. AudioSet Dataset has more than 600 classes of annotated sound, 6000 hours of audio, and 2,084,320 million YouTube videos annotated videos and containing 527 labels. Each video has a 10 sec sounds clip extracted from Youtube Videos in different classes for the training and testing dataset.

AudioSet Ontology is the collection of sound in hierarchical and organized. It covers a wide range of sounds, from the human voice to pets animals to natural sounds. Below the hierarchical 

Table of ontology to select which type of Dataset you require to develop your model or for research purposes.

Research Paper: https://research.google/pubs/pub45857/

Github: https://github.com/audioset/ontology

Download:https://research.google.com/audioset/download.html

About DataSet:

It is available in two formats to use for a research purpose:

  1. Dataset is available as a CSV file.YouTube video  url_id, start time, end time, and other more labels.
  2. Other formats are recorded as VGG-like models as TensorFlow Record files.

Visit here:https://github.com/tensorflow/models/tree/master/research/audioset

Dataset is split into three disjoint sets each is present in CSV Format:

  1. Evaluation 
  2. Balanced Train
  3. Unbalanced Train

How we can use ontology to get the right dataset and help us to find the right data for the training deep learning model.

We have to build the Horn Detection of Train.

Just visit the ontology page of AudioSet Dataset and select Sounds of things to the in the given link. https://research.google.com/audioset/////ontology/index.html . As shown in the image.

When you select the vehicle section, then you get a different type of transportation system such as Waterway, Airways, Roadways, and Railways transportation. We have to select Rail transport as shown in the image below.

After selecting the Rail Transportation, you will get different types of train such as train wagon, 

Railroad car, subway, metro, and Train. We have to choose the Train selection as shown in the image.

After choosing the train selection, we got two types of trains sound the first one is the train whistle, and another one is the train horn. Choose Train horn as shown in the image below.

After selecting the train horn, it comes to our required dataset, which is best suitable for us to build horn detection. Choose this and go furthermore one step to know about the Dataset. Select, as shown in the figure below.

After selecting the train horn now, you will get the overall details of the Dataset and their number of videos available, and the total number of each part of the dataset. Hour duration is sufficient to train the model.

We learned how to select the dataset using AudioSet Ontology if you have any query reach out to audioset-users@googlegroups.com.

Using Pytorch:

 import pickle
 import random
 import torchaudio
 import numpy as np
 import pandas as pd
 from torch.utils.data import DataLoader
 from torch.utils.data.dataset import Dataset
 Data = pd.read_csv(“filepath”) ## Downloaded dataset locally. 

For implementation in PyTorch:

Github: https://github.com/qiuqiangkong/audioset_classification

Application:

1.Building Horn detection:

Horn detection is built using a neural network for the detection of train type. It helps people to identify the train motion and their speed on a railway crossing.

2.Bird Sound Detection.

Detect the bird’s sound and check their emotion and feeling based on it. Birds are singing or calling other birds angry or something. Also, use their song feelings.

Conclusion:

We have learned about the AudioSet dataset, how we can download it from the source. In different file formats,  AudioSet dataset creator and their researcher.AudioSet Ontology uses the case to choose the right dataset and  Implementation of model in PyTorch.Much other real application is used in daily life using AudioSet  Datasets.

To participate in the competition: https://www.kaggle.com/c/birdsong-recognition

Share
Picture of Amit Singh

Amit Singh

Amit Singh is Data Scientist, graduated in Computer Science and Engineering. Data Science writer at Analytics India Magazine.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India