Advertisement

Active Hackathon

Guide To FreeSound Datasets With Implementation In PyTorch

The FreeSound is a hierarchical collection of sound classes of more than 600 and has filled them with the audio samples of 297,144.

The FreeSound is a hierarchical collection of sound classes of more than 600 and has filled them with the audio samples of  297,144. The process is generating  685,403 candidate annotations that express the potential presence of sound sources in audio clips. FreeSound Dataset includes the outcome of everyday sounds, from human and animal sounds to music and sounds made by things.

Freesound is developed by Music Technology Research Group, Pompeu Fabra University, Barcelona.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

To download the free sound dataset for the research project, refer to the following.

Datasets: https://github.com/MTG/freesound-datasets.

Research/Publication:https://www.upf.edu/web/mtg/research/publications.

FreeSound Github:https://github.com/MTG/freesound.

They collect data for the following 

  1. Artistic creation

2. Cultural Preservation

3. Education

4. Health and Well Being

5. Sustainable development

The Music Technology Group is organized into four labs, each one led by a faculty member.

  1. Audio Signal Processing Lab: Faculty, Head of the lab  Xavier Serra.

The lab is concentrated on to advance within the understanding of sound and music signals by combining signal processing and machine learning methods.

2. Music Information Research Lab: Emilia Gomez, Head of the lab.

The lab works on topics like sound and music description, music information retrieval, vocalization synthesis, audio source separation, music, and audio processing. 

3. Music and Multimodal Interaction Lab: Sergi Jorda, Head of the lab.

The lab focuses on multimodal interactive technologies and the way to use them for music creation.

4. Music and Machine Learning Lab: Rafael Ramírez, Faculty, Head of the lab.

The lab is concentrated on the intersection of music technology, AI, deep learning, and neuroscience with their applications.

Download Size: 20 GB

DataLoader:

Using Pytorch:
 import sys, os
 import torch
 import librosa
 import numpy as np
 import pandas as pd
 from torch import Tensor
 from scipy.io import wavfile
 from torchvision import transforms
 from torch.utils.data import DataLoader
 from torch.utils.data.dataset import Dataset
 class Freesound(Dataset):
     def __init__(self, transform=None, mode="train"):
         # setting directories for data
         data_root = "../input"
         self.mode = mode
         if self.mode is "train":
             self.data_dir = os.path.join(data_root, "audio_train")
             self.csv_file = pd.read_csv(os.path.join(data_root, "train.csv"))
         elif self.mode is "test":
             self.data_dir = os.path.join(data_root, "audio_test")
             self.csv_file = pd.read_csv(os.path.join(data_root, "sample_submission.csv"))
         # dict for mapping class names into indices. can be obtained by 
         # {cls_name:i for i, cls_name in enumerate(csv_file["label"].unique())}
         self.classes = {'Acoustic_guitar': 38, 'Applause': 37, 'Bark': 19, 'Bass_drum': 21, 
 'Burping_or_eructation': 28, 'Bus': 22, 'Cello': 4, 'Chime': 20, 'Clarinet': 7,'Computer_keyboard': 8, 'Cough': 17, 'Cowbell': 33, 'Double_bass': 29, 'Drawer_open_or_close': 36, 'Electric_piano': 34, 'Fart': 14, 'Finger_snapping': 40, 'Fireworks': 31, 'Flute': 16, 'Glockenspiel': 3, 'Gong': 26, 'Gunshot_or_gunfire': 6, 'Harmonica': 25, 'Hi-hat': 0, 'Keys_jangling': 9, 'Knock': 5, 'Laughter': 12, 'Meow': 35, 'Microwave_oven': 27, 'Oboe': 15, 'Saxophone': 1, 'Scissors': 24, 'Shatter': 30, 'Snare_drum': 10, 'Squeak': 23, 'Tambourine': 32, 'Tearing': 13, 'Telephone': 18, 'Trumpet': 2, 'Violin_or_fiddle': 39,  'Writing': 11}
         self.transform = transform
     def __len__(self):
         return self.csv_file.shape[0] 
     def __getitem__(self, idx):
         filename = self.csv_file["fname"][idx]
         rate, data = wavfile.read(os.path.join(self.data_dir, filename))
         if self.transform is not None:
             data = self.transform(data)
         if self.mode is "train":
             label = self.classes[self.csv_file["label"][idx]]
             return data, label
         elif self.mode is "test":
             return data
 if __name__ == '__main__':
     import matplotlib.pyplot as plt
     tsfm = transforms.Compose([
         lambda x: x.astype(np.float32) / np.max(x), # rescale to -1 to 1
         lambda x: librosa.feature.mfcc(x, sr=44100, n_mfcc=40), # MFCC 
         lambda x: Tensor(x)
         ])
     # todo: multiprocessing, padding data
     dataloader = DataLoader(
         Freesound(transform=tsfm, mode="train"), 
         batch_size=1,
         shuffle=True, 
         num_workers=0)
     for index, (data, label) in enumerate(dataloader):
         print(label.numpy())
         print(data.shape)
         plt.imshow(data.numpy()[0, :, :])
         plt.show()
         if index == 0:
             Break 

Application:

  1. Audio Tagging System

Audio tagging is a technique to update the meta-data fields in MP3 and other compressed audio files. An audio tag detector is used to correct the meta-data in individual files or to apply a category to a group of files.

  1. Emotion and theme recognition

It involves the prediction of moods and themes conveyed by a music track, given the raw audio.

  1. Automatic assessment system for musical exercises

Music Critic is employed to gauge musical exercises sung by students, to allow meaningful feedback. It is often easily integrated into online applications and education platforms.

  1. Animal Sound Recognition:

The ability to automatically recognize a large range of animal sounds can analyze the habits and distributions of animals, which makes it possible to watch and protect them effectively.

Conclusion:

We have learned about the Freesound dataset, how we can download it from the source. Freesound dataset creator and their researcher. Implementation of model in PyTorch data loader for speaker audio tagging Recognition and some of the application of FreeSound Datasets.

More Great AIM Stories

Amit Singh
Amit Singh is Data Scientist, graduated in Computer Science and Engineering. Data Science writer at Analytics India Magazine.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

Council Post: How to Evolve with Changing Workforce

The demand for digital roles is growing rapidly, and scouting for talent is becoming more and more difficult. If organisations do not change their ways to adapt and alter their strategy, it could have a significant business impact.

All Tech Giants: On your Mark, Get Set – Slow!

In September 2021, the FTC published a report on M&As of five top companies in the US that have escaped the antitrust laws. These were Alphabet/Google, Amazon, Apple, Facebook, and Microsoft.

The Digital Transformation Journey of Vedanta

In the current digital ecosystem, the evolving technologies can be seen both as an opportunity to gain new insights as well as a disruption by others, says Vineet Jaiswal, chief digital and technology officer at Vedanta Resources Limited

BlenderBot — Public, Yet Not Too Public

As a footnote, Meta cites access will be granted to academic researchers and people affiliated to government organisations, civil society groups, academia and global industry research labs.