Now Reading
Guide To FreeSound Datasets With Implementation In PyTorch

Guide To FreeSound Datasets With Implementation In PyTorch

The FreeSound is a hierarchical collection of sound classes of more than 600 and has filled them with the audio samples of  297,144. The process is generating  685,403 candidate annotations that express the potential presence of sound sources in audio clips. FreeSound Dataset includes the outcome of everyday sounds, from human and animal sounds to music and sounds made by things.

Freesound is developed by Music Technology Research Group, Pompeu Fabra University, Barcelona.

To download the free sound dataset for the research project, refer to the following.



FreeSound Github:

They collect data for the following 

  1. Artistic creation

2. Cultural Preservation

3. Education

4. Health and Well Being

5. Sustainable development

The Music Technology Group is organized into four labs, each one led by a faculty member.

  1. Audio Signal Processing Lab: Faculty, Head of the lab  Xavier Serra.

The lab is concentrated on to advance within the understanding of sound and music signals by combining signal processing and machine learning methods.

2. Music Information Research Lab: Emilia Gomez, Head of the lab.

The lab works on topics like sound and music description, music information retrieval, vocalization synthesis, audio source separation, music, and audio processing. 

3. Music and Multimodal Interaction Lab: Sergi Jorda, Head of the lab.

The lab focuses on multimodal interactive technologies and the way to use them for music creation.

4. Music and Machine Learning Lab: Rafael Ramírez, Faculty, Head of the lab.

The lab is concentrated on the intersection of music technology, AI, deep learning, and neuroscience with their applications.

See Also

Download Size: 20 GB


Using Pytorch:
 import sys, os
 import torch
 import librosa
 import numpy as np
 import pandas as pd
 from torch import Tensor
 from import wavfile
 from torchvision import transforms
 from import DataLoader
 from import Dataset
 class Freesound(Dataset):
     def __init__(self, transform=None, mode="train"):
         # setting directories for data
         data_root = "../input"
         self.mode = mode
         if self.mode is "train":
             self.data_dir = os.path.join(data_root, "audio_train")
             self.csv_file = pd.read_csv(os.path.join(data_root, "train.csv"))
         elif self.mode is "test":
             self.data_dir = os.path.join(data_root, "audio_test")
             self.csv_file = pd.read_csv(os.path.join(data_root, "sample_submission.csv"))
         # dict for mapping class names into indices. can be obtained by 
         # {cls_name:i for i, cls_name in enumerate(csv_file["label"].unique())}
         self.classes = {'Acoustic_guitar': 38, 'Applause': 37, 'Bark': 19, 'Bass_drum': 21, 
 'Burping_or_eructation': 28, 'Bus': 22, 'Cello': 4, 'Chime': 20, 'Clarinet': 7,'Computer_keyboard': 8, 'Cough': 17, 'Cowbell': 33, 'Double_bass': 29, 'Drawer_open_or_close': 36, 'Electric_piano': 34, 'Fart': 14, 'Finger_snapping': 40, 'Fireworks': 31, 'Flute': 16, 'Glockenspiel': 3, 'Gong': 26, 'Gunshot_or_gunfire': 6, 'Harmonica': 25, 'Hi-hat': 0, 'Keys_jangling': 9, 'Knock': 5, 'Laughter': 12, 'Meow': 35, 'Microwave_oven': 27, 'Oboe': 15, 'Saxophone': 1, 'Scissors': 24, 'Shatter': 30, 'Snare_drum': 10, 'Squeak': 23, 'Tambourine': 32, 'Tearing': 13, 'Telephone': 18, 'Trumpet': 2, 'Violin_or_fiddle': 39,  'Writing': 11}
         self.transform = transform
     def __len__(self):
         return self.csv_file.shape[0] 
     def __getitem__(self, idx):
         filename = self.csv_file["fname"][idx]
         rate, data =, filename))
         if self.transform is not None:
             data = self.transform(data)
         if self.mode is "train":
             label = self.classes[self.csv_file["label"][idx]]
             return data, label
         elif self.mode is "test":
             return data
 if __name__ == '__main__':
     import matplotlib.pyplot as plt
     tsfm = transforms.Compose([
         lambda x: x.astype(np.float32) / np.max(x), # rescale to -1 to 1
         lambda x: librosa.feature.mfcc(x, sr=44100, n_mfcc=40), # MFCC 
         lambda x: Tensor(x)
     # todo: multiprocessing, padding data
     dataloader = DataLoader(
         Freesound(transform=tsfm, mode="train"), 
     for index, (data, label) in enumerate(dataloader):
         plt.imshow(data.numpy()[0, :, :])
         if index == 0:


  1. Audio Tagging System

Audio tagging is a technique to update the meta-data fields in MP3 and other compressed audio files. An audio tag detector is used to correct the meta-data in individual files or to apply a category to a group of files.

  1. Emotion and theme recognition

It involves the prediction of moods and themes conveyed by a music track, given the raw audio.

  1. Automatic assessment system for musical exercises

Music Critic is employed to gauge musical exercises sung by students, to allow meaningful feedback. It is often easily integrated into online applications and education platforms.

  1. Animal Sound Recognition:

The ability to automatically recognize a large range of animal sounds can analyze the habits and distributions of animals, which makes it possible to watch and protect them effectively.


We have learned about the Freesound dataset, how we can download it from the source. Freesound dataset creator and their researcher. Implementation of model in PyTorch data loader for speaker audio tagging Recognition and some of the application of FreeSound Datasets.

What Do You Think?

Join Our Telegram Group. Be part of an engaging online community. Join Here.

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top