Advertisement

Active Hackathon

Guide To LibriSpeech Datasets With Implementation in PyTorch and TensorFlow

The Librispeech dataset is SLR12 which is the audio recording of reading English speech.

LibriSpeech is developed by OpenSLR with all data collected by his research student. Danial Povey is an assistant professor at Johns Hopkins University in the Center for Language and Speech Processing as a speech recognition researcher. LibriSpeech is a collection of more than 1000 hours of speech data which is collected by Vassil Panayotov with the assistance of Daniel Povey. It is used in many applications such as speaker recognition and automatic speaker verification.

Paper:https://www.danielpovey.com/publications.html

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Download Size: 58GB Approx.

Dataset Size: 305GB Approx.

To Download the dataset on a local computer visit the link here:

http://www.openslr.org/12

About DataSet:

OpenSLR(Open speech and language resources) has 93 SLRs in the domain of software, audio, music, speech, and text dataset open for download. The Librispeech dataset is SLR12 which is the audio recording of reading English speech. The file format of data is in the form of FLAC(Free Lossless Audio Codec) without any loss in quality or loss of any original audio data.

Using Pytorch:

 import os
 import torch
 import pickle
 import random
 import torchaudio
 import numpy as np
 import pandas as pd
 from tqdm import tqdm
 from librosa.util import find_files
 from torch.utils.data import DataLoader
 from torch.utils.data.dataset import Dataset
 from torch.nn.utils.rnn import pad_sequence
 from utility.preprocessor import OnlinePreprocessor
 from transformer.mam import process_train_MAM_data,     process_test_MAM_data
 HALF_BATCHSIZE_TIME = 3000
 SPEAKER_THRESHOLD = 0
 def get_online_Dataloader(args, config, is_train=True):
     # create waveform dataset
     dataset = OnlineDataset(**config['online'])
     print('[Dataset] - Using Online Dataset.')
     # create dataloader for extracting features
     def collate_fn(samples):
         # samples: [(seq_len, channel), ...]
         samples = pad_sequence(samples, batch_first=True)
         # samples: (batch_size, max_len, channel)
         return samples.transpose(-1, -2).contiguous()
         # return: (batch_size, channel, max_len)
     dataloader = DataLoader(dataset, batch_size=config['dataloader']['batch_size'],
           shuffle=is_train, num_workers=config['dataloader']['n_jobs'],
                             pin_memory=True, collate_fn=collate_fn)
     return dataloader 

For Implementation in Pytorch visit the following link.

Using Tensorflow:

 import os
 import sys
 import numpy as np
 import tensorflow as tf
 import yaml
 import argparse
 sys.path.append(os.path.abspath('../../../'))
 from experiments.librispeech.data.load_dataset_ctc import Dataset
 from models.ctc.vanilla_ctc import CTC
 from utils.directory import mkdir_join
 parser = argparse.ArgumentParser()
 parser.add_argument('--epoch', type=int, default=-1,
                     help='the epoch to restore')
 parser.add_argument('--model_path', type=str,
                     help='path to the model to evaluate')
 parser.add_argument('--eval_batch_size', type=str, default=1,
                     help='the size of mini-batch in evaluation')


 For implementation in TensorFlow visit the link. 

Application:

1. Kaldi Speech Recognition toolkit:

LibriSpeech Dataset is used in the Kaldi Speech recognition to extract text from speech. It is useful in a speech to text generation. There are many software developed using the Kaldi toolkit for speech recognition.

Github: https://github.com/kaldi-asr/kaldi.

2.Sentiment classification of  spoken speech:

The sentiment of the speech using LibreSpeech dataset to train the model to automatically detect their emotion in speech.

3.Speaker recognition:

To verify the gender and emotion of the speaker, their accent to catch their range of age.

4. Automatic speech recognition:

Automatic speech recognition is used in the process of speech to text and text to speech recognition. Model is trained using a natural language processing toolkit.

Conclusion:

We have learned about the LibriSpeech dataset, how we can download it from the source. Librispeech dataset creator and their researcher. Implementation of model in PyTorch data loader for Kaldi speech recognition toolkit. Using Tensorflow for the end-to-end speech recognition and some of the application is used in daily life using Librispeech Datasets.

More Great AIM Stories

Amit Singh
Amit Singh is Data Scientist, graduated in Computer Science and Engineering. Data Science writer at Analytics India Magazine.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

Council Post: How to Evolve with Changing Workforce

The demand for digital roles is growing rapidly, and scouting for talent is becoming more and more difficult. If organisations do not change their ways to adapt and alter their strategy, it could have a significant business impact.

All Tech Giants: On your Mark, Get Set – Slow!

In September 2021, the FTC published a report on M&As of five top companies in the US that have escaped the antitrust laws. These were Alphabet/Google, Amazon, Apple, Facebook, and Microsoft.

The Digital Transformation Journey of Vedanta

In the current digital ecosystem, the evolving technologies can be seen both as an opportunity to gain new insights as well as a disruption by others, says Vineet Jaiswal, chief digital and technology officer at Vedanta Resources Limited

BlenderBot — Public, Yet Not Too Public

As a footnote, Meta cites access will be granted to academic researchers and people affiliated to government organisations, civil society groups, academia and global industry research labs.