Overview Of Spleeter: A Music Source Separation Engine

Spleeter is a source separation Python library created by the Deezer R&D team for various types of source separation tasks.
Spleeter

Spleeter is a source separation Python library created by the Deezer R&D team(Deezer is a music streaming platform like Spotify). It comes with pre-trained state-of-the-art models built using Tensorflow for various types of source separation tasks. But what is source separation? Source separation can be thought of as speaker diarization but for music. The speaker diarization models have to differentiate between the voices of different speakers and then split the original audio into multiple tracks corresponding to each speaker. Similarly, the source separation models have to differentiate between the different stems(sources) of audio in a music track, these stems can be the vocals, the sound of a particular instrument, or the sound of a group of instruments. Spleeter contains pre-trained models for the following source separation tasks:

  • 2 stems separation: vocals/accompaniment separation 
  • 4 stems separation: vocals, bass, drums, and other
  • 5 stems separation: vocals, bass, drums, piano, and other 

It is the first tool to offer 5 stems separation. Spleeter allows you to train your own source separation models or fine-tune the pre-trained ones for specific use-cases.

Architecture & Approach

The pre-trained models in Spleeter are U-Nets, i.e.,  encoder/decoder convolutional neural networks(CNN) with skip connections. The U-Nets are 12 layers deep, 6 layers for the encoder and 6 for the decoder. The models were trained on internal dataset from Deezer using L1-norm loss between masked input mix spectrograms and target spectrograms.  

These models were compared with Open-Umix on the musdb18 dataset.  Open-Umix is another openly available music source separation system with state-of-the-art performance. The important point about this comparison is that the Spleeter models weren’t trained or optimized on this dataset. Standard source separation metrics were used for the comparison, namely Signal to Distorsion Ratio (SDR), Signal to Artifacts Ratio (SAR), Signal to Interference Ratio (SIR), and source Image to Spatial distortion Ratio (ISR). 

For most metrics, Spleeter is competitive with Open-Unmix, especially in terms of the Signal to Distorsion Ratio. Not only that, Spleeter is also very fast as it can separate a mixed audio file into 4 stems 100 times faster than real-time on a single GPU.

Music Source Separation with Spleeter

  1. Install Spleeter

Spleeter has two dependencies:

  • ffmpeg
  • libsndfile (optional, only needed for evaluation)

Install from PyPI.

pip install spleeter

  1. Get the audio file(s) for source separation. We’ll be using the demo audio file included in the Spleeter repository. 

wget https://github.com/deezer/spleeter/raw/master/audio_example.mp3

  1. Run the default two stem separation model 

spleeter separate -o output/ audio_example.mp3

If you’re running this in a Windows CLI, you might run into an error. Use python -m spleeter instead of just spleeter to run the models.

This will create a new folder in the -o directory, output/,  with the name of the input track,  audio_example in our case. Navigate to this folder and you should find two files: vocals.wav and accompaniment.wav.

Default two stem spleeter model output
  1. Now, let’s try the five stems model

spleeter separate -o output -p spleeter:4stems audio_example.mp3

Spleeter 5 stems model output

This time, it will generate five files: vocals.wav, bass.wavpiano.wav, drums.wav and, other.wav

Last Epoch (Endnote)

This post discussed Spleeter, a tool for music source separation with pre-trained models. These pre-trained models have already been incorporated into several professional audio software like Acon Digital, VirtualDJ, and Algoriddim. It can also be used for a plethora of Music Information Retrieval (MIR) tasks, such as:

  • Vocal lyrics analysis tasks like audio-lyrics alignment and lyrics transcription
  • Singer identification
  • Mood or genre classification
  • Music transcription tasks like chord transcription, drum transcription, chord estimation, and beat tracking
  • Vocal melody extraction
References

More Great AIM Stories

Aditya Singh
A machine learning enthusiast with a knack for finding patterns. In my free time, I like to delve into the world of non-fiction books and video essays.

More Stories

MORE FROM AIM
Yugesh Verma
Beginner’s Guide To Qiskit for quantum Computing

Quantum computing is the field of computer science that mainly focuses on modern physics principles of quantum theory. Principles of quantum theories illustrate the behaviour of matters and energy at atomic and subatomic levels and Qiskit is an open-source quantum software development kit developed by IBM that provides help writing quantum computing programs

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM