Now Reading
Overview Of Spleeter: A Music Source Separation Engine

Overview Of Spleeter: A Music Source Separation Engine

Spleeter

Spleeter is a source separation Python library created by the Deezer R&D team(Deezer is a music streaming platform like Spotify). It comes with pre-trained state-of-the-art models built using Tensorflow for various types of source separation tasks. But what is source separation? Source separation can be thought of as speaker diarization but for music. The speaker diarization models have to differentiate between the voices of different speakers and then split the original audio into multiple tracks corresponding to each speaker. Similarly, the source separation models have to differentiate between the different stems(sources) of audio in a music track, these stems can be the vocals, the sound of a particular instrument, or the sound of a group of instruments. Spleeter contains pre-trained models for the following source separation tasks:

  • 2 stems separation: vocals/accompaniment separation 
  • 4 stems separation: vocals, bass, drums, and other
  • 5 stems separation: vocals, bass, drums, piano, and other 

It is the first tool to offer 5 stems separation. Spleeter allows you to train your own source separation models or fine-tune the pre-trained ones for specific use-cases.

Register for our upcoming Masterclass>>

Architecture & Approach

The pre-trained models in Spleeter are U-Nets, i.e.,  encoder/decoder convolutional neural networks(CNN) with skip connections. The U-Nets are 12 layers deep, 6 layers for the encoder and 6 for the decoder. The models were trained on internal dataset from Deezer using L1-norm loss between masked input mix spectrograms and target spectrograms.  

These models were compared with Open-Umix on the musdb18 dataset.  Open-Umix is another openly available music source separation system with state-of-the-art performance. The important point about this comparison is that the Spleeter models weren’t trained or optimized on this dataset. Standard source separation metrics were used for the comparison, namely Signal to Distorsion Ratio (SDR), Signal to Artifacts Ratio (SAR), Signal to Interference Ratio (SIR), and source Image to Spatial distortion Ratio (ISR). 

For most metrics, Spleeter is competitive with Open-Unmix, especially in terms of the Signal to Distorsion Ratio. Not only that, Spleeter is also very fast as it can separate a mixed audio file into 4 stems 100 times faster than real-time on a single GPU.

Looking for a job change? Let us help you.

Music Source Separation with Spleeter

  1. Install Spleeter

Spleeter has two dependencies:

  • ffmpeg
  • libsndfile (optional, only needed for evaluation)

Install from PyPI.

pip install spleeter

  1. Get the audio file(s) for source separation. We’ll be using the demo audio file included in the Spleeter repository. 

wget https://github.com/deezer/spleeter/raw/master/audio_example.mp3

  1. Run the default two stem separation model 

spleeter separate -o output/ audio_example.mp3

If you’re running this in a Windows CLI, you might run into an error. Use python -m spleeter instead of just spleeter to run the models.

This will create a new folder in the -o directory, output/,  with the name of the input track,  audio_example in our case. Navigate to this folder and you should find two files: vocals.wav and accompaniment.wav.

Default two stem spleeter model output
  1. Now, let’s try the five stems model

spleeter separate -o output -p spleeter:4stems audio_example.mp3

Spleeter 5 stems model output

This time, it will generate five files: vocals.wav, bass.wavpiano.wav, drums.wav and, other.wav

Last Epoch (Endnote)

This post discussed Spleeter, a tool for music source separation with pre-trained models. These pre-trained models have already been incorporated into several professional audio software like Acon Digital, VirtualDJ, and Algoriddim. It can also be used for a plethora of Music Information Retrieval (MIR) tasks, such as:

  • Vocal lyrics analysis tasks like audio-lyrics alignment and lyrics transcription
  • Singer identification
  • Mood or genre classification
  • Music transcription tasks like chord transcription, drum transcription, chord estimation, and beat tracking
  • Vocal melody extraction
References
What Do You Think?

Join Our Discord Server. Be part of an engaging online community. Join Here.


Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top