Spleeter is a source separation Python library created by the Deezer R&D team(Deezer is a music streaming platform like Spotify). It comes with pre-trained state-of-the-art models built using Tensorflow for various types of source separation tasks. But what is source separation? Source separation can be thought of as speaker diarization but for music. The speaker diarization models have to differentiate between the voices of different speakers and then split the original audio into multiple tracks corresponding to each speaker. Similarly, the source separation models have to differentiate between the different stems(sources) of audio in a music track, these stems can be the vocals, the sound of a particular instrument, or the sound of a group of instruments. Spleeter contains pre-trained models for the following source separation tasks:
- 2 stems separation: vocals/accompaniment separation
- 4 stems separation: vocals, bass, drums, and other
- 5 stems separation: vocals, bass, drums, piano, and other
It is the first tool to offer 5 stems separation. Spleeter allows you to train your own source separation models or fine-tune the pre-trained ones for specific use-cases.
Architecture & Approach
The pre-trained models in Spleeter are U-Nets, i.e., encoder/decoder convolutional neural networks(CNN) with skip connections. The U-Nets are 12 layers deep, 6 layers for the encoder and 6 for the decoder. The models were trained on internal dataset from Deezer using L1-norm loss between masked input mix spectrograms and target spectrograms.
These models were compared with Open-Umix on the musdb18 dataset. Open-Umix is another openly available music source separation system with state-of-the-art performance. The important point about this comparison is that the Spleeter models weren’t trained or optimized on this dataset. Standard source separation metrics were used for the comparison, namely Signal to Distorsion Ratio (SDR), Signal to Artifacts Ratio (SAR), Signal to Interference Ratio (SIR), and source Image to Spatial distortion Ratio (ISR).
For most metrics, Spleeter is competitive with Open-Unmix, especially in terms of the Signal to Distorsion Ratio. Not only that, Spleeter is also very fast as it can separate a mixed audio file into 4 stems 100 times faster than real-time on a single GPU.
Music Source Separation with Spleeter
- Install Spleeter
Spleeter has two dependencies:
- libsndfile (optional, only needed for evaluation)
Install from PyPI.
pip install spleeter
- Get the audio file(s) for source separation. We’ll be using the demo audio file included in the Spleeter repository.
- Run the default two stem separation model
spleeter separate -o output/ audio_example.mp3
If you’re running this in a Windows CLI, you might run into an error. Use
python -m spleeter instead of just
spleeter to run the models.
This will create a new folder in the
output/, with the name of the input track,
audio_example in our case. Navigate to this folder and you should find two files:
- Now, let’s try the five stems model
spleeter separate -o output -p spleeter:4stems audio_example.mp3
This time, it will generate five files:
Last Epoch (Endnote)
This post discussed Spleeter, a tool for music source separation with pre-trained models. These pre-trained models have already been incorporated into several professional audio software like Acon Digital, VirtualDJ, and Algoriddim. It can also be used for a plethora of Music Information Retrieval (MIR) tasks, such as:
- Vocal lyrics analysis tasks like audio-lyrics alignment and lyrics transcription
- Singer identification
- Mood or genre classification
- Music transcription tasks like chord transcription, drum transcription, chord estimation, and beat tracking
- Vocal melody extraction
Join Our Discord Server. Be part of an engaging online community. Join Here.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
A machine learning enthusiast with a knack for finding patterns. In my free time, I like to delve into the world of non-fiction books and video essays.