Overview

Spleeter is an audio source separation library released by Deezer under the MIT license. It includes pretrained models written in Python using Tensorflow. Spleeter is a powerful tool that can split standard digital audio files into separate vocals, piano, drums, bass, and other components. Most demonstrations show how to use Spleeter directly as a command line tool, however, this post will show how to use the basic pretrained models in your standard Python project at a basic level.

Note: Spleeter requires FFmpeg to be installed on your machine

Reference Links

Install

pip install spleeter

Import Dependencies

from spleeter.separator import Separator
from spleeter.audio.adapter import AudioAdapter
import matplotlib.pyplot as plt

Set Up Model

# Load one of spleeter's embedded models
# this will automatically download the model and initialize it
# if the model has already been downloaded it will load the last state
# options include: 
# 'spleeter:2stems', 'spleeter:2stems-16kHz',
# 'spleeter:4stems', 'spleeter:4stems-16kHz', 
# 'spleeter:5stems', 'spleeter:5stems-16kHz'

separator = Separator('spleeter:4stems')

Apply Model to an Audio File

# Specify audio file directory and name
audio_file = './directory/audio.mp3'

# Initialize spleeter's built-in load/save interface
audio_loader = AudioAdapter.default()

# Load the audio file and return as a ndarray, sample_rate returns as a string
waveform, sample_rate = audio_loader.load(audio_file)

# Separate the loaded waveform ndarray into a dict of ndarrays
prediction = separator.separate(waveform)

Output of Model

# If using 2stems model
vocals = prediction['vocals']
accompaniment = prediction['accompaniment']

# If using 4stems model
vocals = prediction['vocals']
drums = prediction['drums']
bass = prediction['bass']
other = prediction['other']

# If using 5stems model
vocals = prediction['vocals']
piano = prediction['piano']
drums = prediction['drums']
bass = prediction['bass']
other = prediction['other']

Plot Results

# Function to plot an audio waveform from a np array
def plot_audio(audio_array, sr=44100, title='Audio Track'):
    Time = np.linspace(0, len(audio_array) / sr, num=len(audio_array))
    plt.figure(figsize=(15, 3))
    plt.title(title)
    plt.xlabel('Time (s)')
    plt.ylabel('Energy')
    plt.plot(Time, audio_array)
    plt.xlim(0, Time[len(Time) - 1])
    plt.show()

# Plot each key from spleeter prediction
for key in prediction:
    plot_audio(prediction[key], sr=sample_rate, title=key)
Share: