Core

AudioSignals

class nussl.core.AudioSignal(path_to_input_file=None, audio_data_array=None, stft=None, label=None, sample_rate=None, stft_params=None, offset=0, duration=None)[source]

Overview

AudioSignal is the main entry and exit point for all source separation algorithms in nussl. The AudioSignal class is a general container for all things related to audio data. It contains utilities for:

  • Input and output from an array or from a file,

  • Time-series and frequency domain manipulation,

  • Plotting and visualizing,

  • Playing audio within a terminal or jupyter notebook,

  • Applying a mask to estimate signals

Attributes

active_region_is_default

bool

audio_data

np.ndarray

file_name

str

freq_vector

np.ndarray

has_data

bool

is_mono

bool

is_stereo

bool

log_magnitude_spectrogram_data

Returns a real valued np.array with log magnitude spectrogram data.

magnitude_spectrogram_data

np.ndarray

num_channels

int

power_spectrogram_data

np.ndarray

sample_rate

int

signal_duration

float

signal_length

int

stft_data

np.ndarray

stft_length

int

stft_params

STFTParams

time_bins_vector

np.ndarray

time_vector

np.ndarray

Methods

add(other)

Adds two audio signal objects.

apply_gain(value)

Apply a gain to audio_data

apply_mask(mask[, overwrite])

Applies the input mask to the time-frequency representation in this AudioSignal object and returns a new AudioSignal object with the mask applied.

concat(other)

Concatenate two AudioSignal objects (by concatenating audio_data).

crop_signal(before, after)

Get rid of samples before and after the signal on all channels.

embed_audio([ext, display])

Embeds the audio signal into a notebook, using nussl.play_utils.embed_audio.

get_channel(n)

Gets audio data of n-th channel from audio_data as a 1D np.ndarray of shape (n_samples,).

get_channels()

Generator that will loop through channels of audio_data.

get_magnitude_spectrogram_channel(n)

Returns the n-th channel from self.magnitude_spectrogram_data.

get_power_spectrogram_channel(n)

Returns the n-th channel from self.power_spectrogram_data.

get_stft_channel(n)

Returns STFT data of n-th channel from stft_data as a 2D np.ndarray.

get_stft_channels()

Generator that will loop through channels of stft_data.

get_window(window_type, window_length)

Wrapper around scipy.signal.get_window so one can also get the popular sqrt-hann window.

ipd_ild_features([ch_one, ch_two])

Computes interphase difference (IPD) and interlevel difference (ILD) for a stereo spectrogram.

istft([window_length, hop_length, …])

Computes and returns the inverse Short Time Fourier Transform (iSTFT).

load_audio_from_array(signal[, sample_rate])

Loads an audio signal from a np.ndarray.

load_audio_from_file(input_file_path, …)

Loads an audio signal into memory from a file on disc.

make_audio_signal_from_channel(n)

Makes a new AudioSignal object from with data from channel n.

make_copy_with_audio_data(audio_data[, verbose])

Makes a copy of this AudioSignal object with audio_data initialized to the input :param:`audio_data` numpy array.

make_copy_with_stft_data(stft_data[, verbose])

Makes a copy of this AudioSignal object with stft_data initialized to the input :param:`stft_data` numpy array.

peak_normalize()

Peak normalizes the audio signal.

play()

Plays this audio signal, using nussl.play_utils.play.

resample(new_sample_rate, **kwargs)

Resample the data in audio_data to the new sample rate provided by :param:`new_sample_rate`.

rms([win_len, hop_len])

Calculates the root-mean-square of audio_data.

set_active_region(start, end)

Determines the bounds of what gets returned when you access audio_data.

set_active_region_to_default()

Resets the active region of this AudioSignal object to its default value of the entire audio_data array.

stft([window_length, hop_length, …])

Computes the Short Time Fourier Transform (STFT) of audio_data.

subtract(other)

Subtracts two audio signal objects.

to_mono([overwrite, keep_dims])

Converts audio_data to mono by averaging every sample.

truncate_samples(n_samples)

Truncates the signal leaving only the first n_samples samples.

truncate_seconds(n_seconds)

Truncates the signal leaving only the first n_seconds.

write_audio_to_file(output_file_path[, …])

Outputs the audio signal data in audio_data to a file at :param:`output_file_path` with sample rate of :param:`sample_rate`.

zero_pad(before, after)

Adds zeros before and after the signal to all channels.

and more. The AudioSignal class is used in all source separation objects in nussl.

AudioSignal object stores time-series audio data as a 2D numpy array in audio_data (see audio_data for details) and stores Short-Time Fourier Transform data as 3D numpy array in stft_data (see stft_data for details).

Initialization

There are a few options for initializing an AudioSignal object. The first is to initialize an empty AudioSignal object, with no parameters:

>>> import nussl
>>> signal = nussl.AudioSignal()

In this case, there is no data stored in audio_data or in stft_data, though these attributes can be updated at any time after the object has been created.

Additionally, an AudioSignal object can be loaded with exactly one of the following:

  1. A path to an input audio file (see load_audio_from_file() for details).

  2. A numpy array of 1D or 2D real-valued time-series audio data.

  3. A numpy array of 2D or 3D complex-valued time-frequency STFT data.

AudioSignal will throw an error if it is initialized with more than one of the previous at once.

Here are examples of all three of these cases:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import numpy as np
import nussl

# Initializing an empty AudioSignal object:
sig_empty = nussl.AudioSignal()

# Initializing from a path:
file_path = 'my/awesome/mixture.wav'
sig_path = nussl.AudioSignal(file_path)

# Initializing with a 1D or 2D numpy array containing audio data:
aud_1d = np.sin(np.linspace(0.0, 1.0, 48000))
sig_1d = nussl.AudioSignal(audio_data_array=aud_1d, sample_rate=48000)

# FYI: The shape doesn't matter, nussl will correct for it
aud_2d = np.array([aud_1d, -2 * aud_1d])
sig_2d = nussl.AudioSignal(audio_data_array=aud_2d)

# Initializing with a 2D or 3D numpy array containing STFT data:
stft_2d = np.random.rand((513, 300)) + 1j * np.random.rand((513, 300))
sig_stft_2d = nussl.AudioSignal(stft=stft_2d)

# Two channels of STFT data:
stft_3d = nussl.utils.complex_randn((513, 300, 2))
sig_stft_3d = nussl.AudioSignal(stft=stft_3d)

# Initializing with more than one of the above methods will raise an exception:
sig_exception = nussl.AudioSignal(audio_data_array=aud_2d, stft=stft_2d)

When initializing from a path, AudioSignal can read many types of audio files, provided that your computer has the backends installed to understand the corresponding codecs. nussl uses librosa’s load function to read in audio data. See librosa’s documentation for details: https://github.com/librosa/librosa#audioread

Once initialized with a single type of data (time-series or time-frequency), there are methods to compute an STFT from time-series data (stft()) and vice versa (istft()).

Sample Rate

The sample rate of an AudioSignal object is set upon initialization. If initializing from a path, the sample rate of the AudioSignal object inherits the native sample rate from the file. If initialized with an audio or stft data array, the sample rate is passed in as an optional argument. In these cases, with no sample rate explicitly defined, the default sample rate is 44.1 kHz (CD quality). If this argument is provided when reading from a file and the provided sample rate does not match the native sample rate of the file, AudioSignal will resample the data from the file so that it matches the provided sample rate.

Notes

There is no guarantee that data in audio_data corresponds to data in stft_data. E.g., when an AudioSignal object is initialized with audio_data of an audio mixture, its stft_data is None until stft() is called. Once stft() is called and a mask is applied to stft_data (via some algorithm), the audio_data in this AudioSignal object still contains data from the original mixture that it was initialized with even though stft_data contains altered data. (To hear the results, simply call istft() on the AudioSignal object.) It is up to the user to keep track of the contents of audio_data and stft_data.

See also

For a walk-through of AudioSignal features, see audio_signal_basics and audio_signal_stft.

Parameters
  • path_to_input_file (str) – Path to an input file to load upon initialization. Audio gets loaded into audio_data.

  • audio_data_array (np.ndarray) – 1D or 2D numpy array containing a real-valued, time-series representation of the audio.

  • stft (np.ndarray) – 2D or 3D numpy array containing pre-computed complex-valued STFT data.

  • label (str) – A label for this AudioSignal object.

  • offset (float) – Starting point of the section to be extracted (in seconds) if initializing from a file.

  • duration (float) – Length of the signal to read from the file (in seconds). Defaults to full length of the signal (i.e., None).

  • sample_rate (int) – Sampling rate of this AudioSignal object.

path_to_input_file

Path to the input file. None if this AudioSignal never loaded a file, i.e., initialized with a np.ndarray.

Type

str

label

A user-definable label for this AudioSignal object.

Type

str

property active_region_is_default
bool

True if active region is the full length of audio_data. False otherwise.

See also

add(other)[source]

Adds two audio signal objects.

This does element-wise addition on the audio_data array.

Raises

AudioSignalException – If self.sample_rate != other.sample_rate, self.num_channels != other.num_channels, or self.active_region_is_default is False.

Parameters

other (AudioSignal) – Other AudioSignal to add.

Returns

New AudioSignal object with the sum of self and other.

Return type

(AudioSignal)

apply_gain(value)[source]

Apply a gain to audio_data

Parameters

value (float) – amount to multiply self.audio_data by

Returns

This AudioSignal object with the gain applied.

Return type

(AudioSignal)

apply_mask(mask, overwrite=False)[source]

Applies the input mask to the time-frequency representation in this AudioSignal object and returns a new AudioSignal object with the mask applied. The mask is applied to the magnitude of audio signal. The phase of the original audio signal is then applied to construct the masked STFT.

Parameters
  • mask (MaskBase-derived object) – A MaskBase-derived object containing a mask.

  • overwrite (bool) – If True, this will alter stft_data in self. If False, this function will create a new AudioSignal object with the mask applied.

Returns

A new AudioSignal` object with the input mask applied to the STFT, iff overwrite is False.

property audio_data
np.ndarray

Stored as a numpy np.ndarray, audio_data houses the raw, uncompressed time-domain audio data in the AudioSignal. Audio data is stored with shape (n_channels, n_samples) as an array of floats.

None by default, can be initialized upon object instantiation or set at any time by accessing this attribute or calling load_audio_from_array(). It is recommended to set audio_data by using load_audio_from_array() if this AudioSignal has been initialized without any audio or STFT data.

Raises

AudioSignalException – If set incorrectly, will raise an error. Expects a real, finite-valued 1D or 2D numpy np.ndarray-typed array.

Warning

audio_data and stft_data are not automatically synchronized, meaning that if one of them is changed, those changes are not instantly reflected in the other. To propagate changes, either call stft() or istft().

Notes

  • This attribute only returns values within the active region. For more information

    see set_active_region_to_default(). When setting this attribute, the active region are reset to default.

  • If audio_data is set with an improperly transposed array, it will

    automatically transpose it so that it is set the expected way. A warning will be displayed on the console.

See also

concat(other)[source]

Concatenate two AudioSignal objects (by concatenating audio_data).

Puts other.audio_data after audio_data.

Raises

AudioSignalException – If self.sample_rate != other.sample_rate, self.num_channels != other.num_channels, or !self.active_region_is_default is False.

Parameters

other (AudioSignal) – AudioSignal to concatenate with the current one.

crop_signal(before, after)[source]

Get rid of samples before and after the signal on all channels. Contracts the length of audio_data by before + after. Useful to get rid of zero padding after the fact.

Parameters
  • before – (int) number of samples to remove at beginning of self.audio_data

  • after – (int) number of samples to remove at end of self.audio_data

embed_audio(ext='.mp3', display=True)[source]

Embeds the audio signal into a notebook, using nussl.play_utils.embed_audio.

Write a numpy array to a temporary mp3 file using ffmpy, then embeds the mp3 into the notebook.

Parameters
  • ext (str) – What extension to use when embedding. ‘.mp3’ is more lightweight

  • to smaller notebook sizes. (leading) –

Example

>>> import nussl
>>> audio_file = nussl.efz_utils.download_audio_file('schoolboy_fascination_excerpt.wav')
>>> audio_signal = nussl.AudioSignal(audio_file)
>>> audio_signal.embed_audio()

This will show a little audio player where you can play the audio inline in the notebook.

property file_name
str

The name of the file associated with this object. Includes extension, but not the full path.

Notes

This will return None if this AudioSignal object was not loaded from a file.

See also

path_to_input_file for the full path.

property freq_vector
np.ndarray

A 1D numpy array with frequency values (in Hz) that correspond to each frequency bin (vertical axis) in stft_data. Assumes linearly spaced frequency bins.

Raises

AudioSignalException – If stft_data is None. Run stft() before accessing this.

get_channel(n)[source]

Gets audio data of n-th channel from audio_data as a 1D np.ndarray of shape (n_samples,).

Parameters

n (int) – index of channel to get. 0-based

See also

stft_data.

Raises

AudioSignalException – If not 0 <= n < self.num_channels.

Returns

The audio data in the n-th channel of the signal, 1D

Return type

(np.array)

get_channels()[source]

Generator that will loop through channels of audio_data.

See also

Yields

(np.array) – The audio data in the next channel of this signal as a 1D np.ndarray.

get_magnitude_spectrogram_channel(n)[source]

Returns the n-th channel from self.magnitude_spectrogram_data.

Raises

Exception – If not 0 <= n < self.num_channels.

Parameters

n – (int) index of magnitude spectrogram channel to get 0-based

Returns

the magnitude spectrogram data in the n-th channel of the signal, 1D

Return type

(np.array)

get_power_spectrogram_channel(n)[source]

Returns the n-th channel from self.power_spectrogram_data.

Raises

Exception – If not 0 <= n < self.num_channels.

Parameters

n – (int) index of power spectrogram channel to get 0-based

Returns

the power spectrogram data in the n-th channel of the signal, 1D

Return type

(np.array)

get_stft_channel(n)[source]

Returns STFT data of n-th channel from stft_data as a 2D np.ndarray.

Parameters

n – (int) index of stft channel to get. 0-based

See also

Raises

AudioSignalException – If not 0 <= n < self.num_channels.

Returns

the STFT data in the n-th channel of the signal, 2D

Return type

(np.array)

get_stft_channels()[source]

Generator that will loop through channels of stft_data.

See also

Yields

(np.array) – The STFT data in the next channel of this signal as a 2D np.ndarray.

static get_window(window_type, window_length)[source]

Wrapper around scipy.signal.get_window so one can also get the popular sqrt-hann window.

Parameters
  • window_type (str) – Type of window to get (see constants.ALL_WINDOW).

  • window_length (int) – Length of the window

Returns

Window returned by scipy.signa.get_window

Return type

np.ndarray

property has_data

bool Returns False if audio_data and stft_data are empty. Else, returns True.

ipd_ild_features(ch_one=0, ch_two=1)[source]

Computes interphase difference (IPD) and interlevel difference (ILD) for a stereo spectrogram. If more than two channels, this by default computes IPD/ILD between the first two channels. This can be specified by the arguments ch_one and ch_two. If only one channel, this raises an error.

Parameters
  • ch_one (int) – index of first channel to compute IPD/ILD.

  • ch_two (int) – index of second channel to compute IPD/ILD.

Returns

Interphase difference between selected channels ild (np.ndarray): Interlevel difference between selected channels

Return type

ipd (np.ndarray)

property is_mono
bool

Whether or not this signal is mono (i.e., has exactly one channel). First looks at audio_data, then (if that’s None) looks at stft_data.

property is_stereo
bool

Whether or not this signal is stereo (i.e., has exactly two channels). First looks at audio_data, then (if that’s None) looks at stft_data.

istft(window_length=None, hop_length=None, window_type=None, overwrite=True, truncate_to_length=None)[source]

Computes and returns the inverse Short Time Fourier Transform (iSTFT).

The results of the iSTFT calculation can be accessed from audio_data if audio_data is None prior to running this function or overwrite == True

Warning

If overwrite=True (default) this will overwrite any data in audio_data!

Parameters
  • window_length (int) – Amount of time (in samples) to do an FFT on

  • hop_length (int) – Amount of time (in samples) to skip ahead for the new FFT

  • window_type (str) – Type of scaling to apply to the window.

  • overwrite (bool) – Overwrite stft_data with current calculation

  • truncate_to_length (int) – truncate resultant signal to specified length. Default None.

Returns

(np.ndarray) Calculated, real-valued iSTFT from stft_data, 2D numpy array with shape (n_channels, n_samples).

load_audio_from_array(signal, sample_rate=44100)[source]

Loads an audio signal from a np.ndarray. :param:`sample_rate` is the sample of the signal.

See also

Notes

Only accepts float arrays and int arrays of depth 16-bits.

Parameters
  • signal (np.ndarray) – Array containing the audio signal sampled at :param:`sample_rate`.

  • sample_rate (int) – The sample rate of signal. Default is constants.DEFAULT_SAMPLE_RATE (44.1kHz)

load_audio_from_file(input_file_path: str, offset: float = 0, duration: Optional[float] = None, new_sample_rate: Optional[int] = None) → None[source]

Loads an audio signal into memory from a file on disc. The audio is stored in AudioSignal as a np.ndarray of float s. The sample rate is read from the file, and this AudioSignal object’s sample rate is set from it. If :param:`new_sample_rate` is not None nor the same as the sample rate of the file, the audio will be resampled to the sample rate provided in the :param:`new_sample_rate` parameter. After reading the audio data into memory, the active region is set to default.

:param:`offset` and :param:`duration` allow the user to determine how much of the audio is read from the file. If those are non-default, then only the values provided will be stored in audio_data (unlike with the active region, which has the entire audio data stored in memory but only allows access to a subset of the audio).

See also

Parameters
  • input_file_path (str) – Path to input file.

  • offset (float,) – The starting point of the section to be extracted (seconds). Defaults to 0 seconds (i.e., the very beginning of the file).

  • duration (float) – Length of signal to load in second. signal_length of 0 means read the whole file. Defaults to the full length of the signal.

  • new_sample_rate (int) – If this parameter is not None or the same sample rate as provided by the input file, then the audio data will be resampled to the new sample rate dictated by this parameter.

property log_magnitude_spectrogram_data

Returns a real valued np.array with log magnitude spectrogram data.

The log magnitude spectrogram is defined as 20*log10(Abs(STFT)). Same shape as stft_data.

Raises

AudioSignalException – if stft_data is None. Run stft() before accessing this.

See also

Type

(np.ndarray)

property magnitude_spectrogram_data
np.ndarray

Returns a real valued np.array with magnitude spectrogram data. The magnitude spectrogram is defined as abs(STFT), the element-wise absolute value of every item in the STFT. Same shape as stft_data.

Raises

AudioSignalException – if stft_data is None. Run stft() before accessing this.

See also

make_audio_signal_from_channel(n)[source]

Makes a new AudioSignal object from with data from channel n.

Parameters

n (int) – index of channel to make a new signal from. 0-based

Returns

(AudioSignal) new AudioSignal object with only data from channel n.

make_copy_with_audio_data(audio_data, verbose=True)[source]

Makes a copy of this AudioSignal object with audio_data initialized to the input :param:`audio_data` numpy array. The stft_data of the new AudioSignal object is None.

Parameters
  • audio_data (np.ndarray) – Audio data to be put into the new AudioSignal object.

  • verbose (bool) – If True prints warnings. If False, outputs nothing.

Returns

A copy of this AudioSignal object with audio_data initialized to the input :param:`audio_data` numpy array.

Return type

(AudioSignal)

make_copy_with_stft_data(stft_data, verbose=True)[source]

Makes a copy of this AudioSignal object with stft_data initialized to the input :param:`stft_data` numpy array. The audio_data of the new AudioSignal object is None.

Parameters

stft_data (np.ndarray) – STFT data to be put into the new AudioSignal object.

Returns

A copy of this AudioSignal object with stft_data initialized to the input :param:`stft_data` numpy array.

Return type

(AudioSignal)

property num_channels
int

Number of channels this AudioSignal has. Defaults to returning number of channels in audio_data. If that is None, returns number of channels in stft_data. If both are None then returns None.

peak_normalize()[source]

Peak normalizes the audio signal.

play()[source]

Plays this audio signal, using nussl.play_utils.play.

Plays an audio signal if ffplay from the ffmpeg suite of tools is installed. Otherwise, will fail. The audio signal is written to a temporary file and then played with ffplay.

property power_spectrogram_data
np.ndarray

Returns a real valued np.ndarray with power spectrogram data. The power spectrogram is defined as (STFT)^2, where ^2 is element-wise squaring of entries of the STFT. Same shape as stft_data.

Raises

AudioSignalException – if stft_data is None. Run stft() before accessing this.

See also

resample(new_sample_rate, **kwargs)[source]

Resample the data in audio_data to the new sample rate provided by :param:`new_sample_rate`. If the :param:`new_sample_rate` is the same as sample_rate then nothing happens.

Parameters
  • new_sample_rate (int) – The new sample rate of audio_data.

  • kwargs – Keyword arguments to librosa.resample.

rms(win_len=None, hop_len=None)[source]

Calculates the root-mean-square of audio_data.

Returns

Root-mean-square of audio_data.

Return type

(float)

property sample_rate
int

Sample rate associated with this object. If audio was read from a file, the sample rate will be set to the sample rate associated with the file. If this object was initialized from an array then the sample rate is set upon init. This property is read-only. To change the sample rate, use resample().

Notes

This property is read-only and cannot be set directly. To change

See also

set_active_region(start, end)[source]

Determines the bounds of what gets returned when you access audio_data. None of the data in audio_data is discarded when you set the active region, it merely becomes inaccessible until the active region is set back to default (i.e., the full length of the signal).

This is useful for reusing a single AudioSignal object to do multiple operations on only select parts of the audio data.

Warning

Many functions will raise exceptions while the active region is not default. Be aware that adding, subtracting, concatenating, truncating, and other utilities are not available when the active region is not default.

Examples

>>> import nussl
>>> import numpy as np
>>> n = nussl.constants.DEFAULT_SAMPLE_RATE  # 1 second of audio at 44.1kHz
>>> np_sin = np.sin(np.linspace(0, 100 * 2 * np.pi, n))  # sine wave @ 100 Hz
>>> sig = nussl.AudioSignal(audio_data_array=np_sin)
>>> sig.signal_duration
1.0
>>> sig.set_active_region(0, n // 2)
>>> sig.signal_duration
0.5
Parameters
  • start (int) – Beginning of active region (in samples). Cannot be less than 0.

  • end (int) – End of active region (in samples). Cannot be larger than signal_length.

set_active_region_to_default()[source]

Resets the active region of this AudioSignal object to its default value of the entire audio_data array.

See also

AudioSignal.

property signal_duration
float

Duration of the active region of audio_data in seconds. The length of the audio signal represented by this object in seconds.

See also

property signal_length
int

Number of samples in the active region of audio_data. The length of the audio signal represented by this object in samples.

See also

stft(window_length=None, hop_length=None, window_type=None, overwrite=True)[source]

Computes the Short Time Fourier Transform (STFT) of audio_data. The results of the STFT calculation can be accessed from stft_data if stft_data is None prior to running this function or overwrite == True

Warning

If overwrite=True (default) this will overwrite any data in stft_data!

Parameters
  • window_length (int) – Amount of time (in samples) to do an FFT on

  • hop_length (int) – Amount of time (in samples) to skip ahead for the new FFT

  • window_type (str) – Type of scaling to apply to the window.

  • overwrite (bool) – Overwrite stft_data with current calculation

Returns

(np.ndarray) Calculated, complex-valued STFT from audio_data, 3D numpy array with shape (n_frequency_bins, n_hops, n_channels).

property stft_data
np.ndarray

Stored as a numpy np.ndarray, stft_data houses complex-valued data computed from a Short-time Fourier Transform (STFT) of audio data in the AudioSignal. None by default, this AudioSignal object can be initialized with STFT data upon initialization or it can be set at any time.

The STFT data is stored with shape (n_frequency_bins, n_hops, n_channels) as a complex-valued numpy array.

Raises

AudioSignalException – if set with an np.ndarray with one dimension or more than three dimensions.

See also

calculate the inverse STFT from this attribute and put it in audio_data.

  • magnitude_spectrogram() to calculate and get the magnitude spectrogram from

stft_data. power_spectrogram() to calculate and get the power spectrogram from stft_data.

Notes

that if one of them is changed, those changes are not instantly reflected in the other. To propagate changes, either call stft() or istft().

  • stft_data will expand a two dimensional array so that it has the expected

shape (n_frequency_bins, n_hops, n_channels).

property stft_length
int

The length of stft_data along the time axis. In units of hops.

Raises

AudioSignalException – If self.stft_dat``a is ``None. Run stft() before accessing this.

property stft_params

STFTParams STFT parameters are kept in this property. STFT parameters are a namedtuple called STFTParams with the following signature:

STFTParams(
    window_length=2048,
    hop_length=512,
    window_type='hann'
)

The defaults are 32ms windows, 8ms hop, and a hann window.

subtract(other)[source]

Subtracts two audio signal objects.

This does element-wise subtraction on the audio_data array.

Raises

AudioSignalException – If self.sample_rate != other.sample_rate, self.num_channels != other.num_channels, or self.active_region_is_default is False.

Parameters

other (AudioSignal) – Other AudioSignal to subtract.

Returns

New AudioSignal object with the difference between self and other.

Return type

(AudioSignal)

property time_bins_vector
np.ndarray

A 1D numpy array with time values (in seconds) that correspond to each time bin (horizontal/time axis) in stft_data.

Raises

AudioSignalException – If stft_data is None. Run stft() before accessing this.

property time_vector

np.ndarray A 1D np.ndarray with timestamps (in seconds) for each sample in audio_data.

to_mono(overwrite=True, keep_dims=False)[source]

Converts audio_data to mono by averaging every sample.

Parameters
  • overwrite (bool) – If True this function will overwrite audio_data.

  • keep_dims (bool) – If False this function will return a 1D array, else will return array with shape (1, n_samples).

Warning

If overwrite=True (default) this will overwrite any data in audio_data!

Returns

Mono-ed version of AudioSignal, either in place or not.

Return type

(AudioSignal)

truncate_samples(n_samples)[source]

Truncates the signal leaving only the first n_samples samples. This can only be done if self.active_region_is_default is True. If n_samples > self.signal_length, then n_samples = self.signal_length (no truncation happens).

Raises

AudioSignalException – If self.active_region_is_default is False.

Parameters

n_samples – (int) number of samples that will be left.

truncate_seconds(n_seconds)[source]

Truncates the signal leaving only the first n_seconds. This can only be done if self.active_region_is_default is True.

Parameters

n_seconds – (float) number of seconds to truncate audio_data.

write_audio_to_file(output_file_path, sample_rate=None)[source]

Outputs the audio signal data in audio_data to a file at :param:`output_file_path` with sample rate of :param:`sample_rate`.

Parameters
  • output_file_path (str) – Filename where output file will be saved.

  • sample_rate (int) – The sample rate to write the file at. Default is sample_rate.

zero_pad(before, after)[source]

Adds zeros before and after the signal to all channels. Extends the length of self.audio_data by before + after.

Raises

Exception – If self.active_region_is_default` is False.

Parameters
  • before – (int) number of zeros to be put before the current contents of self.audio_data

  • after – (int) number of zeros to be put after the current contents fo self.audio_data

Masks

init for masks files

Classes

MaskBase([input_mask, mask_shape])

param input_mask

A 2- or 3-dimensional numpy ndarray representing a mask.

BinaryMask([input_mask, mask_shape])

Class for creating a Binary Mask to apply to a time-frequency representation of the audio.

SoftMask([input_mask, mask_shape])

A simple class for making a soft mask.

class nussl.core.masks.MaskBase(input_mask=None, mask_shape=None)[source]
Parameters

input_mask (np.ndarray) – A 2- or 3-dimensional numpy ndarray representing a mask.

Attributes

dtype

(str) Returns the data type of the values of the mask.

mask

The actual mask.

num_channels

(int) Number of channels this mask has.

shape

(tuple) Returns the shape of the whole mask. Identical to np.ndarray.shape().

Methods

get_channel(ch)

Gets mask channel ch and returns it as a 2D np.ndarray

inverse_mask()

Alias for invert_mask()

invert_mask()

Returns:

ones(shape)

Makes a mask with all ones with the specified shape.

zeros(shape)

Makes a mask with all zeros with the specified shape.

property dtype

(str) Returns the data type of the values of the mask.

get_channel(ch)[source]

Gets mask channel ch and returns it as a 2D np.ndarray

Parameters

ch (int) – Channel index to return (0-based).

Returns

np.array with the mask channel

Raises

ValueError

inverse_mask()[source]

Alias for invert_mask()

See also

invert_mask()

Returns:

invert_mask()[source]

Returns:

property mask

The actual mask. This is represented as a three dimensional numpy ndarray object. The input gets validated by _validate_mask(). In the case of separation.masks.binary_mask.BinaryMask the validation checks that the values are all 1 or 0 (or bools), in the case of separation.masks.soft_mask.SoftMask the validation checks that all values are within the domain [0.0, 1.0].

This base class will throw a NotImplementedError if instantiated directly.

Raises
  • ValueError

  • NotImplementedError

property num_channels

(int) Number of channels this mask has.

classmethod ones(shape)[source]

Makes a mask with all ones with the specified shape. Exactly the same as np.ones(). :param shape: Shape of the resultant mask. :type shape: tuple

Returns:

property shape

(tuple) Returns the shape of the whole mask. Identical to np.ndarray.shape().

classmethod zeros(shape)[source]

Makes a mask with all zeros with the specified shape. Exactly the same as np.zeros(). :param shape: Shape of the resultant mask. :type shape: tuple

Returns:

class nussl.core.masks.BinaryMask(input_mask=None, mask_shape=None)[source]

Class for creating a Binary Mask to apply to a time-frequency representation of the audio.

Parameters

input_mask (np.ndarray) – 2- or 3-D np.array that represents the mask.

Methods

invert_mask()

Makes a new BinaryMask object with a logical not applied to flip the values in this BinaryMask object.

mask_as_ints([channel])

Returns this BinaryMask as a numpy array of ints of 0’s and 1’s.

invert_mask()[source]

Makes a new BinaryMask object with a logical not applied to flip the values in this BinaryMask object.

Returns

A new BinaryMask object that has all of the boolean values flipped.

mask_as_ints(channel=None)[source]

Returns this BinaryMask as a numpy array of ints of 0’s and 1’s.

Returns

numpy ndarray of this BinaryMask represented as ints instead of bools.

class nussl.core.masks.SoftMask(input_mask=None, mask_shape=None)[source]

A simple class for making a soft mask. The soft mask is represented as a numpy array of floats between 0.0 and 1.0, inclusive.

Parameters

input_mask (np.ndarray) – 2- or 3-D np.array that represents the mask.

Methods

invert_mask()

Returns a new mask with inverted values set like 1 - mask for mask.

mask_to_binary([threshold])

Create a new separation.masks.soft_mask.BinaryMask object from this object’s data.

invert_mask()[source]

Returns a new mask with inverted values set like 1 - mask for mask.

Returns

A new SoftMask object with values set at 1 - mask.

mask_to_binary(threshold=0.5)[source]

Create a new separation.masks.soft_mask.BinaryMask object from this object’s data.

Parameters

threshold (float, Optional) – Threshold (between [0.0, 1.0]) to set the True/False cutoff for the binary mask.

Returns

A new separation.masks.soft_mask.BinaryMask object

Constants

A repository containing all of the constants frequently used in this wacky, mixed up source separation stuff.

Data

DEFAULT_SAMPLE_RATE

Default sample rate.

DEFAULT_WIN_LEN_PARAM

Default window length.

DEFAULT_BIT_DEPTH

Default bit depth.

DEFAULT_MAX_VAL

Max value of 16-bit audio file (unsigned)

EPSILON

epsilon for determining small values

MAX_FREQUENCY

Maximum frequency representable.

WINDOW_HAMMING

Name for calling Hamming window.

WINDOW_RECTANGULAR

Name for calling Rectangular window.

WINDOW_HANN

Name for calling Hann window.

WINDOW_BLACKMAN

Name for calling Blackman window.

WINDOW_TRIANGULAR

Name for calling Triangular window.

WINDOW_DEFAULT

Default window, Hamming.

ALL_WINDOWS

list of all available windows in nussl

NUMPY_JSON_KEY

key used when turning numpy arrays into json

LEN_INDEX

Index of the number of samples in an audio signal.

CHAN_INDEX

Index of the number of channels in an audio signal.

STFT_VERT_INDEX

(int) Index of the number of frequency (vertical) values in a time-frequency representation.

STFT_LEN_INDEX

(int) Index of the number of time (horizontal) hops in a time-frequency representation.

STFT_CHAN_INDEX

(int) Index of the number of channels in a time-frequency representation.

nussl.core.constants.DEFAULT_SAMPLE_RATE = 44100

Default sample rate. 44.1 kHz, CD-quality

Type

(int)

nussl.core.constants.DEFAULT_WIN_LEN_PARAM = 0.032

Default window length. 32ms

Type

(float)

nussl.core.constants.DEFAULT_BIT_DEPTH = 16

Default bit depth. 16-bits, CD-quality

Type

(int)

nussl.core.constants.DEFAULT_MAX_VAL = 65536

Max value of 16-bit audio file (unsigned)

Type

(int)

nussl.core.constants.EPSILON = 1e-16

epsilon for determining small values

Type

(float)

nussl.core.constants.MAX_FREQUENCY = 22050

Maximum frequency representable. 22050 Hz

Type

(int)

nussl.core.constants.WINDOW_HAMMING = 'hamming'

Name for calling Hamming window. ‘hamming’

Type

(str)

nussl.core.constants.WINDOW_RECTANGULAR = 'rectangular'

Name for calling Rectangular window. ‘rectangular’

Type

(str)

nussl.core.constants.WINDOW_HANN = 'hann'

Name for calling Hann window. ‘hann’

Type

(str)

nussl.core.constants.WINDOW_BLACKMAN = 'blackman'

Name for calling Blackman window. ‘blackman’

Type

(str)

nussl.core.constants.WINDOW_TRIANGULAR = 'triang'

Name for calling Triangular window. ‘triangular’

Type

(str)

nussl.core.constants.WINDOW_DEFAULT = 'sqrt_hann'

Default window, Hamming.

Type

(str)

nussl.core.constants.ALL_WINDOWS = ['hamming', 'rectangular', 'hann', 'blackman', 'triang', 'sqrt_hann']

list of all available windows in nussl

Type

list(str)

nussl.core.constants.NUMPY_JSON_KEY = 'py/numpy.ndarray'

key used when turning numpy arrays into json

Type

(str)

nussl.core.constants.LEN_INDEX = 1

Index of the number of samples in an audio signal. Used in Introduction to AudioSignals

Type

(int)

nussl.core.constants.CHAN_INDEX = 0

Index of the number of channels in an audio signal. Used in Introduction to AudioSignals

Type

(int)

nussl.core.constants.STFT_VERT_INDEX = 0

(int) Index of the number of frequency (vertical) values in a time-frequency representation. Used in Introduction to AudioSignals and in mask_base.

nussl.core.constants.STFT_LEN_INDEX = 1

(int) Index of the number of time (horizontal) hops in a time-frequency representation. Used in Introduction to AudioSignals and in mask_base.

nussl.core.constants.STFT_CHAN_INDEX = 2

(int) Index of the number of channels in a time-frequency representation. Used in Introduction to AudioSignals and in mask_base.

External File Zoo

The nussl External File Zoo (EFZ) is a server that houses all files that are too large to bundle with nussl when distributing it through pip or Github. These types of files include audio examples, benchmark files for tests, and trained neural network models.

nussl has built-in utilities for accessing the EFZ through its API. Here, it is possible to see what files are available on the EFZ and download desired files. The EFZ utilities allow for such functionality.

Exceptions

FailedDownloadError

Exception class for failed file downloads.

MetadataError

Exception class for errors with metadata.

MismatchedHashError

Exception class for when a computed hash function does match a pre-computed hash.

NoConnectivityError

Exception class for lack of internet connection.

Functions

download_audio_file(audio_file_name[, …])

Downloads the specified audio file from the nussl External File Zoo (EFZ) server.

download_benchmark_file(benchmark_name[, …])

Downloads the specified benchmark file from the nussl External File Zoo (EFZ) server.

download_trained_model(model_name[, …])

Downloads the specified trained model from the nussl External File Zoo (EFZ) server.

get_available_audio_files()

Returns a list of dicts containing metadata of the available audio files on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

get_available_benchmark_files()

Returns a list of dicts containing metadata of the available benchmark files for tests on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

get_available_trained_models()

Returns a list of dicts containing metadata of the available trained models on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

print_available_audio_files()

Prints a message to the console that shows all of the available audio files that are on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

print_available_benchmark_files()

Prints a message to the console that shows all of the available benchmark files that are on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

print_available_trained_models()

Prints a message to the console that shows all of the available trained models that are on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

exception nussl.core.efz_utils.FailedDownloadError[source]

Exception class for failed file downloads.

exception nussl.core.efz_utils.MetadataError[source]

Exception class for errors with metadata.

exception nussl.core.efz_utils.MismatchedHashError[source]

Exception class for when a computed hash function does match a pre-computed hash.

exception nussl.core.efz_utils.NoConnectivityError[source]

Exception class for lack of internet connection.

nussl.core.efz_utils.download_audio_file(audio_file_name, local_folder=None, verbose=True)[source]

Downloads the specified audio file from the nussl External File Zoo (EFZ) server. The downloaded file is stored in :param:`local_folder` if a folder is provided. If a folder is not provided, nussl attempts to save the downloaded file in ~/.nussl/ (expanded) or in tmp/.nussl. If the requested file is already in :param:`local_folder` (or one of the two aforementioned directories) and the calculated hash matches the precomputed hash from the EFZ server metadata, then the file will not be downloaded.

Parameters
  • audio_file_name – (str) Name of the audio file to attempt to download.

  • local_folder – (str) Path to local folder in which to download the file. If no folder is provided, nussl will store the file in ~/.nussl/ (expanded) or in /tmp/.nussl.

  • verbose (bool) – If True prints the status of the download to the console.

Returns

(String) Full path to the requested file (whether downloaded or not).

Example

>>> import nussl
>>> piano_path = nussl.efz_utils.download_audio_file('K0140.wav')
>>> piano_signal = nussl.AudioSignal(piano_path)
nussl.core.efz_utils.download_benchmark_file(benchmark_name, local_folder=None, verbose=True)[source]

Downloads the specified benchmark file from the nussl External File Zoo (EFZ) server. The downloaded file is stored in :param:`local_folder` if a folder is provided. If a folder is not provided, nussl attempts to save the downloaded file in ~/.nussl/ (expanded) or in /tmp/.nussl. If the requested file is already in :param:`local_folder` (or one of the two aforementioned directories) and the calculated hash matches the precomputed hash from the EFZ server metadata, then the file will not be downloaded.

Parameters
  • benchmark_name – (str) Name of the trained model to attempt to download.

  • local_folder – (str) Path to local folder in which to download the file. If no folder is provided, nussl will store the file in ~/.nussl/ (expanded) or in tmp/.nussl.

  • verbose (bool) – If True prints the status of the download to the console.

Returns

(String) Full path to the requested file (whether downloaded or not).

Example

>>> import nussl
>>> import numpy as np
>>> stm_atn_path = nussl.efz_utils.download_benchmark_file('benchmark_sym_atn.npy')
>>> sym_atm = np.load(stm_atn_path)
nussl.core.efz_utils.download_trained_model(model_name, local_folder=None, verbose=True)[source]

Downloads the specified trained model from the nussl External File Zoo (EFZ) server. The downloaded file is stored in :param:`local_folder` if a folder is provided. If a folder is not provided, nussl attempts to save the downloaded file in ~/.nussl/ (expanded) or in tmp/.nussl. If the requested file is already in :param:`local_folder` (or one of the two aforementioned directories) and the calculated hash matches the precomputed hash from the EFZ server metadata, then the file will not be downloaded.

Parameters
  • model_name – (str) Name of the trained model to attempt to download.

  • local_folder – (str) Path to local folder in which to download the file. If no folder is provided, nussl will store the file in ~/.nussl/ (expanded) or in /tmp/.nussl.

  • verbose (bool) – If True prints the status of the download to the console.

Returns

(String) Full path to the requested file (whether downloaded or not).

Example

>>> import nussl
>>> model_path = nussl.efz_utils.download_trained_model('deep_clustering_model.h5')
>>> signal = nussl.AudioSignal()
>>> piano_signal = nussl.DeepClustering(signal, model_path=model_path)
nussl.core.efz_utils.get_available_audio_files()[source]

Returns a list of dicts containing metadata of the available audio files on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

Each entry in the list is in the following format:

{
    u'file_length_seconds': 5.00390022675737,
    u'visible': True,
    u'file_name': u'K0140.wav',
    u'date_modified': u'2018-06-01',
    u'file_hash': u'f0d8d3c8d199d3790b0e42d1e5df50a6801f928d10f533149ed0babe61b5d7b5',
    u'file_size_bytes': 441388,
    u'file_description': u'Acoustic piano playing middle C.',
    u'audio_attributes': u'piano, middle C',
    u'file_size': u'431.0KiB',
    u'date_added': u'2018-06-01'
}

See also

Returns

A list of dicts containing metadata of the available audio files on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

Return type

(list)

nussl.core.efz_utils.get_available_benchmark_files()[source]

Returns a list of dicts containing metadata of the available benchmark files for tests on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

Each entry in the list is in the following format:

{
    u'for_class': u'DuetUnitTests',
    u'visible': True, u'file_name':
    u'benchmark_atn_bins.npy',
    u'date_modified': u'2018-06-19',
    u'file_hash': u'cf7fef6f4ea9af3dbde8b9880602eeaf72507b6c78f04097c5e79d34404a8a1f',
    u'file_size_bytes': 488,
    u'file_description': u'Attenuation bins numpy array for DUET benchmark test.',
    u'file_size': u'488.0B',
    u'date_added': u'2018-06-19'
}

Notes

Most of the entries in the dictionary are self-explanatory, but note the for_class entry. The for_class entry specifies which nussl benchmark class will load the corresponding benchmark file. Make sure these match exactly when writing tests!

See also

Returns

A list of dicts containing metadata of the available audio files on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

Return type

(list)

nussl.core.efz_utils.get_available_trained_models()[source]

Returns a list of dicts containing metadata of the available trained models on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

Each entry in the list is in the following format:

{
    u'for_class': u'DeepClustering',
    u'visible': True,
    u'file_name': u'deep_clustering_vocals_44k_long.model',
    u'date_modified': u'2018-06-01',
    u'file_hash': u'e09034c2cb43a293ece0b121f113b8e4e1c5a247331c71f40cb9ca38227ccc2c',
    u'file_size_bytes': 94543355,
    u'file_description': u'Deep clustering for vocal separation trained on augmented DSD100.',
    u'file_size': u'90.2MiB',
    u'date_added': u'2018-06-01'
}

Notes

Most of the entries in the dictionary are self-explanatory, but note the for_class entry. The for_class entry specifies which nussl separation class the given model will work with. Usually, nussl separation classes that require a model will default so retrieving a model on the EFZ server (if not already found on the user’s machine), but sometimes it is desirable to use a model other than the default one provided. In this case, the for_class entry lets the user know which class it is valid for use with. Additionally, trying to load a model into a class that it is not explicitly labeled for that class will raise an exception. Just don’t do it, ok?

See also

Returns

A list of dicts containing metadata of the available trained models on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

Return type

(list)

nussl.core.efz_utils.print_available_audio_files()[source]

Prints a message to the console that shows all of the available audio files that are on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

See also

Example

>>> import nussl
>>> nussl.efz_utils.print_available_audio_files()
File Name                                Duration (sec)  Size       Description
dev1_female3_inst_mix.wav                10.0            1.7MiB     Instantaneous mixture of three female speakers talking in a stereo field.
dev1_female3_synthconv_130ms_5cm_mix.wav 10.0            1.7MiB     Three female speakers talking in a stereo field, with 130ms of inter-channel delay.
K0140.wav                                5.0             431.0KiB   Acoustic piano playing middle C.
K0149.wav                                5.0             430.0KiB   Acoustic piano playing the A above middle C. (A440)

To download one of these files insert the file name as the first parameter to download_audio_file(), like so:

>>> nussl.efz_utils.download_audio_file('K0140.wav')
nussl.core.efz_utils.print_available_benchmark_files()[source]

Prints a message to the console that shows all of the available benchmark files that are on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

Example

>>> import nussl
>>> nussl.efz_utils.print_available_benchmark_files()
File Name                                For Class            Size       Description
mix3_matlab_repet_foreground.mat         TestRepet            6.4MiB     Foreground matrix for Repet class benchmark test.
benchmark_atn_bins.npy                   DuetUnitTests        488.0B     Attenuation bins numpy array for DUET benchmark test.
benchmark_sym_atn.npy                    DuetUnitTests        3.4MiB     Symmetric attenuation histogram for the DUET benchmark test.
benchmark_wmat.npy                       DuetUnitTests        3.4MiB     Frequency matrix for the DUET benchmark test.

To download one of these files insert the file name as the first parameter to nussl.download_benchmark_file, like so:

>>> nussl.efz_utils.download_benchmark_file('example.npy')

Notes

Most of the entries in the printed list are self-explanatory, but note the for_class entry. The for_class entry specifies which nussl benchmark class will load the corresponding benchmark file. Make sure these match exactly when writing tests!

See also

nussl.core.efz_utils.print_available_trained_models()[source]

Prints a message to the console that shows all of the available trained models that are on the nussl External File Zoo (EFZ) server (http://nussl.ci.northwestern.edu/).

Notes

Most of the entries in the dictionary are self-explanatory, but note the for_class entry. The for_class entry specifies which nussl separation class the given model will work with. Usually, nussl separation classes that require a model will default so retrieving a model on the EFZ server (if not already found on the user’s machine), but sometimes it is desirable to use a model other than the default one provided. In this case, the for_class entry lets the user know which class it is valid for use with. Additionally, trying to load a model into a class that it is not explicitly labeled for that class will raise an exception. Just don’t do it, ok?

See also

Example

>>> import nussl
>>> nussl.efz_utils.print_available_trained_models()
File Name                                For Class            Size       Description
deep_clustering_model.model              DeepClustering       48.1MiB    example Deep Clustering model
deep_clustering_vocal_44k_long.model     DeepClustering       90.2MiB    trained DC model for vocal extraction

To download one of these files insert the file name as the first parameter to download_trained_model(), like so:

>>> nussl.efz_utils.download_trained_model('deep_clustering_model.h5')

General utilities

Provides utilities for running nussl algorithms that do not belong to any specific algorithm or that are shared between algorithms.

Functions

audio_signals_to_musdb_track(mixture, …)

Converts AudioSignal objects to musdb Track objects that contain the mixture, the ground truth sources, and the targets for use with the mus_eval implementation of BSS-Eval and musdb.

complex_randn(shape)

Returns a complex-valued numpy array of random values with shape :param:`shape`.

find_peak_indices(input_array, n_peaks[, …])

This function will find the indices of the peaks of an input n-dimensional numpy array.

musdb_track_to_audio_signals(track)

Converts a musdb track to a dictionary of AudioSignal objects.

seed(random_seed[, set_cudnn])

Seeds all random states in nussl with the same random seed for reproducibility.

verify_audio_signal_list_lax(audio_signal_list)

Verifies that an input (:param:`audio_signal_list`) is a list of AudioSignal objects.

verify_audio_signal_list_strict(…)

Verifies that an input (:param:`audio_signal_list`) is a list of AudioSignal objects and that they all have the same sample rate and same number of channels.

visualize_gradient_flow(named_parameters[, …])

Visualize the gradient flow through the named parameters of a PyTorch model.

visualize_sources_as_masks(audio_signals[, …])

Visualizes a dictionary or list of sources with overlapping waveforms with transparency.

visualize_sources_as_waveform(audio_signals)

Visualizes a dictionary or list of sources with overlapping waveforms with transparency.

visualize_spectrogram(audio_signal[, ch, …])

Wrapper around librosa.display.specshow for usage with AudioSignals.

visualize_waveform(audio_signal[, ch, …])

Wrapper around librosa.display.waveplot for usage with AudioSignals.

nussl.core.utils.audio_signals_to_musdb_track(mixture, sources_dict, targets_dict)[source]

Converts AudioSignal objects to musdb Track objects that contain the mixture, the ground truth sources, and the targets for use with the mus_eval implementation of BSS-Eval and musdb.

See also

  • More information on musdb: Github<https://github.com/sigsep/sigsep-mus-db>

    and documentation<http://musdb.readthedocs.io/>

  • More information on mus_eval: Github<https://github.com/sigsep/sigsep-mus-eval>

    and documentation<https://sigsep.github.io/sigsep-mus-eval/>

  • BSSEvalV4 for nussl’s interface to BSS-Eval v4.

Examples

Parameters
  • mixture (AudioSignal) – The AudioSignal object that contains the mixture.

  • sources_dict (dict) – Dictionary where the keys are the labels for the sources and values are the associated AudioSignal objects.

  • targets_dict (dict) – Dictionary where the keys are the labels for the sources (as above) and the values are weights.

Returns

(musdb.MultiTrack) populated as specified by inputs.

nussl.core.utils.complex_randn(shape)[source]

Returns a complex-valued numpy array of random values with shape :param:`shape`.

Parameters

shape (tuple) – Tuple of ints that will be the shape of the resultant complex numpy array.

Returns

a complex-valued numpy array of random values with shape shape

Return type

(np.ndarray)

nussl.core.utils.find_peak_indices(input_array, n_peaks, min_dist=None, do_min=False, threshold=0.5)[source]

This function will find the indices of the peaks of an input n-dimensional numpy array. This can be configured to find max or min peak indices, distance between the peaks, and a lower bound, at which the algorithm will stop searching for peaks (or upper bound if searching for max). Used exactly the same as find_peak_values().

This function currently only accepts 1-D and 2-D numpy arrays.

Notes

  • This function only returns the indices of peaks. If you want to find peak values,

use find_peak_values().

  • min_dist can be an int or a tuple of length 2.

    If input_array is 1-D, min_dist must be an integer. If input_array is 2-D, min_dist can be an integer, in which case the minimum distance in both dimensions will be equal. min_dist can also be a tuple if you want each dimension to have a different minimum distance between peaks. In that case, the 0th value in the tuple represents the first dimension, and the 1st value represents the second dimension in the numpy array.

Parameters
  • input_array – a 1- or 2- dimensional numpy array that will be inspected.

  • n_peaks – (int) maximum number of peaks to find

  • min_dist – (int) minimum distance between peaks. Default value: len(input_array) / 4

  • do_min – (bool) if True, finds indices at minimum value instead of maximum

  • threshold – (float) the value (scaled between 0.0 and 1.0)

Returns

(list) list of the indices of the peak values

Return type

peak_indices

nussl.core.utils.musdb_track_to_audio_signals(track)[source]

Converts a musdb track to a dictionary of AudioSignal objects.

Parameters

track (musdb.audio_classes.MultiTrack) – MultiTrasack object containing stems that will each be turned into AudioSignal objects.

Returns

tuple containing the mixture AudioSignal and a dictionary of

the sources.

Return type

(2-tuple)

nussl.core.utils.seed(random_seed, set_cudnn=False)[source]

Seeds all random states in nussl with the same random seed for reproducibility. Seeds numpy, random and torch random generators.

For full reproducibility, two further options must be set according to the torch documentation:

https://pytorch.org/docs/stable/notes/randomness.html

To do this, set_cudnn must be True. It defaults to False, since setting it to True results in a performance hit.

Parameters
  • random_seed (int) – integer corresponding to random seed to

  • use.

  • set_cudnn (bool) – Whether or not to set cudnn into determinstic

  • and off of benchmark mode. Defaults to False. (mode) –

nussl.core.utils.verify_audio_signal_list_lax(audio_signal_list)[source]

Verifies that an input (:param:`audio_signal_list`) is a list of AudioSignal objects. If not so, attempts to correct the list (if possible) and returns the corrected list.

Parameters

audio_signal_list (list) – List of AudioSignal objects

Returns

Verified list of AudioSignal objects.

Return type

audio_signal_list (list)

nussl.core.utils.verify_audio_signal_list_strict(audio_signal_list)[source]

Verifies that an input (:param:`audio_signal_list`) is a list of AudioSignal objects and that they all have the same sample rate and same number of channels. If not true, attempts to correct the list (if possible) and returns the corrected list.

Parameters

audio_signal_list (list) – List of AudioSignal objects

Returns

Verified list of AudioSignal objects, that all have the same sample rate and number of channels.

Return type

audio_signal_list (list)

nussl.core.utils.visualize_gradient_flow(named_parameters, n_bins=50)[source]

Visualize the gradient flow through the named parameters of a PyTorch model.

Plots the gradients flowing through different layers in the net during training. Can be used for checking for possible gradient vanishing / exploding problems.

Usage: Plug this function in Trainer class after loss.backwards() as “visualize_gradient_flow(self.model.named_parameters())” to visualize the gradient flow

Parameters
  • named_parameters (generator) – Generator object yielding name and parameters for each layer in a PyTorch model.

  • n_bins (int) – Number of bins to use for each histogram. Defaults to 50.

nussl.core.utils.visualize_sources_as_masks(audio_signals, ch=0, do_mono=False, x_axis='time', y_axis='linear', db_cutoff=-60, colors=None, alphas=None, alpha_amount=1.0, show_legend=True, **kwargs)[source]

Visualizes a dictionary or list of sources with overlapping waveforms with transparency.

The labels of each source are either the key, if a dictionary, or the path to the input audio file, if a list.

Parameters
  • audio_signals (list or dict) – List or dictionary of audio signal objects to be plotted.

  • ch (int, optional) – Which channel to plot. Defaults to 0.

  • do_mono (bool, optional) – Make each AudioSignal mono. Defaults to False.

  • x_axis (str, optional) – x_axis argument to librosa.display.waveplot. Defaults to ‘time’.

  • colors (list, optional) – Sequence of colors to use for each signal. Defaults to None, which uses the default matplotlib color cycle.

  • alphas (list, optional) – Sequence of alpha transparency to use for each signal. Defaults to None.

  • kwargs – Additional keyword arguments to librosa.display.specshow.

nussl.core.utils.visualize_sources_as_waveform(audio_signals, ch=0, do_mono=False, x_axis='time', colors=None, alphas=None, show_legend=True, **kwargs)[source]

Visualizes a dictionary or list of sources with overlapping waveforms with transparency.

The labels of each source are either the key, if a dictionary, or the path to the input audio file, if a list.

Parameters
  • audio_signals (list or dict) – List or dictionary of audio signal objects to be plotted.

  • ch (int, optional) – Which channel to plot. Defaults to 0.

  • do_mono (bool, optional) – Make each AudioSignal mono. Defaults to False.

  • x_axis (str, optional) – x_axis argument to librosa.display.waveplot. Defaults to ‘time’.

  • colors (list, optional) – Sequence of colors to use for each signal. Defaults to None, which uses the default matplotlib color cycle.

  • alphas (list, optional) – Sequence of alpha transparency to use for each signal. Defaults to None.

  • kwargs – Additional keyword arguments to librosa.display.waveplot.

nussl.core.utils.visualize_spectrogram(audio_signal, ch=0, do_mono=False, x_axis='time', y_axis='linear', **kwargs)[source]

Wrapper around librosa.display.specshow for usage with AudioSignals.

Parameters
  • audio_signal (AudioSignal) – AudioSignal to plot

  • ch (int, optional) – Which channel to plot. Defaults to 0.

  • do_mono (bool, optional) – Make the AudioSignal mono. Defaults to False.

  • x_axis (str, optional) – x_axis argument to librosa.display.specshow. Defaults to ‘time’.

  • y_axis (str, optional) – y_axis argument to librosa.display.specshow. Defaults to ‘linear’.

  • kwargs – Additional keyword arguments to librosa.display.specshow.

nussl.core.utils.visualize_waveform(audio_signal, ch=0, do_mono=False, x_axis='time', **kwargs)[source]

Wrapper around librosa.display.waveplot for usage with AudioSignals.

Parameters
  • audio_signal (AudioSignal) – AudioSignal to plot

  • ch (int, optional) – Which channel to plot. Defaults to 0.

  • do_mono (bool, optional) – Make the AudioSignal mono. Defaults to False.

  • x_axis (str, optional) – x_axis argument to librosa.display.waveplot. Defaults to ‘time’.

  • kwargs – Additional keyword arguments to librosa.display.waveplot.

Mixing

Small collection of utilities for altering and remixing AudioSignal objects.

TODO: add pitch_shift, time_stretch, dynamic_range_compression, and apply_impulse_response

Functions

delay_audio_signal(audio_signal, …)

Delays an audio signal by the desired number of samples per channel.

pan_audio_signal(audio_signal, angle_in_degrees)

Pans an audio signal left or right by the desired number of degrees.

nussl.core.mixing.delay_audio_signal(audio_signal, delays_in_samples)[source]

Delays an audio signal by the desired number of samples per channel. This returns a copy of the input audio signal.

Delay must be positive. The end of the audio signal is truncated for that channel so that the length remains the same as the original.

Parameters
  • audio_signal (AudioSignal) – Audio signal to be panned.

  • delays_in_samples (list of int) – List of delays to apply to each channel. Should have the same length as number of channels in the AudioSignal.

Raises

ValueError – If length of delays_in_samples`does not match number of channels in the audio signal. Or if any items in `delays_in_samples are float type. Or if any delays are negative.

Returns

Audio signal with each channel delayed by the specified number

samples in delays_in_samples.

Return type

AudioSignal

nussl.core.mixing.pan_audio_signal(audio_signal, angle_in_degrees)[source]

Pans an audio signal left or right by the desired number of degrees. This returns a copy of the input audio signal.

Use negative numbers to pan left, positive to pan right. Angles outside of the range [-45, 45] raise an error.

Parameters
  • audio_signal (AudioSignal) – Audio signal to be panned.

  • angle_in_degrees (float) – Angle in degrees to pan by, between -45 and 45.

Raises

ValueError – Angles outside of the range [-45, 45] raise an error.

Returns

Audio signal panned by angle_in_degrees.

Return type

AudioSignal

Playing and embedding audio

These are optional utilities included in nussl that allow one to embed an AudioSignal as a playable object in a Jupyter notebook, or to play audio from the terminal.

Functions

embed_audio(audio_signal[, ext, display])

Write a numpy array to a temporary mp3 file using ffmpy, then embeds the mp3 into the notebook.

multitrack(audio_signals[, names, ext, display])

Takes a bunch of audio sources, converts them to mp3 to make them smaller, and creates a multitrack audio player in the notebook that lets you toggle between the sources and the mixture.

play(audio_signal)

Plays an audio signal if ffplay from the ffmpeg suite of tools is installed.

nussl.core.play_utils.embed_audio(audio_signal, ext='.mp3', display=True)[source]

Write a numpy array to a temporary mp3 file using ffmpy, then embeds the mp3 into the notebook.

Parameters
  • audio_signal (AudioSignal) – AudioSignal object containing the data.

  • ext (str) – What extension to use when embedding. ‘.mp3’ is more lightweight leading to smaller notebook sizes. Defaults to ‘.mp3’.

  • display (bool) – Whether or not to display the object immediately, or to return the html object for display later by the end user. Defaults to True.

Example

>>> import nussl
>>> audio_file = nussl.efz_utils.download_audio_file('schoolboy_fascination_excerpt.wav')
>>> audio_signal = nussl.AudioSignal(audio_file)
>>> audio_signal.embed_audio()

This will show a little audio player where you can play the audio inline in the notebook.

nussl.core.play_utils.multitrack(audio_signals, names=None, ext='.mp3', display=True)[source]

Takes a bunch of audio sources, converts them to mp3 to make them smaller, and creates a multitrack audio player in the notebook that lets you toggle between the sources and the mixture. Heavily adapted from https://github.com/binarymind/multitrackHTMLPlayer, designed by Bastien Liutkus.

Parameters
  • audio_signals (list) – List of AudioSignal objects that add up to the mixture.

  • names (list) – List of names to give to each object (e.g. foreground, background).

  • ext (str) – What extension to use when embedding. ‘.mp3’ is more lightweight leading to smaller notebook sizes.

  • display (bool) – Whether or not to display the object immediately, or to return the html object for display later by the end user.

nussl.core.play_utils.play(audio_signal)[source]

Plays an audio signal if ffplay from the ffmpeg suite of tools is installed. Otherwise, will fail. The audio signal is written to a temporary file and then played with ffplay.

Parameters

audio_signal (AudioSignal) – AudioSignal object to be played.