Machine Learning¶
SeparationModel¶
-
class
nussl.ml.
SeparationModel
(config, verbose=False)[source]¶ SeparationModel takes a configuration file or dictionary that describes the model structure, which is some combination of MelProjection, Embedding, RecurrentStack, ConvolutionalStack, and other modules found in
nussl.ml.networks.modules
.References
Methods
forward
(data)- param data
(dict) a dictionary containing the input data for the model.
save
(location[, metadata])Saves a SeparationModel into a location into a dictionary with the weights and model configuration.
Hershey, J. R., Chen, Z., Le Roux, J., & Watanabe, S. (2016, March). Deep clustering: Discriminative embeddings for segmentation and separation. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 31-35). IEEE.
Luo, Y., Chen, Z., Hershey, J. R., Le Roux, J., & Mesgarani, N. (2017, March). Deep clustering and conventional networks for music separation: Stronger together. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on (pp. 61-65). IEEE.
- Parameters
config – (str, dict) Either a config dictionary that defines the model and its connections, or the path to a json file containing the dictionary. If the latter, the path will be loaded and used.
See also
ml.register_module to register your custom modules with SeparationModel.
Examples
>>> config = nussl.ml.networks.builders.build_recurrent_dpcl( >>> num_features=512, hidden_size=300, num_layers=3, bidirectional=True, >>> dropout=0.3, embedding_size=20, >>> embedding_activation=['sigmoid', 'unit_norm']) >>> >>> model = SeparationModel(config)
Building blocks for SeparationModel¶
Helpers for common deep networks¶
Functions that make it easy to build commonly used source separation architectures. Currently contains mask inference, deep clustering, and chimera networks that are based on recurrent neural networks. These functions are a good place to start when creating your own network toplogies. Since there can be dependencies between layers depending on input size, it’s good to work this out in a function like those below.
Functions
Builds a config for a dual path recurrent network that operates on the time-series. |
|
|
This is a builder for an open-unmix LIKE (UMX) architecture for music source separation. |
|
Builds a config for a Chimera network that can be passed to SeparationModel. |
|
Builds a config for a deep clustering network that can be passed to SeparationModel. |
|
Builds a config for a BLSTM-based network that operates on the time-series. |
|
Builds a config for a mask inference network that can be passed to SeparationModel. |
-
nussl.ml.networks.builders.
build_dual_path_recurrent_end_to_end
(num_filters, filter_length, hop_length, chunk_size, hop_size, hidden_size, num_layers, bidirectional, bottleneck_size, num_sources, mask_activation, num_audio_channels=1, window_type='sqrt_hann', skip_connection=False, rnn_type='lstm', mix_key='mix_audio')[source]¶ Builds a config for a dual path recurrent network that operates on the time-series. Uses a learned filterbank within the network.
- Parameters
num_filters (int) – Number of learnable filters in the front end network.
filter_length (int) – Length of the filters.
hop_length (int) – Hop length between frames.
window_type (str) – Type of windowing function on apply to each frame.
hidden_size (int) – Hidden size of the RNN.
num_layers (int) – Number of layers in the RNN.
bidirectional (int) – Whether the RNN is bidirectional.
dropout (float) – Amount of dropout to be used between layers of RNN.
num_sources (int) – Number of sources to create masks for.
mask_activation (list of str) – Activation of the mask (‘sigmoid’, ‘softmax’, etc.). See
nussl.ml.networks.modules.Embedding
.num_audio_channels (int) – Number of audio channels in input (e.g. mono or stereo). Defaults to 1.
rnn_type (str, optional) – RNN type, either ‘lstm’ or ‘gru’. Defaults to ‘lstm’.
normalization_class (str, optional) – Type of normalization to apply, either ‘InstanceNorm’ or ‘BatchNorm’. Defaults to ‘BatchNorm’.
mix_key (str, optional) – The key to look for in the input dictionary that contains the mixture spectrogram. Defaults to ‘mix_magnitude’.
- Returns
- A TASNet configuration that can be passed to
SeparationModel.
- Return type
dict
-
nussl.ml.networks.builders.
build_open_unmix_like
(num_features, hidden_size, num_layers, bidirectional, dropout, num_sources, num_audio_channels=1, add_embedding=False, embedding_size=20, embedding_activation='sigmoid', rnn_type='lstm', mix_key='mix_magnitude')[source]¶ This is a builder for an open-unmix LIKE (UMX) architecture for music source separation.
The architecture is not exactly the same but is very similar for the most part. This architecture also has the option of having an embedding space attached to it, making it a UMX + Chimera network that you can regularize with a deep clustering loss.
- Parameters
num_features (int) – Number of features in the input spectrogram (usually means window length of STFT // 2 + 1.)
hidden_size (int) – Hidden size of the RNN. Will be hidden_size // 2 if bidirectional is True.
num_layers (int) – Number of layers in the RNN.
bidirectional (int) – Whether the RNN is bidirectional.
dropout (float) – Amount of dropout to be used between layers of RNN.
num_sources (int) – Number of sources to create masks for.
num_audio_channels (int) – Number of audio channels in input (e.g. mono or stereo). Defaults to 1.
add_embedding (bool) – Whether or not to add an embedding layer to this to make this a Chimera network. If True, then
embedding_size
andembedding_activation
will be used to define this.embedding_size (int) – Embedding dimensionality of the deep clustering network.
embedding_activation (list of str) – Activation of the embedding (‘sigmoid’, ‘softmax’, etc.). See
nussl.ml.networks.modules.Embedding
.rnn_type (str, optional) – RNN type, either ‘lstm’ or ‘gru’. Defaults to ‘lstm’.
mix_key (str, optional) – The key to look for in the input dictionary that contains the mixture spectrogram. Defaults to ‘mix_magnitude’.
- Returns
- An OpenUnmix-like configuration that can be passed to
SeparationModel.
- Return type
dict
-
nussl.ml.networks.builders.
build_recurrent_chimera
(num_features, hidden_size, num_layers, bidirectional, dropout, embedding_size, embedding_activation, num_sources, mask_activation, num_audio_channels=1, rnn_type='lstm', normalization_class='BatchNorm', mix_key='mix_magnitude')[source]¶ Builds a config for a Chimera network that can be passed to SeparationModel. Chimera networks are so-called because they have two “heads” which can be trained via different loss functions. In traditional Chimera, one head is trained using a deep clustering loss while the other is trained with a mask inference loss. This Chimera network uses a recurrent neural network (RNN) to process the input representation.
- Parameters
num_features (int) – Number of features in the input spectrogram (usually means window length of STFT // 2 + 1.)
hidden_size (int) – Hidden size of the RNN.
num_layers (int) – Number of layers in the RNN.
bidirectional (int) – Whether the RNN is bidirectional.
dropout (float) – Amount of dropout to be used between layers of RNN.
embedding_size (int) – Embedding dimensionality of the deep clustering network.
embedding_activation (list of str) – Activation of the embedding (‘sigmoid’, ‘softmax’, etc.). See
nussl.ml.networks.modules.Embedding
.num_sources (int) – Number of sources to create masks for.
mask_activation (list of str) – Activation of the mask (‘sigmoid’, ‘softmax’, etc.). See
nussl.ml.networks.modules.Embedding
.num_audio_channels (int) – Number of audio channels in input (e.g. mono or stereo). Defaults to 1.
rnn_type (str, optional) – RNN type, either ‘lstm’ or ‘gru’. Defaults to ‘lstm’. normalization_class (str, optional): Type of normalization to apply, either
or 'BatchNorm'. Defaults to 'BatchNorm'. ('InstanceNorm') –
mix_key (str, optional) – The key to look for in the input dictionary that contains the mixture spectrogram. Defaults to ‘mix_magnitude’.
- Returns
- A recurrent Chimera network configuration that can be passed to
SeparationModel.
- Return type
dict
-
nussl.ml.networks.builders.
build_recurrent_dpcl
(num_features, hidden_size, num_layers, bidirectional, dropout, embedding_size, embedding_activation, num_audio_channels=1, rnn_type='lstm', normalization_class='BatchNorm', mix_key='mix_magnitude')[source]¶ Builds a config for a deep clustering network that can be passed to SeparationModel. This deep clustering network uses a recurrent neural network (RNN) to process the input representation.
- Parameters
num_features (int) – Number of features in the input spectrogram (usually means window length of STFT // 2 + 1.)
hidden_size (int) – Hidden size of the RNN.
num_layers (int) – Number of layers in the RNN.
bidirectional (int) – Whether the RNN is bidirectional.
dropout (float) – Amount of dropout to be used between layers of RNN.
embedding_size (int) – Embedding dimensionality of the deep clustering network.
embedding_activation (list of str) – Activation of the embedding (‘sigmoid’, ‘softmax’, etc.). See
nussl.ml.networks.modules.Embedding
.num_audio_channels (int) – Number of audio channels in input (e.g. mono or stereo). Defaults to 1.
rnn_type (str, optional) – RNN type, either ‘lstm’ or ‘gru’. Defaults to ‘lstm’.
normalization_class (str, optional) – Type of normalization to apply, either ‘InstanceNorm’ or ‘BatchNorm’. Defaults to ‘BatchNorm’.
mix_key (str, optional) – The key to look for in the input dictionary that contains the mixture spectrogram. Defaults to ‘mix_magnitude’.
- Returns
- A recurrent deep clustering network configuration that can be passed to
SeparationModel.
- Return type
dict
-
nussl.ml.networks.builders.
build_recurrent_end_to_end
(num_filters, filter_length, hop_length, window_type, hidden_size, num_layers, bidirectional, dropout, num_sources, mask_activation, num_audio_channels=1, mask_complex=False, trainable=False, rnn_type='lstm', mix_key='mix_audio', normalization_class='BatchNorm')[source]¶ Builds a config for a BLSTM-based network that operates on the time-series. Uses an STFT within the network and can apply the mixture phase to the estimate, or can learn a mask on the phase as well as the magnitude.
- Parameters
num_filters (int) – Number of learnable filters in the front end network.
filter_length (int) – Length of the filters.
hop_length (int) – Hop length between frames.
window_type (str) – Type of windowing function on apply to each frame.
hidden_size (int) – Hidden size of the RNN.
num_layers (int) – Number of layers in the RNN.
bidirectional (int) – Whether the RNN is bidirectional.
dropout (float) – Amount of dropout to be used between layers of RNN.
num_sources (int) – Number of sources to create masks for.
mask_activation (list of str) – Activation of the mask (‘sigmoid’, ‘softmax’, etc.). See
nussl.ml.networks.modules.Embedding
.num_audio_channels (int) – Number of audio channels in input (e.g. mono or stereo). Defaults to 1.
mask_complex (bool, optional) – Whether to also place a mask on the complex part, or whether to just use the mixture phase.
trainable (bool, optional) – Whether to learn the filters, which start from a Fourier basis.
rnn_type (str, optional) – RNN type, either ‘lstm’ or ‘gru’. Defaults to ‘lstm’.
normalization_class (str, optional) – Type of normalization to apply, either ‘InstanceNorm’ or ‘BatchNorm’. Defaults to ‘BatchNorm’.
mix_key (str, optional) – The key to look for in the input dictionary that contains the mixture spectrogram. Defaults to ‘mix_magnitude’.
- Returns
- A recurrent end-to-end network configuration that can be passed to
SeparationModel.
- Return type
dict
-
nussl.ml.networks.builders.
build_recurrent_mask_inference
(num_features, hidden_size, num_layers, bidirectional, dropout, num_sources, mask_activation, num_audio_channels=1, rnn_type='lstm', normalization_class='BatchNorm', mix_key='mix_magnitude')[source]¶ Builds a config for a mask inference network that can be passed to SeparationModel. This mask inference network uses a recurrent neural network (RNN) to process the input representation.
- Parameters
num_features (int) – Number of features in the input spectrogram (usually means window length of STFT // 2 + 1.)
hidden_size (int) – Hidden size of the RNN.
num_layers (int) – Number of layers in the RNN.
bidirectional (int) – Whether the RNN is bidirectional.
dropout (float) – Amount of dropout to be used between layers of RNN.
num_sources (int) – Number of sources to create masks for.
mask_activation (list of str) – Activation of the mask (‘sigmoid’, ‘softmax’, etc.). See
nussl.ml.networks.modules.Embedding
.num_audio_channels (int) – Number of audio channels in input (e.g. mono or stereo). Defaults to 1.
rnn_type (str, optional) – RNN type, either ‘lstm’ or ‘gru’. Defaults to ‘lstm’.
normalization_class (str, optional) – Type of normalization to apply, either ‘InstanceNorm’ or ‘BatchNorm’. Defaults to ‘BatchNorm’.
mix_key (str, optional) – The key to look for in the input dictionary that contains the mixture spectrogram. Defaults to ‘mix_magnitude’.
- Returns
- A recurrent mask inference network configuration that can be passed to
SeparationModel.
- Return type
dict
Confidence measures¶
There are ways to measure the quality of a separated source without requiring ground truth. These functions operate on the output of clustering-based separation algorithms and work by analyzing the clusterability of the feature space used to generate the separated sources.
Functions
|
Computes the clusterability in two steps: |
|
Calculates the clusterability of a space by comparing a K-cluster GMM with a 1-cluster GMM on the same features. |
|
Compute Jensen-Shannon (JS) divergence between two Gaussian Mixture Models via sampling. |
|
Computes the clusterability of the feature space by comparing the absolute size of each cluster. |
|
Calculates the clusterability of an embedding space by looking at the strength of the assignments of each point to a specific cluster. |
|
Uses the silhouette score to compute the clusterability of the feature space. |
|
Computes the clusterability in two steps: |
-
nussl.ml.confidence.
dpcl_classic_confidence
(audio_signal, features, num_sources, threshold=95, **kwargs)[source]¶ Computes the clusterability in two steps:
Cluster the feature space using KMeans into assignments
Compute the classic deep clustering loss between the features and the assignments.
- Parameters
audio_signal (AudioSignal) – AudioSignal object which will be used to compute the mask over which to compute the confidence measure. This can be None, if and only if
representation
is passed as a keyword argument to this function.features (np.ndarray) – Numpy array containing the features to be clustered. Should have the same dimensions as the representation.
n_sources (int) – Number of sources to cluster the features into.
threshold (int, optional) – Threshold by loudness. Points below the threshold are excluded from being used in the confidence measure. Defaults to 95.
kwargs – Keyword arguments to _get_loud_bins_mask. Namely, representation can go here as a keyword argument.
- Returns
Confidence given by deep clustering loss.
- Return type
float
-
nussl.ml.confidence.
jensen_shannon_confidence
(audio_signal, features, num_sources, threshold=95, n_samples=100000, **kwargs)[source]¶ Calculates the clusterability of a space by comparing a K-cluster GMM with a 1-cluster GMM on the same features. This function fits two GMMs to all of the points that are above the specified threshold (defaults to 95: 95th percentile of all the data). This saves on computation time and also allows one to have the confidence measure only focus on the louder more perceptually important points.
References:
Seetharaman, Prem, Gordon Wichern, Jonathan Le Roux, and Bryan Pardo. “Bootstrapping Single-Channel Source Separation via Unsupervised Spatial Clustering on Stereo Mixtures”. 44th International Conference on Acoustics, Speech, and Signal Processing, Brighton, UK, May, 2019
Seetharaman, Prem. Bootstrapping the Learning Process for Computer Audition. Diss. Northwestern University, 2019.
- Parameters
audio_signal (AudioSignal) – AudioSignal object which will be used to compute the mask over which to compute the confidence measure. This can be None, if and only if
representation
is passed as a keyword argument to this function.features (np.ndarray) – Numpy array containing the features to be clustered. Should have the same dimensions as the representation.
n_sources (int) – Number of sources to cluster the features into.
threshold (int, optional) – Threshold by loudness. Points below the threshold are excluded from being used in the confidence measure. Defaults to 95.
kwargs – Keyword arguments to _get_loud_bins_mask. Namely, representation can go here as a keyword argument.
- Returns
Confidence given by Jensen-Shannon divergence.
- Return type
float
-
nussl.ml.confidence.
jensen_shannon_divergence
(gmm_p, gmm_q, n_samples=100000)[source]¶ Compute Jensen-Shannon (JS) divergence between two Gaussian Mixture Models via sampling. JS divergence is also known as symmetric Kullback-Leibler divergence. JS divergence has no closed form in general for GMMs, thus we use sampling to compute it.
- Parameters
gmm_p (GaussianMixture) – A GaussianMixture class fit to some data.
gmm_q (GaussianMixture) – Another GaussianMixture class fit to some data.
n_samples (int) – Number of samples to use to estimate JS divergence.
- Returns
JS divergence between gmm_p and gmm_q
-
nussl.ml.confidence.
loudness_confidence
(audio_signal, features, num_sources, threshold=95, **kwargs)[source]¶ Computes the clusterability of the feature space by comparing the absolute size of each cluster.
References:
Seetharaman, Prem, Gordon Wichern, Jonathan Le Roux, and Bryan Pardo. “Bootstrapping Single-Channel Source Separation via Unsupervised Spatial Clustering on Stereo Mixtures”. 44th International Conference on Acoustics, Speech, and Signal Processing, Brighton, UK, May, 2019
Seetharaman, Prem. Bootstrapping the Learning Process for Computer Audition. Diss. Northwestern University, 2019.
- Parameters
audio_signal (AudioSignal) – AudioSignal object which will be used to compute the mask over which to compute the confidence measure. This can be None, if and only if
representation
is passed as a keyword argument to this function.features (np.ndarray) – Numpy array containing the features to be clustered. Should have the same dimensions as the representation.
n_sources (int) – Number of sources to cluster the features into.
threshold (int, optional) – Threshold by loudness. Points below the threshold are excluded from being used in the confidence measure. Defaults to 95.
kwargs – Keyword arguments to _get_loud_bins_mask. Namely, representation can go here as a keyword argument.
- Returns
Confidence given by size of smallest cluster.
- Return type
float
-
nussl.ml.confidence.
posterior_confidence
(audio_signal, features, num_sources, threshold=95, **kwargs)[source]¶ Calculates the clusterability of an embedding space by looking at the strength of the assignments of each point to a specific cluster. The more points that are “in between” clusters (e.g. no strong assignmment), the lower the clusterability.
References:
Seetharaman, Prem, Gordon Wichern, Jonathan Le Roux, and Bryan Pardo. “Bootstrapping Single-Channel Source Separation via Unsupervised Spatial Clustering on Stereo Mixtures”. 44th International Conference on Acoustics, Speech, and Signal Processing, Brighton, UK, May, 2019
Seetharaman, Prem. Bootstrapping the Learning Process for Computer Audition. Diss. Northwestern University, 2019.
- Parameters
audio_signal (AudioSignal) – AudioSignal object which will be used to compute the mask over which to compute the confidence measure. This can be None, if and only if
representation
is passed as a keyword argument to this function.features (np.ndarray) – Numpy array containing the features to be clustered. Should have the same dimensions as the representation.
n_sources (int) – Number of sources to cluster the features into.
threshold (int, optional) – Threshold by loudness. Points below the threshold are excluded from being used in the confidence measure. Defaults to 95.
kwargs – Keyword arguments to _get_loud_bins_mask. Namely, representation can go here as a keyword argument.
- Returns
Confidence given by posteriors.
- Return type
float
-
nussl.ml.confidence.
silhouette_confidence
(audio_signal, features, num_sources, threshold=95, max_points=1000, **kwargs)[source]¶ Uses the silhouette score to compute the clusterability of the feature space.
The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of. Note that Silhouette Coefficient is only defined if number of labels is 2 <= n_labels <= n_samples - 1.
References:
Seetharaman, Prem. Bootstrapping the Learning Process for Computer Audition. Diss. Northwestern University, 2019.
Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65.
- Parameters
audio_signal (AudioSignal) – AudioSignal object which will be used to compute the mask over which to compute the confidence measure. This can be None, if and only if
representation
is passed as a keyword argument to this function.features (np.ndarray) – Numpy array containing the features to be clustered. Should have the same dimensions as the representation.
n_sources (int) – Number of sources to cluster the features into.
threshold (int, optional) – Threshold by loudness. Points below the threshold are excluded from being used in the confidence measure. Defaults to 95.
kwargs – Keyword arguments to _get_loud_bins_mask. Namely, representation can go here as a keyword argument.
max_points (int, optional) – Maximum number of points to compute the Silhouette score for. Silhouette score is a costly operation. Defaults to 1000.
- Returns
Confidence given by Silhouette score.
- Return type
float
-
nussl.ml.confidence.
whitened_kmeans_confidence
(audio_signal, features, num_sources, threshold=95, **kwargs)[source]¶ Computes the clusterability in two steps:
Cluster the feature space using KMeans into assignments
Compute the Whitened K-Means loss between the features and the assignments.
- Parameters
audio_signal (AudioSignal) – AudioSignal object which will be used to compute the mask over which to compute the confidence measure. This can be None, if and only if
representation
is passed as a keyword argument to this function.features (np.ndarray) – Numpy array containing the features to be clustered. Should have the same dimensions as the representation.
n_sources (int) – Number of sources to cluster the features into.
threshold (int, optional) – Threshold by loudness. Points below the threshold are excluded from being used in the confidence measure. Defaults to 95.
kwargs – Keyword arguments to _get_loud_bins_mask. Namely, representation can go here as a keyword argument.
- Returns
Confidence given by whitened k-means loss.
- Return type
float
Training¶
Training¶
-
nussl.ml.train.
create_train_and_validation_engines
(train_func, val_func=None, device='cpu')[source]¶ Helper function for creating an ignite Engine object with helpful defaults. This sets up an Engine that has four handlers attached to it:
prepare_batch: before a batch is passed to train_func or val_func, this function runs, moving every item in the batch (which is a dictionary) to the appropriate device (‘cpu’ or ‘cuda’).
book_keeping: sets up some dictionaries that are used for bookkeeping so one can easily track the epoch and iteration losses for both training and validation.
add_to_iter_history: records the iteration, epoch, and past iteration losses into the dictionaries set up by book_keeping.
clear_iter_history: resets the current iteration history of losses after moving the current iteration history into past iteration history.
- Parameters
train_func (func) – Function that provides the closure for training for a single batch.
val_func (func, optional) – Function that provides the closure for validating a single batch. Defaults to None.
device (str, optional) – Device to move tensors to. Defaults to ‘cpu’.
-
nussl.ml.train.
add_tensorboard_handler
(tensorboard_folder, engine, every_iteration=False)[source]¶ Every key in engine.state.epoch_history[-1] is logged to TensorBoard.
- Parameters
tensorboard_folder (str) – Where the tensorboard logs should go.
trainer (ignite.Engine) – The engine to log.
every_iteration (bool, optional) – Whether to also log the values at every iteration.
-
nussl.ml.train.
cache_dataset
(dataset)[source]¶ Runs through an entire dataset and caches it if there nussl.datasets.transforms.Cache is in dataset.transform. If there is no caching, or dataset.cache_populated = True, then this function just iterates through the dataset and does nothing.
This function can also take a torch.util.data.DataLoader object wrapped around a nussl.datasets.BaseDataset object.
- Parameters
dataset (nussl.datasets.BaseDataset) – Must be a subclass of nussl.datasets.BaseDataset.
-
nussl.ml.train.
add_validate_and_checkpoint
(output_folder, model, optimizer, train_data, trainer, val_data=None, validator=None)[source]¶ This adds the following handler to the trainer:
validate_and_checkpoint: this runs the validator on the validation dataset (
val_data
) using a defined validation process functionval_func
. These are optional. If these are not provided, then no validator is run and the model is simply checkpointed. The model is always saved to{output_folder}/checkpoints/latest.model.pth
. If the model is also the one with the lowest validation loss, then it is also saved to{output_folder}/checkpoints/best.model.pth. This is attached to ``Events.EPOCH_COMPLETED
on the trainer. After completion, it fires aValidationEvents.VALIDATION_COMPLETED
event.
- Parameters
model (torch.nn.Module) – Model that is being trained (typically a SeparationModel). optimizer (torch.optim.Optimizer): Optimizer being used to train.
train_data (BaseDataset) – dataset that is being used to train the model. This is to save additional metadata information alongside the model checkpoint such as the STFTParams, dataset folder, length, list of transforms, etc.
trainer (ignite.Engine) – Engine for trainer
validator (ignite.Engine, optional) – Engine for validation. Defaults to None.
val_data (torch.utils.data.Dataset, optional) – The validation data. Defaults to None.
-
nussl.ml.train.
add_stdout_handler
(trainer, validator=None)[source]¶ This adds the following handler to the trainer engine, and also sets up Timers:
log_epoch_to_stdout: This logs the results of a model after it has trained for a single epoch on both the training and validation set. The output typically looks like this:
EPOCH SUMMARY ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Epoch number: 0010 / 0010 - Training loss: 0.583591 - Validation loss: 0.137209 - Epoch took: 00:00:03 - Time since start: 00:00:32 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Saving to test. Output @ tests/local/trainer
- Parameters
trainer (ignite.Engine) – Engine for trainer
validator (ignite.Engine, optional) – Engine for validation. Defaults to None.
-
nussl.ml.train.
add_progress_bar_handler
(*engines)[source]¶ Adds a progress bar to each engine. Keeps track of a running average of the loss as well.
Usage:
.. code-block:: python
tr_engine, val_engine = … add_progress_bar_handler(tr_engine, val_engine)
Loss functions¶
Classes
|
Variant on Permutation Invariant Loss where instead a combination of the sources output by the model are used. |
Computes the deep clustering loss with weights. |
|
|
|
|
|
|
|
|
Computes the Permutation Invariant Loss (PIT) [1] by permuting the estimated sources and the reference sources. |
|
Computes the Scale-Invariant Source-to-Distortion Ratio between a batch of estimated and reference audio signals. |
Computes the whitened K-Means loss with weights. |
-
class
nussl.ml.train.loss.
CombinationInvariantLoss
(loss_function)[source]¶ Variant on Permutation Invariant Loss where instead a combination of the sources output by the model are used. This way a model can output more sources than there are in the ground truth. A subset of the output sources will be compared using Permutation Invariant Loss with the ground truth estimates.
For when you’re trying to match the estimates to the sources but you don’t know the order in which your model outputs the estimates AND you are outputting more estimates then there are sources.
Attributes
dict() -> new empty dictionary
Methods
forward
(estimates, targets)Defines the computation performed at every call.
-
DEFAULT_KEYS
= {'estimates': 'estimates', 'source_magnitudes': 'targets'}¶
-
forward
(estimates, targets)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
nussl.ml.train.loss.
DeepClusteringLoss
[source]¶ Computes the deep clustering loss with weights. Equation (7) in [1].
References:
- [1] Wang, Z. Q., Le Roux, J., & Hershey, J. R. (2018, April).
Alternative Objective Functions for Deep Clustering. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Attributes
dict() -> new empty dictionary
Methods
forward
(embedding, assignments, weights)Defines the computation performed at every call.
-
DEFAULT_KEYS
= {'embedding': 'embedding', 'ideal_binary_mask': 'assignments', 'weights': 'weights'}¶
-
forward
(embedding, assignments, weights)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
nussl.ml.train.loss.
KLDivLoss
(size_average=None, reduce=None, reduction='mean')[source]¶ Attributes
dict() -> new empty dictionary
-
DEFAULT_KEYS
= {'estimates': 'input', 'source_magnitudes': 'target'}¶
-
-
class
nussl.ml.train.loss.
L1Loss
(size_average=None, reduce=None, reduction='mean')[source]¶ Attributes
dict() -> new empty dictionary
-
DEFAULT_KEYS
= {'estimates': 'input', 'source_magnitudes': 'target'}¶
-
-
class
nussl.ml.train.loss.
MSELoss
(size_average=None, reduce=None, reduction='mean')[source]¶ Attributes
dict() -> new empty dictionary
-
DEFAULT_KEYS
= {'estimates': 'input', 'source_magnitudes': 'target'}¶
-
-
class
nussl.ml.train.loss.
PermutationInvariantLoss
(loss_function)[source]¶ Computes the Permutation Invariant Loss (PIT) [1] by permuting the estimated sources and the reference sources. Takes the best permutation and only backprops the loss from that.
For when you’re trying to match the estimates to the sources but you don’t know the order in which your model outputs the estimates.
References:
- [1] Yu, Dong, Morten Kolbæk, Zheng-Hua Tan, and Jesper Jensen.
“Permutation invariant training of deep models for speaker-independent multi-talker speech separation.” In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 241-245. IEEE, 2017.
Attributes
dict() -> new empty dictionary
Methods
forward
(estimates, targets)Defines the computation performed at every call.
-
DEFAULT_KEYS
= {'estimates': 'estimates', 'source_magnitudes': 'targets'}¶
-
forward
(estimates, targets)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
nussl.ml.train.loss.
SISDRLoss
(scaling=True, return_scaling=False, reduction='mean', zero_mean=True)[source]¶ Computes the Scale-Invariant Source-to-Distortion Ratio between a batch of estimated and reference audio signals. Used in end-to-end networks. This is essentially a batch PyTorch version of the function
nussl.evaluation.bss_eval.scale_bss_eval
and can be used to compute SI-SDR or SNR.- Parameters
scaling (bool, optional) – Whether to use scale-invariant (True) or signal-to-noise ratio (False). Defaults to True.
return_scaling (bool, optional) – Whether to only return the scaling factor that the estimate gets scaled by relative to the reference. This is just for monitoring this value during training, don’t actually train with it! Defaults to False.
reduction (str, optional) – How to reduce across the batch (either ‘mean’, ‘sum’, or none). Defaults to ‘mean’.
zero_mean (bool, optional) – Zero mean the references and estimates before computing the loss. Defaults to True.
Attributes
dict() -> new empty dictionary
Methods
forward
(estimates, references)Defines the computation performed at every call.
-
DEFAULT_KEYS
= {'audio': 'estimates', 'source_audio': 'references'}¶
-
forward
(estimates, references)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
nussl.ml.train.loss.
WhitenedKMeansLoss
[source]¶ Computes the whitened K-Means loss with weights. Equation (6) in [1].
References:
- [1] Wang, Z. Q., Le Roux, J., & Hershey, J. R. (2018, April).
Alternative Objective Functions for Deep Clustering. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Attributes
dict() -> new empty dictionary
Methods
forward
(embedding, assignments, weights)Defines the computation performed at every call.
-
DEFAULT_KEYS
= {'embedding': 'embedding', 'ideal_binary_mask': 'assignments', 'weights': 'weights'}¶
-
forward
(embedding, assignments, weights)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Closures¶
Classes
|
Closures are used with ignite Engines to train a model given an optimizer and a set of loss functions. |
|
This closure takes an optimization step on a SeparationModel object given a loss. |
|
This closure validates the model on some data dictionary. |
Exceptions
Exception class for errors when working with closures in nussl. |
-
class
nussl.ml.train.closures.
Closure
(loss_dictionary, combination_approach='combine_by_sum', *args, **kwargs)[source]¶ Closures are used with ignite Engines to train a model given an optimizer and a set of loss functions. Closures perform forward passes of models given the input data. The loss is computed via
self.compute_loss
. The forward pass is implemented via the objects__call__
function.This closure object provides a way to define the loss functions you want to use to train your model as a loss dictionary that is structured as follows:
loss_dictionary = { 'LossClassName': { 'weight': [how much to weight the loss in the sum, defaults to 1], 'keys': [key mapping items in dictionary to arguments to loss], 'args': [any positional arguments to the loss class], 'kwargs': [keyword arguments to the loss class], } }
Methods
combine_by_multiply
(loss_output)combine_by_multitask
(loss_output)Implements a multitask learning objective [1] where each loss
combine_by_sum
(loss_output)compute_loss
(output, target)The keys value will default to
LossClassName.DEFAULT_KEYS
, which can be found innussl.ml.train.loss
within each available class. Here’s an example of a Chimera loss combining deep clustering with permutation invariant L1 loss:loss_dictionary = { 'DeepClusteringLoss': { 'weight': .2, }, 'PermutationInvariantLoss': { 'weight': .8, 'args': ['L1Loss'] } }
Or if you’re using permutation invariant loss but need to specify arguments to the loss function being wrapped by PIT, you can do this:
loss_dictionary = { 'PITLoss': { 'class': 'PermutationInvariantLoss', 'keys': {'audio': 'estimates', 'source_audio': 'targets'}, 'args': [{ 'class': 'SISDRLoss', 'kwargs': {'scaling': False} }] } }
If you have your own loss function classes you wish to use, you can pass those into the loss dictionary and make them discoverable by the closure by using ml.register_loss.
- Parameters
loss_dictionary (dict) – Dictionary of losses described above.
combination_approach (str) – How to combine losses, if there are multiple losses. The default is that the losses will be combined via a weighted sum (‘combine_by_sum’). Can also do ‘combine_by_multiply’. Defaults to ‘combine_by_sum’.
args – Positional arguments to
combination_approach
.kwargs – Keyword arguments to
combination_approach
.
See also
ml.register_loss to register your loss functions with this closure.
-
combine_by_multitask
(loss_output)[source]¶ Implements a multitask learning objective [1] where each loss is weighted by a learned parameter with the following function:
combined_loss = sum_i exp(-weight_i) * loss_i + weight_i
where i indexes each loss. The weights come from the loss dictionary and can point to nn.Parameter teensors that get learned jointly with the model.
References:
- [1] Kendall, Alex, Yarin Gal, and Roberto Cipolla.
“Multi-task learning using uncertainty to weigh losses for scene geometry and semantics.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
-
exception
nussl.ml.train.closures.
ClosureException
[source]¶ Exception class for errors when working with closures in nussl.
-
class
nussl.ml.train.closures.
TrainClosure
(loss_dictionary, optimizer, model, *args, **kwargs)[source]¶ This closure takes an optimization step on a SeparationModel object given a loss.
- Parameters
loss_dictionary (dict) – Dictionary containing loss functions and specification.
optimizer (torch Optimizer) – Optimizer to use to train the model.
model (SeparationModel) – The model to be trained.
-
class
nussl.ml.train.closures.
ValidationClosure
(loss_dictionary, model, *args, **kwargs)[source]¶ This closure validates the model on some data dictionary.
- Parameters
loss_dictionary (dict) – Dictionary containing loss functions and specification.
model (SeparationModel) – The model to be validated.