AudioSignal Basics

The nussl.AudioSignal object is the main container for all things related to your audio data. It provides a lot of helpful utilities to make it easy to manipulate your audio. Because it is at the heart of all of the source separation algorithms in nussl, it is crucial to understand how it works. Here we provide a brief introduction to many common tasks.

Initialization from a file

It is easy to initialize an AudioSignal object by loading an audio file from a path. First, let’s use the external file zoo to get an audio file to play around with.

[1]:
import nussl
import time
start_time = time.time()
[2]:
input_file_path = nussl.efz_utils.download_audio_file(
    'schoolboy_fascination_excerpt.wav')
Matching file found at /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, skipping download.

Now let’s initialize it an AudioSignal object with the audio.

[3]:
signal1 = nussl.AudioSignal(input_file_path)

Now the AudioSignal object is ready with all of the information about our the signal. Let’s also embed the audio signal in a playable object right inside this notebook so we can listen to it! We can also look at its attributes by printing it.

[4]:
signal1.embed_audio()
print(signal1)
AudioSignal (unlabeled): 15.000 sec @ /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, 44100 Hz, 2 ch.

AudioSignals pack in a lot of useful functionality. For example:

[5]:
print("Duration: {} seconds".format(signal1.signal_duration))
print("Duration in samples: {} samples".format(signal1.signal_length))
print("Number of channels: {} channels".format(signal1.num_channels))
print("File name: {}".format(signal1.file_name))
print("Full path to input: {}".format(signal1.path_to_input_file))
print("Root mean square energy: {:.4f}".format(signal1.rms().mean()))
Duration: 15.0 seconds
Duration in samples: 661500 samples
Number of channels: 2 channels
File name: schoolboy_fascination_excerpt.wav
Full path to input: /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav
Root mean square energy: 0.1136

The actual signal data is in signal1.audio_data. It’s just a numpy array, so we can use it as such:

[6]:
signal1.audio_data
[6]:
array([[ 0.00213623, -0.04547119, -0.0513916 , ..., -0.24395752,
        -0.2310791 , -0.20785522],
       [-0.1791687 , -0.20150757, -0.20574951, ..., -0.23834229,
        -0.2156372 , -0.168396  ]], dtype=float32)
[7]:
signal1.audio_data.shape
[7]:
(2, 661500)

A few things to note here:

  1. When AudioSignal loads a file, it converts the data to floats between [-1, 1]

  2. The number of channels is the first dimension, the number of samples is the second.

Initialization from a numpy array

Another common way to initialize an AudioSignal object is by passing in a numpy array. Let’s first make a single channel signal within a numpy array.

[8]:
import numpy as np

sample_rate = 44100  # Hz
dt = 1.0 / sample_rate
dur = 2.0  # seconds
freq = 440  # Hz
x = np.arange(0.0, dur, dt)
x = np.sin(2 * np.pi * freq * x)

Cool! Now let’s put this into a new AudioSignal object.

[9]:
signal2 = nussl.AudioSignal(
    audio_data_array=x, sample_rate=sample_rate)
signal2.embed_audio()
print(signal2)
AudioSignal (unlabeled): 2.000 sec @ path unknown, 44100 Hz, 1 ch.

Note that we had to give a sample rate. If no sample rate is given, then the following is used:

[10]:
print(f"Default sample rate: {nussl.constants.DEFAULT_SAMPLE_RATE}")
Default sample rate: 44100

Other basic manipulations

If we want to add the audio data in these two signals, it’s simple. But there are some gotchas:

[11]:
signal3 = signal1 + signal2
---------------------------------------------------------------------------
AudioSignalException                      Traceback (most recent call last)
<ipython-input-11-2652ece2ee15> in <module>
----> 1 signal3 = signal1 + signal2

~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in __add__(self, other)
   1611
   1612     def __add__(self, other):
-> 1613         return self.add(other)
   1614
   1615     def __radd__(self, other):

~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in add(self, other)
   1240             return self
   1241
-> 1242         self._verify_audio_arithmetic(other)
   1243
   1244         new_signal = copy.deepcopy(self)

~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in _verify_audio_arithmetic(self, other)
   1629
   1630     def _verify_audio_arithmetic(self, other):
-> 1631         self._verify_audio(other)
   1632
   1633         if self.signal_length != other.signal_length:

~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in _verify_audio(self, other)
   1621     def _verify_audio(self, other):
   1622         if self.num_channels != other.num_channels:
-> 1623             raise AudioSignalException('Cannot do operation with two signals that have '
   1624                                        'a different number of channels!')
   1625

AudioSignalException: Cannot do operation with two signals that have a different number of channels!

Uh oh! I guess it doesn’t make sense to add a stereo signal (signal1) and mono signal (signal2). But if we really want to add these two signals, we can make one of them mono.

nussl does this by simply averaging the two channels at every sample. We have to explicitly tell nussl that we are okay with to_mono() changing audio_data. We do that like this:

[12]:
print(signal1.to_mono(overwrite=True))
AudioSignal (unlabeled): 15.000 sec @ /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, 44100 Hz, 1 ch.

If we hadn’t set overwrite=True then to_mono() would just return a new audio signal that is an exact copy of signal1 except it is mono. You will see this pattern come up again. In certain places, :class:AudioSignal:’s default behavior is to overwrite its internal data, and in other places the default is to not overwrite data. See the reference pages for more info. Let’s try:

[13]:
signal3 = signal1 + signal2
---------------------------------------------------------------------------
AudioSignalException                      Traceback (most recent call last)
<ipython-input-13-2652ece2ee15> in <module>
----> 1 signal3 = signal1 + signal2

~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in __add__(self, other)
   1611
   1612     def __add__(self, other):
-> 1613         return self.add(other)
   1614
   1615     def __radd__(self, other):

~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in add(self, other)
   1240             return self
   1241
-> 1242         self._verify_audio_arithmetic(other)
   1243
   1244         new_signal = copy.deepcopy(self)

~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in _verify_audio_arithmetic(self, other)
   1632
   1633         if self.signal_length != other.signal_length:
-> 1634             raise AudioSignalException('Cannot do arithmetic with signals of different length!')
   1635
   1636     def __iadd__(self, other):

AudioSignalException: Cannot do arithmetic with signals of different length!

Uh oh! Let’s fix this by truncating the longer signal1 to match signal2 duration in seconds.

[14]:
signal1.truncate_seconds(signal2.signal_duration)
print(signal1)
AudioSignal (unlabeled): 2.000 sec @ /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, 44100 Hz, 1 ch.

Now we can finally add them. The adding of these two signals clips, so let’s also peak normalize the audio data.

[15]:
signal3 = signal1 + signal2
signal3.peak_normalize()
signal3.embed_audio()
print(signal3)
AudioSignal (unlabeled): 2.000 sec @ /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, 44100 Hz, 1 ch.

No exceptions this time! Great! signal3 is now a new AudioSignal object. We can similarly subtract two signals.

Let’s write this to a file:

[16]:
signal3.write_audio_to_file('/tmp/signal3.wav')

Awesome! Now lets see how we can manipulate the audio in the frequency domain…

[17]:
end_time = time.time()
time_taken = end_time - start_time
print(f'Time taken: {time_taken:.4f} seconds')
Time taken: 1.3912 seconds