AudioSignal Basics¶
The nussl.AudioSignal object is the main container for all things related to your audio data. It provides a lot of helpful utilities to make it easy to manipulate your audio. Because it is at the heart of all of the source separation algorithms in nussl, it is crucial to understand how it works. Here we provide a brief introduction to many common tasks.
Initialization from a file¶
It is easy to initialize an AudioSignal object by loading an audio file from a path. First, let’s use the external file zoo to get an audio file to play around with.
[1]:
import nussl
import time
start_time = time.time()
[2]:
input_file_path = nussl.efz_utils.download_audio_file(
'schoolboy_fascination_excerpt.wav')
Matching file found at /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, skipping download.
Now let’s initialize it an AudioSignal object with the audio.
[3]:
signal1 = nussl.AudioSignal(input_file_path)
Now the AudioSignal object is ready with all of the information about our the signal. Let’s also embed the audio signal in a playable object right inside this notebook so we can listen to it! We can also look at its attributes by printing it.
[4]:
signal1.embed_audio()
print(signal1)
AudioSignal (unlabeled): 15.000 sec @ /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, 44100 Hz, 2 ch.
AudioSignals pack in a lot of useful functionality. For example:
[5]:
print("Duration: {} seconds".format(signal1.signal_duration))
print("Duration in samples: {} samples".format(signal1.signal_length))
print("Number of channels: {} channels".format(signal1.num_channels))
print("File name: {}".format(signal1.file_name))
print("Full path to input: {}".format(signal1.path_to_input_file))
print("Root mean square energy: {:.4f}".format(signal1.rms().mean()))
Duration: 15.0 seconds
Duration in samples: 661500 samples
Number of channels: 2 channels
File name: schoolboy_fascination_excerpt.wav
Full path to input: /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav
Root mean square energy: 0.1136
The actual signal data is in signal1.audio_data
. It’s just a numpy array, so we can use it as such:
[6]:
signal1.audio_data
[6]:
array([[ 0.00213623, -0.04547119, -0.0513916 , ..., -0.24395752,
-0.2310791 , -0.20785522],
[-0.1791687 , -0.20150757, -0.20574951, ..., -0.23834229,
-0.2156372 , -0.168396 ]], dtype=float32)
[7]:
signal1.audio_data.shape
[7]:
(2, 661500)
A few things to note here:
When AudioSignal loads a file, it converts the data to floats between [-1, 1]
The number of channels is the first dimension, the number of samples is the second.
Initialization from a numpy array¶
Another common way to initialize an AudioSignal object is by passing in a numpy array. Let’s first make a single channel signal within a numpy array.
[8]:
import numpy as np
sample_rate = 44100 # Hz
dt = 1.0 / sample_rate
dur = 2.0 # seconds
freq = 440 # Hz
x = np.arange(0.0, dur, dt)
x = np.sin(2 * np.pi * freq * x)
Cool! Now let’s put this into a new AudioSignal object.
[9]:
signal2 = nussl.AudioSignal(
audio_data_array=x, sample_rate=sample_rate)
signal2.embed_audio()
print(signal2)
AudioSignal (unlabeled): 2.000 sec @ path unknown, 44100 Hz, 1 ch.
Note that we had to give a sample rate. If no sample rate is given, then the following is used:
[10]:
print(f"Default sample rate: {nussl.constants.DEFAULT_SAMPLE_RATE}")
Default sample rate: 44100
Other basic manipulations¶
If we want to add the audio data in these two signals, it’s simple. But there are some gotchas:
[11]:
signal3 = signal1 + signal2
---------------------------------------------------------------------------
AudioSignalException Traceback (most recent call last)
<ipython-input-11-2652ece2ee15> in <module>
----> 1 signal3 = signal1 + signal2
~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in __add__(self, other)
1611
1612 def __add__(self, other):
-> 1613 return self.add(other)
1614
1615 def __radd__(self, other):
~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in add(self, other)
1240 return self
1241
-> 1242 self._verify_audio_arithmetic(other)
1243
1244 new_signal = copy.deepcopy(self)
~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in _verify_audio_arithmetic(self, other)
1629
1630 def _verify_audio_arithmetic(self, other):
-> 1631 self._verify_audio(other)
1632
1633 if self.signal_length != other.signal_length:
~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in _verify_audio(self, other)
1621 def _verify_audio(self, other):
1622 if self.num_channels != other.num_channels:
-> 1623 raise AudioSignalException('Cannot do operation with two signals that have '
1624 'a different number of channels!')
1625
AudioSignalException: Cannot do operation with two signals that have a different number of channels!
Uh oh! I guess it doesn’t make sense to add a stereo signal (signal1
) and mono signal (signal2
). But if we really want to add these two signals, we can make one of them mono.
nussl does this by simply averaging the two channels at every sample. We have to explicitly tell nussl that we are okay with to_mono()
changing audio_data
. We do that like this:
[12]:
print(signal1.to_mono(overwrite=True))
AudioSignal (unlabeled): 15.000 sec @ /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, 44100 Hz, 1 ch.
If we hadn’t set overwrite=True
then to_mono()
would just return a new audio signal that is an exact copy of signal1
except it is mono. You will see this pattern come up again. In certain places, :class:AudioSignal
:’s default behavior is to overwrite its internal data, and in other places the default is to not overwrite data. See the reference pages for more info. Let’s try:
[13]:
signal3 = signal1 + signal2
---------------------------------------------------------------------------
AudioSignalException Traceback (most recent call last)
<ipython-input-13-2652ece2ee15> in <module>
----> 1 signal3 = signal1 + signal2
~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in __add__(self, other)
1611
1612 def __add__(self, other):
-> 1613 return self.add(other)
1614
1615 def __radd__(self, other):
~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in add(self, other)
1240 return self
1241
-> 1242 self._verify_audio_arithmetic(other)
1243
1244 new_signal = copy.deepcopy(self)
~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py in _verify_audio_arithmetic(self, other)
1632
1633 if self.signal_length != other.signal_length:
-> 1634 raise AudioSignalException('Cannot do arithmetic with signals of different length!')
1635
1636 def __iadd__(self, other):
AudioSignalException: Cannot do arithmetic with signals of different length!
Uh oh! Let’s fix this by truncating the longer signal1
to match signal2
duration in seconds.
[14]:
signal1.truncate_seconds(signal2.signal_duration)
print(signal1)
AudioSignal (unlabeled): 2.000 sec @ /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, 44100 Hz, 1 ch.
Now we can finally add them. The adding of these two signals clips, so let’s also peak normalize the audio data.
[15]:
signal3 = signal1 + signal2
signal3.peak_normalize()
signal3.embed_audio()
print(signal3)
AudioSignal (unlabeled): 2.000 sec @ /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, 44100 Hz, 1 ch.
No exceptions this time! Great! signal3
is now a new AudioSignal object. We can similarly subtract two signals.
Let’s write this to a file:
[16]:
signal3.write_audio_to_file('/tmp/signal3.wav')
Awesome! Now lets see how we can manipulate the audio in the frequency domain…
[17]:
end_time = time.time()
time_taken = end_time - start_time
print(f'Time taken: {time_taken:.4f} seconds')
Time taken: 1.3912 seconds