{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "AudioSignal Basics\n", "==================\n", "\n", "The nussl.AudioSignal object is the main container for all things related to your audio data. It provides a lot of\n", "helpful utilities to make it easy to manipulate your audio. Because it is at the heart of all of the source separation\n", "algorithms in *nussl*, it is crucial to understand how it works. Here we provide a brief introduction to many common\n", "tasks.\n", "\n", "Initialization from a file\n", "--------------------------\n", "\n", "It is easy to initialize an AudioSignal object by loading an audio file from a path. First, let's use the external file zoo to get an audio file to play around with.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import nussl\n", "import time\n", "start_time = time.time()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Matching file found at /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, skipping download.\n" ] } ], "source": [ "input_file_path = nussl.efz_utils.download_audio_file(\n", " 'schoolboy_fascination_excerpt.wav')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's initialize it an AudioSignal object with the audio." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "signal1 = nussl.AudioSignal(input_file_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now the AudioSignal object is ready with all of the information about our the signal. Let's also embed the audio signal in a playable object right inside this notebook so we can listen to it! We can also look at its attributes by printing it." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "AudioSignal (unlabeled): 15.000 sec @ /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, 44100 Hz, 2 ch.\n" ] } ], "source": [ "signal1.embed_audio()\n", "print(signal1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "AudioSignals pack in a lot of useful functionality. For example:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Duration: 15.0 seconds\n", "Duration in samples: 661500 samples\n", "Number of channels: 2 channels\n", "File name: schoolboy_fascination_excerpt.wav\n", "Full path to input: /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav\n", "Root mean square energy: 0.1136\n" ] } ], "source": [ "print(\"Duration: {} seconds\".format(signal1.signal_duration))\n", "print(\"Duration in samples: {} samples\".format(signal1.signal_length))\n", "print(\"Number of channels: {} channels\".format(signal1.num_channels))\n", "print(\"File name: {}\".format(signal1.file_name))\n", "print(\"Full path to input: {}\".format(signal1.path_to_input_file))\n", "print(\"Root mean square energy: {:.4f}\".format(signal1.rms().mean()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The actual signal data is in ``signal1.audio_data``. It’s just a numpy array, so we can use it as such:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.00213623, -0.04547119, -0.0513916 , ..., -0.24395752,\n", " -0.2310791 , -0.20785522],\n", " [-0.1791687 , -0.20150757, -0.20574951, ..., -0.23834229,\n", " -0.2156372 , -0.168396 ]], dtype=float32)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "signal1.audio_data" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 661500)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "signal1.audio_data.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A few things to note here:\n", "\n", "1. When AudioSignal loads a file, it converts the data to floats between [-1, 1]\n", "2. The number of channels is the first dimension, the number of samples is the second.\n", "\n", "Initialization from a numpy array\n", "---------------------------------\n", "\n", "Another common way to initialize an AudioSignal object is by passing in a numpy array. Let’s first make a single channel signal within a numpy array.\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "sample_rate = 44100 # Hz\n", "dt = 1.0 / sample_rate\n", "dur = 2.0 # seconds\n", "freq = 440 # Hz\n", "x = np.arange(0.0, dur, dt)\n", "x = np.sin(2 * np.pi * freq * x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cool! Now let’s put this into a new AudioSignal object." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "AudioSignal (unlabeled): 2.000 sec @ path unknown, 44100 Hz, 1 ch.\n" ] } ], "source": [ "signal2 = nussl.AudioSignal(\n", " audio_data_array=x, sample_rate=sample_rate)\n", "signal2.embed_audio()\n", "print(signal2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we had to give a sample rate. If no sample rate is given, then the following is used:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Default sample rate: 44100\n" ] } ], "source": [ "print(f\"Default sample rate: {nussl.constants.DEFAULT_SAMPLE_RATE}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other basic manipulations\n", "-------------------------\n", "\n", "If we want to add the audio data in these two signals, it's simple. But there are some gotchas:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "ename": "AudioSignalException", "evalue": "Cannot do operation with two signals that have a different number of channels!", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAudioSignalException\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0msignal3\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msignal1\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0msignal2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py\u001b[0m in \u001b[0;36m__add__\u001b[0;34m(self, other)\u001b[0m\n\u001b[1;32m 1611\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1612\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__add__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1613\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1614\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1615\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__radd__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py\u001b[0m in \u001b[0;36madd\u001b[0;34m(self, other)\u001b[0m\n\u001b[1;32m 1240\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1241\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1242\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_verify_audio_arithmetic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1243\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1244\u001b[0m \u001b[0mnew_signal\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcopy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdeepcopy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py\u001b[0m in \u001b[0;36m_verify_audio_arithmetic\u001b[0;34m(self, other)\u001b[0m\n\u001b[1;32m 1629\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1630\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_verify_audio_arithmetic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1631\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_verify_audio\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1632\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1633\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msignal_length\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msignal_length\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py\u001b[0m in \u001b[0;36m_verify_audio\u001b[0;34m(self, other)\u001b[0m\n\u001b[1;32m 1621\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_verify_audio\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1622\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnum_channels\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnum_channels\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1623\u001b[0;31m raise AudioSignalException('Cannot do operation with two signals that have '\n\u001b[0m\u001b[1;32m 1624\u001b[0m 'a different number of channels!')\n\u001b[1;32m 1625\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAudioSignalException\u001b[0m: Cannot do operation with two signals that have a different number of channels!" ] } ], "source": [ "signal3 = signal1 + signal2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Uh oh! I guess it doesn’t make sense to add a stereo signal (``signal1``) and mono signal (``signal2``).\n", "But if we really want to add these two signals, we can make one of them mono.\n", "\n", "*nussl* does this by simply averaging the\n", "two channels at every sample. We have to explicitly tell *nussl* that we are okay with ``to_mono()``\n", "changing ``audio_data``. We do that like this:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AudioSignal (unlabeled): 15.000 sec @ /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, 44100 Hz, 1 ch.\n" ] } ], "source": [ "print(signal1.to_mono(overwrite=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we hadn’t set ``overwrite=True`` then ``to_mono()`` would just return a new audio signal \n", "that is an exact copy of ``signal1`` except it is mono. You will see this pattern \n", "come up again. In certain places, :class:`AudioSignal`:'s default behavior is to \n", "overwrite its internal data, and in other places the default is to\n", "**not** overwrite data. See the reference pages for more info. Let's try:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "ename": "AudioSignalException", "evalue": "Cannot do arithmetic with signals of different length!", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAudioSignalException\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0msignal3\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msignal1\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0msignal2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py\u001b[0m in \u001b[0;36m__add__\u001b[0;34m(self, other)\u001b[0m\n\u001b[1;32m 1611\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1612\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__add__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1613\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1614\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1615\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__radd__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py\u001b[0m in \u001b[0;36madd\u001b[0;34m(self, other)\u001b[0m\n\u001b[1;32m 1240\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1241\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1242\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_verify_audio_arithmetic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1243\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1244\u001b[0m \u001b[0mnew_signal\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcopy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdeepcopy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/Dropbox/research/nussl_refactor/nussl/core/audio_signal.py\u001b[0m in \u001b[0;36m_verify_audio_arithmetic\u001b[0;34m(self, other)\u001b[0m\n\u001b[1;32m 1632\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1633\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msignal_length\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msignal_length\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1634\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mAudioSignalException\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Cannot do arithmetic with signals of different length!'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1635\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1636\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__iadd__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAudioSignalException\u001b[0m: Cannot do arithmetic with signals of different length!" ] } ], "source": [ "signal3 = signal1 + signal2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Uh oh! Let's fix this by truncating the longer `signal1` to match `signal2` duration in seconds." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AudioSignal (unlabeled): 2.000 sec @ /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, 44100 Hz, 1 ch.\n" ] } ], "source": [ "signal1.truncate_seconds(signal2.signal_duration)\n", "print(signal1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can finally add them. The adding of these two signals clips, so let's also peak normalize the audio data." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "AudioSignal (unlabeled): 2.000 sec @ /home/pseetharaman/.nussl/audio/schoolboy_fascination_excerpt.wav, 44100 Hz, 1 ch.\n" ] } ], "source": [ "signal3 = signal1 + signal2\n", "signal3.peak_normalize()\n", "signal3.embed_audio()\n", "print(signal3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No exceptions this time! Great! ``signal3`` is now a new AudioSignal \n", "object. We can similarly subtract two signals.\n", "\n", "Let’s write this to a file:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "signal3.write_audio_to_file('/tmp/signal3.wav')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Awesome! Now lets see how we can manipulate the audio in the frequency domain..." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Time taken: 1.3912 seconds\n" ] } ], "source": [ "end_time = time.time()\n", "time_taken = end_time - start_time\n", "print(f'Time taken: {time_taken:.4f} seconds')" ] } ], "metadata": { "jupytext": { "encoding": "# -*- coding: utf-8 -*-", "formats": "ipynb,py:light" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }