\ \NoIze/ / Sound Classification Tool

Last updated: September 5th, 20192019-09-05Project preview

Imgur

Welcome to a NoIze interactive notebook on sound classification. Here you can access the project's documentation or code repository.

To follow along this demo, headphones are recommended to hear sound examples. (Don't forget to turn down the volume first as you can always turn it back up.)

If you just want to read along and hear some audio, ignore the snippets of code, like the one below. However, I encourage you to fork this notebook so that you can experiment with the examples. You don't have to download or install anything onto your computer. If you don't have an account with 'notebooks.ai', you can create a free one here.

In [ ]:
# install what is required to use NoIze:
!pip install -r requirements.txt
import noize

# what is necessary to play audio files in this notebook:
import IPython.display as ipd
from scipy.io.wavfile import read

Set directories for training data

In [2]:
path2audiodata = './audiodata/'
path2_speechcommands_data = '{}speech_commands_sample/'.format(path2audiodata)
path2_backgroundnoise_data = '{}background_noise/'.format(path2audiodata)

Hear some examples:

Background Noise: buzzing

In [3]:
buzzing = '{}buzzing/118340__julien-matthey__jm-noiz-buzz-01-neon-light21.wav'.format(
    path2_backgroundnoise_data)
sr, samps = read(buzzing)
ipd.Audio(samps,rate=sr)
Out[3]:

Background Noise: street

In [4]:
street = '{}street/2019-08-19 10.10.433.wav'.format(
    path2_backgroundnoise_data)
sr, samps = read(street)
ipd.Audio(samps,rate=sr)
Out[4]:

Background Noise: train

In [5]:
train = '{}train/331877.wav'.format(
    path2_backgroundnoise_data)
sr, samps = read(train)
ipd.Audio(samps,rate=sr)
Out[5]:

Speech Commands: nine

In [6]:
nine = '{}nine/e269bac0_nohash_0.wav'.format(
    path2_speechcommands_data)
sr, samps = read(nine)
ipd.Audio(samps,rate=sr)
Out[6]:

Speech Commands: right

In [7]:
right = '{}right/d0ce2418_nohash_1.wav'.format(
    path2_speechcommands_data)
sr, samps = read(right)
ipd.Audio(samps,rate=sr)
Out[7]:

Speech Commands: zero

In [8]:
zero = '{}zero/b3bb4dd6_nohash_0.wav'.format(
    path2_speechcommands_data)
sr, samps = read(zero)
ipd.Audio(samps,rate=sr)
Out[8]:

Build a Sound Classifier!

In [9]:
from noize.templates import noizeclassifier

Set directory for saving newly created files

In [10]:
path2_features_models = './feats_models/'

Name Project

Tip: include something about the data used to train the classifier

In [11]:
project_backgroundnoise = 'background_noise'

Running the following code will extract 'mfcc' features from the audio data provided. These features will then be used to train a convolutional neural network to classify such data as either sound most similar to 'buzzing', 'street', or 'train' noise.

In [15]:
noizeclassifier(classifer_project_name = project_backgroundnoise, 
                headpath = path2_features_models,
                audiodir = path2_backgroundnoise_data,
                feature_type = 'mfcc')
multiple models found. chose this model:
feats_models/background_noise/models/mfcc_40_1.0/background_noise_model/bestmodel_background_noise_model.h5

Features have been extracted.

Loading corresponding feature settings.

Loading previously trained classifier.

Use the classifier to classify new data!

In [13]:
cafe_noise = '{}cafe18.wav'.format(path2audiodata)
sr, samps = read(cafe_noise)
ipd.Audio(samps,rate=sr)
Out[13]:
In [16]:
noizeclassifier(classifer_project_name = project_backgroundnoise, 
                headpath = path2_features_models,
                audiodir=path2_backgroundnoise_data,
                target_wavfile = cafe_noise, # the sound we want to classify
                feature_type='mfcc')
multiple models found. chose this model:
feats_models/background_noise/models/mfcc_40_1.0/background_noise_model/bestmodel_background_noise_model.h5

Features have been extracted.

Loading corresponding feature settings.

Loading previously trained classifier.

Label classified:  train

The classifier labeled the cafe noise as belonging to the class 'train'.

Challenges

1)

Try training the background noise classifier with the feature_type 'fbank' instead of 'mfcc'. Do you notice a difference? Does the cafe noise still get labeled as 'train' noise?

2)

Collect a sound or two you would like to classify with this classifier, for example from freesound.org. You will need to create a free account in order to download sounds, which I highly encourage. Note: as of now, NoIze can only process monochannel, 16-bit wavfiles. The link offered should be set to only show sounds that adhere to those requirements.

3)

Build a speech commands classifier using the data provided in the speech_commands_sample folder. Try adjusting the arguments for noizeclassifier, such as features extraced ('mfcc' vs 'fbank').

How do you think the classifier will classify the following words: 'cat', 'marvin', and 'wow'?

  • cat
In [54]:
cat = '{}cat.wav'.format(path2audiodata)
sr, samps = read(cat)
ipd.Audio(samps,rate=sr)
Out[54]:
  • marvin
In [55]:
marvin = '{}marvin.wav'.format(path2audiodata)
sr, samps = read(marvin)
ipd.Audio(samps,rate=sr)
Out[55]:
  • wow
In [56]:
wow = '{}wow.wav'.format(path2audiodata)
sr, samps = read(wow)
ipd.Audio(samps,rate=sr)
Out[56]:

And how does the classifer actually classify them? Are the classifications the same for both 'mfcc' and 'fbank' features? Which adhere better to your expectations?

4)

Advanced:

Adjust the model architecture in the file 'cnn.py'. This can be located in the following directory: './noize/models/'. You can try implementing another convolutional neural network (CNN) architecture or even try adding a long short-term memory network (LSTM). This latter option would require a bit of fiddling around with data input sizes.

A little prompt to get you started

You will need to indicate where the speech commands data are.

Hint: the variable holding the path was set / defined at the beginning of the notebook.

In [50]:
project_speechcommands = 'speech_commands'
In [ ]:
noizeclassifier(classifer_project_name = project_speechcommands, 
                headpath = path2_features_models,
                audiodir=,# YOUR CODE HERE!! where will the classifier find the speech commands data?
                target_wavfile = cat, # file for classification - test out the other words as well
                feature_type='mfcc' # try 'fbank' features and see if the validation score increases or decreases
               ) 
Notebooks AI
Notebooks AI Profile20060