We took the dataset and ran it through librosa libraries to convert the wav files into MFCC files. The Mel-Frequency Cepstral Coefficients (MFCC) is a way of capturing the spectrum of the voice (phoneme) so that it can used in voice recognition and machine learning.
import librosa.display import matplotlib.pyplot as plt import IPython.display as ipd def display_wav(signal,fn): librosa.display.waveplot(signal, sr=sr) plt.xlabel("Time") plt.ylabel("Amplitude") plt.savefig(fn, Bbox='tight') plt.show()
The following graphs and sounds show how much MFCC contains the original data.
signal, sr = librosa.load("7383-3-0-0.wav", sr=22050)
display_wav(signal, '../images/dog_bark_plot.png')
import soundfile as sf
mfccs = librosa.feature.mfcc(signal,sr=sr,n_mfcc=13)
wav = librosa.feature.inverse.mfcc_to_audio(mfccs)
display_wav(wav, '../images/dog_bark_reversed.png')
Eunjeong Lee, ejlee127 at gmail dot com, last updated in Nov. 2020