1
/
5

Deep Learning Using Raw Audio Files

// Feed raw audio files directly into the deep neural network without any feature extraction. //

If you have observed, conventional audio and speech analysis systems are typically built using a pipeline structure, where the first step is to extract various low dimensional hand-crafted acoustic features (e.g., MFCC, pitch, RMSE, Chroma, and whatnot).

Although hand-crafted acoustic features are typically well designed, is still not possible to retain all useful information due to the human knowledge bias and the high compression ratio. And of course, the feature engineering you will have to perform will depend on the type of audio problem that you are working on.

But, how about learning directly from raw waveforms (i.e., raw audio files are directly fed into the deep neural network)?

In this post, let's take learnings from this paper and try to apply it to the following Kaggle dataset.

www.kaggle.com

Go ahead and download the Heatbeat Sounds dataset. Here is how one of the sample audio files from the dataset sounds like

Read more

Read on Medium

株式会社ゴーリストでは一緒に働く仲間を募集しています
同じタグの記事
今週のランキング