An End to End Audio Recognition Project using Deep Learning, and Flask
Dataset is taken from UrbanSound8K: https://urbansounddataset.weebly.com/download-urbansound8k.html
Justin Salamon'^, Christopher Jacoby' and Juan Pablo Bello'
' Music and Audio Research Laboratory (MARL), New York University
^ Center for Urban Science and Progress (CUSP), New York University
A Fully Connected Neural Network (FCN) based deep-learning architecture is used to solve the classification problem. The overall number of paramaters in the the training network are 78,017. This architecture consists of Dense (7) Layers. Towards the end there is softmax activation function to solve the classification problem.
-
Open command line cmd at the root of the repository.
-
Run the command
pip install -r requirements.txt
-
Open the Notebook
Training_Notebook.ipynb
to follow all the preprocessing and training steps of the model.
NOTE: In order to make path, variables or any related change, please change the config.yaml
file.
-
A
Dockerfile
is provided which can be used for deployment. From thisDockerfile
a docker image can be created and deployed in cloud, etc.-
To create a docker image, first download docker for your OS from the official docker website.
-
Then, open a command line cmd at the root of the repository, and run the command:
docker build -t audio_classification_image:v1 .
-
Once the image is created, you can push the docker image to the docker hub after signing in, from where the image can be used.
-
To run the docker image, open a command line cmd at the root of the repository, and run the command:
docker run -p 5000:5000 audio_classification_image:v1
-
Open the link on your preffered browser:
http://127.0.0.1:5000/
, or check the logs provided by Docker in command line, to find the link.
-
-
Also a seperate
templates
andapp.py
is provided which can serve as frontend and backend for uploading an image on a web application and getting back a prediction.To run the application, open a command line cmd at the root of the repository, and run the command:
flask run
-
In the future all models can be stored on cloud for sending a
request
and getting aresponse
for demand prediction. -
Samples of deployed images are shown below.
Use of Librosa
We can scipy or librosa to read the audio files, but librosa has the added advantage that it tries to unify the sample rate across all input audio files. Librosa also converts stereo (2 channels) into mono (single channel). Also it normalizes the input signal between [-1, 1].
The whole project is developed with python version Python 3.7.7
and pip version pip 19.2.3
.
In case of error, feel free to contact us over Linkedin at Adnan.