DAT255 course project at Western Norway University of Applied Sciences.
- DeepShip - Final project notebook and inference script location
- Experiments - Notebooks with training experiments
- dataset - Datasets used by this project
- docker - Docker scripts for generating the images and starting jupyterlab
- models - Link to trained models
- RestClinet - Rest client written in Qt
For a background for the project and some considerations, please read here.
A Resnet50 image classification model, using fastai, is trained on random spectrograms taken on the fly from the DeepShip dataset. 20% of the data is used for validation, and the model achieves validation accuracy of 94%.
This is a significant improvement from the paper, and as far as I can tell, the state of the art for this dataset.
Further details
- notebook using reguilar STFTs (93,75% accuracy),
- notebook using polar representation of CQT spectrograms (94% accuracy, possbly overfitted)
Note: The DeepShip dataset contains recording of similar ships, though recorded at different points in time, under different conditions. That could be one of the reasons behind a unusual high accuracy score.
- Nvidia GPU with cuda support
- Linux
- docker-compose with nvidia container toolkit and runtime
- Nvidia setup guide
Note: Windows 11 Should work according to this blogpost, but I was not able to set it up
Note: The only system configuration I have tested with is Arch Linux and a RTX3080. It should work on most cuda capable graphics cards, and if something "CUDA out of memory", try reducing batch-size and/ or image size parametes, found in the different notebooks or inference script
This project utilizes docker-compose
for training and inference.
This makes the whole ML stack portable.
More details on the docker images is found in the docker directory.
Run docker-compose build fastai
to build the training and inference image.
Run docker-compose up fastai
to reach jupyterlab with a webbrowser (open link from the terminal).
It should be something like this http://127.0.0.1:80/lab?token=45604507b9f4a392...
With jupyterlab you can run notebooks, and train models.
In order to train, you need a copy of the DeepShip dataset.
Then you need to extract the files to the dataset/DeepShip
folder.
Run docker-compose up rest
to start the rest api.
This starts the rest server defined in DeepShip/rest.py
The rest api works by uploading a multipart formdata sound file "file" to port 8088.
The server processes the file; resamples it if needed, and returns the mean propbabilites for each class in a json format.
Every filetype torchaudio support, should work with inference.
Example using curl from the command line
curl -F file=@'./hurtigbaat test mono.wav' http://127.0.0.1:8088
Note: This test file is recorded with a hydrophone by a pier in Bergen, while a passengership is passing by. It is not used during training.
{"Cargo":0.0,"Passengership":0.9243250489234924,"Tanker":0.0,"Tug":0.0}
If there are any errors processing the file, the server will return a message something like this
{"error":"Something failed"}
- Test More signal processing before feeding the networks
- Extract envelope modulation as its own image layer
- Apply augementation on audio / spectrograms.
- Represent audio differently, using PCA or SVD.
- Fit random trees or some gradient boosting algorithm (xgboost)
- Test more network architectures; recurrent networks, tranformers, and so on.
- Try those that tests nnAudio https://github.com/KinWaiCheuk/pytorch_template
- Design a custom model which tranins on features from different fft-sizes separatley.
- Update images, both new versions of ubnutu lts and pytorch are aviable.
- Merge tensorflow and pytorch images. Both are based on ubuntu, so it should be possible.