A Parallel segmentation algorithm of a flowers dataset using Dask library on python.
The problem treated is a classic images segmentation problem : through the DaskFlowersSegmentation.py
algorithm, I applied some basic preprocessing (grayscale filters as an example), and then implemented a parallel pipeline that preprocess and segment each flower on the dataset.
The final outcome is a histogramm that displays the distribution of the number of segments found in the flowers dataset. This result can be used for further analysis (to identify clusters in the data for example).
This algorithm was implemented as an academic project for the BigData lecture given by Prof.Jean-Marc Gratien.
the data used in this project is from the Oxford Flowers dataset.
This ML pipeline can be executed via : python DaskFlowersSegmentation.py
- Python v > 3.6
- dask