Experiments with several machine learning models for tumor classification.
Used two brain MRI datasets founded on Kaggle.
The first dataset you can find it here
The second dataset here
About the data:
The first dataset contains 155 positive and 98 negative examples, resulting in 253 example images.The folder yes contains 155 Brain MRI Images that are tumorous and the folder no contains 98 Brain MRI Images that are non-tumorous.
The second dataset contains 100 positive and 100 negative examples, resulting in 200 example images. The dataset is seperate by test,train and validation and each folder has a hemmorhage_data and non_hemmorhage_data
For every image, the following preprocessing steps were applied:
1. Resize to 250,250,3 (image_width, image_height,channels) because images in the two datasets come in different sizes.
2. Convert image from RGB to grayscale.
3. Use Hog for feature extraction.
After the preprocessing we use the hog features to our models. Also there is an option to use Principal component analysis (PCA) for feature reduction .
The output after applying HOG for pixels per cell : 32x32
Experiments with SVM, Linear-SVM, Random Forest,Logistic Regression using 5-fold cross validation.
Accuracy, Precision, Recall, Fmeasure, Specificity
The goal is try to make the recall equal to 1 . So the FN must be equal to 0. This way the classifier always will spot images that are tumorous.
Predicted Label | ||
---|---|---|
No | Yes | |
No | TN | FP |
Yes | FN=0 | TP |
The threshold is selected based the accuracy of the model and the recall at validation set
Accuracy | Presicion | Recall | Fmeasure | Spesificity | |
---|---|---|---|---|---|
LR | 0.842 | 0.8 | 1.0 | 0.888 | 0.571 |
SVM | 0.815 | 0.774 | 1.0 | 0.872 | 0.5 |
Linear-SVM | 0.868 | 0.827 | 1.0 | 0.905 | 0.642 |
RF | 0.815 | 0.793 | 0.958 | 0.867 | 0.571 |
SVM-Additive Chi^2 | 0.894 | 0.857 | 1.0 | 0.923 | 0.714 |