While learning Machine Leanring, I came across few datasets which were highly imbalanced which resulted in me getting stuck in the very beginning. So I thought of making a notebook which will help in quickly refering and revising different ways to handle imbalanced datasets.
- Under-sampling
- Over-sampling
- imbalanced-learn module
- Random Over-sampling and under-sampling
- Tomek links
- SMOTE
- Over-sampling followed by under-sampling
- Using Recall to measure accuracy
- Performed Logistic Regression for all the preprocessed data
- Used Recall Score as metric to measure how well the model is performing!
The dataset used can be downloaded here (Kaggle) - Click to Download
Made with ❤️ by Sahil Chachra
MIT © Sahil Chachra