Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 747 Bytes

README.md

File metadata and controls

7 lines (4 loc) · 747 Bytes

diabetes_prediction

Predicting whether a patient has diabetes using the Pima Indians diabetes data from Kaggle (https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database).

Three different models were used for this binary classifier problem: decision tree, random forest, and K-nearest neighbors. Each model has its own .py file. Additionally, several data cleaning/feature engineering techniques were attempted in the testing of these models (normalization, outlier removal/replacement, oversampling, etc.), which can also be found in the code.

Code should produce confusion matrix and accuracy scores (accuracy, precision, recall, F1-score) for the model when ran. Other README file details specifics of running the code files.