diff --git a/Supervised learning with scikit-learn/03-fine-tuning-your-model/AUC computation b/Supervised learning with scikit-learn/03-fine-tuning-your-model/AUC computation new file mode 100644 index 0000000..49f4c01 --- /dev/null +++ b/Supervised learning with scikit-learn/03-fine-tuning-your-model/AUC computation @@ -0,0 +1,30 @@ +AUC computation: + +Say you have a binary classifier that in fact is just randomly making guesses. It would be correct approximately 50% of the time, and the resulting ROC curve would be a diagonal line in which the True Positive Rate and False Positive Rate are always equal. The Area under this ROC curve would be 0.5. This is one way in which the AUC, which Hugo discussed in the video, is an informative metric to evaluate a model. If the AUC is greater than 0.5, the model is better than random guessing. Always a good sign! + +In this exercise, you'll calculate AUC scores using the roc_auc_score() function from sklearn.metrics as well as by performing cross-validation on the diabetes dataset. + +X and y, along with training and test sets X_train, X_test, y_train, y_test, have been pre-loaded for you, and a logistic regression classifier logreg has been fit to the training data. + +Instructions: + +1. Import roc_auc_score from sklearn.metrics and cross_val_score from sklearn.model_selection. +2. Using the logreg classifier, which has been fit to the training data, compute the predicted probabilities of the labels of the test set X_test. Save the result as y_pred_prob. +3. Compute the AUC score using the roc_auc_score() function, the test set labels y_test, and the predicted probabilities y_pred_prob. +4. Compute the AUC scores by performing 5-fold cross-validation. Use the cross_val_score() function and specify the scoring parameter to be 'roc_auc'. + +# Import necessary modules +from sklearn.metrics import roc_auc_score +from sklearn.model_selection import cross_val_score + +# Compute predicted probabilities: y_pred_prob +y_pred_prob = logreg.predict_proba(X_test)[:,1] + +# Compute and print AUC score +print("AUC: {}".format(roc_auc_score(y_test, y_pred_prob))) + +# Compute cross-validated AUC scores: cv_auc +cv_auc = cross_val_score(logreg,X, y, cv=5, scoring='roc_auc') + +# Print list of AUC scores +print("AUC scores computed using 5-fold cross-validation: {}".format(cv_auc))