ROC curve and AUC score

The ROC curves we able to evaluate the performance of a model across all possible threshold choices. ROC (Receiver Operating Characteristic) shows how well a model can separate two classes, positive and negative.

We need two metrics for ROC curves: TPR and FPR, or true positive rate and false positive rate.

True positive rate and false positive rate

Evaluating a model at multiple thresholds

The idea is similar to what we previously did with accuracy, but instead of recording just one value, we record all the four outcomes for the confusion table.

def tpr_fpr_dt(y_true, y_pred):
    scores = []
    for t in np.linspace(0, 1, 11):
        predictions_true = y_pred >= t
        predictions_false = y_pred < t
        actual_true = y_true == 1
        actual_false = y_true == 0

        TN = (predictions_false & actual_false).sum()
        FP = (predictions_true & actual_false).sum()
        FN = (predictions_false & actual_true).sum()
        TP = (predictions_true & actual_true).sum()
        scores.append([t, TN, FP, FN, TP])

    scores = pd.DataFrame(data=scores, columns=["threshold", "TN", "FP", "FN", "TP"])
    scores["FPR"] = scores["FP"] / (scores["FP"]+scores["TN"])
    scores["TPR"] = scores["TP"] / (scores["TP"]+scores["FN"])
    return scores

scores = tpr_fpr_dt(y_val, y_pred_val)
scores
threshold TN FP FN TP FPR TPR
0 0.0 0 7994 0 1048 1.000000 1.000000
1 0.1 5154 2840 311 737 0.355266 0.703244
2 0.2 7421 573 609 439 0.071679 0.418893
3 0.3 7689 305 699 349 0.038154 0.333015
4 0.4 7821 173 801 247 0.021641 0.235687
5 0.5 7901 93 865 183 0.011634 0.174618
6 0.6 7950 44 917 131 0.005504 0.125000
7 0.7 7971 23 957 91 0.002877 0.086832
8 0.8 7980 14 996 52 0.001751 0.049618
9 0.9 7992 2 1040 8 0.000250 0.007634
10 1.0 7994 0 1048 0 0.000000 0.000000

Both TPR and FPR start at 100% at the threshold of 0.0, we predict ‘subscribe’ for everyone, hence there are no negative predictions :

With FPR and TPR already compute. Let’s plot them:

sns.lineplot(scores, x='threshold', y="TPR", label="TPR")
sns.lineplot(scores, x='threshold', y="FPR", label="FPR", ls='--')
plt.legend()
plt.title("TPR and FPR")
plt.ylabel(None);

image.png

ROC curve