ROC curve and AUC score

The ROC curves we able to evaluate the performance of a model across all possible threshold choices. ROC (Receiver Operating Characteristic) shows how well a model can separate two classes, positive and negative.

We need two metrics for ROC curves: TPR and FPR, or true positive rate and false positive rate.

True positive rate and false positive rate

False positive rate (FPR)
True positive rate (TPR)

Evaluating a model at multiple thresholds

The idea is similar to what we previously did with accuracy, but instead of recording just one value, we record all the four outcomes for the confusion table.

def tpr_fpr_dt(y_true, y_pred):
    scores = []
    for t in np.linspace(0, 1, 11):
        predictions_true = y_pred >= t
        predictions_false = y_pred < t
        actual_true = y_true == 1
        actual_false = y_true == 0

        TN = (predictions_false & actual_false).sum()
        FP = (predictions_true & actual_false).sum()
        FN = (predictions_false & actual_true).sum()
        TP = (predictions_true & actual_true).sum()
        scores.append([t, TN, FP, FN, TP])

    scores = pd.DataFrame(data=scores, columns=["threshold", "TN", "FP", "FN", "TP"])
    scores["FPR"] = scores["FP"] / (scores["FP"]+scores["TN"])
    scores["TPR"] = scores["TP"] / (scores["TP"]+scores["FN"])
    return scores

scores = tpr_fpr_dt(y_val, y_pred_val)
scores

	threshold	TN	FP	FN	TP	FPR	TPR
0	0.0	0	7994	0	1048	1.000000	1.000000
1	0.1	5154	2840	311	737	0.355266	0.703244
2	0.2	7421	573	609	439	0.071679	0.418893
3	0.3	7689	305	699	349	0.038154	0.333015
4	0.4	7821	173	801	247	0.021641	0.235687
5	0.5	7901	93	865	183	0.011634	0.174618
6	0.6	7950	44	917	131	0.005504	0.125000
7	0.7	7971	23	957	91	0.002877	0.086832
8	0.8	7980	14	996	52	0.001751	0.049618
9	0.9	7992	2	1040	8	0.000250	0.007634
10	1.0	7994	0	1048	0	0.000000	0.000000

Both TPR and FPR start at 100% at the threshold of 0.0, we predict ‘subscribe’ for everyone, hence there are no negative predictions :

$FPR = \frac{FP}{FP+TN} \rightarrow TN=0 \rightarrow FPR=1$
$TPR = \frac{TP}{TP+FN} \rightarrow FN=0 \rightarrow TPR=1$

With FPR and TPR already compute. Let’s plot them:

sns.lineplot(scores, x='threshold', y="TPR", label="TPR")
sns.lineplot(scores, x='threshold', y="FPR", label="FPR", ls='--')
plt.legend()
plt.title("TPR and FPR")
plt.ylabel(None);

Ideally, FPR should go down very quickly. A small FPR indicates that the model makes very few mistakes predicting negative examples (false positives).
TPR should go down slowly, ideally staying near 100% all the time: that will mean that the model predicts true positives well.