Evaluation metrics

In the previous chapters, we built a classification model to classify customers as either subscribed or not subscribed. But how accurate are the model’s predictions?

For this, we use a metric. A metric is a function that looks at the predictions the model makes and compares them with the actual values.

This comparison is quite useful since we able to compare different models and select the one with the best metric value.

Classification accuracy

In the previous week, we saw an implementation of the accuracy metric, this metric measures the percentage of correct predictions the model makes.

We already calculated the accuracy in the previous chapter for our classification model, and the accuracy was 0.9, which means that our model is 90% accurate. Remembering the Python implementation to calculate the accuracy metric:

y_pred_val = model.predict_proba(X_val)[:, 1] #soft predictions
subscribed = y_pred_val >= 0.5 #hard predictions
(y_val == subscribed).mean()
# 0.9

We transformed from ‘soft’ predictions to ‘hard’ predictions using a threshold value — 0.5 in this case. But why did we choose 0.5 as the threshold value and not any other number?.

That was an arbitrary choice. We can check other threshold values and choose the one with the best accuracy score.

The Scikit-Learn library offers a variety of metrics, including accuracy, which we can use to find the best threshold value. If we build upon on the project from the 3rd week.

from sklearn.metrics import accuracy_score

thresholds = np.linspace(0, 1, 11)

print(f"{'Threshold':^9}{'Accuracy':^12}")
for t in thresholds:
    subscribed = y_pred_val >= t
    acc = accuracy_score(y_val, subscribed)
    print(f'{t:^9.2f} {acc:^10.4f}')  
       
#Threshold  Accuracy  
#  0.00      0.1159  
#  0.10      0.6515  
#  0.20      0.8693  
#  0.30      0.8890  
#  0.40      0.8923  
#  0.50      0.8940  
#  0.60      0.8937  
#  0.70      0.8916  
#  0.80      0.8883  
#  0.90      0.8848  
#  1.00      0.8841

As we see, using the threshold of 0.5 gives us the best accuracy.

Dummy baseline

An accuracy metric of 90% seems like a good number, but is it the best value possible? Are there other models with an accuracy greater than or less than 90%?

A dummy model or baseline model is useful to give us a reference metric. In our example, the dataset is imbalanced, so the dummy model can always predict the majority class, in other words, the model will always output False. Quickly, we can create a dummy model with:

size_val = len(y_val)
baseline = np.repeat(False, size_val)

Now we can check the accuracy of this baseline prediction: