Bank Marketing project

To introduce the concepts of classification, we’ll use an example. In this case, the objective is to predict whether a client subscribed to a term deposit or not.

For this purpose:

  1. We need to download the dataset and perform some initial preparation.
  2. Split the dataset into training, validation and testing sets. This way, we can perform the validation strategy to ensure out model produces accurate predictions.
  3. We look at feature importance to identify which features are important in our data.
  4. If necessary perform some data transformation.
  5. Then we implement a logistic regression model.

The dataset:

image.png

Downloading the dataset

To download the dataset using Python, we can do it the following way:

!wget <https://archive.ics.uci.edu/static/public/222/bank+marketing.zip>

To unzip the files, we can perform the following commands:

!unzip -o bank+marketing.zip
!unzip -o bank-additional.zip
!unzip -o bank.zip

Initial data preparation

The first step in any data science project is to import some packages to work on the dataset:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import mplcyberpunk

plt.style.use("cyberpunk")