Linear regression

To predict the laptop’s price we’ll use linear regression, this is predicting a target numerical.

We already have a dataset that contains a matrix X of features and a y vector with the values we want to predict. Recall the general form that represent a supervised machine learning model:

$$ y \thickapprox g(X) $$

To facilitate compression the next concepts, we take a single record that we represent with $x_i$ and the target with $y_i$, such that:

$$ y_i \thickapprox g(x_i) $$

Where $x_i = (x_{i1}, x_{i2}, x_{i3}, ..., x_{in})$ is a single record that have n-features.

There are many ways the function g could look, but to do justice to the name, we’ll establish that the function g will be a linear relation between the $x_i$ and $y_i$ vectors, it has the following form:

$$ y_i = g(x_i) = g(x_{i1}, x_{i2}, x_{i3}, ..., x_{in}) = w_0 + x_{i1}w_1 + x_{i2}w_2 + x_{i3}w_3 + ... + x_{in}w_n $$

Where $w_0, w_1, w_2, ... ,w_n$ are the variables that establishing the relation between the features and the target. Normally, this variables is called the weights. In particular, $w_0$ is the bias term*.*

Let’s use a shorten notation:

$$ y_i = w_0 + \sum_{k=1}^{n} x_{ik}w_k

$$

Based on this relationship, we can implement a prediction using linear regression, which in Python code would look like this:

def linear_regression(X, w):
    return w[0] + X.dot(w[1:])

How do we select the correct weights for the $w$ vector?

Easy, we learn the weights from data. In the previous block we had the following expression:

$$ Xw = y $$

The matrix $X$ may not be a square matrix and to find the $w$ vector, we need to invert the matrix $X$. A solution for this is:

$$ X^TXw = X^Ty

$$