This tutorial is for those who use the linear regression model and wants to understand the math under it. So here I am going to explain how mathematically linear regression works and how to implement it from scratch in Python.
We will use a random example with one independent variable and one dependent. Let’s create them:
import numpy as np
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
So we have created two vectors which have a correlation with some random noise. So that it looks like real-world data (nothing is perfectly linear). For better understanding we will plot them with Matplotlib’s Scatterplot:
import matplotlib.pyplot as plt
%matplotlib inlineplt.scatter(X,y)
Now we can see that the data is somehow linear but noisy and it’s OK.
After this we must understand how linear regression prediction works:
The predicted value calculation is based on the features vectors multiplied by their weights(theta) and also bias term.
In our case we have only one feature, so we will have only bias term and one weight theta 1.
Our goal is to minimize the cost function so we must find the best theta parameters. Here we can use Normal Equation:
Here the main part of this tutorial starts. We will go through this equation and explain it step by step and then implement it in Python.
In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal, that is it switches the row and column indices of the matrix by producing another matrix denoted as A.T. Lets implement it in Python with Numpy:
A = np.array([[10,20,30],
[40,50,60]])
AOutput: array([[10, 20, 30],
[40, 50, 60]])
We can create the transpose function:
def transpose(matrix):
transposed = np.empty((matrix.shape[1], matrix.shape[0]), dtype="int16")
for i in range(matrix.shape[0]):
for j in range(matrix.shape[1]):
transposed[j,i] = matrix[i,j]
return transposed
At = transpose(A)
AtOutput: array([[10, 40],
[20, 50],
[30, 60]], dtype=int16)
Also, we can simply use .T method:
A.TOutput: array([[10, 40],
[20, 50],
[30, 60]])
It is easier and we have the same result but the function was provided for your better understanding of what is happening.
Now let’s talk about matrix inverse:
First, we must calculate the determinant of our matrix. In linear algebra, the determinant is a scalar value that can be computed from the elements of a square matrix and encodes certain properties of the linear transformation described by the matrix. Here you can see how it is calculated:
It was the first step for a matrix inverse. We can do it manually or use numpy’s method:
import numpy.linalg as LA
M = np.array([[1,2],
[3,4]])
LA.det(M)Output: -2.0000000000000004
The next step:
We go one by one through all elements and just put them as a center of the cross. We get rid of all elements which are in the cross and get only others as it is shown on the picture. It is a simple case for 2×2 matrix, but for more complicated, you should also use that cross and build matrix with small matrices:
And then replace those small matrices with their determinants ( they can be negative). In our case we have this:
M1 = np.array([[4,3],
[2,1]])
After this step, we must multiply each number by positive 1 then by negative 1.
M1 = np.array([[4, -3],
[-2, 1]])
Now we must reflet all numbers along the main diagonal:
M1 = np.array([[4, -2],
[-3, 1]])
After this we just divide all elements by the determinant of the original matrix:
M_1 = M1/(LA.det(M))
M_1Output: array([[-2. , 1. ],
[ 1.5, -0.5]])
As always we can easily do it with numpy’s method:
LA.inv(M)Output: array([[-2. , 1. ],
[ 1.5, -0.5]])
The last step is to understand matrix multiplication:
It is clearly understandable from these examples. Now we know everything that we need and can do the calculation of theta.
First, we must add ones to Features vectors. We need this for that bias term.
X_b = np.c_[np.ones((100, 1)), X] # add x0 = 1 to each instance
Now we can compute our theta:
theta = LA.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
thetaOutput: array([[3.94500004],
[2.92708698]])
As we see our results are very similar to that 4 and 3 from the beginning.
For new data we just have to multiply new X by our theta for making predictions:
X_new = np.array([[0], [2]])
X_new_b = np.c_[np.ones((2,1)), X_new]
y_pred = X_new_b.dot(theta)
We can also plot it:
plt.plot(X_new, y_pred, "r-")
plt.plot(X,y,"b.")
plt.axis([0,2,0,15])
plt.title("Predictions")
plt.show()
As the extra example we can see how it works with sklearn:
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)
lin_reg.intercept_, lin_reg.coef_Output: (array([3.94500004]), array([[2.92708698]]))
So it finds the same best coefficients.
Thank you for your attention!!!
Credit: BecomingHuman By: Taras Rumezhak