Here, we are interested in finding the relationship between a person’s height and his/her weight. Intuitively, we would know that these two variables are positively correlated. In other words, taller people tends to be heavier than those that are shorter. For this example, height is our independent variable and weight is our dependent variable. To calculate the relationship between the two variables, we need some data. Assume that we have collected data from a pool of people and these data are tabulated and plotted as shown below:
By observing the scatter plot in Figure 1 (right), we can clearly see a linear relationship between the height and the weight of a person. It is also obvious that we can fit a straight linear line to describe the relationship between these two variables. Therefore, this justify the selection of linear regression algorithm as the right tool for this problem. As you may already know, the formula for a straight line is
where m is the slope and c is the y-intercept of the line. It is common to see these two parameters to be denoted as c=θ₀ and m=θ₁. However, in this example, we still stick with m and c. At this point, we may ask: how can we find the best value for m and c? We are going to discuss that now.
Like most machine learning algorithms, we initialize our model by assigning random values to the parameters. In this example, we will start by setting all parameters in our model as 1 (m = 1, c = 1). Next, we examine how well our randomly initialize model is doing using our data. We determine the performance of our model during training using loss function.
1. Why Corporate AI projects fail?
2. How AI Will Power the Next Wave of Healthcare Innovation?
3. Machine Learning by Using Regression Model
4. Top Data Science Platforms in 2021 Other than Kaggle
Loss function (J) is employed to calculate the discrepancy between our predicted variable (y*) and the true target variable (y). The loss function that is used in linear regression is known as Mean Squared Error (MSE) and is formulated as:
Ideally, we would like a model that can give us the lowest possible loss value or loss equal to zero. This is because when loss is equal to zero, our model is able to predict a person’s weight that perfectly matches with the data we have collected (y* = y). Thus, we can say that our model excels at describing the relationship between a person’s height and weight.
However, with m and c equal to 1, such that our model is y* = x + 1, the predicted weights are significantly different from what we have collected and thus leading to incredibly high loss value. As shown in the table below, the average loss J is equal to 9135.12 and that is far far away from zero.
In order to optimize our linear regression model so that the predicted weights will be as close to the actual weights we have collected (or in other words having a loss value close to or equal to zero), we can employ gradient descent. This step is also known as the training step.
Gradient descent is a simple but incredibly useful optimization process and it is the core idea that allow the learning process in machine learning algorithms to happen. As illustrated in Fig. 4, gradient descent usually involves two steps: (i) Calculate the gradient w.r.t. the parameter you want to update (equation 1, 2, 3) and (ii) Update the corresponding parameter using the gradient (equation 4).
The term α in equation 4 is known as the learning rate and it decides the step size of moving towards the optimal solution for parameters m and c. Intuitively, setting α to a small value would suggest a slow convergence rate (or longer training time). However, setting α to a large value would cause the learning to be highly unstable. Therefore, it is critical to select the appropriate learning rate (just like selecting any other hyperparameters).
When the training is done, we now have our optimized linear regression model!! This means that we can now estimate a person’s weight when we are provided with his/her height.
In our example, the final values of our parameters are m=0.44, c=1.68. Say we have collected the height (x) and weight (y) of a new person (James), x=178cm, y=80.5kg. We can easily predict how much does James weight, y* = 0.44 * 178 + 1.68 = 80kg. Our model is predicting that James probably weight 80kg provided that he is 178cm tall and in fact he is 80.5kg. Therefore, our model is doing a fairly good job at predicting a person’s weight, provided the error is it made is only 0.5kg.
To those that are interested in coding exercise, please feel free to visit this notebook I have created on Google Colab. In this notebook, I have included a very simple and easy to understand python codes to illustrate how to construct and train an univariate linear regression model from scratch.
Thank you for reading and I hope this post is useful to you. Any comments or feedback is greatly appreciated.