Here, we are interested in finding the relationship between a person’s height and his/her weight. Intuitively, we would know that these two variables are positively correlated. In other words, taller people tends to be heavier than those that are shorter. For this example, height is our independent variable and weight is our dependent variable. To calculate the relationship between the two variables, we need some data. Assume that we have collected data from a pool of people and these data are tabulated and plotted as shown below:

By observing the scatter plot in Figure 1 (right), we can clearly see a linear relationship between the height and the weight of a person. It is also obvious that we can fit a straight linear line to describe the relationship between these two variables. Therefore, this justify the selection of linear regression algorithm as the right tool for this problem. As you may already know, the formula for a straight line is

where *m* is the slope and *c* is the y-intercept of the line. It is common to see these two parameters to be denoted as *c*=θ₀ and *m*=θ₁. However, in this example, we still stick with *m* and *c*. At this point, we may ask: how can we find the best value for *m* and *c*? We are going to discuss that now.

Like most machine learning algorithms, we initialize our model by assigning random values to the parameters. In this example, we will start by setting all parameters in our model as 1 (m = 1, c = 1). Next, we examine how well our randomly initialize model is doing using our data. We determine the performance of our model during training using loss function.

1. Why Corporate AI projects fail?

2. How AI Will Power the Next Wave of Healthcare Innovation?

3. Machine Learning by Using Regression Model

4. Top Data Science Platforms in 2021 Other than Kaggle

**Loss function**

Loss function (J) is employed to calculate the discrepancy between our predicted variable (y*) and the true target variable (y). The loss function that is used in linear regression is known as Mean Squared Error (MSE) and is formulated as:

Ideally, we would like a model that can give us the lowest possible loss value or loss equal to zero. This is because when loss is equal to zero, our model is able to predict a person’s weight that perfectly matches with the data we have collected (y* = y). Thus, we can say that our model excels at describing the relationship between a person’s height and weight.

However, with m and c equal to 1, such that our model is y* = x + 1, the predicted weights are significantly different from what we have collected and thus leading to incredibly high loss value. As shown in the table below, the average loss J is equal to 9135.12 and that is far far away from zero.

Credit: BecomingHuman By: WeiQin Chuah