Linear Regression for Beginners: A Mathematical Introduction

Updated: Jul 17




Linear Regression is a supervised machine learning algorithm where the value of the dependent variable is predicted for one or more independent variables (features) based on the best fit line.

What are Dependent and independent variables?

Dependent variables are those variables whose values have to be predicted. A dependent variable may also be known as the output. The predicted value or the dependent variable is based on the values of one or more independent variables.

On the other hand, independent variables are those variables whose values are fixed and not dependent on any other variable. They may be known as input or features. There can be one or more independent variables that can help in the determination of the dependent variable.

For example, let us take a scenario of the ‘House Price Prediction’ problem. In this problem, the price of the house has to be predicted based on the given size of the house.

Let x be the size of the house and y be the Price of the house. Here,x is the independent variable or a feature whereas y is the dependent variable as it’s value has to be predicted.

The following is the sample data provided:


Table 1


First, let us get an idea about the best fit line!


What is the Best Fit Line?


It is a straight line on the graph where the distance of each of the plotted points from that line is minimum.

The equation of any best fit line is given as:


Y`=mx+c

Here, m is the slope of the line, and c is the point where the line cuts the y-axis.

Let us now use the above table 1 to plot the points.


Fig 1



In the above figure 1, the plotted points are indicated in blue colour. Let the following straight-line represent the best fit line as indicated by red colour.

We also assume that the equation of the best fit line is given as follows:

Y`=20 x

This means that the value of m is taken as 20 and c is taken as 0.

Since the linear regression is used for prediction, we will predict the price of the house is the given value of the size of the house is 60.

Using the above equation of the line, we can easily see that when x is 60, then Y` will be 1200.


Cost Function


It is the function which minimizes the sum of all the distances of the plotted point from the best fit line.

The cost function for a particular slope m is given by the following equation:


Here, m is the slope of the line, n is the number of points, Y`ᵢ is the predicted value on the best fit line and yᵢ is the given value of the dependent variable for the given x.



Explanation of the cost function with an example

Let us consider a sample dataset:


Table 2

Now, if we plot the data points given in table 2 and draw a line, we get the following graph:


Fig 2


Here, the straight line (indicated by red colour) is given by the following equation:


Y` =mx+c

If we consider c=0, then the equation becomes Y`=mx

Case 1) When m=1:

Y`(1)=1×1 =1, Y`(2)=1×2 = 2, Y`(3)=1×3 = 3, Y`(4)=1×4 = 4


As we from equation 3 that


Hence, the cost function when m is 1 will be given as follows:


Case 2) When m=2:


Y`(1)= 2×1= 2, Y`(2)= 2×2= 4, Y`(3)= 2×3= 6, Y`(4)= 2×4= 8

Hence, the cost function when m is 2 will be given as follows:


Similarly, we will calculate the cost function for all the different values of m. The best line will be determined whose cost function will be the minimum, in this case, it is for m=1. Hence the value of the slope, m is chosen as 1 for this kind of sample data.

Now, the question arises for how many values of the slope m should we calculate the value of cost function. This will be explained through a concept known as Gradient Descent which will be covered in the next post. So stay tuned!

164 views