In this article, we will discuss what is a Linear regression model. There are so many resources out there that you can find regarding this topic, but when it comes to AI and ML it’s not only the code you need to understand but also the math behind it.
This article will first discuss the math involved in the Linear regression model and We will make a simple application to predict house prices based on the income in JS(Yes not Python).
For Math haters(If you love maths, feel free to skip):
WE NEED TO TALK. I felt the need to address this issue especially when it comes to ML and AI. Yes, I agree that math specifically used in Machine learning is hard, the sad truth is that we cannot make it easy, we have to acknowledge the situation here.
No right, why because we just didn’t need it. We used the functionality in many projects to print data and YAY! it worked. The same approach we need to follow while learning Maths for ML and AI(At least for the initial stages as you work you will understand in-depth).
If I say the equation of a straight line is
y=mx+c, you can refer to resources and books to go deep in rabbit holes and try to learn in-depth or you treat it as a console.log, which is, knowing what the variable stands for, what it does, and how it will give us the solution.
In my opinion, this is the easiest and the fastest way to learn maths. It worked for me!. Do give it a try.
That being said if you want to learn maths a good starting point will be maths is fun.
If you want to build your website and host it, do checkout Reseller club. They have great plans!
What is Linear regression model?
To put it simply, Linear regression is the method to draw a line in a graph to cover most of the plotted points.
In the above two examples, which graph do you think has the line that covers a lot of points? Graph – 2 right. So the process of getting from Graph – 1 to Graph – 2 is called Linear regression.
Why do we need this line anyway? To make predictions. How you ask, say If I were to ask you in Graph – 1 what might be the next coordinate of a new point. We might say by following the line (70,90) approximately. And that would be wrong because that would be nowhere near other points.
But in Graph – 2, by following the line I could say (80,95) approximately. This is close to what the other points follow as a pattern.
If we consider the x-axis to be the income of the individual and the y-axis to be the house price, we could say that for “x” income the price of the house will be some “y”. Which is exactly what we are going to do.
The Fun part(Math):
In this section, we will explore two formulas in order to completely understand how Linear regression works.
Equation of a straight line:
This equation is used to draw a straight line with the given coordinates.
y = mx + c
y – y value
x – x value
m – slope (how steep the line is)
c – the value of y when x is 0(y-intercept)
to calculate m, we use the below formula.
m = (change in y/ change in x) = (y2 – y1/ x2 – x1)
change simply means the difference between two points, for example, if there are two coordinates (1,2) and (3,4). Then the slope will be (4-2/3-1) = 1.
c = y1 – mx1
with the above formula, we take one coordinate(we can take any coordinate) (1,2) and slope m=1. So c will be c = 2 – (1)1 = 1.
We have to apply this formula for all given coordinates to get a line.
The line generated will be equivalent to the line that we drew in Graph – 1 in the above figure. So we need to optimize it right?. We will do that in the next section.
This equation is used to minimize the error in the above equation(Equation of a straight line). By minimizing we mean to generate the best fit line that covers most of the points on the graph.
the equation stays the same
y = mx + c
but the slope calculation and the y-intercept c calculation will change.
m = Σ(x – xMean)*(y – yMean)/Σ(x – xMean)2
Σ – this alien symbol means sum. example say if we have an array a = [1,2,3,5,6] Σa means the sum of all elements in ‘a’ which is 1+2+3+4+5+6 = 21.
xMean – mean value of x, it’s just a fancy mathematical way to say the average of all the given values.
YMean – mean value of y.
c = yMean – m*xMean
where m is the value that we obtained using the above formula.
Notice that we calculate y = mx+c for all the values in this equation at once.
That’s it with all the math. let’s dive into coding.
House price prediction:
In this section, we will create a simple application to predict house prices based on the income of a person. We have the following data which we can plot in a graph.
Note: The data dose’nt make much sense you cannot have a house for 1 and income of 4. For simplicity we have considered these values but this application will work for any other values.
We are going to use only one HTML file. That’s it.
The header section contains only the Chart.js file CDN. We will use this library for displaying our data in a graph.
In the above code, we represent the table data in a JSON structure(xyValues). the x-axis represents the income of the individual and the y-axis represents the house price.
We also have datasetFormatter function which gets data, color and returns a format understood by the Chart.js library.
do check the Chart.js documentation for the setup and code reference. For now, we just need to concentrate on the data that we pass to the Chart library.
first, we get all the min and max of x and y values. chartDatasets contains the array of data(we will have different sets of values).
The min and max values are used to set the bound for the graph while displaying in the browser. We are adding 5 and 50 to the x and y-axis for getting a much larger and clear graph.
The rest of the configuration is specific to Chart.js.
We are differentiating each and every piece of data with colors.
The result of the initial data is as follows.
To run the code just open the file in your favorite browser.
In the above code, getY function represents the equation of a line(y=mx+c).
getSlopeConstant function calculates the slope and the c(y-intercept) value with the formula that we discussed earlier.
We are passing the first and last values of the coordinate for calculating the slope. This is how you can calculate the slope of the entire line.
Now, we iterate the existing values to calculate the new y values based on the obtained slope, intercept, and existing x values.
then we call generateChart function to redraw the graph. now you will see two lines.
Now we got the straight line in the graph but as you can see it doesn’t cover most of the points. So we will apply the Least Square Method to efficiently generate a line.
In the above code, we first calculate the mean for x and y values.
mSlopeNumerator contains the numerator of the slope(Refer to the slope formula for the Least square method) which is Σ(x – xMean)*(y – yMean).
mSlopeDenominator contains the denominator of the slope which is Σ(x – xMean)2.
leastSquareSlope contains the slope value.
leastSquareIntercept contains the y-intercept(c = yMean – m*xMean) and We convert the value to 2 decimal numbers.
We again iterate the xyValues and get the new y values based on slope and y-intercept from the least square method.
And then we update chartDatasets and call generateChart function to redraw the graph.
Now we can find three lines.
In the above graph, we can see that the yellow line is closer to the original line in comparison with the simple line equation.
So far we have created an application that displays the lines generated using Linear regression and the Least square method. Now we can predict house prices for new values, let’s say that we want to know the house price of a person with an income of 4.5 then we can easily predict with our yellow line that the value is 62.
That’s it. Happy coding!!!
Do check out my article on creating a Recommender system.
Follow my Instagram page for tech news and updates _techimperialist 😎
Git Repository: https://github.com/kishork2120/linear-regression-least-square