**What is linear regression ?**

Linear regression is a machine learning technique which is used to find direct/linear relation between independent variable(s) and a dependent variable. For example how area of a land affects its price. Or imagine you are thinking to buy a laptop, so you might want to know how the price depends on the RAM and storage of a laptop.

It is also the simplest and the most basic kind of machine learning.

**Objective**

**NOTE: there is a mistake in the graph, 400 is repeated twice at the y axis.**

lets say the image above shows revenue generated(in thousands) by a restaurant in each month

if you are asked to predict what will be the revenue generated at the 70th month, what would you say ? well most probably you’ll follow the same trend and answer something between 500 and 600

or imagine if we draw a straight line through it, which *just fits* in all the observations. wouldn’t it be easy now to predict the revenue by just taking the month number and seeing its corresponding y value ? well it will, indeed. it will give the most accurate prediction. Thats what linear regression is, fitting a line between all the observations so that it could be used to make predictions. so basically our goal is to find the best *m *and *b* value for this line(if you are familiar with the equation ‘mx+b =y’). FROM NOW WE’LL CALL MONTH ‘INPUT’ AND REVENUE ‘OUTPUT’. ALSO we’ll call m weight(w) and b bias(b)

**So how do we find the best fit line ?**

its really simple, first we let our computer generate random value for both our variable m and b. giving us something like this

then we see what prediction it gives on all the inputs. probably almost all of them would not even be near the real revenue. so right now our goal is to **minimize the average squares of the differences between each original and predicted value**. well that was too much. okay lets simplify that. lets say the output(from the randomly generated line) for input no. 1 is 100, but the real value is 70. so what we do is substract both and square the resulting number, here we get 900. so its basically telling how far our prediction is from being right. this is what is often called **cost **of a model. then to know how it performs in our entire dataset , we’ll find the average of cost of all records . this specific cost function we are using is called **mean square error**

now imagine the MSE as a function of w, you’ll get a parabola !(try to figure out yourself why this happens). wouldn’t this graph tell you how changing the value of w would affect the cost ? now you find the derivative of the function with respect to w. if the derivative is negative, it means you should go right to decrease the cost function, and if its positive, it means going to left will decrease the cost function

you’ll do it by first multiplying the derivative with a very small number called LEARNING RATE. then substracting this number from the w. this new number is the new value of w. this way it’ll go left if derivative is positive and right if derivative is negative. and then you do this with b too. after doing one step for both w and b. repeat this step again several times(15-20 times)for both but with the function obtained because of changing w and b.

like this your model will change like

And finally to this.

Read More –