0:00
hello everyone my name is kristo schar
0:02
and today I'm going to teach you how to
0:04
use skarn linear regression to make a
0:07
multiple linear regression and python so
0:10
a linear regression is using ordinary Le
0:13
Square to uh predict value so
0:17
essentially it uses the escalar linear
0:20
model linear uh regression class to fit
0:23
a linear model to minimize the residual
0:26
sum of square so we're trying to fit a
0:28
line on this dat that was as close as
0:31
possible as all those blue data point by
0:35
saying we give an intercept we give a
0:37
slope and we fit that line and we try to
0:39
minimize the sum of these green uh lines
0:45
essentially a multiple linear regression
0:47
what it does is similar to what a simple
0:50
linear regression does but it makes it
0:53
models a relationship on one dependent
0:56
variable on two or more independent
0:58
variable so essentially we're trying to
1:01
fit multiple lines with the same
1:03
intercept on the data and we Tred to
1:06
minimize that residual sum of
1:09
square so if we look at a primary on
1:12
linear regression we had a previous
1:15
tutorial where we ran this and we
1:18
essentially loaded the data and we
1:20
selected just one of the features so we
1:23
just looked at BMI uh to predict uh uh
1:30
we split and trained and testing data
1:33
and then we uh fitted the model we
1:36
predicted the Y and then we computed the
1:39
metrix we had some subset of uh metrix
1:43
results and now we're going to see the
1:45
difference between this and a multiple
1:48
linear regression so with the multiple
1:50
linear regression we're going to keep
1:53
all the features not just BMI but all of
1:55
these features in order to predict
2:00
so again what we're going to do is we're
2:02
going to split training and testing data
2:06
so that we can evaluate the score later
2:08
on and to do this uh is with train test
2:12
place and I'm not going to go in the
2:14
details of how we do this but we get the
2:16
X training data and Y training to train
2:19
our data set and we're going to use X
2:24
yred uh and we're going to compare ypr
2:27
with Y test in order to do this
2:30
we are going to train our model with
2:33
from SK learn and linear model module is
2:39
where you will find the linear
2:47
regression so okay so we do this we are
2:50
splitting the training in data and then
2:53
we train the model so we do Lin R and we
2:57
do equal linear regression so we we just
3:00
instantiate the model
3:03
here and we will use this class of
3:06
linear regression to fit so fitting
3:10
means training so we pass X train and Y
3:13
train so we're training on the training
3:16
data not the original data and then we
3:18
do ypr equals linear reg predict and
3:24
then we try to predict X use X test to
3:27
predict y then we run this
3:30
and we have performed a uh multiple
3:34
regression in order to really go into
3:38
what uh it it means to do a multiple
3:41
linear regression is in in the
3:45
interpretation so so far we've had all
3:48
the same steps as earlier uh now what
3:51
we're going to do is we are going to
3:52
show how it impacts so you do data frame
3:56
so we're going to do a data frame that
3:59
will had all the linear regression
4:02
coefficient earlier we had just one
4:04
coefficient because we had just one
4:07
feature now we have multiple feature and
4:10
we will have multiple coefficients so we
4:13
use the coefficient attribute and we pl
4:16
X columns and then we give a column
4:22
names and we will name this as
4:27
coefficient so this dat frame will help
4:30
us making a plot and we will plot all
4:33
these data into a bar
4:39
plot we use the pandas plot function to
4:44
kind bar H uh we're going to give a
4:47
title X label and Y Lael for some
4:50
information and then we do Plot show now
4:53
we try to plot this data and what we see
4:57
is that for each of the feature we have
4:59
the coefficient and which one impacts
5:02
the most variance of the body mass index
5:05
so we can see that S5 and S1 have big
5:08
big impact here uh and and now we are
5:13
going to look at each of those Mets so
5:16
this is the difference earlier with a
5:18
simple linear regression we had only one
5:22
of this these coefficient and now we
5:26
these so we are going to compute the
5:29
accuracy of this uh by using escalar
5:33
metrix from escalar metric and
5:37
essentially what we try to compute is
5:40
the R2 R2 score we try to compute the uh
5:44
mean absolute error and we try to
5:48
rmsc so we are not going into the
5:52
details of how to calculate this uh what
5:55
they mean but here I'm going to show
5:58
sorry I have a problem here import
6:00
Matrix and then to calculate the Ary
6:03
score there is a convenient
6:07
uh function in metrix so you just pass
6:10
in y test and you pass in the result of
6:13
your prediction and it will calculate R2
6:17
score for you again with metrix uh mean
6:20
absolute error is mean
6:28
error and again you pass it uh y test
6:32
and Y PR and lastly we are doing
6:43
squared error so if you don't know why
6:46
I'm doing this uh I have tutorial on
6:48
each of these uh you can look at them
6:51
but essentially we're predicting the
6:54
metrics um how do you interpret this uh
6:58
we earlier we had a simple linear
7:01
regression these were our metrics and
7:04
now we have a multiple linear regression
7:06
and these are our metrics if we look at
7:09
this closely we can see that mean
7:11
absolute error is lower the rmsse is
7:15
lower and the R2 score is higher so what
7:19
the mean absolute error is says is the
7:22
average absolute difference between what
7:24
you predicted and what really uh
7:27
happened so the lower the better and in
7:31
this case multiple linear regression was
7:35
lower similar the magnitude of R the of
7:39
error you want it as low as possible and
7:41
with multiple linear regression it was
7:43
lower than simple and lastly you want to
7:47
look at coefficient of of determination
7:50
so you want to know how much of the
7:52
variation in the body mass in the in the
7:56
uh how much of the the percentage of the
8:00
in the progression of the disease how
8:03
much of it is explained by uh all your
8:05
independent variables the higher it is
8:08
the better it is the better fit fitted
8:11
your model is so in this case uh having
8:14
more multiple linear regression is
8:17
better for all three of our
8:19
metrics uh so in this case we absolutely
8:24
wanted to use a multiple linear
8:26
regression we could have further
8:28
improved those or by doing some data
8:30
preprocessing or hyperparameter tuning
8:34
but this was outside the scope of this
8:36
course so this was it for this tutorial
8:39
so please help me and subscribe to this
8:41
Channel and see you in the next tutorial