0:00
hello everyone my name is Jael today I'm
0:02
going to teach you how to use grid
0:04
search CV in sidekit learn so grid
0:07
search CV is used to test different
0:09
hyperparameter values and find the one
0:11
for your model so whatever model that
0:14
you have machine learning model that you
0:16
try to train you often use need to
0:19
define the parameters that you're going
0:21
to use so for example in supervised
0:23
learning if you do a classification with
0:25
K neighbor neighbor you need to define
0:27
the number of Neighbors in order to to
0:30
train your data set and the different
0:32
depending on the number of neighbors
0:34
that you select you will have different
0:36
accuracy score so how this works
0:40
that uh whenever you uh do a machine
0:44
learning workflow you will split your uh
0:47
data into training and testing set and
0:52
you will compare your testing results
0:55
with the actual results uh with grid
0:59
search CV you have an extra step and is
1:01
you take your training set and you get a
1:03
hold out validation uh validation
1:07
section from it that you will use in
1:10
order to find your best uh parameters so
1:14
grid search CV will split the training
1:16
data into k equal parts which are folds
1:19
which is what we mean when we say k
1:22
folds and each fold is used as a
1:26
validation set and the remaining
1:28
information is used uh in the training
1:30
data set so essentially what it does in
1:35
the cross validation is that you have uh
1:39
you will split your data and you will
1:42
split it if you use fivefold for example
1:46
you will uh split it in five fold and
1:49
you will train your data by taking the
1:52
first fold in the validation set and get
1:56
a metric out of it so accuracy score for
1:58
example and then the rest is all
2:01
training data and then you will do it
2:03
again but this time you will use the
2:05
second fold uh for your uh uh validation
2:09
set and then you will get a metric and
2:11
then you will compute all those metrics
2:16
whatever uh metric is the best parameter
2:19
so for example if we do K and N we have
2:22
the number of neighbors here and then
2:24
the number of folds so the first fold is
2:27
we can see uh the accur y score here is
2:32
higher when you use three neighbors and
2:34
when we look at the mean accuracy of all
2:36
of that we figure out yeah en neighbors
2:39
is indeed the best parameter we should
2:41
use for um in this case so essentially
2:46
that's what uh grid search CV tries to
2:49
do I will make an example on the breast
2:51
cancer data set so I've I'm loading this
2:54
data for you and uh the breast cancer
2:57
data set essentially what it has is a
3:00
number of features and all of the these
3:03
features can be used to predict the
3:05
Target and if we look down below here uh
3:09
we see that the target is whether a can
3:12
a breast cancer is malignant or is
3:16
benign so in our case we're trying to
3:19
predict malignant cancer or benign
3:25
cancer first step is we need to have our
3:28
data as arrays uh psych learn wants us
3:31
to provide array to the model and I
3:35
won't D dive too much into train test
3:37
spit I have a tutorial specifically on
3:39
train test split but essentially what we
3:41
try to do here is this step where we
3:44
split training and testing data and we
3:47
will generate this so I don't spend too
3:50
much time on this but essentially I will
3:52
have this training data so I'm splitting
3:54
the original data set I'm taking 30% to
3:58
keep for validation uh to check test the
4:01
accuracy later on now let's dive into
4:04
the real uh the real objective of this
4:09
this tutorial is how to use grid search
4:11
CV so in order to use grid search CV we
4:15
import from escalar model selection we
4:18
import grid search CV and in this case
4:20
we are going to do a classification with
4:24
KS neighbor I also have a uh a tutorial
4:28
of KNN um if you want to look into it uh
4:32
tutorials uh so what we do is we do K&N
4:36
and then we instantiate the K neighbors
4:39
K classifier so that's generally what we
4:42
start to do uh but this time instead of
4:48
equals a certain number like what we
4:51
would do generally we would just Define
4:53
the number of Neighbors in this case we
4:55
don't know which one we want to do so
4:57
what we're going to do is we're going to
5:03
dictionary and that's where we're going
5:09
neighbors and for the end neighbors
5:11
we're going to make a list and we're
5:13
going to say three five 7even n so in
5:17
this case I'm going to ask the model to
5:20
uh the grid search CV to try these four
5:22
parameters and tell me which is the best
5:27
CV equal and then I do grid search CV
5:33
and that's how you use it and the first
5:35
uh argument that you're going to provide
5:37
is the K&N model uh that we have the
5:42
second argument that we're going to
5:47
grid that we just given and we can put
5:51
CV fivefold CV so we're going to do
5:53
fivefolds like uh I've shown you in this
5:56
screenshot I'm going to do fivefold uh
6:00
we can increase that but in this case
6:02
we'll limit ourself and the scoring that
6:04
we'll use is the accuracy score there
6:07
are plenty of other metrics that we can
6:10
use for example we could use the F1
6:12
score or other uh metrics like this but
6:16
we're going to focus on accuracy score
6:20
now so we can run this and then what we
6:25
do is we use this K&N CV so instead of
6:28
using KNN to train we fit with x the
6:32
training data with X and
6:35
Y so we don't use the testing data
6:38
because we keep that for later uh
6:42
estimating and now we have trained the
6:46
model that's what the fit method does
6:48
and from here we do canon. uh Canon CV
6:52
we use that fitted model and we find the
6:56
params and we will also print Cann
7:01
CV best score and we are going to limit
7:06
this to two the two value after
7:11
the uh I'm just going to print this best
7:19
score so here I can see that the out of
7:23
these uh forign neighbors that I've
7:26
given it has selected uh and neighbor
7:29
equal five and the best accuracy score
7:34
is 91% in this case using the best
7:38
core if you want to evaluate that best
7:41
model so what you're going to do is
7:43
you're going to get the best K&N model
7:45
and you say Cann CV and the way to do
7:49
this is you use the best estimator art
7:52
attribute which essentially is the model
7:55
that performs the best and you use this
8:01
y so you do bestn so that's your new
8:05
model that you want to use for
8:08
prediction and you use X test to predict
8:12
so essentially we will predict y using
8:16
the best predictor and then we look at
8:23
best K&N do score and we to to evaluate
8:29
the core score we do X
8:36
print that test accuracy and that's how
8:40
we show um we use that best estimator
8:44
and then we test the accuracy with this
8:46
and then we end up with almost 96% of
8:52
accuracy so grid search CV is very cool
8:56
but what is not cool about grid search
8:59
CV CV is almost everything else is that
9:02
it doesn't scale very much I'm not going
9:04
into that uh that code at all but what I
9:08
want to show you is if I take this and I
9:12
increase the number of hyper parameter I
9:15
can show that if I'm doing three folds
9:18
and I'm using three hyper parameters
9:21
it's still manageable so I need to fit
9:24
uh quite a low amount of time because
9:27
remember our code what it did is
9:29
essentially is it looped through each of
9:32
these parameters and it fitted a model
9:35
for each of the parameter so that every
9:38
time it does some Computing it costs
9:41
money and that can become quite slow and
9:44
expensive when you look at this chart
9:46
you can see that when you get at five
9:49
par number of fits uh if you have a a
9:53
three4 fold or 10-fold uh CV and you
9:57
have six hyper parameter or even seven
10:00
with only seven hyperparameter you end
10:03
up with almost a million fits required
10:07
just to do the tenfold here so that
10:09
grows very quickly it's not very
10:13
scalable and in order to uh work with
10:17
this is we are going to work with
10:19
randomized search CV instead um and this
10:23
is just an introduction but essentially
10:25
what randomized search CV does is that
10:28
instead ofun running everything it will
10:30
take random hyperparameters and run them
10:34
uh so if we take that param param grid
10:37
we used to have uh a four value here and
10:40
now I'm I'm providing 49 values so this
10:44
is a lot more than just that seven
10:46
parameters how randomized uh search will
10:50
do is essentially uh it will do the same
10:56
thing as what we did
11:00
so the process is very similar and you
11:02
just use randomized search CV instead of
11:05
cross validation CV and you provide a
11:09
few parameters here uh again you provide
11:12
KNN you provide param grid like we just
11:16
did now we will say Okay limit yourself
11:20
to only 10 iteration and then we say
11:24
cross validation five and then
11:27
scoring equal accuracy
11:31
again and we will provide a random state
11:39
reproducibility uh so we run this and
11:43
essentially we can do all the steps that
11:45
we just did we do them again and we can
11:49
see that uh it went quite fast it didn't
11:53
have to perform a million randomized
11:56
search uh just 10 in this case
11:59
and the best parameters was 14 ant
12:03
Neighbors which we didn't provide
12:06
earlier we Prov we gave four values and
12:10
14 wasn't part of those and in this case
12:13
it it gave me the best enigh was 14 the
12:17
accuracy was 91 and the test set
12:20
accuracy was 96 so that's it this is how
12:25
uh grid search CV Works in py learn and
12:28
help me by subscribing subing to this
12:29
Channel or visit my blog at jc.com see