Sklearn KNeighborsClassifier (in Python) - K-Nearest Neighbors in Scikit-learn

Name: Sklearn KNeighborsClassifier (in Python) - K-Nearest Neighbors in Scikit-learn
Uploaded: 2024-11-19T16:23:25+00:00
Duration: 19 min

0:00
hi everyone my name is Ja in this Python
0:04
tutorial I'm going to show you how to
0:05
use psychic learns caners classifiers
0:08
and how to perform K rest neighbor in py
0:10
it learns so what KNN is uh it's an
0:15
algorithm used in supervised learning
0:18
for both classification and regression
0:20
and essentially is you try to classify
0:24
data in two or multiple categories based
0:27
on the closest neighbors so if if you
0:30
have some prediction here uh you can say
0:32
okay if I'm looking at three uh variable
0:36
uh in that Circle I'm going to have this
0:39
classification and so if K is equal
0:42
three then I will classify it as green
0:46
or Class B and if K is six since it's uh
0:50
bigger I have four items in that class
0:52
six and two in green then I'm
0:54
classifying it as Class A so in a
0:58
nutshell that's how K&N work
1:01
Works uh if you want to learn more uh
1:03
make sure that you uh look at my
1:06
tutorial
1:07
online uh otherwise let's get started so
1:11
I have loaded this data for you and
1:13
essentially what I'm doing is I'm
1:15
loading the breast cancer data set and I
1:18
have all these features that try to
1:21
predict something which is malignant or9
1:24
so I have these
1:26
features and what I'm trying to predict
1:29
is whether
1:30
cancer is malignant or benign in breast
1:34
cancer data
1:35
set uh so I'm going to since psychic
1:39
learn requires us to use uh numer
1:41
numerical data arrays I'm going to uh
1:44
assign the data set to X and Y and then
1:48
I'm going to get started with data
1:51
prepar preparation and splitting so
1:54
whenever we run machine learning model
1:58
what we want to do is we want to make
1:59
sure that we have a a test set and a
2:03
control set so that we can look at
2:05
accuracy and this is essentially what
2:07
this does you can look at my tutorial on
2:09
train test split if you want to
2:11
understand what this does but
2:13
essentially it creates uh it split my
2:16
data in a 30% test group so let's run
2:21
this and let's dive into the meat of the
2:25
subject which is the actual keris
2:28
neighbor and in in order to run a k KNN
2:32
algorithm with cych learn we need to uh
2:36
from SK learn the
2:39
neighbors so it's within neighbors
2:42
module you import K nearest neighbor K
2:47
neighbors
2:49
classifier and in order to train the
2:51
model we generally use the uh naming
2:54
convention KNN and we use K neighbors
2:57
classifier and the first parameter that
3:00
we want to use the parameter that we
3:02
want to set is the number of neighbors
3:04
and that's the K value I was talking to
3:06
you about so in this case we'll randomly
3:09
select eight and I'm going to show you
3:12
how to select that value later and then
3:15
we do KN ann. fit at X train and Y train
3:21
and then we will
3:26
predict using KNN
3:30
predict X test so here the fit method is
3:35
the actual training of the of the data
3:37
set so that we uh that's why we put the
3:39
training data in there and the predict
3:42
method is where we put in the test so
3:45
we're trying to predict y test uh so
3:48
here that's why we have splitted X test
3:52
y train and we're going to try to
3:54
predict y test and in order to uh
3:57
compare the score we're going to predict
3:59
compare y test to the actual y
4:02
prediction uh data and that's how uh we
4:07
compute accuracy so we do KNN doore and
4:11
we add X test and Y test so we will look
4:18
at that information and currently our
4:21
prediction accuracy is
4:24
92.9% so 92.9% of the time we are right
4:30
and that happens because this data set
4:32
uh is a clean data set so we don't have
4:34
too much pre-processing to do but let's
4:37
dive into that
4:41
information so what we want to know is
4:44
how many neighbors should we actually
4:46
put in the data so what I'm going to do
4:49
is I'm going to PL make a plot that will
4:51
tell us this information so I have the
4:54
same split and training the same data
4:57
here and I'm going to create variable
4:59
where to store the information so we
5:02
start with the same starting point of as
5:04
what we have done but what we're going
5:06
to do is we're going to try to Loop
5:08
through uh multiple value values so what
5:12
we can do is we can say
5:14
neighbors equal
5:17
NP a range 1 to 26 so essentially we
5:23
what this NPA range will do is that it
5:25
will create a range of value from 1 2 3
5:29
4 up to 26 and we are going to Loop for
5:33
each of these value so in this case if I
5:36
print
5:38
neighbor you'll see that what I'm doing
5:40
is I'm actually looping through all the
5:43
value 1 to 25 and what we want to do is
5:47
we want
5:48
to train a model using that value every
5:53
time so we use K neighbor classifier and
5:58
neighbors equal W and then we put the
6:01
number of the neighbor so here we will
6:04
train it at one train it two and then
6:06
we'll expand up to 25 in order to try to
6:09
find the best um the best uh value for
6:15
en
6:16
neighbors so we will train the model at
6:20
fit and X train y
6:24
train and then what we will do is with
6:28
the training value we will try
6:30
[Music]
6:32
to assign the test accuracy and the
6:35
train accuracy the training accuracy in
6:37
the test secur and here for each
6:41
neighbor essentially we will
6:43
assign the can and score so Cann do
6:47
score so that's our accuracy score X
6:50
train y train and then here I'm going to
6:55
do the same but instead of X Trin y
6:58
train it will be X
7:00
test y
7:02
test so I'm going to plot H so I'm going
7:06
to have this um this value here and what
7:10
I want to do is actually plot this
7:12
information so I'm going to plot My
7:15
Title Here with mad plot liim hence why
7:18
uh I've imported mad plot
7:21
liim uh and then I what I do is I will
7:26
need to plot plot The Neighbors so I'm
7:30
taking all the neighbors for the x value
7:34
and I'm going to put the train
7:36
accuracies
7:38
values and the label here will
7:43
be train
7:49
accuracy and I can do the same with the
7:53
uh test accuracies and say test accuracy
7:58
and then I will plot a legend plot X
8:00
label and plot show let me show you what
8:03
this is going to do it just plots this
8:06
information in order to tell us that
8:09
somewhere around here before it plate uh
8:13
we have the best possible and neighbors
8:17
and over here to the right that's me
8:19
that's where we start overfitting the
8:22
data um so this is a plot that shows you
8:26
the accuracy by NE number of neighbor we
8:29
can also actually do grid search CV in
8:33
order to predict the right uh get the
8:36
the actual right um end neighbor so what
8:40
we're going to do is we are going to run
8:42
parent grid and I'm going to not going
8:44
to go too deep into the grid search CV
8:47
because this is a K&N
8:50
uh uh K&N tutorial but let me show you
8:54
here uh I'm going to use en neighor from
8:57
NPA range from 1 to 50 uh and I'm going
9:00
to use grid search CV KNN
9:04
CV equal
9:06
grid
9:07
[Music]
9:08
search
9:11
CV and I'm going to use
9:13
KNN param grid and CV equal five cross
9:19
validation equal five let's not dive too
9:22
too deep into what this means but we can
9:25
use now KNN CV to fit on the xra and
9:29
white train and what this will do is
9:32
that it will go through all the end
9:34
neighbor and fit the model and what it
9:37
will show is we can show the print
9:41
K&N
9:43
CV best
9:46
pars and we can
9:52
print we can print the best score as
9:55
well so essentially by doing this we
9:59
will be able to know what are the best
10:01
parameter and what will be the accuracy
10:03
score for this best series of of
10:06
parameter in this case it tells us that
10:10
uh we have we should use six neighbor
10:12
with an accuracy of almost
10:15
95% and here when we show you the plot
10:18
earlier we were around six parameters so
10:21
we were rightous by looking at the plot
10:23
but uh grid search CV it's a much better
10:26
way if we look at our PR previous we had
10:30
90% uh 92% accuracy and now we are
10:34
showing 95% accuracy which is
10:38
fantastic now we can evaluate the model
10:42
and one of the way that we can do this
10:44
is by using confusion Matrix if you
10:46
don't know what a confusion Matrix is I
10:49
have a tutorial on the topic uh here on
10:51
my website but essentially it shows you
10:54
the true positive true negative false
10:57
negatives and false posit postives um so
11:01
essentially is how often do you how
11:03
often you are right and how often you
11:05
are wrong
11:07
or uh I won't dive too much into this
11:10
but you can uh look at that accuracy by
11:13
using cm equal confusion
11:19
Matrix and this is useful it's again not
11:22
a torial on KNN but uh not on confuser
11:26
Matrix but it's helpful to go when you
11:29
look at your KNN uh accuracy you can
11:33
look at your labels and then you do
11:36
KNN
11:38
classes so we are going to show this I'm
11:42
putting some
11:44
color and I'm putting a display and I'm
11:49
going to use confusion Matrix display
11:51
because if I just run confusion Matrix
11:54
what I have is an array like this which
11:57
you can interpret if you want to make it
11:59
more visual then you come in to with
12:01
confusion Matrix display and you can
12:10
put display
12:15
labels equal K&N classes so essentially
12:20
you are just making a beautiful plot
12:23
with
12:25
this and then you can plot the title and
12:28
plot the confusion Matrix plot and then
12:31
you can see uh your true labels and if
12:34
you want to interpret this what it means
12:36
is that you have 102 instances where you
12:39
correctly predicted the
12:42
B9 uh and 57 instances when you
12:47
correctly predicted
12:49
malignant um let's not dive too much but
12:52
you can interpret this and the way you
12:55
are going to go further is by looking at
12:57
the classification report and that's
13:00
where you can have your metri actual
13:01
metric so you go from escalar
13:07
metric import classification
13:14
report and then you can print
13:17
classification
13:19
Report with
13:22
your white test y bre so you're
13:27
comparing your prediction against your
13:30
initial test information and you get
13:34
that report again I have a tutorial here
13:37
that can uh tell you what the
13:39
classification report is uh but
13:41
essentially what you want to learn is
13:44
your uh accuracy here is
13:48
93% um and that says how often you are
13:52
right when
13:53
predicting uh and the other information
13:56
that you can look is here the class one
13:59
so essentially if we come back to our
14:01
problem here we try to uh Define how
14:05
often we can predict
14:08
cancer um as malignant and so what we
14:12
can look here we can look at the
14:14
Precision and recall and what the
14:17
Precision says is that when you
14:21
predicted
14:23
cancer how often was it really cancer
14:27
and the answer is 94% of the time when
14:30
we predicted someone had cancer it was
14:32
really cancer and the recall here is
14:37
what percentage of the real cancers of
14:40
all the cancers that existed so the true
14:42
positive and the the the false negative
14:47
uh how of of all the real cancers how
14:51
what is the percentage that we could
14:53
predict is 95% so these number in a
14:57
cases of cancer they should be really
14:59
really high and that's what we have in
15:01
this
15:02
case now let's dive into feature
15:06
scaling uh here we have a specific case
15:09
where the data set is quite clean but
15:12
with KNN it's very uh KNN is really
15:15
sensitive to the scale of the data so if
15:17
you have uh something that are on a very
15:20
large scale and some other features that
15:22
are in a tiny scale this really can skew
15:25
uh the calculations because the distance
15:28
won't map the same thing here if you
15:31
want to scale uh the data so you one way
15:35
you can do this is you might want to
15:39
convert uh the mean to be zero and the
15:44
uh standard deviation to be
15:47
one
15:48
um and so essentially you have one
15:51
standard deviation everywhere and one
15:54
way you can do this is you can use
15:57
standard scaler and and to do this you
15:59
just instantiate a
16:06
scaler and then you just run xtrain
16:10
scaled equals
16:13
scaler fit
16:15
[Music]
16:17
transform and you can run X test scale
16:21
and you run scaler and then you just uh
16:26
transform because you don't want to fit
16:28
on on the test
16:32
data forgot to say
16:35
xend X test so you don't want to train
16:39
on the test data that's why you're just
16:41
transforming and if we plot the results
16:43
here we have the original uh mean and
16:48
and standard
16:49
deviation and once we have transferred
16:53
uh transformed it we have this mean and
16:56
uh standard deviation
17:00
in order to do this within the P
17:02
Pipeline with uh uh can neighbors
17:06
classif fires uh we start with the same
17:10
initial step that we've done so far we
17:13
run the train test plate and then we
17:16
create a steps variable and the steps
17:18
variable should be a list and the first
17:22
the each step should be what you're
17:24
trying to do here we have a scaler and
17:26
we can run standard scal
17:32
the list should be two
17:34
polls and we have
17:37
KNN and what we have here is K neighbors
17:43
classifiers and neighbors should be six
17:47
because we learned that information
17:49
earlier and then we train the data in a
17:52
separate Way by running a pipeline so we
17:55
instantiate pipeline equals pip pip
18:02
line and within the pipeline we give the
18:06
steps uh two pole then we can train the
18:10
model using
18:13
KNN scale equals pipeline instead of
18:17
using KN andn the fit this time we use
18:20
pipeline the FIP and we put in the X
18:23
train y train and then we predict
18:29
using the KNN
18:32
scale predict on X test again and then
18:37
we can print the pipeline score and
18:41
that's how you train a model using a
18:44
pipeline so thank you very much that was
18:47
all for this K neighbors classifier with
18:49
pyit learn tutorial uh make sure to
18:52
subscribe to my Channel or visit my
18:54
website or follow me on the social media
18:57
thank you very much see you next next
18:59
time

Sklearn KNeighborsClassifier (in Python) - K-Nearest Neighbors in Scikit-learn

jcchouinard.com

Stanford STORM AI Generated Content

3 Main Python Error & Exception Types #python #pythonprogramming

How to use CUSTOM GPTs for SEO

Elevating Entry-Level Performance with Kompanio 520 & 528 📈 #shorts

Triggered? Exploring Trauma and Capacity

Strings in Python | Python Strings Methods | Python for Beginners | Tutorialspoint

Advanced Excel Power Query | Text Functions | Tutorialspoint

The Secret to Perfect PLC Edge Detection Revealed!

Pros and Cons of the Kodak FLIK HD10 AndroidTV Video Projector

5 Essential Text Effects in Canva Every Beginners Should Know

How to fix meteor is not recognized as an internal and external Command Error in Terminal

Learn to Give the Most Generous Assumption for a More Peaceful Life

Up next in 10

Sklearn KNeighborsClassifier (in Python) - K-Nearest Neighbors in Scikit-learn

jcchouinard.com