0:00
hello everyone my name is Je kristo Shar
0:02
and today I'm going to teach you how to
0:04
use a classification report from Psychic
0:06
learn in Python a classification report
0:10
is used to compute the accuracy of a
0:12
classification model based on the values
0:16
from the confusion Matrix so we are in
0:18
the realm of supervised learning here in
0:21
order to compute the classification
0:23
report we will use the classification
0:26
report function from SK learn. metric
0:32
in order to Showcase this we are going
0:34
to use the breast cancer diet set that
0:37
tries to predict cancer whether it's
0:42
malignant uh and we are going to use
0:44
kabers classify kabers classifiers to
0:49
prediction so since this is not a course
0:52
on KNN or the breast cancer data set I'm
0:56
going to uh refer you to the proper
1:00
that I've made on KNN um and essentially
1:04
by running this I have my KNN algorithm
1:08
that return an accuracy of
1:12
92.9% so this tells us uh how precise we
1:17
were so 92.9% of the time we were right
1:22
but it doesn't tell us all the
1:24
information that we could get and in
1:26
order to get more information we can run
1:28
classification report so to run a
1:31
classification report on this we will do
1:34
from SK learn. metrics and we will
1:43
report and in order to do this we will
1:45
print and we will do classification
1:50
report and the argument the the the
1:54
parameters that we pass is the Y test
1:57
and Y PR so essentially
2:00
what was actually true in our data set
2:04
and what we did manage to predict here
2:07
that's what we pass to the
2:08
classification report and we we end up
2:11
with something that looks like
2:13
this in order to understand this report
2:17
we need to understand about the
2:19
confusion Matrix I have a compl complete
2:22
tutorial on the confusion Matrix but I'm
2:25
going to give you a small overview here
2:28
uh the confusion Matrix return the
2:30
metrics that are used to make the
2:32
classific the calculations in the
2:34
classification report it returns true
2:37
positives false positives true negative
2:41
negatives so essentially what these are
2:44
is how right like it is essentially a
2:48
table where we have the actual values
2:50
and the predicted values and what is the
2:54
actual value so negative it's not a
2:57
Cancer and the positive it is a Cancer
3:01
and we have predicted not a cancer or we
3:04
have predicted positive it is a cancer
3:07
so whenever we have negative and
3:09
negative that means our prediction and
3:12
our actual values are aligned and we
3:15
have when we have predicted positive and
3:17
positive that means that we uh have
3:20
predicted that a cancer was actually a
3:23
Cancer and the actual value were a
3:25
cancer so in blue we have the real value
3:28
are aligned with our prediction and in
3:31
yellow we have the real value are not
3:34
aligned with our predictions so that's
3:37
essentially what confusion Matrix
3:40
is and these uh these Matrix are used to
3:46
make those calculation of the Precision
3:48
the recall and the F1 F1 scar that we
3:54
minute so let's show again our
3:58
classification report right right here
4:01
and uh we will try to get better
4:04
understand uh better understanding of
4:06
that report so we have those column uh
4:10
Precision recall F1 score and support so
4:14
the Precision what it is is that when we
4:17
predict that something is true how often
4:20
is it really true so the the higher that
4:23
value is the better recall is uh what is
4:28
the percentage of True Values that we
4:32
actually manag to predict and again this
4:36
higher is better and often the uh
4:41
Precision or and recall are going in
4:43
separate Direction and that's when we
4:46
try to use the F1 score which combines
4:49
precision and recall uh recall and it's
4:52
so again F1 score the higher is
4:56
better support here is the number of
5:00
occurrences in each class of your test
5:02
so malignant and benign zero and
5:06
ones so those rows that we're seeing is
5:10
essentially those two classes that we're
5:12
trying to predict either malignant or or
5:17
ones uh the accuracy is how often are we
5:22
predicting the right outcome so 93% of
5:25
the time we're predicting the right
5:28
outcome the macro average is the average
5:32
metrix across all classes so we're
5:35
essentially comparing Precision of zero
5:37
precision and B and we divide it by two
5:40
the weighted average is actually giving
5:43
more weight to the uh a larger class
5:46
than to a smaller class so we would have
5:48
like that Precision Time 107 and
5:52
Precision Time 64 and then we would uh
5:55
divide that by the total support and get
6:03
so I made that small visual
6:05
visualization here just to uh showcase
6:08
what this is so essentially we have that
6:12
we you will recognize that confusion
6:15
metric and we have if we do the
6:19
calculation of false negative and true
6:21
positives we get recall and if we look
6:24
at false positive and true positives we
6:26
get the Precision we also have other
6:29
metrics not figure found in the
6:31
classification report but that are valid
6:34
which is the specif and the
6:37
npv so how do you calculate this and
6:40
these uh as a recap is that you have the
6:44
accuracy so the accuracy is the number
6:46
of true positive plus true negative so
6:50
true positive plus true negative so the
6:54
amount of real like uh the amount of
6:58
number of values that align between our
7:00
predictions and actual value divided by
7:03
all those four values together and
7:05
that's how we get the accuracy if we
7:08
want to look at the Precision then we
7:10
look at the true positive and we divide
7:14
by true positives and false positive so
7:17
how many right do we have compared to
7:20
the number of predicted cancer uh
7:23
situation recall is when you are looking
7:26
at true positives and false negatives
7:30
so we do true positive divided by those
7:34
recall and then in order to look at F1
7:37
score we uh have a specific formula
7:41
right here that just uh multiply
7:43
precision and recall and uh compute that
7:48
score so let's make an example with that
7:51
confusion Matrix so we start with from
7:54
esarn Matrix again this is not a course
7:57
on confusion Matrix but import confusion
8:01
Matrix but this will help us understand
8:04
the classification report better so we
8:06
do cm equals confusion
8:10
Matrix and in order to make those
8:13
calculation We compare y test with our y
8:18
predictions and we see uh we get that
8:21
confusion Matrix then we have this array
8:24
that looks like that table that we have
8:27
and then we have have uh we can assign
8:32
each of these values to true negative
8:36
false positive false negative and true
8:38
positive that way we're going to be able
8:40
to use this to actually uh showcase how
8:44
this works so let's take those uh those
8:48
true negative positive values and we
8:52
will compute the accuracy first and in
8:56
order to get the accuracy I have here
8:58
the formula so the formula is true
9:01
positive plus true negative divided by
9:03
to Total predictions so in order to
9:06
calculate this we come here and we can
9:09
do the actual true positive plus true
9:13
negative and we can divide by the uh all
9:17
the values and TP plus FP so we can
9:21
divide by all the values we can
9:24
calculate the Precision and precision
9:27
here if I came back is the number of
9:30
true positive divided by true positive
9:33
and false positives so the number of
9:37
positive divided by all the positive
9:42
results so we do two positive and we do
9:46
TP plus FP and then we can do recall and
9:51
TP divided by in this case if we come
9:55
back to this chart uh is true positive
9:59
divided by all the uh actual positive
10:04
value so in this case it's true positive
10:08
negatives and in order to uh calculate
10:15
score we had the formula which was two
10:19
you don't have to learn that
10:21
specifically but two times Precision
10:24
recall divided by Precision plus
10:28
recall so we can actually make that
10:36
recall divided by precision and in this
10:40
case we can do plus recall so and that's
10:44
the F1 score we can run all of this and
10:48
you will get these metrics if we run the
10:51
classification report you will find out
10:55
that the accuracy is the same here that
10:57
the classification would report return
11:00
Precision is also the same recall is 95%
11:06
and F1 score is 94% so you can see here
11:11
uh how uh this calculation of the
11:14
classification report is just this
11:17
classification coming from the confusion
11:20
Matrix we can also learn uh figure those
11:24
Support classes uh and in order to to
11:28
showcase the this is that in order to
11:30
know uh the number of uh classes that
11:35
are negative then you have true negative
11:38
plus false positive you will get the
11:41
negative value if you want to have the
11:43
positive value then you need to do the
11:45
true positive and uh the false negative
11:49
essentially the one that we didn't
11:51
manage to predict as
11:54
one and in order to show support for all
11:57
classes then we do false positive plus
12:00
false negative plus true negative so
12:03
essentially if we look at this we will
12:06
get that the support uh column is
12:09
essentially simple simply those
12:14
values okay we made that very complex so
12:18
let's try to interpret uh those uh
12:22
report so in order to really understand
12:25
it we need to uh convert this into our
12:30
our context is we're trying to predict
12:34
cancer so there are maybe some metrics
12:37
that are more important in our case it's
12:39
possible that what we want is we want to
12:41
make sure that we always be able to
12:44
predict cancer when there is even if
12:48
sometime we predict cancer to someone
12:50
that doesn't have it because in this
12:51
case the doctors can pick that model and
12:54
just say okay it's predicted cancer
12:57
let's do more tests so in this case we
12:59
can learn this kind of information so we
13:02
look at our uh first thing is that we
13:05
are in the medic realm of medical uh uh
13:09
diagnosis so we need something that is
13:12
very high accuracy in this case we get
13:15
93% which is uh arguably good uh in this
13:20
case could be better um but like you
13:24
don't want to predict this at a 90% rate
13:27
right you want a high accuracy in this
13:29
case so 93% of the time the model is
13:34
right uh so in this case should we look
13:40
recall uh so in this case what we will
13:44
focus is on the class one should we
13:47
focus on class zero or class one we
13:49
actually want to make sure that we
13:51
predicted when there is a cancer uh so
13:54
we will focus on class one in this case
13:56
because we want to make sure that we
13:58
always predict cancer so in this case we
14:02
want a high Precision essentially
14:04
correct positive precisions and we want
14:08
also a high recall because we want to
14:11
find all the actual positives
14:15
so essentially when we look at this is
14:19
when we predict that someone has a
14:21
cancer how often is it real cancer and
14:24
in this case it's 94% of the time
14:29
uh so there are not many benign cancers
14:33
that were uh predicted as
14:36
malignant when we look at recall what we
14:39
see is we want to capture all the actual
14:42
positives so what is the percentage of
14:45
all the real cancers that we did manage
14:47
to predict and then in this case it's
14:55
95% so if we look at this we have the pr
14:59
decision uh if we and and if we look at
15:03
all the classes in order to make it
15:05
clearer uh we have uh benign class zero
15:14
and and what it says is out of all the
15:18
time that the model predict Bine it was
15:20
right 90% 92% of the time so it's more
15:24
precise when it tries to predict cancer
15:27
than when it tries to predict Bine which
15:30
is fine in this case so that was it for
15:33
the classification report uh help me by
15:37
uh following me subscribing to my
15:39
channel and if you want to learn more
15:41
just go ahead to my blog and there's a
15:44
lot of resources there that can help you
15:47
uh doing classification report thank you
15:49
very much see you next time