Data Scientist

# Assessing classifier performance

## Introduction

• Actual class values
• Predicted class values
• Estimated probability of the prediction

These are three main types of data that are used to evaluate a classifier. We have used the first two types in previous blogs where we constructed a confusion matrix to compare the actual class values and the predicted class when applying the trained model on the test data with a support vector machines classifier model.

## The Data

The data was downloaded from the UCI Machine Learning database and inspired by Cortez et al., 2008. We use maths results data only. We start off by clearing the workspace, then setting the working directory to match the location of the student maths data file. A caveat, note that the data is not comma-seperated but semi-colon seperated, be sure to specify this in the sep argument in the read.table() function. Refer to the sessionInfo() output at the foot of this blog-post to determine which packages are installed and loaded for this blog.

Let’s have a look at our data using the convenient glimpse courtesy of the dplyr package. Notice how the range of the numeric variables is similar as we have used our custom normalise() function. We also convert the G3 to a binary pass or fail grade called final and use this as our class that we wish to predict for future students. Are they going to pass or fail that all important end of year exam?

In an earlier post we describe all the steps for building this decision tree classifier in detail, we will not repeat that here but instead carry on and attempt to evaluate the classifier’s performance. The model looked like this:

We evaluate by comparing real outcome with predicted outcome of students exam result.

93.4% model accuracy not bad, 3 students proved us wrong and passed anyway! Seems like a useful model for identifying students who need extra intervention and importantly it can be applied and interpreted by a human.

To dig deeper and output the predicted probabilities for a C5.0 classifier we can set type = "prob". We cbind() these columns produced by the model regarding the test data and look at it. Where p is the predicted class and the probabilities of pass or fail based on the model are given in the final two columns. Notice that when the predicted type p is pass, the probability of pass is near one but near zero when p is failure.

We can identify what is happening when the predicted and actual values differ using the subset() function.

Notice that the probabilities are somewhat less extreme. In spite of such mistakes is the model still useful? That depends somewhat on the context of the problem. We started looking at this data is a way to inform which students should be provided with extra intervention to turn them from a fail into a pass. The CrossTable() function used earlier describes the type of students we are failing which may make things more palatable. Rather than students slipping through not receiving the intervention, we would be exposing students to the intervention who would pass anyway, this may be more or less acceptable depending on the context of the problem.

## Beyond accuracy

We can also use the confusionMatrix() function from the caret package to provide other measures of accuracy but we must specify the “positive” outcome. We can also determine the sensitivity and specificity of the model.

We may prefer the situation that a couple of students are getting additional help they don’t need with its associated costs rather than students are missing out on passing a crucial exam. We can use this data, the model and associated accuracy statistics to inform decision making.

## Visualising performance

To create visualisations with ROCR package, two vectors of data are needed. The predicted class values and the probability of the positive class. These are combined using the prediction() function.

Qualitatively, we see that our ROC curve appears to occupy the space in the top-left corner of the diagram, which suggests that it is closer to a perfect classifier.

## Resampling methods

However, we still havn’t addressed how well the model performs if applied to data it hasn’t seen yet (beyond the single instance of the test data). Cross-validation and bootstrapping methods can help us understand the models accuracy further, but will be discussed in a later post.

## References

• Cortez and Silva (2008). Using data mining to predict secondary school performance.
• Lantz, B. (2013). Machine Learning with R. Packt Publishing Ltd.
• James et al., (2014). An introduction to statistical learning with applications in R. Springer.
• Tobias Sing, Oliver Sander, Niko Beerenwinkel, Thomas Lengauer. ROCR: visualizing classifier performance in R. Bioinformatics 21(20):3940-3941 (2005).
• Max Kuhn. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, Yuan Tang and Can Candan. (2016). caret: Classification and Regression Training. R package version 6.0-64. https://CRAN.R-project.org/package=caret