Sunday, November 03, 2019

AUC Calculations

Introduction

In the last post on this topic, the AUC metric of classifier performance was seen to have a dual nature. It can be viewed as a probability calculation or the area under a particular curve (AUC = area under curve). These are mathematically equivalent, and the basic principle behind both is easy to understand and useful to know.

Recall that we want to distinguish a "success" case Y1 from a "failure" case Y0, based on some information we have about each case, X1 and X0. Unless the scale is reversed, we compare the two predictors and guess that Y1 is the one with the greatest value of X. If there is a tie, we have to flip a coin.

The explanatory power of a predictor in these conditions derives from

(Pr[Y1|X1>X0]Pr[Y0|X1>X0])/2.

That is, we take the probability that the predictor gives us the correct answer and then subtract the probability that it gives us the wrong answer. On the ROC graph, this probability difference is the area above the diagonal guessing line. 

Titanic Example

The distribution of passenger class is different for survivors and non-survivors. The rates are
PclassY0Y110.1470.39820.1780.25430.6750.348

For example, for a randomly-selected survivor Y1, the probability of being in first class is about 40%, but for a non-survivor it's only 15%. So when a random selection is made from each of these columns to pick a Y1 and Y0, the combined probabilities are obtained by multiplying out each of the possibilities. See the R code at the end for how to do this. The result is a square matrix

Y0Class1Class2Class3Class10.0580.0710.269Y1Class20.0370.0450.172Class30.0510.0620.235

showing for example that the joint probability of the survivor Y1 being in first class and the non-survivor Y0 being in third class is 26.9%.

We will correctly classify the two cases (guessing which one of the two is Y1) only when the passenger class is greater for Y1, or if the two classes are the same and we guess correctly. To reflect this correct probability rate for each of the entries in the matrix above, we can weight them accordingly to obtain

Class1Class2Class3Class10.0290.0710.269Class20.0000.0230.172Class30.0000.0000.117

The entries above the diagonal haven't changed, and the ones on the diagonal (where the classes are equal) have been halved, since we have to flip a coin in those cases. Below the diagonal, the predictor is giving us the wrong answer: the passenger class of the survivor is lower than the non-survivor, so those cells are zeroed out--they don't add to the probability of being correct.

We can obtain the AUC by simply adding up the probabilities in the last matrix, to get .68. Here' the ROC graph.

ROC graph for passenger class predicting survival

Improvement Over Guessing

The computation above gives us the whole AUC, which includes the cases in which we just guess correctly. Notice that the diagonal of the last matrix sums to only about 17%, far less than the 50% rate we can get by ignoring passenger class and just flipping a coin. 

To get the excess probability over guessing, which in our case is .68 - .50 = .18, we have to use the formula advertised at the top, viz: for each combination of passenger class, like 1 vs 2, find the probability that we get the right answer and subtract the probability that we get the wrong answer and divide by two. That matrix, after dividing by two, looks like

Class1Class2Class3Class10.0000.0170.109Class20.0170.0000.055Class30.1090.0550.000

We only care about the upper triangle here, above the diagonal, which is .017 + .109 + .055 = .180. (The entries below the triangle just tell us what would be the case if we had our predictor reversed: we'd get an AUC less than .5 by .18).

The steps in the calculation of the excess probability are the same as finding the area of a triangular region, which is one way the AUC can be calculated. The comparison is shown in the areas in the image below.


[Edit] Why Divide by Two?

The divide-by-two in the formula may seem mysterious. Here's where it comes from. I'll demonstrate for the 2x2 case. The general result comes from the same idea.

If we go back to our column-normalized matrix 

Y=0Y=1X=0abX=1cd

and multiply columns (outer product of the second column with the first), we get the probability of each possibility:

abbcadcd

The predictive power comes from the ad case, where the values of X are in the correct direction to pick Y=1. As we saw before, 

AUC=ad+12(ab+cd)

where the second term is due to guessing when the predictor values are the same for both cases. The algebraic trick we need is the fact that the second matrix above sums to 1, so that we could also write 

AUC=1bc12(ab+cd)

Then we just add the two AUC formulas and divide by two to get

AUC=12+12(adbc)

The explanatory power beyond guessing is what we get after subtracting the 1/2. In the general case, we have to sum up all the pieces that look like adbc, which amounts to taking the probability matrix, subtracting its transpose, summing anything above the diagonal, and dividing by two.

This, in turn, is just an application of a kind of probabilistic "recentering around .5" you can do with any event A. If A' is the event's complement, then

Pr(A)=1Pr(A) so 2Pr(A)=Pr(A)+1Pr(A), or

Pr(A)=12+12(Pr(A)Pr(A))

The direct summing of probabilities in the frequency matrix is the left side of the equation. The summing of triangles in the geometric AUC calculation is the right side.

R code

library(tidyverse)
library(knitr) # optional -- pretty tables

# read in data 
pass <- read_csv("titanic.csv")

# calculate the passenger class frequencies conditional on survival status
rates <- pass %>% 
  select(Y = Survived, Pclass) %>% 
  group_by(Y, Pclass) %>% 
  summarize(N = n()) %>% 
  mutate(P = N /sum(N)) %>%
  select(-N) %>% 
  spread(Y, P, sep = "_")

# print out the matrix
rates %>% kable(digits = 3, format = "latex") %>% cleaner()

# create the matrix of probabilities for class combinations for Y0 and Y1
class_prob <- outer(rates$Y_1, rates$Y_0) 
rownames(class_prob) <- colnames(class_prob) <- c("Class1","Class2","Class3")

# print out the matrix
class_prob %>% kable(digits = 3, format = "latex") %>% cleaner()

# when X1 > X0 the weight is 1. When x1 = X0, we have to guess, so the weight is .5. 
# when X0 > X1 we'll get the wrong answer, so its weight is zero
case_weights <- upper.tri(class_prob) + 0 
diag(case_weights) <- .5 # classes match, so guess

# print out the matrix
(class_prob*case_weights) %>% kable(digits = 3) 

# compute the AUC
sum(class_prob*case_weights)

# compute the explanatory power in excess of guessing
prob_diff <- (class_prob - t(class_prob)) / 2

# print it
prob_diff %>% kable(digits = 3)

sum(prob_diff[upper.tri(prob_diff)]) 

No comments:

Post a Comment