Friday, May 21, 2010

Conditional Probabilities

The topic of conditional probabilities was on my wish list for the well-educated critical thinking.  I have a personal reason for this, actually.  When my daughter Epislon was being mechanically assembled in the usual manner, it was an unusually stressful time.  Part of the reason is that we decided to do an exchange visit to Shanghai during that time, and Epsilon ended up being born only a month after we got back.  That probably wasn't the smartest thing to do (that's not exactly how the obstetrics doctor said it!), but it worked out okay.  Before we left, however, I got the worst phone call of my life.  "The lab tests show that your child has tested positive for genetic abnormality X," was the gist of the message from the nurse.  Here, X is a big, life-changing abnormality.  Not a snub nose.  This was on a Friday, so we had to wait all weekend to come in and actually see the results.  I can't describe the amount of anxiety that phone call provoked.  By Monday I had my questions ready.  What's the probability that genetic abnormality X exists if the test shows negative, I asked.  In other words what's the probability of a false negative.  One in 430 (or thereabouts--I don't remember exactly) she told me.  Okay, what's the probability that the condition exists when the test shows positive?  About one in 420.  For this very small, probably statistically insignificant change in odds, we'd gone through a hellish few days and then further tests (to assure my wife).  Epsilon turned out fine, except for inheriting my sense of the absurd.  The whole episode would have been avoided if the nurse had a lick of understanding of probabilities (or just basic humanity).  The mistake was similar to students who compute (9+6)/3 on their calculators and happily write down 11 as the answer (9 + 6/3).

All of this by way of introducing a nice puzzler on the topic. The source for this is this blog post.  If I told you that I had two children, that the first was male, and ask what you thought the probability of the second being male is, we could write it in conditional probability notation like this:
Pr[2nd child is male | there are two children, and the 1st child is male]
The vertical bar is read as 'given that'.  In real life, this isn't so simple, since gender is determined by the chromosomal  contributions of the male, and we can't necessarily assume it's 50-50 (some males have only male children, for example).  Also, slightly mre males than females are born because mortality rates aren't the same.  But assume these complications away.  Assume it's 50%.

What if we made the question more general?  Find:
Pr[both children are male | there are two children, and at least one child is male]
It looks similar, but it's subtly and significantly different.  The probability is not 50% anymore.  Now it gets really strange.  Find:
Pr[both children are male | there are two children, and at least one child is male born on a Tuesday]
The probability is now different from the previous one, which seems to confound all common sense.  How can such "irrelevant" information as the day of week of birth have any impact on whether the other child is a male or not?  A full discussion of the problem and solution can be found on the post.  And there's an interesting discussion on reddit, where I found the problem.