An Intuitive Explanation of Eliezer Yudkowsky's Intuitive Explanation of Bayes' Theorem

logic
math

2011-02-13

http://commonsenseatheism.com/?p=13156

Luke Muehlhauser

2010-12-18

Example involving a particular mammography which gives a positive/negative result for breast cancer. Since the patient may either have or not have breast cancer, this gives 4 possibilities:

patient has breast cancer, tests positive: true positive
patient has breast cancer, tests negative: false negative
patient does not have breast cancer, tests positive: false positive
patient does not have breast cancer, tests negative: true negative.

Terms

prior probability: the probability of some condition being true (e.g., the probability that a patient has breast cancer)

conditional probability: the probability of some event, in the case that some other event is known (e.g., the probability of a particular mammography result in the case that the patient is known to have breast cancer)

priors: prior probability and conditional probabilities

revised probability / posterior probability: the revised probability of some hypothesis based on some condition holding.

If the conditional probabilities are the same, then the posterior probability equals the prior probability.

p(a) : the probability of a
p(a|x) : the probability of a, given that x is the case
p(~a) : the probability of not a
p(a&x) : the probability of both a and x.

Example involving eggs being blue or red that may or may not contain pearls.

Case of Sally Clark: A pediatrician Roy Meadow testified on odds of 2 children dying of SIDS vs. odds of a parent murdering two of her children. His miscalculation got her convicted and destroyed her life.

The difference between the conditional probabilities determines how much your odds change when you gain new information (i.e., how much you can change your prior probability when the posterior probability ends up fixed at either 0% or 100%, or vice versa)

Visualizing probabilities

Best ways to get people to think in Bayesian terms

to-scale visualizations
natural frequencies (x out of y, where x and y are natural numbers)
frequencies (x out of 100, where x is real)
percentages (x percent, where x is real)

How the quantities relate

In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.

e.g., the following sets have this many degrees of freedom:

1 degree of freedom: p(a) and p(~a)
1 degree of freedom: p(x|a) and p(~x|a)
2 degrees of freedom: p(x|a) and p(x|~a)
2 degrees of freedom: p(a&x), p(a|x), and p(x)
- Because p(a&x) = p(a|x) * p(x)
2 degrees of freedom: p(a&x) + p(a&~x) = p(a)
- Because p(a&x) + p(a&~x) = p(a)
3 degrees of freedom: p(a&x), p(a&~x), p(~a&x), and p(~a&~x)
- Because the 4 comprise the universe of possibilities, any 3 determine the 4th.

Meanwhile, there are 16 facts that can be known from our 3 priors:

p(x) = p(a&x) + p(~a&x)
p(~x) = p(a&~x) + p(~a&~x)
p(a|x) = p(a&x) / ( p(a&x) + p(~a&x) )
p(~a|x) = p(~a&x) / ( p(a&x) + p(~a&x) )
p(a|~x) = p(a&~x) / ( p(a&~x) + p(~a&~x) )
p(~a|~x) = p(~a&~x) / ( p(a&~x) + p(~a&~x) )
p(a&x) = p(a|x) * p(x)
p(a&~x) = p(a|~x) * p(~x)
p(~a&x) = p(~a|x) * p(x)
p(~a&~x) = p(~a|x) * p(~x)
p(a) = p(a&x) + p(a&~x)
p(~a) = p(~a&x) + p(~a&~x)
p(x|a) = p(a&x) / ( p(a&x) + p(a+~x) )
p(~x|a) = p(a&~x) / ( p(a&x) + p(a+~x) )
p(x|~a) = p(~a&x) / ( p(~a&x) + p(~a+~x) )
p(~x|~a) = p(~a&~x) / ( p(~a&x) + p(~a+~x) )

Likelihood ratios

Conservation of Probability: Since p(a) + p(~a) = 1, and p(x) = ( p(x|a) * p(a) ) + ( p(x|~a) * p(~a) ), we know: if there is rare but strong evidence from one of the conditional probabilities, this must be balanced by common but weak evidence from the other conditional probability

Likelihood ratio: likelihood of a true positive vs. likelihood of a false positive.

In the context of the mammography example:

A mammography with a hit rate of 80% for patients with breast cancer and a false positive rate of 9.6% for healthy patients has the same likelihood ratio as a test with an 8% hit rate and a false positive rate of 0.96%. Although these two tests have the same likelihood ratio, the first test is more useful in every way – it detects disease more often, and a negative result is stronger evidence of health.

Decibels of evidence

Coinciding events have a multiplicative effect on probabilities, so it’s useful to think of odds in logarithmic terms.

Behold, Bayes' Theorem!

For hypothesis H that we want to investigate, and an observation E that is evidence about H, Bayes' Theorem tells us how to update the probability that H is true:

p(H|E) = ( p(E|H) * p(H) ) / ( ( p(E|H) * p(H) ) + ( p(E|~H) * p(~H) ) )

which is reducible to a more succinct (but less descriptive) form:

p(H|E) = ( p(E|H) * p(H) ) / p(E)

Why is this so exciting?

the Bayesian method defines the maximum amount of mileage you can get out of a given piece of evidence, in the same way that thermodynamics defines the maximum amount of work you can get out of a temperature differential.

science itself is a special case of Bayes' Theorem; experimental evidence is Bayesian evidence.