An Intuitive Explanation of Eliezer Yudkowsky's Intuitive Explanation of Bayes' Theorem
Example involving a particular mammography which gives a positive/negative result for breast cancer. Since the patient may either have or not have breast cancer, this gives 4 possibilities:
- patient has breast cancer, tests positive: true positive
- patient has breast cancer, tests negative: false negative
- patient does not have breast cancer, tests positive: false positive
- patient does not have breast cancer, tests negative: true negative.
Terms
prior probability: the probability of some condition being true (e.g., the probability that a patient has breast cancer)
conditional probability: the probability of some event, in the case that some other event is known (e.g., the probability of a particular mammography result in the case that the patient is known to have breast cancer)
priors: prior probability and conditional probabilities
revised probability / posterior probability: the revised probability of some hypothesis based on some condition holding.
If the conditional probabilities are the same, then the posterior probability equals the prior probability.
p(a)
: the probability of ap(a|x)
: the probability of a, given that x is the casep(~a)
: the probability of not ap(a&x)
: the probability of both a and x.
Example involving eggs being blue or red that may or may not contain pearls.
Case of Sally Clark: A pediatrician Roy Meadow testified on odds of 2 children dying of SIDS vs. odds of a parent murdering two of her children. His miscalculation got her convicted and destroyed her life.
The difference between the conditional probabilities determines how much your odds change when you gain new information (i.e., how much you can change your prior probability when the posterior probability ends up fixed at either 0% or 100%, or vice versa)
Visualizing probabilities
Best ways to get people to think in Bayesian terms
- to-scale visualizations
- natural frequencies (x out of y, where x and y are natural numbers)
- frequencies (x out of 100, where x is real)
- percentages (x percent, where x is real)
How the quantities relate
In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.
e.g., the following sets have this many degrees of freedom:
- 1 degree of freedom:
p(a)
andp(~a)
- 1 degree of freedom:
p(x|a)
andp(~x|a)
- 2 degrees of freedom:
p(x|a)
andp(x|~a)
- 2 degrees of freedom:
p(a&x)
,p(a|x)
, andp(x)
- Because
p(a&x) = p(a|x) * p(x)
- Because
- 2 degrees of freedom:
p(a&x) + p(a&~x) = p(a)
- Because
p(a&x) + p(a&~x) = p(a)
- Because
- 3 degrees of freedom:
p(a&x)
,p(a&~x)
,p(~a&x)
, andp(~a&~x)
- Because the 4 comprise the universe of possibilities, any 3 determine the 4th.
Meanwhile, there are 16 facts that can be known from our 3 priors:
p(x) = p(a&x) + p(~a&x)
p(~x) = p(a&~x) + p(~a&~x)
p(a|x) = p(a&x) / ( p(a&x) + p(~a&x) )
p(~a|x) = p(~a&x) / ( p(a&x) + p(~a&x) )
p(a|~x) = p(a&~x) / ( p(a&~x) + p(~a&~x) )
p(~a|~x) = p(~a&~x) / ( p(a&~x) + p(~a&~x) )
p(a&x) = p(a|x) * p(x)
p(a&~x) = p(a|~x) * p(~x)
p(~a&x) = p(~a|x) * p(x)
p(~a&~x) = p(~a|x) * p(~x)
p(a) = p(a&x) + p(a&~x)
p(~a) = p(~a&x) + p(~a&~x)
p(x|a) = p(a&x) / ( p(a&x) + p(a+~x) )
p(~x|a) = p(a&~x) / ( p(a&x) + p(a+~x) )
p(x|~a) = p(~a&x) / ( p(~a&x) + p(~a+~x) )
p(~x|~a) = p(~a&~x) / ( p(~a&x) + p(~a+~x) )
Likelihood ratios
Conservation of Probability: Since p(a) + p(~a) = 1
, and p(x) = ( p(x|a) * p(a) ) + ( p(x|~a) * p(~a) )
, we know: if there is rare but strong evidence from one of the conditional probabilities, this must be balanced by common but weak evidence from the other conditional probability
Likelihood ratio: likelihood of a true positive vs. likelihood of a false positive.
In the context of the mammography example:
A mammography with a hit rate of 80% for patients with breast cancer and a false positive rate of 9.6% for healthy patients has the same likelihood ratio as a test with an 8% hit rate and a false positive rate of 0.96%. Although these two tests have the same likelihood ratio, the first test is more useful in every way – it detects disease more often, and a negative result is stronger evidence of health.
Decibels of evidence
Coinciding events have a multiplicative effect on probabilities, so it’s useful to think of odds in logarithmic terms.
Behold, Bayes' Theorem!
For hypothesis H that we want to investigate, and an observation E that is evidence about H, Bayes' Theorem tells us how to update the probability that H is true:
p(H|E) = ( p(E|H) * p(H) ) / ( ( p(E|H) * p(H) ) + ( p(E|~H) * p(~H) ) )
which is reducible to a more succinct (but less descriptive) form:
p(H|E) = ( p(E|H) * p(H) ) / p(E)
Why is this so exciting?
the Bayesian method defines the maximum amount of mileage you can get out of a given piece of evidence, in the same way that thermodynamics defines the maximum amount of work you can get out of a temperature differential.
science itself is a special case of Bayes' Theorem; experimental evidence is Bayesian evidence.