Bayes’ Theorem 1: The Idea

I was going to do one more music theory post, but it seemed way more effort than it was worth. I’ll definitely come back to this topic in the future. I really want to look at the crazy huge book The Topos of Music and try to distill out what the main idea is. So someday you can look forward to that.

We’ll move on to a topic that use to fascinate me a lot, and then I sort of forgot about it. I started reading Richard Carrier’s book on using Bayes’ theorem in the historical method, so it has come up again. I just started the book, so I might not talk about this in particular, but over the years I’ve come across some very fascinating applications of Bayes’ theorem to surprising situations.

What does it say? Well, the simplest form of it is just a formula for calculating a probability when you have some information (technically I’m referring to a conditional probability). Suppose A and B are two events such as “it is raining in Seattle” and “I am carrying an umbrella.” The negation, {-A}, would then be “it is not raining in Seattle.”

We will use the notation {P(A)} to denote the probability that {A} happens. We will use the notation {P(A|B)} to mean “the probability that {A} happens given that {B} has happened.” Now in a simple two event situation like this Bayes’ theorem says we can calculate the probability as follows:

\displaystyle P(A|B)=\frac{P(B|A)P(A)}{P(B|A)P(A)+P(B|-A)P(-A)}

There are tons of equivalent ways to express this, but this is the one we’ll find most useful for now. Before reading the example below it is important to remember that this really is a “theorem” with rigorous proof. We can have all sorts of philosophical debates about what it means to actually know the probability of an event happening with varying levels of certainty, but what can not be debated is that if you accept that we know somehow the probabilities {P(B|A)}, {P(A)}, {P(B|-A)} and {P(-A)} (of course the last one is redundant) then we can know {P(A|B)} using the formula with the same level of certainty.

I first saw this as an undergrad in a introduction to statistics and probability class. I then went on to tutor nursing majors in a similar class for several years, so maybe my proto-typical application of this is skewed by my experience. Still, it gives a really good idea of why this theorem is useful in tons and tons of everyday situations.

Let’s say a new disease has just been discovered: Hilbert’s disease (I’m pretty sure this isn’t real). Doctors develop a highly accurate way to test for the disease. It turns out (through testing a huge sample of the population) that 99% of the time you test positive for the disease you actually have it (in the language of conditional probability we could say “the probability that you test positive given that you actually have the disease is 99%) and 99% of the time that you test negative for the disease you don’t actually have it. Alternatively, false positives and false negatives only occur one percent of the time.

Now this is a newly discovered disease, so it turns out that very few people have it. Specifically only 1% of the population has it. There is also no known cause or early symptoms (I throw this in so that when I say “you” in the next sentence you are truly a random choice from the population). You decide to go get tested. Oops. You test positive. What is the Bayesian probability that you actually have the disease?

If you haven’t seen this before, then you might be tempted to say that since the test has 99% accuracy, then it must be the case that there is a 99% chance you have the disease. But this is your human intuition at work, and if there is one thing we know about the human brain it is notoriously bad at intuiting probabilities (just think of the infamous Monty Hall controversy).

Well, we can just plug all the numbers into Bayes’ theorem. If A is the event of testing positive for the disease and B is the event of actually having the disease, then we want to calculate P(B|A) the probability that you have the disease given the information of testing positive.

Bayes theorem says

\displaystyle P(B|A)=\frac{P(A|B)P(B)}{P(A|B)P(B)+P(A|-B)P(-B)}=\frac{(.99)(.01)}{(.99)(.01)+(.01)(.99)}=.5

What?! This says there is only a 50% chance that you have the disease even though the test is 99% accurate and you tested positive for it. If you find this surprising it is because you are ignoring a huge piece of information. Bayes’ theorem is accounting for the fact that we know that only one percent of the population actually has the disease. If you really are a random member of the population, then there is a huge chance you don’t have the disease. So if you test positive it is very likely that you fall into the one percent of cases that give a false positive.

This is pretty cool right? It gives you a radically different perspective on these numbers when you see these statistics like pregnancy tests are whatever percent accurate or drug tests are whatever percent accurate and so on. Anyway, that’s the gist of Bayes’ theorem. Next time we’ll see how Bayesian ideas can actually be applied to philosophy of mathematics and proof theory.


7 thoughts on “Bayes’ Theorem 1: The Idea

  1. Fascinating stuff. Mathematics wasn’t my strong subject although I protect my vanity by claiming the subject is often badly taught. I’d be interested in any comments you could make on the curious idea of synchronicity. My common sense tells me to ignore the idea but there have been so many occasions in life where coincidence appears to be a less convincing explanation than the possibility that something is going on. In other words, I’m on a roll. Of course, so many people have lost their shirts in Vegas by clinging on to this idea. One crazy idea that occurred to me is that there might be variable densities in causality. Perhaps I should stick to music.

  2. I think this would have to be examined on a case-by-case basis. I won’t say that the idea is outright impossible, but I do know that humans are notoriously bad at keeping track of frequencies of things especially when it comes to probability.

    For example, there is the Baader-Meinhof bias and as you already pointed out the Gambler’s fallacy. There is the Regression Fallacy and many others as well.

    It is kind of depressing, and it really points out the need in the sciences to be very careful that observations are done in a non-biased way and things are recorded exactly as they happen and then a proper statistical analysis is run to determine if there is actually something going on.

    In fact, maybe I should do a post on this because there is a fascinating experiment in which a human makes up a “random” sequence of numbers and a machine produces an actual random sequence. Then people are asked to guess which one is random and which one was made up by a human. Humans usually guess wrong, and this is because we often forget that truly random things will often exhibit patterns, so when the human writes down a random list they try to avoid all patterns. This then looks “more random” than the computer generated list which didn’t avoid patterns.

  3. Thanks for that and if you do that post I’ll look forward to it. The Baader-Meinhoff bias is interesting. I’ve experienced that many times.

    In music, I often use the serial development of rhythms to create a long sequence that is seemingly random but, at the sane time, organically (for lack of a better word) linked.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s