Some fallacies hurt you in arguments, and some hurt you in real life.
Read on to find out about how the Texas Sharpshooter Fallacy works and why even medical professionals have made this mistake to the detriment of the health of millions.
The world of big data constantly bombards us with fancy graphics, the statistical fallacy that I think we are most likely to fall for is called the Texas Sharpshooter Fallacy.
What makes this fallacy so dangerous is that it is propped up by solid, correct statistics which can be hard to argue against.
Texas Sharpshooter Fallacy Simply Explained
Here’s the idea.
A person goes into the yard and shoots their rifle at random at their barn. Maybe even say the person is drunk, so the holes have no underlying pattern to them.
This person then goes to the barn and figures out a way to draw a bulls-eye after the fact that makes it look like they are a competent sharpshooter.
The fallacy is that if you look at a large enough amount of data with good enough visualization tools, you will probably start to find patterns that aren’t actually there by strategically drawing artificial boundaries.
Let’s make the example a bit more real.
Disease Example of Fallacy
Suppose you want to better understand the causes of Disease X, something just discovered, and occurs in 10% of the population naturally. You plot the data of a nearby town of 10,000 to see if you can find a pattern.
Here is the plot. I used a uniform distribution so we know any clumps have no underlying cause:
Your eye gets drawn to an oddly dense clump of cases of Disease X. You circle it and then run a statistical test to see if the number of cases is significant.
The properly run statistical test shows you the increased number of cases is significant, and with 95% certainty, you conclude it isn’t just a fluke. This is good enough to publish in a peer-reviewed journal.
So what do you do?
You start looking for causes. Of course, you’ll be able to find one.
Maybe that clump of houses has a power station nearby, or they drink from the same well water source, or whatever. When you’re looking for something in common, you’ll always be able to find something.
When this happens, you’ve committed the Texas Sharpshooter Fallacy.
Avoiding the Sharpshooter Fallacy
It might be okay to use this data exploration to look for a cause if you merely intend to turn it into a hypothesis to be tested. So, you hypothesize that it is radon in the water that caused the spike of cases in that cluster.
Now you must do real science where you do a randomized controlled study to actually test your null hypothesis.
Doing statistics on big data is risky business, because any clever person can construct correlations from a large enough data set. This has two problems:
- Those correlations may not actually be there.
- Even if they are, they’re almost surely not causally related.
Another way to think about why this is a fallacy is that when you have 95% certainty, 5 out of 100 times you will falsely find correlation where none exists.
This is just the definition of the p-value (it’s beyond the scope of this post, but you can learn more about it in my article on The Base Rate Fallacy).
So, if your data set is large enough to draw 100 different boundaries, then by random chance 5 of those will have false correlations. When you allow your eye to catch the cluster, it is your brain being good at finding patterns.
It probably rejected 100 non-clusters to find that one.
Are You Being Poisoned?
This is scary in today’s world, because lots of news articles do exactly this. They claim some crazy thing, and they use statistics people don’t understand to “prove” its legitimacy (numbers can’t lie don’t you know).
But really, it is just this fallacy at work. The media don’t want to double check it because “Cancer rate five times higher near power station” is going to get a lot of hits and interest.
Actually, cancer is particularly susceptible to this type of fallacy. Dozens of examples of cancer studies get publicity despite no actual correlation (yet alone causation!).
These are documented in George Johnson’s The Cancer Chronicles or an older The New Yorker article called “The Cancer-Cluster Myth.” I highly recommend the book if these things interest you.
So, the next time you read about one of these public health outcries, you should pay careful attention in the article to see if this fallacy has been made.
For example, the vaccination causes autism myth also originated this way.
Probably the most egregious example is The China Study, a highly-praised vegan propaganda book. It takes the largest diet study ever done (367 variables) and pulls out the correlations that support the hypothesis “meat is poison.”
What the book doesn’t tell you is that the study found over 8,000 statistically significant correlations, many contradicting the ones presented in the book.
This is why large studies of observational epidemiology always have to be treated with caution. The larger the study, the more likely you will be able to find a way to support your hypothesis.
If you don’t believe me, and you want to protect marriage in Maine, then make sure you eat less margarine this year: