# Confounding Variables and Apparent Bias

I was going to call this post something inflammatory like #CylonLivesMatter but decided against it. Today will be a thought experiment to clarify some confusion over whether apparent bias is real bias based on aggregate data. I’ll unpack all that with a very simple example.

Let’s suppose we have a region, say a county, and we are trying to tell if car accidents disproportionately affect cylons due to bias. If you’re unfamiliar with this term, it comes from Battlestar Galactica. They were the “bad guys,” but they had absolutely no distinguishing features. From looking at them, there was no way to tell if your best friend was one or not. I want to use this for the thought experiment so that we can be absolutely certain there is no bias based on appearance.

The county we get our data from has roughly two main areas: Location 1 and Location 2. Location 1 has 5 cylons and 95 humans. Location 2 has 20 cylons and 80 humans. This means the county is 12.5% cylon and 87.5% human.

Let’s assume that there is no behavioral reason among the people of Location 1 to have safer driving habits. Let’s assume it is merely an environmental thing, say the roads are naturally larger and speed limits lower or something. They only average 1 car accident per month. Location 2, on the other hand, has poorly designed roads and bad visibility in areas, so they have 10 car accidents per month.

At the end of the year, if there is absolutely no bias at all, we would expect to see 12 car accidents uniformly distributed among the population of Location 1 and 120 car accidents uniformly distributed among the population of Location 2. This means Location 1 had 1 cylon in an accident and 11 humans, and Location 2 had 24 cylons and 96 humans in accidents.

We work for the county, and we take the full statistics: 25 cylon accidents and 107 human accidents. That means 19% of car accidents involve cylons, even though their population in the county is only 12.5%. As an investigator into this matter, we now try to conclude that since there is a disproportionate number of cylons in car accidents with respect to their baseline population, there must be some bias or speciesism present causing this.

Now I think everyone knows where this is going. It is clear from the example that combining together all the numbers from across the county, and then saying that the disproportionately high number of cylon car accidents had to be indicative of some underlying, institutional problem, was the incorrect thing to do. But this is the standard rhetoric of #blacklivesmatter. We hear that blacks make up roughly 13% of the population but are 25% of those killed by cops. Therefore, that basic disparity is indicative of racist motives by the cops, or at least is an institutional bias that needs to be fixed.

Recently, a more nuanced study has been making the news rounds that claims there isn’t a bias in who cops kill. How can this be? Well, what happened in our example case to cause the misleading information? A disproportionate number of cylons lived in environmental conditions that caused the car accidents. It wasn’t anyone’s fault. There wasn’t bias or speciesism at work. The lack of nuance in analyzing the statistics caused apparent bias that wasn’t there.

The study by Fryer does this. It builds a model that takes into account one uncontroversial environmental factor: we expect more accidental, unnecessary shootings by cops in more dangerous locations. In other words, we expect that, regardless of race, cops will shoot out of fear for their lives in locations where higher chances of violent crimes occur.

As with any study, there is always pushback. Mathbabe had a guest post pointing to some potential problems with sampling. I’m not trying to make any sort of statement with this post. I’ve talked about statistics a lot on the blog, and I merely wanted to show how such a study is possible with a less charged example. I know a lot of the initial reaction to the study was: But 13% vs 25%!!! Of course it’s racism!!! This idiot just has an agenda, and he’s manipulating data for political purposes!!!

Actually, when we only look at aggregate statistics across the entire country, we can accidentally pick up apparent bias where none exists, as in the example. The study just tries to tease these confounding factors out. Whether it did a good job is the subject of another post.