The Carter Catastrophe

I’ve been reading Manifold: Time by Stephen Baxter. The book is quite good so far, and it presents a fascinating probabilistic argument that humans will go extinct in the near future. It is sometimes called the Carter Catastrophe, because Brandon Carter first proposed it in 1983.

I’ll use Bayesian arguments, so you might want to review some of my previous posts on the topic if you’re feeling shaky. One thing we didn’t talk all that much about is the idea of model selection. This is the most common thing scientists have to do. If you run an experiment, you get a bunch of data. Then you have to figure out the most likely reason for what you see.

Let’s take a basic example. We have a giant tub of golf balls, and we can’t see inside the tub. There could be 1 ball or a million. We’re told the owner accidentally dropped a red ball in at some point. All the other balls are the standard white golf balls. We decide to run an experiment where we draw a ball out, one at a time, until we reach the red one.

First ball: white. Second ball: white. Third ball: red. We stop. We’ve now generated a data set from our experiment, and we want to use Bayesian methods to give the probability of there being three total balls or seven or a million. In probability terms, we need to calculate the probability that there are x balls in the tub given that we drew the red ball on the third draw. Any time we see this language, our first thought should be Bayes’ theorem.

Define A_i to be the model of there being exactly i balls in the tub. I’ll use “3” inside of P( ) to be the event of drawing the red ball on the third try. We have to make a finiteness assumption, and although this is one of the main critiques of the argument, we can examine what happens as we let the size of the bound grow. Suppose for now the tub can only hold 100 balls.

A priori, we have no idea how many balls are in there, so we’ll assume all “models” are equally likely. This means P(A_i)=1/100 for all i. By Bayes’ theorem we can calculate:

P(A_3|3) = \frac{P(3|A_3)P(A_3)}{(\sum_{i=1}^{100}P(3|A_i)P(A_i))}

\frac{(1/3)(1/100)}{(1/100)\sum_{i=3}^{100}1/i} \approx 0.09

So there’s around a 9% chance that there are only 3 balls in the tub. That bottom summation remains exactly the same when computing P(A_n | 3) for any n and equals about 3.69, and the (1/100) cancels out every time. So we can compute explicitly that for n > 3:

P(A_n|3)\approx \frac{1}{n}(0.27)

This is a decreasing function of n, and this shouldn’t be surprising at all. It says that as we guess there are more and more balls in the tub, the probability of that guess goes down. This makes sense, because it’s unreasonable to think we’d see the red one that early if there are actually 100 balls in the tub.

There’s lots of ways to play with this. What happens if our tub could hold millions but we still assume a uniform prior? It just takes all the probabilities down, but the general trend is the same: It becomes less and less reasonable to assume large amounts of total balls given that we found the red one so early.

You could also only care about this “earliness” idea and redo the computations where you ask how likely is A_n given that we found the red ball by the third try. This is actually the more typical way the problem is formulated in the Doomsday arguments. It’s more complicated, but the same idea pops out, and this should make intuitive sense.

Part of the reason these computations were somewhat involved is because we tried to get a distribution on the natural numbers. But we could have tried to compare heuristically to get a super clear answer (homework for you). What if we only had two choices “small number of total balls (say 10)” or “large number of total balls (say 10,000)”? You’d find there is around a 99% chance that the “small” hypothesis is correct.

Here’s the leap. Now assume the fact that you exist right now is random. In other words, you popped out at a random point in the existence of humans. So the totality of humans to ever exist are the white balls and you are the red ball. The same type of argument above applies, and it says that the most likely thing is that you aren’t born at some super early point in human history. In fact, it’s unreasonable from a probabilistic standpoint to think that humans will continue much longer at all given your existence.

The “small” total population of humans is far, far more likely than the “large” total population, and the interesting thing is that this remains true even if you mess with the uniform prior. You could assume it is much more likely a priori for humans to continue to make improvements and colonize space and develop vaccines giving a higher prior for the species existing far into the future. But unfortunately the Bayesian argument will still pull so strongly in favor of humans ceasing to exist in the near future that one must conclude it is inevitable and will happen soon!

Anyway. I’m travelling this week, so I’m sorry if there are errors in those calculations. I was in a hurry and never double checked them. The crux of the argument should still make sense even if you don’t get my exact numbers. There’s also a lot of interesting and convincing rebuttals, but I don’t have time to get into them now (including the fact that unlikely hypotheses turn out to be true all the time).

The Infinite Cycle of Gladwell’s David and Goliath

I recently finished reading Malcolm Gladwell’s David and Goliath: Underdogs, Misfits, and the Art of Battling Giants. The book is like most Gladwell books. It has a central thesis, and then interweaves studies and anecdotes to make the case. In this one, the thesis is fairly obvious: sometimes things we think of as disadvantages have hidden advantages and sometimes things we think of as advantages have hidden disadvantages.

The opening story makes the case from the Biblical story of David and Goliath. Read it for more details, but roughly he says that Goliath’s giant strength was a hidden disadvantage because it made him slow. David’s shepherding was a hidden advantage because it made him good with a sling. It looks like the underdog won that fight, but it was really Goliath who was at a disadvantage the whole time.

The main case I want to focus on is the chapter on education, since that is something I’ve talked a lot about here. The case he makes is both interesting and poses what I see as a big problem for the thesis. There is an infinite cycle of hidden advantages/disadvantages that makes it hard to tell if the apparent (dis)advantages are anything but a wash.

Gladwell tells the story of a girl who loves science. She does so well in school and is so motivated that she gets accepted to Brown University. Everyone thinks of an Ivy League education as being full of advantages. It’s hard to think of any way in which there would be a hidden disadvantage that wouldn’t be present in someplace like Small State College (sorry, I don’t remember what her actual “safety school” was).

It turns out that she ended up feeling like a complete inadequate failure despite being reasonably good. The people around her were so amazing that she got impostor syndrome and quit science. If she had gone to Small State College, she would have felt amazing, gotten a 4.0, and become a scientist like she wanted.

It turns out we have quite a bit of data on this subject, and this is a general trend. Gladwell then goes on to make just about the most compelling case against affirmative action I’ve ever heard. He points out that letting a minority into a college that they otherwise wouldn’t have gotten into is not an advantage. It’s a disadvantage. Instead of excelling at a smaller school and getting the degree they want, they’ll end up demoralized and quit.

At this point, I want to reiterate that this has nothing to do with actual ability. It is entirely a perception thing. Gladwell is not claiming the student can’t handle the work or some nonsense. The student might even end up an A student. But even the A students at these top schools quit STEM majors because they perceive themselves to be not good enough.

Gladwell implies that this hidden disadvantage is bad enough that the girl at Brown should have gone to Small State College. But if we take Gladwell’s thesis to heart, there’s an obvious hidden advantage within the hidden disadvantage. Girl at Brown was learning valuable lessons by coping with (perceived) failure that she wouldn’t have learned at Small State College.

It seems kind of insane to shelter yourself like this. Becoming good at something always means failing along the way. If girl at Brown had been a sheltered snowflake at Small State College and graduated with her 4.0 never being challenged, that seems like a hidden disadvantage within the hidden advantage of going to the “bad” school. The better plan is to go to the good school, feel like you suck at everything, and then have counselors to help students get over their perceived inadequacies.

As a thought experiment, would you rather have a surgeon who was a B student at the top med school in the country, constantly understanding their limitations, constantly challenged to get better, or the A student at nowhere college who was never challenged and now has an inflated sense of how good they are? The answer is really easy.

This gets us to the main issue I have with the thesis of the book. If every advantage has a hidden disadvantage and vice-versa, this creates an infinite cycle. We may as well throw up our hands and say the interactions of advantages and disadvantages is too complicated to ever tell if anyone is at a true (dis)advantage. I don’t think this is a fatal flaw for Gladwell’s thesis, but I do wish it had been addressed.

On Switching to Colemak

There’s this thing many people will probably go their whole lives and never know about. A ton of alternative keyboard layouts exist other than the default “QWERTY” (named for the letters along the top row of the keyboard). There is a subculture obsessed with this.

The two most common ones are Dvorak and Colemak. Last Saturday I started learning where the letters on Colemak are located. By the end of Sunday, I had them memorized. This meant I could type very slowly (3-5 wpm) with near perfect accuracy.

It didn’t take long to learn at all. Now, a few days later, I no longer have to think about where the letters are, but it will probably be another week or so before I get back to full speed.

Let’s talk about the burning question in everyone’s mind: why would anyone subject themselves to such an experience? I type a lot. For the past year or so I’ve experienced some mild pain in my wrists. I’ve never had it diagnosed to know if it is repetitive strain injury, but my guess is it’s a bad sign if you experience any pain, no matter how small.

I tried to alleviate some stress by tilting my keyboard and giving my wrists something to rest on:

imag0482

[Yes, that’s Aristotle’s Poetics under the front of the keyboard.]

This helped a little, but the more I looked it up, the more I realized there was a fundamental issue with the keyboard layout that could be part of the problem. Most people probably think the layout has a purpose because of how strange it is. But we’ve outgrown that purpose.

The history of this is long and somewhat interesting, but it basically boils down to making sure hands alternate and common digraphs (two-letter combinations) have large distances separating them, so that when typing quickly on a mechanical typewriter it will be least likely to jam.

If one were to design a keyboard to minimize injury, one would put the most common letters on the home row, minimize long stretches, and make sure common digraphs use different but nearby fingers. This is almost exactly the philosophy of the Colemak layout.

The Colemak layout allows you to type around 34 times the number of words on the home row than QWERTY. It’s sort of insane that “j” is on the home row and “e” and “i” are not for QWERTY. Colemak also distributes workload more evenly. It favors the right hand slightly more at 6%, unlike the massive favoring of the right hand for QWERTY at 15%. You can go look up the stats if you want to know more. I won’t bore you by listing them here.

You will definitely lose a lot of work time while making the change due to slow typing, but the layout is provably more efficient. So in the long run you’ll end up more than compensated for these short-term losses.

I’d like to end by reflecting on what a surreal experience this has been. I think I first started learning to type around the age of eight. I’m now thirty. That’s twenty-two years of constant ingraining of certain actions that had to be undone. Typing has to be subconscious to be effective. We don’t even think about letters or spelling when doing it. Most words are just patterns that roll off the fingers.

This is made explicitly obvious when I get going at a reasonable speed. I can type in Colemak without confusion letter-by-letter, but I still slip up when my speed hits that critical point where I think whole words at a time. At that point, a few words of nonsense happen before I slide back into correct words. It’s very strange, because I don’t even notice it until I look back and see that it happened.

I’ve never become fluent in another language, but I imagine a similar thing must happen when one is right on the edge of being able to think in the new language. You can speak fluently, but occasionally the subconscious brain takes over for a word, even if you know the word.

If you’re at all interested, I’d recommend checking into it. I already feel a huge difference in comfort level.

Confounding Variables and Apparent Bias

I was going to call this post something inflammatory like #CylonLivesMatter but decided against it. Today will be a thought experiment to clarify some confusion over whether apparent bias is real bias based on aggregate data. I’ll unpack all that with a very simple example.

Let’s suppose we have a region, say a county, and we are trying to tell if car accidents disproportionately affect cylons due to bias. If you’re unfamiliar with this term, it comes from Battlestar Galactica. They were the “bad guys,” but they had absolutely no distinguishing features. From looking at them, there was no way to tell if your best friend was one or not. I want to use this for the thought experiment so that we can be absolutely certain there is no bias based on appearance.

The county we get our data from has roughly two main areas: Location 1 and Location 2. Location 1 has 5 cylons and 95 humans. Location 2 has 20 cylons and 80 humans. This means the county is 12.5% cylon and 87.5% human.

Let’s assume that there is no behavioral reason among the people of Location 1 to have safer driving habits. Let’s assume it is merely an environmental thing, say the roads are naturally larger and speed limits lower or something. They only average 1 car accident per month. Location 2, on the other hand, has poorly designed roads and bad visibility in areas, so they have 10 car accidents per month.

At the end of the year, if there is absolutely no bias at all, we would expect to see 12 car accidents uniformly distributed among the population of Location 1 and 120 car accidents uniformly distributed among the population of Location 2. This means Location 1 had 1 cylon in an accident and 11 humans, and Location 2 had 24 cylons and 96 humans in accidents.

We work for the county, and we take the full statistics: 25 cylon accidents and 107 human accidents. That means 19% of car accidents involve cylons, even though their population in the county is only 12.5%. As an investigator into this matter, we now try to conclude that since there is a disproportionate number of cylons in car accidents with respect to their baseline population, there must be some bias or speciesism present causing this.

Now I think everyone knows where this is going. It is clear from the example that combining together all the numbers from across the county, and then saying that the disproportionately high number of cylon car accidents had to be indicative of some underlying, institutional problem, was the incorrect thing to do. But this is the standard rhetoric of #blacklivesmatter. We hear that blacks make up roughly 13% of the population but are 25% of those killed by cops. Therefore, that basic disparity is indicative of racist motives by the cops, or at least is an institutional bias that needs to be fixed.

Recently, a more nuanced study has been making the news rounds that claims there isn’t a bias in who cops kill. How can this be? Well, what happened in our example case to cause the misleading information? A disproportionate number of cylons lived in environmental conditions that caused the car accidents. It wasn’t anyone’s fault. There wasn’t bias or speciesism at work. The lack of nuance in analyzing the statistics caused apparent bias that wasn’t there.

The study by Fryer does this. It builds a model that takes into account one uncontroversial environmental factor: we expect more accidental, unnecessary shootings by cops in more dangerous locations. In other words, we expect that, regardless of race, cops will shoot out of fear for their lives in locations where higher chances of violent crimes occur.

As with any study, there is always pushback. Mathbabe had a guest post pointing to some potential problems with sampling. I’m not trying to make any sort of statement with this post. I’ve talked about statistics a lot on the blog, and I merely wanted to show how such a study is possible with a less charged example. I know a lot of the initial reaction to the study was: But 13% vs 25%!!! Of course it’s racism!!! This idiot just has an agenda, and he’s manipulating data for political purposes!!!

Actually, when we only look at aggregate statistics across the entire country, we can accidentally pick up apparent bias where none exists, as in the example. The study just tries to tease these confounding factors out. Whether it did a good job is the subject of another post.

Draw Luck in Card Games, Part 2

A few weeks ago I talked about draw luck in card games. I thought I’d go a little further today with the actual math behind some core concepts when you play a card game where you build your own deck to use. The same idea works for computing probabilities in poker, so you don’t need to get too hung up on the particulars here.

I’m going to use Magic: The Gathering (MTG) as an example. Here are the relevant idea axioms we will use:

1. Your deck will consist of 60 cards.
2. You start by drawing 7 cards.
3. Each turn you draw 1 card.
4. Each card has a “cost” to play it (called mana).
5. Optimal strategy is to play a cost 1 card on turn 1, a cost 2 card on turn 2, and so on. This is called “playing on curve.”

You don’t have to know anything about MTG now that you have these axioms (in fact, writing them this way allows you to convert everything to Hearthstone, or your card game of choice). Of course, every single one of those axioms can be affected by play, so this is a vast oversimplification. But it gives a good reference point if you’ve never seen anything like this type of analysis before. Let’s build up the theory little by little.

First, what is the probability of being able to play a 1-cost card on turn 1 if you put, say, 10 of these in your deck? We’ll simplify axiom 2 to get started. Suppose you only draw one card to start. Basically, by definition of probability, you have a 10/60, or 16.67% chance of drawing it. Now if you draw 2 cards, it already gets a little bit trickier. Exercise: Try to work it out to see why (hint: the first card could be 1-cost OR the second OR both).

Let’s reframe the question. What’s the probability of NOT being able to play a card turn 1 if you draw 2 cards? You would have to draw a non-1-cost AND another non-1-cost. The first card you pick up has a 50/60 chance of this happening. Now the deck only has 59 cards left, and 49 of those are non-1-cost. So the probability of not being able to play turn 1 is {\frac{50}{60}\cdot\frac{49}{59}}, or about a 69% chance.

To convert this back, we get that the probability of being able to play the 1-cost card on turn 1 (if start with 2 cards) is {\displaystyle 1- \frac{50\cdot 49}{60\cdot 59}}, or about a 31% chance.

Axiom 2 says that in the actual game we start by drawing 7 cards. The pattern above continues in this way, so if we put {k} 1-cost cards in our deck, the probability of being able to play one of these on turn 1 is:

{\displaystyle 1 - \frac{(60-k)\cdot (60-k-1)\cdots (60-k-7)}{60\cdot 59\cdots (60-7)} = 1 - \frac{{60-k \choose 7}}{{60 \choose 7}}}.

To calculate the probability of hitting a 2-cost card on turn 2, we just change the 7 to an 8, since we’ll be getting 8 cards by axiom 3. The {k} becomes however many 2-cost cards we have.

Here’s a nice little question: Is it possible to make a deck where we have a greater than 50% chance of playing on curve every turn for the first 6 turns? We just compute the {k} above that makes each probability greater than {0.5}. This requires putting the following amount of cards in your deck:

6 1-cost
5 2-cost
5 3-cost
4 4-cost
4 5-cost
3 6-cost

Even assuming you put 24 lands in your deck, this still gives you tons of extra cards. Let’s push this a little further. Can you make a deck that has a better than 70% chance of playing on curve every turn? Yes!

9 1-cost
8 2-cost
7 3-cost
7 4-cost
6 5-cost
6 6-cost

Warning: This mana curve would never be used by any sort of competetive deck. This is a thought experiment with tons of simplifying assumptions. The curve for your deck is going to depend on a huge number of things. Most decks will probably value playing on curve in the 2,3,4 range way more than other turns. If you have an aggressive deck, you might value the early game. If you play a control deck, you might value the later game.

Also, the longer the game goes, the less cards you probably need in the high cost range to get those probabilities up, because there will be ways to hunt through your deck to increase the chance of finding them. Even more, all of these estimates are conservative, because MTG allows you to mulligan a bad starting hand. This means many worst-case scenarios get thrown out, giving you an even better chance at playing on curve.

This brings us back to the point being made in the previous post. Sometimes what feels like “bad luck” could be poor deck construction. This is an aspect you have full control over, and if you keep feeling like you aren’t able to play a card, you might want to work these probabilities out to make a conscious choice about how likely you are to draw certain cards at certain points of the game.

Once you know the probabilities, you can make more informed strategic decisions. This is exactly how professional poker is played.

Draw Luck in Card Games

Every year, around this time, I like to do a post on some aspect of game design in honor of the 7DRL Challenge. Let’s talk about something I hate: card games (though I sometimes become obsessed with, and love, well-made ones). For a game to be competitive, luck must be minimized or controlled in some way.

My family is obsessed with Canasta. I don’t get the appeal at all. This is a game that can take 1-2 hours to play and amounts to taking a random hand of cards and sorting them into like piles.

I’ve seen people say there is “strategy” on various forums. I’ll agree in a limited sense. There is almost always just one correct play, and if you’ve played a few times, that play will be immediately obvious to you. This means that everyone playing the game will play the right moves. This isn’t usually what is meant by “strategy.” By definition, the game is completely decided by the cards you draw.

This is pure tedium. Why would anyone want to sit down, flip a coin but not look at it, then perform a sorting task over and over for an hour or more, stop, look at the result of the coin flip and then determine that whoever won the coin flip won the “game.” This analogy is almost exactly the game of Canasta. There are similar (but less obnoxious) bureaucratic jobs that people are paid to do, and those people hate their job.

Not to belabor this point, but imagine you are told to put a bunch of files into alphabetical order, and each time you finish, someone came into the room and threw the files into the air. You then had to pick them up and sort them again. Why would you take this task upon yourself as a leisure activity?

I’ve asked my family this before, and the answer is always something like: it gives us something to do together or it is bonding time or similar answers. But if that’s the case, why not sit around a table and talk rather than putting this tedious distraction into it? If the point is to have fun playing a game, why not play a game that is actually fun?

This is an extreme example, but I’d say that most card games actually fall into this pure coin flip area. We get so distracted by playing the right moves and the fact that it is called a “game” that we sometimes forget the winner of the activity is nothing more than a purely random luck of the draw.

Even games like Pitch or Euchre or other trick taking games, where the right plays take a bit more effort to come up with, are the same. It’s a difficult truth to swallow, but the depth of these games is so shallow that a few hours of playing and you’ll be making the correct moves, without much thought, every single hand. Once every player makes the right plays, it only amounts to luck.

It’s actually really difficult to design a game with a standard deck of cards that gets around this problem. I’ve heard Bridge has depth (I know nothing of the game, but I take people’s word on this considering there is a professional scene). Poker has depth.

How does Poker get around draw luck? I’d say there are two answers. The first is that we don’t consider any individual hand a “game” of Poker. Obviously, the worst Poker player in the world could be dealt a straight flush and win the hand against the best Poker player in the world. Skill in Poker comes into play over the long run. One unit of Poker should be something like a whole tournament, where enough games are played to overcome the draw luck.

Now that we aren’t referring to a single hand, the ability to fold with minimal consequences also mitigates draw luck. This means that if you get unlucky with your initial cards, you can just choose to not play that hand. There are types of Poker that straight up let you replace bad cards (we’ll get to replacing in a moment). All of these things mitigate the luck enough that it makes sense to talk about skill.

Another card game with a professional scene is Magic: The Gathering (MTG). Tournament types vary quite a bit, but one way to mitigate draw luck is again to consider a whole tournament as a unit rather than an individual game. Or you could always play best of five or something.

But one of the most interesting aspects is the deck itself. Unlike traditional playing cards, you get to make the deck you play with. This means that over the course of many games, you can only blame yourself for bad drawing. Did you only draw lands on your first turn for five matches in a row? Then maybe you have too many land cards. That’s your fault. Did you draw no land many times in a row? Also, your own fault again. Composing a deck that takes all these probabilities into account is part of the skill of the game (usually called the “curve” of the deck).

Here’s an interesting question: is there a way to mitigate draw luck without having to play a ton of games? Most people want to play something short and not have to travel for a few days to play in a tournament to test their skill.

In real life, replacing cards is obnoxious to implement, but I think it is a fascinating and underutilized rule. The replacement idea allows you to tone down draw luck even at the level of a single game. If your card game exists online only, it is easy to do, and some recent games actually utilize this like Duelyst.

Why does it work? Well, if you have a bad draw, you can just replace one or all of your cards (depending on how the rule is worded). Not only does this create strategic depth through planning ahead for which cards will be useful, it almost completely eliminates the luck of the draw.

I really want to see someone design a card game with a standard deck of cards that makes this idea work. The one downside is that the only way I can see a “replace” feature working is if you shuffle after each replacement. This is pretty annoying, but I don’t see a way around it. You can’t just stick the card you replace into the middle of the deck and pretend like that placement is random. Everyone will know that it isn’t going to be drawn in the next few turns and can play around that.

Anyway. That’s just something I’ve been thinking about since roguelikes have tons of randomness in them, and the randomness of card games have always bothered me.

 

Best Books I Read in 2015

I read over 60 books this year. Although I averaged less than one physical book a week, I also trained for a half-marathon and listened to a lot of these on tape while training. They were divided pretty evenly into three categories: genre fiction (sci-fi, fantasy, romance, mystery, etc), literary fiction, and nonfiction. This post is not to be confused with Best Books of 2015. Instead of doing a list, I’ll give each of the best books an award that indicates what made it stick out to me.

Best Overall: Hyperion by Dan Simmons.

This book cannot easily be described. It pulls together several sci-fi elements that made me skeptical at first. Anything that deals with time manipulation, particularly time moving backwards, usually makes me groan. This cleverly makes it work.

The mystery is brought up early, and the narration is done through a sequence of stories. Each story hints at different pieces, but are wildly different in tone, style, time frame, and reference point. Each story is excellent in its own right. Together they form a beautiful non-traditional narrative.

Simmons is not only a master at suspense and mystery, but proves he can create a timeless work of art that still feels fresh and original 25 years later.

Most Surprising: The Portrait of a Lady by Henry James.

What a truly ahead of its time book. I hate most of the traditional “marriage plot” novels like Pride and Prejudice, Wuthering Heights, and so on. Even though this looks like such a novel on the surface, it goes deep into issues that plague us still.

Some of the basic questions explored include but are not limited to: Is marriage a partnership of equals? What is the purpose of marriage? Do you lose some autonomy when you choose to get married? What does it mean to live a meaningful life? How should one balance work, a career, and leisure? Is one ever truly free in one’s actions? Is clothing an expression of the self? Does being a rebel subject you to being manipulated more strongly than someone that appears to go with society’s expectations? How does money affect relationships? How does one balance the life of the mind with the living of life?

The writing is also fantastic. It is dense and mature but not impenetrably so. The plot moves along through dialogue, and is not nearly as wordy and dull as many would have you believe (unless the above questions don’t interest you). I find Austen far more difficult to slog through than this.

Anyway, The Portrait of a Lady is an excellent examination of life’s toughest questions that seems even more relevant today than back when it was written.

Most Thought-Provoking: Outliers: The Story of Success by Malcolm Gladwell.

I had never read anything by Gladwell despite hearing his name come up all the time. This book will make you think hard about everything you thought you knew about how to be successful. The stories are interesting and provide counterintuitive examples. I have to wonder if this book is an outlier of Gladwell’s work, because I then picked up The Tipping Point and found every aspect of it subpar.

Best Characters: Cat’s Eye by Margaret Atwood.

I blogged a full review of it here.

Should Roguelikes be Winnable?

A topic that I’ve been thinking about recently has to do with balancing roguelikes. If you haven’t heard the term balance before, it basically refers to making a game fair through adjusting values: enemy health, enemy strength, items you find, your health, your strength, and so on.

For a normal RPG, you balance a game so that a skilled player can win and so nothing feels unfair. An example of something an RPG fan might find unfair is an “out of depth” enemy that instantly and unavoidably kills you (this happens in many roguelikes).

Many developers and players think this is bad game design because the player couldn’t do anything about it. Why bother getting good at a game if you will just lose to unpredictable circumstances? The game cheated you somehow, and many players quit various roguelikes before getting better for exactly this reason.

This post isn’t so much on actual balance as it is on two distinct philosophies on the winnability of a roguelike. This is a design choice that must be thought about carefully in roguelike design, and it doesn’t even come up for other types of games.

The question: Should a skilled player be able to win?

Most modern game designers would laugh at this question. Their games are designed so that you don’t even need skill to win. Winning is the default position. Your hand will be held through the process. Checkpoints are made every step of the way so you can try something again if you mess it up.

This might be surprising to people not immersed in the genre, but many classic roguelike games have a steep enough skill hurdle that probably less than 10% who ever play will get a win (maybe even as low as 1%). Sometimes it can take years of playing a game to get good enough at it to win. But the game is balanced such that a really skilled player can win almost every time.

Think about that for a second. This is quite a feat. Here’s an analogy which isn’t perfect: think about running a 5 minute mile. Almost no runner (even ones that train very, very hard) achieves this. But once they do, they can reproduce it many times. This is what makes roguelikes great. The focus is on player skill and progression not on character progression. You get a sense of real accomplishment.

After I wrote this post, I did a search for the topic and found it discussed at the Brogue forums. It seems there isn’t an easy way to even define “winnable.” I’ll give you my definition in a bit, but I want to dispel the obvious one as not being a good one.

We already have to distinguish between the game being winnable and the winnability of a given seed (industry term for a particular playthrough). This is only weird for roguelikes, because the game is different every time you play.

One might try to define a game as winnable if approximately 100% of the seeds can be won with “perfect play.” But using perfect play is problematic in a roguelike because of the randomness. Perfect play means you play in a way that perfectly maximizes your chance of winning.

It isn’t hard to think of situations in which sub-optimal play will randomly luck into a win and optimal play loses the seed (e.g. you need magic reflection, so you check Sokoban, but encounter an enemy with a wand of death that kills you, but the unskilled player doesn’t check Sokoban and goes on to win).

This is kind of funny, because now we have a problem with defining winnable even for a seed. Should it mean: someone somewhere won the seed? This, too, seems problematic. I’ll try to explain why from the commentary at the Brogue forum discussion. One person claimed that at least 80% of Brogue seeds are winnable based on the fact that people got wins on around 80 of the last 100 weekend challenge competitions (not the same person).

Let’s digress to make the problem with the above analysis clear. Suppose we make a game. Flip a coin. If it is heads you win and tails you lose. Under the perfect play definition, the game is not winnable. In other words, perfect play does not guarantee a win. Under the definition that some person somewhere was able to win, it is winnable.

Here’s where things get interesting. If we think about what percentage of seeds can be won, we better find out that the answer is 50%, because this is our expected percentage of games a player that plays perfectly would win. But in the above Brogue analysis, the commenter takes a pool of players and asks if any of them has won. This should greatly inflate the win percentage, because it is like taking 5 coins and flipping them all at the same time and seeing if any were wins.

To get around this subtlety, I’ll call a game winnable if a single skilled player can get a win streak of say 10 or so. A good example of this is NetHack. The vast majority of people who play will never get a win ever. But Adeon has a win streak of 29, and many people have streaks of 10. This proves that it is a game that can be won basically every time (and many consider it so easy they self-impose crazy challenges and still win).

Other famous roguelikes that have this same philosophy are Tales of Maj’Eyal (on normal/adventure at least) or from the “roguelite” genre The Binding of Isaac (where people have 150+ win streaks).

At this point you’re probably thinking, what other philosophy could there be? No one could possibly want to play a game for which you work really hard for 1,000 hours learning to make all the best moves, and yet the design will still have you lose to random impossible scenarios. It wouldn’t be fun. It would be pure frustration.

But people do this all the time in other types of games. The best example I can think of is poker. It takes a huge number of hours of training to become good enough to make roughly the best plays. You can be the best in the world and still lose due to the inherent randomness. You can only see how good someone is through long-term averages.

One way to think of this philosophy is: losing is fun, winning is more fun, winning every time is too easy and boring. Traditional roguelikes are fun, because you get in seemingly impossible situations but with enough skill you can think your way out. You can have a lot of confidence that you will basically never be randomly put in an impossible situation. Losing is your own fault, and you can get better from it.

If you take this alternate philosophy, the fun comes from the fact that you don’t know if a given situation is impossible. Maybe you just weren’t good enough. Balancing so that there are impossible situations makes it so that the top of the skill curve can still feel challenged.

I think the biggest difficulty with balancing in this manner is that a highly skilled player may never reach a 10 streak, but they should probably still be able to win something like 6 or 7 of those 10 games. This would be a very difficult balance to achieve. It is much easier to make it winnable.

Roguelikes already have a very small market. Part of what keeps people interested is that when they lose, it is their own fault. They don’t feel cheated. A game that was upfront about containing a large number of impossible seeds would probably narrow the market even more. One way to mitigate the pain would be for the game to keep track of your monthly win percent. That way you can track your progress.

I haven’t heard of this before. I’d be curious if anyone knows of any roguelikes that fit this design philosophy. The two that come to mind are Sword of the Stars: The Pit and Brogue. Both feel like you can just not find the items necessary to get a run off the ground. But I’m not very good at either, so it could be player error. There are people with about 2500 hours of play in The Pit, so I’d be curious to see if they could get a 5 streak on Normal mode (most refuse to play that difficulty since they’ve won on much harder).

The 77 Cent Wage Gap Fallacy

I almost posted about this last month when “Equal Pay Day” happened. Instead, I sat back on the lookout for a good explanation of why the “fact” that “women only make 77 cents for every dollar a man makes” is meaningless. There were a ton of excellent take downs by pointing out all sorts of variables that weren’t controlled for. This is fine, but the reason the number is meaningless is so much more obvious.

Now, this blog talks about math and statistics a lot, so I felt somewhat obligated to point this out. Unfortunately, this topic is politically charged, and I’ve heard some very smart, well-intentioned people repeat this nonsense who should know better. This means bias is at work.

Let’s be clear before I start. I’m not saying there is no pay gap or no discrimination. This post is only about the most prominent figure that gets thrown around: 77 cents for every $1 and why it doesn’t mean what people want it to mean. This number is everywhere and still pops up in viral videos monthly (sometimes as “78” because they presume the gap has decreased?):

I include this video to be very clear that I am not misrepresenting the people who cite this number. They really propagate the idea that the number means a woman with the same experience and same job will tend to make 77% of what a man makes.

I did some digging and found the number comes from this outdated study. If you actually read it, you’ll find something shocking. This number refers to the median salary of a full-time, year round woman versus the median salary of a full-time, year round man. You read that right: median across everything!!

At this point, my guess is that all my readers immediately see the problem. In case someone stumbles on this who doesn’t, let’s do a little experiment where we control for everything so we know beyond all doubt that two groups of people have the exact same pay for the same work, but a median gap appears.

Company A is perfectly egalitarian. Every single employee gets $20 an hour, including the highest ranking people. This company also believes in uniforms, but gives the employees some freedom. They can choose blue or green. The company is a small start-up, so there are only 10 people: 8 choose blue and 2 choose green.

Company B likes the model of A, but can’t afford to pay as much. They pay every employee $15 an hour. In company B it turns out that 8 choose green and 2 choose blue.

It should be painfully obvious that there is no wage gap between blue and green uniformed people in any meaningful sense, because they are paid exactly the same as their coworkers with the same job. Pay is equal in the sense that everyone who argues for pay equality should want.

But, of course, the median blue uniform worker makes $20/hour whereas the green uniform worker only makes $15/hour. There is a uniform wage gap!

Here’s some of the important factors to note from this example. It cannot be from discriminatory hiring practices, because the uniform was chosen after being hired. It cannot be that green uniform people are picking lower paying jobs, because they picked the uniform after picking the job. It cannot be from green uniforms wanting to give up their careers to go have a family, because we’ll assume for the example that all the workers are single.

I’ll reiterate, it can’t be from anything, because no pay gap exists in the example! But it gets worse. Now suppose that both companies are headed by a person who likes green and gives a $1/hour raise to all green employees. This means both companies have discriminatory practices which favor green uniforms, but the pay gap would tell us that green are discriminated against!

This point can’t be stated enough. It is possible (though obviously not true based on other, narrower studies) that every company in the U.S. pays women more for equal work, yet we could still see the so-called “77 cent gender wage gap” calculated from medians. If you don’t believe this, then you haven’t understood the example I gave. Can we please stop pretending this number is meaningful?

Someone who uses a median across jobs and companies to say there is a pay gap has committed a statistical fallacy or is intentionally misleading you for political purposes. My guess is we’ll be seeing this pop up more and more as we get closer to the next election, and it will be perpetuated by both sides. It is a hard statistic to debunk in a small sound bite without sounding like you advocate unequal pay. I’ll leave you with a clip from a few weeks ago (see how many errors you spot).

Lossless Compression by Example Part 2: Huffman Coding

Last time we looked at some reasons why lossy compression is considered bad for music, and we looked at one possible quick and dirty way to compress. This time we’ll introduce the concept of lossless compression.

I’ll first point out that even if this seems like a paradoxical notion, everyone already believes it can be done. We use it all the time when we compress files on our computers by zipping them. Of course, this results in a smaller file, but no one thinks when they unzip they will have lost information. This means that there must exist ways to do lossless compression.

Today’s example is a really simple and brilliant way of doing it. It will have nothing to do with music for now, but don’t think of this as merely a toy example. Huffman coding is actually used as a step in mp3 encoding, so it relates to what we’ve been discussing.

Here’s the general idea. Suppose you want to encode (into binary) text in the most naive way possible. You assign A to 0, B to 1, C to 10, D to 11, etc. When you get to Z you’ll have 11001. This means that you have to use 5 bits for every single letter. “CAT” would be 00010 00000 10011.

To encode “CAT” we did something dumb. We only needed 3 letters, so if we had chosen ahead of time a better encoding method, maybe C = 00, A = 01, T = 10, then we could encode the text as 00 01 10. In other words, we compress our data without losing any information by a clever choice of encoding 00010 00000 10011 -> 000110.

I know your complaint already. Any sufficiently long text will contain every letter, so there is no way to do better than that original naive method. Well, you’re just not being clever enough!

Some letters will occur with more frequency than others. So if, for example, the letter “s” occurs with frequency 100 and then the next most frequent letter occurs 25 times, you will want to choose something like “01” for “s”. That way the smallest number of bits is used for the most frequent letters.

Ah, but the astute reader complains again. The reason we couldn’t do this before is because we won’t be able to tell the difference in a long string between two frequent letters: 10 01, and a single less-frequent letter: 1001. This was why we needed all 5 bits when we used the whole alphabet.

This is a uniqueness problem. What we do is not allow “01” to be a prefix on an assigned string once we’ve assigned it. This way, when we encounter 01, we stop. We know that is the letter “s” because no other letter begins “01”.

Of course, what ends up happening is that we have to go to much more than 5 bits for some letters, but the idea is that they will be used with such infrequency and the 2 and 3 bit letters used with such high frequency that it ends up saving way more space than if we stuck to 5.

Now you should be asking two questions: Is it provably smaller and is there some simple algorithm to figure out how to assign a letter to a bit sequence so that the uniqueness and smallness happens? Yes to both!

We won’t talk about proofs, since this is a series “by example.” But I think the algorithm to generate the symbol strings to encode is pretty neat.

Let’s generate the Huffman tree for “Titter Twitter Top” (just to get something with high frequency and several “repeat” frequencies).

First, make an ordered list of the letters and their frequencies: (T:7), (I:2), (E:2), (R:2), (W:1), (O:1), (P:1).

Now we will construct a binary tree with these as leaves. Start with the bottom 2 as leaves and connect them to a parent with a placeholder (*) and the sum of the frequencies. Then insert this new placeholder into the correct place on the list and remove the two you used:

Now repeat the process with the bottom two on the list (if a node is on the list already, use it in the tree):

Keep repeating this process until you’ve exhausted the list and you will get the full binary tree we will use:

Now to work out how to encode each letter, write a 0 on every left edge and a 1 on every right edge. Descend from the top to the letter you want and write the digits in order. This is the encoding. So T = 1, I = 000, R = 010, E = 011, W = 0011, O = 00101, and P = 00100. Test it out for yourself. You will find there is no ambiguity because each string of digits used for a letter never appears as a prefix of another letter.

Also, note that the letter that occurs with the highest frequency is a single bit, and the bits needed gets longer only as the frequency gets less. The encoding for Titter Twitter Top with this Huffman code is 39 bits whereas the naive encoding is 80. This compresses to half the space needed and loses no information!

We won’t get into the tedious details of how computers actually store information to see that there are lots of subtleties we’ve ignored for executing this in practice (plus we have to store the conversion table as part of the data), but at least we’ve seen an example of lossless compression in theory. Also, there was nothing special about letters here. We could do this with basically any information (for example frequencies in a sound file).