A Mind for Madness

Musings on art, philosophy, mathematics, and physics

Leave a comment

Lossless Compression by Example Part 2: Huffman Coding

Last time we looked at some reasons why lossy compression is considered bad for music, and we looked at one possible quick and dirty way to compress. This time we’ll introduce the concept of lossless compression.

I’ll first point out that even if this seems like a paradoxical notion, everyone already believes it can be done. We use it all the time when we compress files on our computers by zipping them. Of course, this results in a smaller file, but no one thinks when they unzip they will have lost information. This means that there must exist ways to do lossless compression.

Today’s example is a really simple and brilliant way of doing it. It will have nothing to do with music for now, but don’t think of this as merely a toy example. Huffman coding is actually used as a step in mp3 encoding, so it relates to what we’ve been discussing.

Here’s the general idea. Suppose you want to encode (into binary) text in the most naive way possible. You assign A to 0, B to 1, C to 10, D to 11, etc. When you get to Z you’ll have 11001. This means that you have to use 5 bits for every single letter. “CAT” would be 00010 00000 10011.

To encode “CAT” we did something dumb. We only needed 3 letters, so if we had chosen ahead of time a better encoding method, maybe C = 00, A = 01, T = 10, then we could encode the text as 00 01 10. In other words, we compress our data without losing any information by a clever choice of encoding 00010 00000 10011 -> 000110.

I know your complaint already. Any sufficiently long text will contain every letter, so there is no way to do better than that original naive method. Well, you’re just not being clever enough!

Some letters will occur with more frequency than others. So if, for example, the letter “s” occurs with frequency 100 and then the next most frequent letter occurs 25 times, you will want to choose something like “01” for “s”. That way the smallest number of bits is used for the most frequent letters.

Ah, but the astute reader complains again. The reason we couldn’t do this before is because we won’t be able to tell the difference in a long string between two frequent letters: 10 01, and a single less-frequent letter: 1001. This was why we needed all 5 bits when we used the whole alphabet.

This is a uniqueness problem. What we do is not allow “01” to be a prefix on an assigned string once we’ve assigned it. This way, when we encounter 01, we stop. We know that is the letter “s” because no other letter begins “01”.

Of course, what ends up happening is that we have to go to much more than 5 bits for some letters, but the idea is that they will be used with such infrequency and the 2 and 3 bit letters used with such high frequency that it ends up saving way more space than if we stuck to 5.

Now you should be asking two questions: Is it provably smaller and is there some simple algorithm to figure out how to assign a letter to a bit sequence so that the uniqueness and smallness happens? Yes to both!

We won’t talk about proofs, since this is a series “by example.” But I think the algorithm to generate the symbol strings to encode is pretty neat.

Let’s generate the Huffman tree for “Titter Twitter Top” (just to get something with high frequency and several “repeat” frequencies).

First, make an ordered list of the letters and their frequencies: (T:7), (I:2), (E:2), (R:2), (W:1), (O:1), (P:1).

Now we will construct a binary tree with these as leaves. Start with the bottom 2 as leaves and connect them to a parent with a placeholder (*) and the sum of the frequencies. Then insert this new placeholder into the correct place on the list and remove the two you used:

Now repeat the process with the bottom two on the list (if a node is on the list already, use it in the tree):

Keep repeating this process until you’ve exhausted the list and you will get the full binary tree we will use:

Now to work out how to encode each letter, write a 0 on every left edge and a 1 on every right edge. Descend from the top to the letter you want and write the digits in order. This is the encoding. So T = 1, I = 000, R = 010, E = 011, W = 0011, O = 00101, and P = 00100. Test it out for yourself. You will find there is no ambiguity because each string of digits used for a letter never appears as a prefix of another letter.

Also, note that the letter that occurs with the highest frequency is a single bit, and the bits needed gets longer only as the frequency gets less. The encoding for Titter Twitter Top with this Huffman code is 39 bits whereas the naive encoding is 80. This compresses to half the space needed and loses no information!

We won’t get into the tedious details of how computers actually store information to see that there are lots of subtleties we’ve ignored for executing this in practice (plus we have to store the conversion table as part of the data), but at least we’ve seen an example of lossless compression in theory. Also, there was nothing special about letters here. We could do this with basically any information (for example frequencies in a sound file).


Lossless Compression by Example Part 1: Lossy Methods

Since I’m into music, it often comes up there is a growing trend: music is sold digitally and as vinyl. Sometimes I’ll hear people mistakenly call the vinyl trend “retro” or “trendy” or “hip” or whatever. But if you actually ask someone why they prefer records, they’ll probably tell you the sound quality is better.

I thought I’d do a series on lossless compression and try to keep everything to general concepts or example. Let’s start with the terminology. First, media files can be large, and back in the day when computers didn’t have basically infinite space, compression was an important tool for reducing the size of a media file.

Compression is basically an algorithm to take the size of a file and makes it smaller. The most obvious method for doing this is lossy compression. This just means you lose information. The goal of such an algorithm is to only lose information that is “unimportant” and “won’t be noticed.”

A far more surprising method of compression is called lossless. At first it seems paradoxical. How can you make the file size smaller, but not lose any information? Isn’t the file size basically the information? We won’t get to this in this post. Teaser for next time!

Now lets talk about why people don’t like lossy compressed audio files. There is one quick and dirty thing you can do to immediately lose information and reduce the size of an audio file. This is dynamic range (DR) compression.

Think of a soundwave. The amplitude basically determines how loud it is. You can literally compress the wave to have a smaller amplitude without changing any other musical qualities. But this is terrible! One of the most important parts of music is the DR. A moving, soaring climax will not have the same effect if the entire build up to it is the same loudness.

This is such a controversial compression technique that many people switch to vinyl purely for DR reasons. There is a whole, searchable online database of albums to find out the DR and whether it is consider good, acceptable, or bad. Go search for your favorite albums. It is kind of fun to find out how much has been squashed out even in lossless CD format vs vinyl! (e.g. System of a Down’s Toxity is DR 11 [acceptable] on vinyl and DR 6 [truly bad] on lossless CD).

The other most common lossy compression technique for audio is a bit more involved, but it actually changes the music, so it is worth thinking about. Let’s actually make a rough algorithm for doing this (there currently exist much better and subtler forms of the following, but it amounts to the same thing).

This is a bit of a silly example, but I went to http://www.wavsource.com to get a raw wav file to work with. I grabbed one of the first ones, an audio sample from the movie 2001: A Space Odyssey. Here is the data visualization of the sound waves and the actual clip:



One thing we can do is the Fast Fourier Transform. This will take these sound waves and get rid of the time component. Normally you’ll want to make a “moving window,” so you keep track of some time. For example, we can see that from 0.5 sec to 1.5 sec is one “packet.” We should probably transform that first, then move to the next.

The FFT leaves us just with the frequencies that occur and how loud they are. I did this with python’s scypy.fftpack:

import matplotlib.pyplot as plt
import scipy.fftpack as sfft
import numpy as np
from scipy.io import wavfile

fs, data = wavfile.read('daisy.wav')
b=[(ele/2**8.)*2-1 for ele in data]
c = sfft.fft(b)
d = len(c)/2

compressed = []
for ele in c:
	if abs(ele) > 50:

compressed = np.asarray(compressed)

e = sfft.ifft(compressed)


Ignore the scales which were changed just to make everything more visible but not normalized. The most crude thing we could do is set a cutoff and just remove all frequencies that we assume will be inaudible anyway:


If we do this too much, we are going to destroy how natural the sound is. As I’ve explained before, all sounds occurring naturally have tons of subtle overtones. You often can’t explicitly hear these, so they will occur below the cutoff threshold. This will bring us towards a “pure” tone which will sound more synthetic or computer generated. This is probably why no one actually compresses this way. This example was just to give an idea of one way it could be done (to finish it off you can now just inverse FFT and write to wav).

A slightly better compression technique would be to take short time intervals and multiply the peak frequency by a bump function. This will shrink all the extraneous frequencies without completely removing the robustness of the sound. This is how some lossy compression is actually done. There are other more fun things with wavelets which would take several posts to describe and the goal is to get to lossless compression.

I hope that helps to see what lossy compression is, and that it can cause some serious harm when done without care. With care, you will still lose enough sound quality that many music aficionados avoid mp3 and digital downloads completely in favor of vinyl.

Next time we’ll tackle the seemingly paradoxical concept of lossless compression.

Leave a comment

Thoughts on ToME’s Adventure Mode

I’ve done several posts explaining why I think roguelikes are a great genre of game to play. It is probable that the most important feature of a roguelike for me is permadeath. For example, see this post for reasons why.

If you aren’t up on roguelikes, there are only a handful of games that standout as the “giants” that most people have heard of. One of these is called ToME (aka ToME 4; aka Tales of Maj’Eyal). There are more interesting features in ToME than could fit in a single blog post. Someday I may come back and post about these.

I’ll fully admit that my views on permadeath have evolved a bit, possibly due to my age. I think the older someone gets, the more likely they are to view losing all progress in a game as too punishing to be worth it. You tend to grow out liking the more extreme and hardcore elements of certain games.

Anyway, I stand by my original post. I’ll recall some key points. Permadeath is a great game mechanic, because it forces you to contemplate the consequences of your actions. It gives weight to the game. It makes you become better at it in order to win. You can’t just “save scum” until you get through a particularly difficult section.

Before you take this the wrong way, ToME is possibly the most well-balanced roguelike I’ve played. Every death feels like my own fault and not me getting screwed by the randomness. But when a game involves as much randomness as any of the great classic roguelikes, you are bound to get the occassional unavoidable death that is not your fault.

This becomes more and more pronounced as a game’s design is less thoroughly vetted for imbalances. Part of ToME’s excellent balance comes from people who have put in thousands of hours of play who can spot these things. The developer takes their opinions seriously which makes the game more fair.

ToME has three modes of play: roguelike, adventure, and explore. Roguelike has traditional permadeath. Once your die, you must start the entire game over. Adventure gives you five lives. Once those five are gone, you start the game over. Explore is deathless.

The main point I’ve been contemplating is whether Adventure mode ruins the permadeath experience of a roguelike. This will be a highly controversial statement, but I think it keeps all the original greatness of the mechanic and eliminates the negative aspects.

If you only have five lives, then each one is still precious. You’ll play the early game as if you only have one life, because if you waste one early, you will probably restart anyway. This makes the beginning just as intense as if you only had one life.

Let’s put it this way. If you don’t play as if you only have one life, then you will probably quickly lose them all anyway and revert to roguelike mode. So nothing really changes. In the middle and late game, if you are really good and don’t lose any lives, then it didn’t matter anyway. If you’re like me, you’ll probably be back to one life by that point and get all the benefits of roguelike mode.

It seems to me that Adventure mode merely serves to alleviate the annoyance and waste of time that comes from getting killed in one hit by some out of depth enemy that randomly appeared due to no fault of your own. It keeps all the intensity and pressure of permadeath, but gives some much needed buffer for the extreme amount of randomness of roguelikes.

I’d be quite happy to see some other roguelikes incorporate this as an option, but I’d also be totally understanding if they saw it as a compromise on the quality of the play experience.

Leave a comment

Texas Sharpshooter Fallacy

In the world of big data that constantly bombards us with fancy graphics, the statistical fallacy that I think we are most likely to fall for is called the Texas Sharpshooter Fallacy. What makes this fallacy so dangerous is that it is propped up by solid, correct statistics which can be hard to argue against.

Here’s the idea. A person goes into the yard and shoots their rifle at random at their barn. Maybe even say the person is drunk, so the holes have no underlying pattern to them. The person then goes to the barn and figures out a way to draw a bullseye after the fact that makes it look like they are a competent sharpshooter.

The fallacy is that if you look at a large enough amount of data with good enough visualization tools, you will probably start to find patterns that aren’t actually there by strategically drawing artificial boundaries. Let’s make the example a bit more real.

Suppose you want to better understand the causes of Disease X, something just discovered and occurs in 10% of the population naturally. You plot the data of a nearby town of 10,000 to see if you can find a pattern.

Here is the plot (I used a uniform distribution so we know any clumps have no underlying cause):


Your eye gets drawn to an oddly dense clump of cases of Disease X. You circle it and then run a statistical test to see if the number of cases is significant. You’re shocked! Your properly run statistical test shows you the increased number of cases is significant and with 95% certainty you conclude it isn’t just a fluke.

So what do you do? You start looking for causes. Of course you’ll be able to find one. Maybe that clump of houses has a power station nearby, or they drink from the same well water source or whatever. When you are looking for something in common, you’ll be able to find it.

When this happens, you’ve committed the Texas Sharpshooter Fallacy. It might be okay to use this data exploration to look for a cause if you merely intend to turn it into a hypothesis to be tested. So you hypothesize that it is radon in the water that caused the spike of cases in that cluster.

Now do real science where you do a randomized controlled study to actually test your null hypothesis. Doing statistics on big data is risky business, because any clever person can construct correlations from a large enough data set that first off may not actually be there but second off is almost surely not causally related.

Another way to think about why this is a fallacy is that when you have 95% certainty, 5 out of 100 times you will falsely find correlation where none exists. So if your data set is large enough to draw 100 different boundaries, then by random chance 5 of those will have false correlations. When you allow your eye to catch the cluster, it is your brain being good at finding patterns. It probably rejected 100 non-clusters to find that one.

This is scary in today’s world, because lots of news articles do exactly this. They claim some crazy thing, and they use statistics people don’t understand to “prove” its legitimacy (numbers can’t lie don’t you know). But really it is just this fallacy at work. The media don’t want to double check it because “Cancer rate five times higher near power station” is going to get a lot of hits and interest.

Actually, cancer is particularly susceptible to this type of fallacy as dozens of examples of these studies getting publicity despite no actual correlation (yet alone causation!) are documented in George Johnson’s (excellent) The Cancer Chronicles or an older The New Yorker article called “The Cancer-Cluster Myth.”

So the next time you read about one of these public health outcries, you should pay careful attention in the article to see if this fallacy has been made. For example, the vaccination causes autism myth also orignated this way.

Probably the most egregious example is The China Study, a highly praised vegan propaganda book. It takes the largest diet study ever done (367 variables) and pulls out the correlations that support the hypothesis “meat is bad.”

What the book doesn’t tell you is that the study found over 8000 statistically significant correlations, many contradicting the ones presented in the book. This is why large studies of observational epidemiology always have to be treated with caution. The larger the study, the more likely you will be able to find a way to support your hypothesis.

If you don’t believe me, and you want to protect marriage in Maine, then make sure you eat less margarine this year:


1 Comment

Thoughts on Roth’s American Pastoral

The first time I read Philip Roth’s American Pastoral, I had nothing but criticism for it. I’ll try to set the stage for my first reading. It was my early undergraduate days about 10 years ago.

I had had a fairly sheltered childhood. I grew up in a highly apolitical house. At that point, I had not been of age to vote during a major election, and so the extent of my political knowledge was the ability to name the president.

Despite this, I read the book at the height of my reading career. No offense for the university I attended, but I breezed through (a perfect 4.0 finishing GPA) with almost no work. This meant I supplemented my studies by reading a lot.

By this I mean I sometimes read 2 novels a week. I read Infinite Jest and Gravity’s Rainbow during this time. I wanted to read every book anyone had ever recommended to me or had said was “unreadable” (is that a description or a challenge?).

So what were my complaints? Well, it read like realism, yet nothing struck me as realistic in the book. It seemed filled with hyperbole and extreme character overreaction. Here’s a few of the things I remember saying, but there were probably more:

1. How could anyone be so upset over politics to do something so extreme?
2. How could someone’s perception of someone else be so skewed?
3. How could one event cause someone to change so radically and suddenly?
4. The pacing is too slow.
5. The second half is too bizarrely different from the first to create something coherent.

Anyway, I decided to reread it and was shocked to find how much 10 years can change your perspective. The book is a delicate portrait of how a tragedy wrecked a family’s life.

What I originally perceived as too slow of pacing turned out to be a striking dive into the psyche of a man torn by conflicting and paradoxical emotions. It tries to answer the question: How does one grapple with continuing to love someone after they have done something horrible? It is heartbreaking to witness.

What I originally thought of as radical and sudden change of a character turned out to be a perfectly natural reaction of changing values and priorities. It’s happened to me. It’s happened to people I know. It happens to everyone. With a catalyst of such magnitude as happens in the book, it doesn’t seem at all extreme to me anymore.

I’ve learned a lot about bias and the human mind since last reading the book. Now the inconceivable false perception of someone strikes a chord of truth in me.

In fact, none of my initial criticisms ring true anymore. The book presents all of these complicated human interactions and emotions in a unified, compelling story.

The thing I most love about Roth’s style (at least in the second Zuckerman trilogy) is ever present in American Pastoral. He has the ability to lead you down a somewhat illogical, yet fully natural series of thoughts to land on a beautifully constructed gem of a sentence to contemplate.

It is hard to describe or give an example, because to pull the quote out of context removes how striking it is to read in real time. I often found myself having to stop and contemplate how illuminating the paragraph was. I could always relate to a time when I had a similar thought process. I first noticed this style in The Human Stain which drove me to the other Roth novels.

Needless to say, I loved this book. At a time when our politics seem to be more divided and more extreme than ever, and outrage and violence surrounding it has become more public (the recent Ferguson protests come to mind), a book of such introspection on the topic has only grown in its importance among the rank of American literature of the past fifty years.

Leave a comment

Does Bayesian Epistemology Suffer Foundational Problems?

I recently had a discussion about whether Bayesian epistemology suffers from the problem of induction, and I think some interesting things came from it. If these words make you uncomfortable, think of epistemology as the study of how we form beliefs and gain knowledge. Bayesian epistemology means we model it probabilistically using Bayesian methods. This old post of mine talks a bit about it but is long and unnecessary to read to get the gist of this post.

I always think of the problem of induction in terms of the classic swan analogy. Someone wants to claim that all swans are white. They go out and see swan after swan after swan, each confirming the claim. Is there any point at which the person can legitimately say they know that all swans are white?

Classically, the answer is no. The problem of induction is crippling to classic epistemologies, because we can never be justified in believing any universal claim (at least using empirical methods). One of the great things about probabilistic epistemologies (not just Bayesian) is that it circumvents this problem.

Classical epistemologies require you to have 100% certainty to attain knowledge. Since you can’t ever be sure you’ve encountered every instance of a universal, you can’t be certain there is no instance that violates the universal. Hence the problem of induction is an actual problem. But note it is only a problem if your definition of knowledge requires you to have absolute certainty of truth.

Probabilistic epistemologies lower the threshold. They merely ask that you have 95% (or 98%, etc) confidence (or that your claim sits in some credible region, etc) for the justification. By definition, knowledge is always tentative and subject to change in these theories of knowledge.

This is one of the main reasons to use a probabilistic epistemology. It is the whole point. They were invented to solve this problem, so I definitely do not believe that Bayesian epistemology suffers from the problem of induction.

But it turns out I had misunderstood. The point the other person tried to make was much more subtle. It had to do with the other half of the problem of induction (which I always forget about, because I usually consider it an axiom when doing epistemology).

This other problem is referred to as the principle of the uniformity of nature. One must presuppose that the laws of nature are consistent across time and space. Recall that a Bayesian has prior beliefs and then upon encountering new data they update their beliefs factoring in both the prior and new data.

This criticism has to do with the application of Bayes’ theorem period. In order to consider the prior to be relevant to factor in at all, you must believe it is … well, relevant! You’ve implicitly assumed at that step the uniformity of nature. If you don’t believe nature is consistent across time, then you should not factor prior beliefs into the formation of knowledge.

Now a Bayesian will often try to use Bayesian methods to justify the uniformity of nature. We start with a uniform prior so that we haven’t assumed anything about the past or its relevance to the future. Then we merely note that billions of people across thousands of years have only ever observed a uniformity of nature, and hence it is credible to believe the axiom is true.

Even though my gut buys that argument, it is a bit intellectually dishonest. You can never, ever justify an axiom by using a method that relies on that axiom. That is the quintessential begging the question fallacy.

I think the uniformity of nature issue can be dismissed on different grounds. If you want to dismiss an epistemology based on the uniformity of nature issue, then you have to be willing to dismiss every epistemology that allows you to come to knowledge.

It doesn’t matter what the method is. If you somehow come to knowledge, then one second later all of nature could have changed and hence you no longer have that knowledge. Knowledge is impossible if you want to use that criticism. All this leave you with is radical skepticism, which of course leads to self-contradiction (if you know you can’t know anything, then you know something –><– ).

This is why I think of the uniformity of nature as a necessary axiom for epistemology. Without some form of it, epistemology is impossible. So at least in terms of the problem of induction, I do not see foundational problems for Bayesian epistemology.


7DRL Takeaway Lessons

I promise this will be my last 7DRL post, but I thought I should record some thoughts I’ve had about it now that it is over.

This post could alternately be called “Why I Will Not Use MonoGame in the Foreseeable Future.”

First, the motto of MonoGame is “Write Once, Play Everywhere.” This sounded promising to me, because my first game had issues deploying almost anywhere. Of course, what they mean is that you can play anywhere supported by Windows.

There are ways to get builds onto other platforms so that Mac and Linux users can play, but I only had 7 days, and I’m not a trained computer scientist, so that wasn’t going to happen. This was a little frustrating, and also one Windows user already complained they couldn’t get it installed.

Second, MonoGame is effectively a re-implementation of XNA 4. Unfortunately, some of the pipelines (especially with an up to date Visual Studio) are broken and require hacks to work. This caused major problems with sound. So much so that I scratched all sound from the game.

I know that with enough time, I probably could have gotten this working (because lots of games made with it have sound), but I couldn’t waste the time in those 7 days to fight with it. This was frustrating, because one of the main reasons to use MonoGame was to make all of that streamlined and easy.

I also felt trapped into using Visual Studio as an IDE. This is certainly a fine IDE, but I’m most comfortable with Sublime Text 2. Switching editors wastes a lot of time, especially when you “downgrade.”

By this I mean VS doesn’t have half the great features of ST2 (including my most used feature: multiple cursors). In retrospect, I should have edited in ST2 and built in VS.

All this being said, MonoGame was a good enough framework to produce a full working game in 7 days, so I can’t complain too much. Also, if I were part of a larger team with a longer time frame, many of these issues would disappear. So I admit these complaints come specifically from the 7 day issue.

If I do this again, I will almost certainly try Cocos2d-x. It looks like this will fix all the complaints I had. First, Cocos2d-x is open source. This means I can actually see what the engine is doing if I need to! What a concept.

Second, it is C++, so it will deploy across platforms much easier. Lastly, it isn’t in some transition period like MonoGame where content management seems to use outdated, unsupported pipelines. I’m also more familiar with C++ than C# which would have solved a few headaches.


Get every new post delivered to your Inbox.

Join 247 other followers