## How Bad is External Fragmentation under a First-Fit Algorithm?

Sometimes I read about operating systems, because I want to know how my computer works. I’ve run Linux distributions as my primary OS for about 10 years, so having a familiarity with these concepts allows me to greatly customize and optimize my system. One topic that the general public is probably interested in is “What is fragmentation, and why am I supposed to defrag my computer sometimes?”

What I’m going to describe is not how memory management is actually done on any modern operating system, but it will give a feel for why fragmentation happens (and what this even is). I realized this really simple algorithm would be easy to program a simulation for, so we can see just how bad it can get.

The first-fit algorithm is exactly what it sounds like. You visualize memory as a bar that fills up. Each program has a size, and you look for the first hole big enough for it to fit. Here’s a picture I made to illustrate:

What’s going on? You download the blue program, and it goes into the first area no problem. You download the red, and the first available hole is right next to the blue. Same for the green. Then you realize you don’t want the red anymore, so you delete it. This leaves a hole. When you download the purple program your memory management notices the hole and tries to fit it there. Woops. The hole was too small, so it moves on to find the first hole that it fits in (hence the name “first-fit algorithm”). Eventually you may delete the green as well, and something might go in the big hole leftover from the red and green, but it probably won’t fill the whole thing.

Once these holes start appearing, they are hard to get rid of. This is external fragmentation. I wrote a simulation of this to see just how bad it can get. Although the program does nothing more than the simple thing I just showed, it is kind of long and confusing, so let’s take it one step at a time.

from matplotlib import pyplot as pp
import numpy as np
import random

def randomSize():
return random.randint(1,10)

def randomTime():
return random.randint(1,20)

class Program(object):
def __init__(self):
self.size = randomSize()
self.position = -1
self.time = randomTime()

def getSize(self):
return self.size
def getPosition(self):
return self.position
def getTime(self):
return self.time

def decrement(self):
self.time -= 1


I made two classes. The first one allowed me to pretend like I was downloading some program. Each program came with a random size from 1 to 10, the position where it was placed into memory, and the amount of time until I deleted it (a random number from 1 to 20). I’m still really confused about whether or not it is proper python style to use getter methods rather than just directly accessing attributes…

class Memory(object):
def __init__(self, size):
self.size = size
self.contents = []

def getContents(self):
return self.contents
def numPrograms(self):
return len(self.contents)
def getSize(self):
return self.size

def checkHole(self, i, p):
return p.getSize() <= self.contents[i+1].getPosition() - (self.contents[i].getPosition() + self.contents[i].getSize())

n = self.numPrograms()
tmp = 0
if n == 0:
self.contents.append(p)
p.position = 0
elif n == 1:
if self.contents[0].getPosition() >= p.getSize():
p.position = 0
self.contents.append(p)
else:
self.contents.append(p)
p.position = self.contents[0].getSize()
else:
if self.contents[0].getPosition() >= p.getSize():
p.position = 0
self.contents.append(p)
tmp = 1
if tmp == 0:
for i in range(n-2):
if self.checkHole(i,p):
self.contents.append(p)
p.position = self.contents[i].getPosition() + self.contents[i].getSize()
tmp = 1
break
if tmp == 0:
if p.getSize() <= self.getSize() - (self.contents[n-1].getPosition() + self.contents[n-1].getSize()):
self.contents.append(p)
p.position = self.contents[n-1].getPosition() + self.contents[n-1].getSize()

def sort(self):
self.contents = sorted(self.getContents(), key=lambda p: p.getPosition())

def removeProgram(self):
for p in self.getContents():
if p.getTime() == 0:
self.contents.remove(p)


I’m sorry that this is probably the worst python code you’ve ever seen. This class is just the stick of memory. It has a total size that we will take to be 100 in our simulations and it has a list of the programs that are currently in memory (the programs keep track of their positions). I made a helper function that checks whether or not a given program fits into the hole between the i-th and (i+1)-th programs.

Originally I thought this helper function would make the code much simpler, but it is almost entirely useless which is why this looks so bad. Then I made the function that adds a program into the memory. This just looks for the first available hole to stick it into. Unfortunately, I ended up not seeing a slick uniform way to do it and cobbled together something by cases. I check the hole before the first program separately and the hole after the last program separately.

I made a sort method, because when I appended a new program to the list it was just stuck at the end despite having a position that could be in the middle. This just orders them by position. Lastly, I have a program that gets removed if enough time has passed.

def simulate(memSize, numTimeSteps):
m = Memory(memSize)
tmp = 0
for i in xrange(numTimeSteps):
for p in m.getContents():
p.decrement()
m.removeProgram()
m.sort()
for p in m.getContents():
tmp += p.getSize()
print float(tmp)/memSize
memArray = []
for i in xrange(len(m.getContents())-1):
memArray.extend([1 for j in range(m.getContents()[i].getSize())])
memArray.extend([0 for j in range(m.getContents()[i+1].getPosition()-(m.contents[i].getPosition() + m.contents[i].getSize()))])
memArray.extend([1 for j in range(m.getContents()[m.numPrograms()-1].getSize())])
memArray.extend([0 for j in range(100- len(memArray))])
x = [i for i in range(100)]
ymin = [0 for i in range(100)]
a = np.array(memArray)
pp.vlines(x, ymin, a)
pp.ylim(0,1)
pp.show()

simulate(100, 100)


Lastly I just simulate for 100 time steps. At each time step I try to stick in one randomly generated program. I then make all the time-to-deletion parameters decrease. I check whether any are 0 and remove those. I sort the list, and then repeat.

Here are some of the final results:

This one had 66% memory use, so external fragmentation led to about 34% waste.

This one was slightly better at 69% use.

This one was quite a bit worse at 43% use and hence 57% waste. The full code can be found here. It might be fun to play around with different sizes of memory and lengths of time to delete stuff. Fragmentation will obviously not be a huge problem if the amount of space you try to use is sufficiently less than your total memory.

On the other hand, as these simulations show, the answer to the title question is that it can be pretty bad. These simulations are actually accurate (if my textbook is correct), because you can mathematically prove that the expected space you waste with this algorithm is roughly 33% (I ran a lot more simulations than I showed you and this was what I usually saw).

There is no need to fear, though. Old operating systems did use some variant on this, but all modern operating systems use something called “paging” which completely avoids external fragmentation (there is something called internal fragmentation that I didn’t talk about). I hope you enjoyed this change of pace. I had fun doing it. It may look like it was a lot of work, but the program was done in one sitting. If anyone knows how to make an “animation” where you can watch the whole process unfold that might be cool. I googled a bit, and couldn’t find an obvious way to do it.

Update 5/17/14: I pushed a cleaned up version of this code to github. It is amazing what one night of sleeping on something will do. I got rid of most of the indices using itertools, and I realized that the proper helper function was to just make a method that returns the right endpoint of the position of a program in memory. Then finding a hole size was just subtracting and could be reused everywhere.

Just for fun, I decided I would “prove” to you that this algorithm is only bad when you use a sufficiently large amount of memory (i.e. probably never on a modern computer!). Here’s such a simulation result where your programs are too small for the wasted space to ever matter:

## The Myth of a Great Seminar

Sometimes I peruse the debates at Intelligence Squared to see if any catch my eye. There was one this time that seemed really interesting to me. It was a debate on whether or not MOOCs are reasonable replacements for actual in-class and campus college experiences. You can see the full thing here.

This was interesting to me, because I’ve actually gone through a few MOOCs from start to finish and found them to be extremely good experiences. I was curious if there was research that would be mentioned about the effectiveness of one or the other. The debate was pretty disappointing in this regard. The main anti-MOOC argument was based around how wonderful small seminars are and that you can’t get this in a MOOC. That’s why I want to write a response to this mythical seminar.

Before talking about why I think such seminars don’t really exist in this Platonic, pristine state at any university, I want to first address the fact that the existence of seminars at all is pretty mythical. I decided to check the University of Washington’s Spring 2014 schedule. The senior level literature classes had a student range of 25-40, but most were about 30. Should I consider a 30 person class a “small seminar?” I get it. We’re a gigantic school, so I fully admit that small liberal arts colleges probably do have a lot of small seminars. But most students at most schools will graduate with few to no small seminars as their classes.

Even middle level courses like Introduction to the Theory of Literature at Ivy League schools are gigantic. That class probably has 100 students or more in it, and those are the types of courses that are offered as MOOCs. I think the comparison is a bit disingenuous when you take some capstone seminar and compare it to an “intro” MOOC. The MOOC side of the debate also responded to this criticism and pointed out that some MOOCs offer small group breakout sessions which actually do simulate small seminars. So the point doesn’t even stand.

Now that that rant is over, let’s pretend like the comparison is fair. Here are some of the myths I heard and why I think they are mostly myth (I’ll grant that maybe a few seminars run according to plan):

Let’s suppose for the sake of argument that the teacher is practically invisible in this mythical seminar and the students are all enraptured in high level critical conversation about Dostoevsky or some such nonsense. This seems to be the ideal the seminar aspires to. This is going to sound extremely cynical, but just how interesting can this conversation actually be? The seminar is going to be made up of an incredibly homogeneous group. Everyone is going to be about 20, never having had to make a living. They are all educated at the same school, which means they have roughly the same cultural experience, read the same books, and developed the same theories about how to analyze books.

What’s so great about this perfect conversation in comparison with a MOOC? When you take the exact same course as a MOOC, you will probably have a math professor in India, a farmer in the American midwest, a retired middle school teacher in Scotland, etc. The conversation about the same books is going to be infinitely more interesting and enlightening, because the perspectives will be so varied.

Now let’s back up a little from the perfect situation and get a little more realistic. We’ve all been to these seminar classes before. The free-flowing and enlightening conversation essentially never happens. You have some people who didn’t read the stuff. You have people who aren’t very good at articulating their thoughts on the spot. The whole thing usually turns into the professor calling on someone, a brief sentence or two is mumbled, and then the professor carries on along that point. The “conversation” is forced, and the student input is more like a prompt for the professor to riff on.

Depending on the day and material, the degree to which this is the case will vary, but I think the overall sentiment is what happens most days in most seminars. This is actually why I think a written discussion board in a MOOC is actually a far better method for discussion than a conversation in a seminar.

First off, there are hundreds of more topics and conversations going on at a discussion board than in class. This means that you can search around for conversations that you really want to participate in. Second, you have to write your thoughts down. This gives you time to figure out what you are going to say rather than awkwardly spewing out some muddled nonsense while everyone stares at you. It also gives you time to figure out what other people mean before responding to them.

It is amazing the number of times you start typing a response, and then when you go back to what was actually said you realize you misunderstood at first. Which brings me to my next point. A discussion board records all of it. You can continually return to conversations as your understanding of a topic develops. The conversation doesn’t end at the end of the hour. Once you leave the physical setting of a seminar, it probably only takes a few hours to forget most of what most people said. The discussion board allows you to go back whenever you want to recall certain parts of certain conversations.

To summarize, I think most courses most people take are not seminars, so it is pointless to use them as a main argument against MOOCs. I also think that the MOOC setup is actually a better platform for enlightening discussion in almost every respect than an actual seminar. That being said, I think the anti-MOOC side has a point when they say that communication skills are developed in class discussion. Unfortunately, even small seminars tend not to have real “discussions,” so I don’t find that compelling (along with the fact that some MOOCs are incorporating small group live chat sessions now).

Don’t get me wrong. I don’t think all university education should be relegated to the online setting. I’m just saying that using some idealized small seminar as the main argument is a highly flawed way to go about it.

## Correlation Does not Imply Causation

I’ve never done this before in six years and well over 400 posts. I’m going to direct your attention to a webpage rather than write a post. As they say, “A picture is worth 1000 words,” so consider this a 1000 word post:

The full page is here.

This is exactly why it is so dangerous to conclude a relationship from statistically significant correlations. Even phenomena with direct known causal relationships tend not to have 0.99 correlation. Peruse the rest of that webpage at your own risk. It is quite addicting (who knew that so many people died from getting tangled in their bed sheets every year?).

## Rorty’s Pragmatism

Today I’d like to talk about Richard Rorty. He was an American philosopher that became famous in the late 70′s and 80′s for advocating a new form of pragmatism. I thought this might be a timely topic, because we’ve been spending a lot of time on making sense of data. Modern society has become polarized on a bunch of issues which basically stem from more fundamental questions: what is knowledge and what is truth?

On the one side we have radical scientism. This side argues that in order to count something as knowledge, it must be falsifiable, formulated as a scientific hypothesis, and demonstrated with 95% certainty. There are of course much milder variants on this side. For example, one might stipulate that all questions that naturally have a scientific formulation must meet scientific standards before we consider it to be reliable information, but science doesn’t have much to say about non-scientific questions.

The other side is radical skepticism or postmodernism (I know these are not at all the same thing). The radical skeptics claim that all knowledge is impossible, so we should be skeptical of all things that we hear (even if they were proven by a scientific study). I have a lot of sympathy for this side. Facebook alone makes me skeptical of basically anything anyone says, because I know that half of the interesting things I’m told probably come from a totally false Facebook post someone made. Everyone has bias and/or funding which skews results including supposedly objective scientific ones.

Postmodernism gives a bit more substance to this argument. It essentially says that we have no foundations anymore. Science can’t prove that science is getting at truth, so we shouldn’t treat it as a special class of knowledge. This “lack of foundations” argument ends up giving merit to a lot of dangerous ideas. Since the scientific method is no longer seen as the most reliable way to truth, maybe new age spirituality or alternative medicine actually works and is just as effective.

I’ll state my bias right up front. I tend to agree with the scientism viewpoint (although I’d probably call my stance “naturalism,” but let’s not get into that). Both sides make really good critiques of the other when done by a careful thinker. Science has assumptions that cannot be justified. It is merely building models. Maybe our model of gravity is totally wrong, but just happens to consistently give really accurate predictions when tested.

Science critiques the other positions as well. Skepticism is not self-consistent, because it requires you to be skeptical of skepticism. The lack of foundations in postmodernism does not mean that all things are equally likely to be true.

These differing foundations manifest in huge shouting matches: evolution vs intelligent design, medicine vs alternative medicine, atheism vs theism, and on and on. The main reason I err on the side of science is because all people seem to think that science provides the best answers until those answers disagree with their previously held beliefs. It is only then that the lack of foundations is pointed out or the bias of the researcher is brought up. See also this post which shows why the scientific method is needed to surpass bias and this post for an ethical reason to err on the side of science.

Anyway, we’ve passed 500 words already and I’m still just setting up why Rorty is such an important thinker. His views seem to just gain importance as data sets keep getting bigger and we get confused about who we should believe. Rorty basically comes up with a middle ground which is sometimes called neopragmatism. He entered the scene at a time where both sides seemed right and wrong. His position is that the postmodernists are right that there are no foundations, but this doesn’t matter because some systems are useful. Let’s unpack this a bit.

First off, if this interests you, then go read Philosophy and the Mirror of Nature. A quick blog post cannot do it justice. It is quite complex and subtle. One side says that they’ve built a fantastic pillar called science on the solid foundations of peer review, objectivity, etc. The other side says that all our institutions can be knocked down, because there are no solid foundations.

Rorty has a somewhat shocking response that both sides are wrong. There are no foundations (i.e. external objective standards), but this doesn’t mean the pillars are unstable. It just means that the rules of the game depend on which game we’re playing. When playing tennis, we must follow the rules of tennis. When doing science, we must play by the rules of science. There is no universal, correct rule set for all games. It is just dependent on the game. That’s okay. None are more “right” than another, because this concept doesn’t even make sense.

So what is truth? Rorty says that we can think about justification, but not about truth. How we justify beliefs is dependent on the system we are in. We know how to use the word true in each system, so we don’t have to define it. This is a very classic pragmatic response. When speaking of scientific truth, we have a collection of things we mean. When speaking of literary truth we have another. These truths are dependent on time and place (e.g. “It is a truth universally acknowledged, that a single man in possession of a good fortune must be in want of a wife.”)

So how is this different from the extreme relativism of postmodernism? Well, Rorty would say that usefulness has to be taken into account. There is no way to get at objective truth, but some systems are more useful for certain purposes than others. For example, at this point in time, science seems to be the most useful system to answer scientific questions. Your computer is working, polio was eradicated, we put people on the moon, etc, etc. As the internet meme goes, “Science. It works, bitches!” And so even though we don’t know if science is getting at truth (which reasonable scientists fully admit, by the way), it does consistently get at something useful. There may be other contexts in which scientific rigor is not the most useful system.

Rorty develops a theory that fully admits that the postmodernists are right when they say that we have no basis for foundations anymore. But he doesn’t descend into extreme relativism. He leaves room for some systems of thought to be more useful than others. They don’t have a monopoly on truth, because we don’t even know what that means. Relativism doesn’t even really make sense from Rorty’s viewpoint, because you can never leave your current context from which to make a relative judgement. And that’s why I think he’s so important. He points out that our shouting matches aren’t about content or truth. They are about coming at the same question from different systems.

## Fun with Decision Theory

I’ve done quite a few decision theory posts at this point, and I think I’m mostly done with it. So to conclude the section I thought I’d leave you with some fun thought experiments having to do with decision theory. You can use your new skills to try to analyze them.

The first thought experiment I want to present has been around since at least the late 60′s. It is generally referred to as Newcomb’s paradox. Here’s the setup. Suppose you encounter a strange being in the forest that can predict your decisions (it’s telepathic or something, just go with it for the purposes of the thought experiment).

They offer you a deal. They present Box A which contains $1,000, and they present Box B which contains either$0 or $1,000,000. You are allowed to take Box B by itself or both Box A and Box B. The being predicts what choice you will make to determine the contents of Box B. If you take only Box B, then they put the$1,000,000 in it. If they predict that you will take both, then they put $0 in. All of this is done ahead of time (because they also correctly predicted that you would walk through this random area of the forest). An important part of the setup is that the predictor puts the money in ahead of time, so the contents are not determined after you make a decision. The contents cannot change. There are only four total possibilities, so if you use your decision theory skills, then it should be a pretty straightforward calculation to figure out how to maximize your profit. Strangely, this is often referred to as a paradox, because two equally valid sounding arguments lead to different answers. Here’s one analysis. Suppose this being thinks you will pick both boxes. If you actually pick B, then you get nothing. If you actually pick both, then you get$1,000. Thus picking both gets you a better result in that case. Suppose it thinks you will only pick B. Then if you actually pick both, you get $1,001,000. If you only pick B, then you only get$1,000,000. Thus picking both leads to a better result in that case as well. In fact, picking both clearly gets you more money no matter what the prediction was. Thus picking both maximizes your profit.

The other analysis says that the first one ignored vital information. We can throw out two possibilities, because by the assumption of the thought experiment the prediction will never be wrong. Thus the only two possibilities are that you pick both, in which case you get $1,000 or you pick B in which case you get$1,000,000. Therefore picking only B maximizes your profit.

I won’t present any of the attempted resolutions of this, because I’ve given you some tools to think about it on your own for awhile. I’ll just say that if you Google this, then you will find that tons of famous philosophers and mathematicians have attempted to resolve it. So answers are really easy to find if you get stuck or are curious to read more about it. If you aren’t sure where to start, I highly recommend stuff that Eliezer Yudkowsky has written on it. I dare say he has probably thought about this more deeply than most people.

Another fun and related issue is the idea of acting randomly being the best decision. Suppose you are playing a game in which if you make moves at random, then you have a 1/2 probability of winning. If your opponent can guess what your next move will be, then you only have a 1/4 chance of winning. Games like these are pretty easy to construct, but telling you one isn’t as important as the fact that it has this feature.

In such a situation, if you run your decision theory algorithm and come up with a deterministic set of moves to make that maximizes your chance of winning, then you will almost surely lose. This is because your opponent could figure out what moves you need to make to win and hence figure out which moves you are going to make. In such a situation, the only way to maximize your chance of winning is to ensure that you never make moves according to some rule that your opponent could figure out, i.e. picking a move at random maximizes your chance of winning.

In some sense, if you make your decision according to some random mechanism external to yourself, then you prevent the game from becoming a “Newcomb-like problem.” In fact, some people try to resolve the Newcomb problem with such randomness. Anyway, I thought it would be fun to end this series with something a little lighter.

## Markov Chain Monte Carlo Example

Let’s look at a problem called parameter estimation. As usual, we have a bunch of coin flips. What we’ve learned to do with Bayesian statistics is calculate some posterior distribution that tells me how likely the bias ${\theta}$ is. I ask myself, “Is it a credible hypothesis that the coin is fair (${\theta =1/2}$)?” I find out yes it is. I ask myself, “Is it a credible hypothesis that the coin is massively biased at ${\theta=4/5}$?” I find out yes it is. Uh oh.

Maybe in abstract math land this type of contradictory information is fine. I should be honest that both are credible guesses based on my data, and Bayesian statistics helps me to be very precise about my degrees of certainty and uncertainty. Unfortunately, in the real world I want to figure out which ${\theta}$ is “best” so that I can use it in my model for practical purposes. This is called parameter estimation, because I want to estimate what my parameter ${\theta}$ should be in my model.

We’re in luck for the coin example, because we only have one parameter living in one-dimensional space. This alone vastly simplifies the situation, but we have something far, far better. Our posterior distribution has a unique maximum, that max happens to equal the mean of the distribution, and that max can be determined easily and exactly! This means that we can safely use that parameter as the “best.”

In the real world, we often have several parameters we are trying to estimate in a high-dimensional space, and the posterior is some non-convex crazy thing with lots of local mins/maxs that can’t be determined analytically. Let’s face it. Optimization is really hard even in relatively nice situations. The real world is usually not nice.

There often isn’t even an obvious notion of what you mean by “best” set of parameters either. Think of a symmetrical bimodal distribution where both peaks have the same max. You don’t really have any good reason to pick one of the points that gives the max, and if you do something like take the mean, then you might end up with a min on accident. The method I’m going to describe doesn’t really help with this issue of “equally good choices”, but it does give a beautiful way to deal with high-dimensional parameter spaces and crazy posterior distributions.

The idea is extremely simple. You will pick some starting collection of parameters. Then you let those parameters randomly move in some direction. We will then use our model to test whether or not it is more or less likely to see the data that we observed under each of those parameter choices. With some probability depending on this likelihood we will move that parameter to that value. This is just a Markov chain process of our ${\theta}$ values moving through the possible parameter values, and hence this technique is called a Markov Chain Monte Carlo (MCMC) method (I used the indefinite article “a” because there are all sorts of variations on this out there).

It turns out that as long as we set this up in a reasonable way, then it will converge. Here’s something cool about this. Your parameters could live in some gigantic space for which it would be impossible to search for a good parameter estimation. Usually there is some much, much smaller dimensional subset of reasonably likely candidates. Once you move to this smaller dimensional set, by the nature of the algorithm, you will stay close to it and hence start moving to something optimal much faster. Here’s a picture showing how the random walks stay on a smaller set in a real example:

Let’s actually implement this in the silly case of the coin example where we know what the answer should be. My next post might try to implement this for some sort of real data set, although that could be more time consuming than I’m willing to do. To make this example more fun, I had the computer pick a random number in ${[0,1]}$ and then generate 100 coin flips with bias equal to that number without telling me the number! This way we are in a realistic situation of not knowing what the “correct” answer is ahead of time.

I got 85 heads and 15 tails. To make computations easier, let’s assume the prior probability is just uniform. This means the posterior is given by ${p(D|\theta)=\theta^{85}\cdot (1-\theta)^{15}}$. I’ll start the random walk at ${\theta = 0.5}$. To know how much to move by, I pick a random number from a normal distribution with mean ${0}$ and standard deviation ${0.1}$. So if I pick ${0.05}$, then my candidate place to move to is ${0.55}$.

I compute ${p(D|\theta_{new})/p(D|\theta_{old})}$ and I move to the new spot with this probability. Note that if my new theta value is more likely to be the true theta, then I will always move to the new value, because the probability of moving is greater than ${1}$. The more unlikely my new theta value is, the less likely it is that I will move there. This implementation is called the Metropolis (or Metropolis-Hastings) algorithm. Note how simple the implementation is. It is only a few lines of code:

import numpy as np
import random
import pylab

# Posterior Distribution
def p(theta):
return (theta**85)*((1-theta)**15)

# Random Walk Step Size
def norm_dist():
return random.normalvariate(0, 0.1)

# Perform one step of random walk from spot theta
def rand_walk(theta):
x = norm_dist()
if theta + x < 1 and theta + x >0:
return theta + x
else:
return theta

# Simulate the random walk for 1000 time steps
walk = []
walk.append(0.5)
for i in xrange(1000):
n = walk.pop()
walk.append(n)
y = rand_walk(n)
if random.random() < p(y)/p(n):
walk.append(y)
else:
walk.append(n)

# Plot the results
ylab = [i for i in xrange(len(walk))]
pylab.plot(walk, ylab)
pylab.title('Random Walk Visualization')
pylab.xlabel('Theta Value')
pylab.ylabel('Time')
pylab.show()


Note that the key insight that MCMC gives us is that picking values from the posterior is going to be “easy.” Even if we don’t know much about the distribution and have no idea how to explicitly calculate anything from it, we can still perform this random walk. This is what it looks like:

The last step is to actually do a parameter estimation. The whole point is that the walk will stay close to the best value, so we can now just average these to get ${\theta = 0.84}$. The average is just a finite sum instead of an integral now. If we had done this analytically, we would have gotten ${0.85}$. Since we know MCMC is just giving us an estimation coming from randomness, this is really quite good!

## Video Games as a Solution to the One-Sided Problem of Art

In October I wrote a post in defense of gaming in which the central argument is a claim that any person who takes experiencing art as an important human experience should consider certain types of games as a worthwhile use of time as well. Some games are basically interactive films, but some are much more interesting and original forms of interactive art. If you close yourself off from this world, then you close yourself off from deep artistic experiences that you can’t get elsewhere.

A few months ago I did two posts on David Foster Wallace, his philosophy of art, and how to get the most out of Infinite Jest.

One of DFW’s central concerns in art was the one-sided nature of art. The artist puts in hundreds of hours of work, and the viewer/reader/whatever passively experiences the work. He thought of the artist/viewer relationship as an honest relationship. If it is completely one-sided, then it is a defunct relationship and you won’t get much out of it for very long. To have a successful relationship, both sides have to be putting in reasonable amounts of work.

This is one way people justify postmodernist writing. You have a bunch of endnotes or footnotes or you pull the reader out of the reading experience in other ways by drawing attention to the fact that they are reading something. You write in stream of consciousness from points of view that change every couple of pages, so that the reader can’t immediately tell what is happening. Whatever the literary device, the idea is that the reader has to put in work.

The point is that the more work the reader puts in, the more they will get out of the experience. Just like in a relationship, the reader has to invest something if they want a meaningful experience. Of course, the relationship becomes one-sided on the other side if the author just uses a random word generator and plops nonsense on the page for the reader to spend months trying to decipher. It needs to be a symbiotic relationship where neither side carries too much of the burden.

I’m going to go out on a limb and say that this problem is a real problem, and what writers, filmmakers, artists, etc have come up with so far merely mitigates the problem. There hasn’t been a really good way to get the viewer to truly participate in and invest in the work of art … until the fairly recent paradigm shift in thinking about games as art.

I’m definitely not the first to propose this, so I won’t spend a lot of time making this into a long post. Now that I’ve blogged around this topic a few times without actually addressing it I thought I would just point out that games are one obvious solution to the problem. They provide an interactive experience where the “player” has to fully invest in the work.

In fact, if artists are scared of the idea that their art will be “played” and hence will not qualify as “serious” (two notions that are extraordinarily hard to define or separate), then they should check out some recent games like To the Moon. The game play is extremely minimal. The player experiences a moving story by progressing through the game. The game play consists of moving around to collect some items and at the end of certain segments of collecting you “solve a puzzle” (sometimes only 2 or 3 clicks of the mouse). Still, this level of interaction is vital to fully immersing you in the story as if you were really the main character. This interaction is impossible with film or literature.