## The Functor of Points Revisited

Mike Hopkins is giving the Milliman Lectures this week at the University of Washington and the first talk involved this idea that I’m extremely familiar with, but am also surprised at how unfamiliar most mathematicians are with it. I’ve made almost this exact post several other times, but it bears repeating. As I basked in the amazingness of this idea during the talk, I couldn’t help but notice how annoyed some people seemed to be at the level of abstractness and generality this notion forces on you.

Every branch of math has some crowning achievements and insights into how to actually think about something so that it works. The idea I’ll present in this post is a truly remarkable insight into geometry and topology. It is incredibly simple (despite the daunting language) which is what makes it so fascinating. Here is the idea. Suppose you care about some type of spaces (metric, topological, manifolds, varieties, …).

Let ${X}$ be one of your spaces. In order to figure out what ${X}$ is you could probe it by other spaces. What does this mean? It just means you look at maps ${Y\rightarrow X}$. If ${X}$ is a topological space, then you can recover the points of ${X}$ by considering all the maps from a singleton (i.e. point) ${\{x\} \rightarrow X}$. If you want to understand more about the topology, then you probe by some other spaces. Simple.

Even analysts use this idea all the time. A distribution ${\phi}$ (on ${\mathbb{R}}$) is not a well-defined function, so you can’t just tell whether or not two distributions are the same by looking at values. Instead you probe it by test functions ${\int \phi f dx}$. If these probes give you the same thing for all test functions, then the distributions are the same. This is all we are doing with our spaces above, and this is all the Yoneda lemma is saying. It says that if the maps (test functions) to ${X}$ and the maps to ${Y}$ are the same, then ${X}$ and ${Y}$ are the same.

We can fancy up the language now. Considering maps to ${X}$ is a functor ${Hom(-,X): Spaces^{op} \rightarrow Set}$. Such a functor is called a presheaf on the category of Spaces (recall, that for your particular situation this might be the category of smooth manifolds or metric spaces or algebraic varieties or …). Don’t be scared. This is literally the definition of presheaf, so if you were following to now, then introducing this term requires no new definitions.

The Yoneda lemma is saying something very simple in this fancy language. It says that there is a (fully faithful) embedding of Spaces into Pre(Spaces), the category of presheaves on Spaces. If we now work with this new category of functors, we just enlarge what we consider to be a space and this is of fundamental importance for many reasons. If ${X}$ is one of our old spaces, then we can just naturally identify it with the presheaf ${Hom(-,X)}$. The reason Mike Hopkins is giving for why this is important is very different from the one I’ll give which just goes to show how incredibly useful this idea is.

In every single branch of math people care about some sort of classification problem. Classify all elliptic curves. What are the vector bundles on my manifold? If I fix a vector bundle, what are the connections on my vector bundle? What are the Borel measures on my metric space? The list goes on forever.

In general, classification is a hugely impossible task to grapple with. We know a ton of stuff about smooth manifolds, but how can we leverage that to make the seemingly unrelated problem of classifying vector bundles more manageable? Here our insight comes to the rescue, because there is a way to write down a functor that outputs vector bundles. There is subtlety in writing it down properly (and we should now land in Grpds instead of Set so that we can identify isomorphic ones), but once we do this we get a presheaf. In other words, we make a (generalized) space whose points are the objects we are classifying.

In many situations you then go on to prove that this moduli space of vector bundles is actually one of the original types of spaces (or not too far from one) we know a lot about. Now our impossible task of understanding what the vector bundles on my manifold are is reduced to the already studied problem of understanding the geometry of a manifold itself!

Here is my challenge to any analyst who knows about measures. Warning, this could be totally ridiculous and nonsense because it is based on reading Wikipedia for 5 minutes. Construct a presheaf of real-valued Radon measures on ${\mathbb{R}}$. Analyze this “space”. If it was done right, you should somehow recover that the space is the dual space to the convex space, ${C_c(\mathbb{R})}$, of compactly supported real-valued functions on ${\mathbb{R}}$. Congratulations, you’ve just started a new branch of math in which you classify measures on a space by analyzing the topology/geometry of the associated presheaf.

## Bayesian Statistics Worked Example Part 2

Last time I decided my post was too long, so I cut some stuff out and now this post is fleshing those parts into their own post. Recall our setup. We perform an experiment of flippling a coin. Our data set consists of ${a}$ heads and ${b}$ tails. We want to run a Bayesian analysis to figure out whether or not the coin is biased. Our bias is a number between ${0}$ and ${1}$ which just indicates the expected proportion of times it will land on heads.

We found our situation was modeled by the beta distribution: ${P(\theta |a,b)=\beta(a,b)}$. I reiterate here a word of warning. ALL other sources will call this ${B(a+1, b+1)}$. I’ve just shifted by 1 for ease of notation. We saw last time that if our prior belief is that the probability distribution is ${\beta(x,y)}$, then our posterior belief should be ${\beta(x+a, y+b)}$. This simple “update rule” falls out purely from Bayes’ Theorem.

The main thing I didn’t explain last time was what exactly I meant by the phrase “we can say with 95% confidence that the true bias of the coin lies between ${0.40}$ and ${0.60}$” or whatever the particular numbers are that we get from our data. What I had in mind for that phrase was something called the highest density interval (HDI). The 95% HDI just means that it is an interval for which the area under the distribution is ${0.95}$ (i.e. an interval spanning 95% of the distribution) such that every point in the interval has a higher probability than any point outside of the interval (I apologize for such highly unprofessional pictures):

(It doesn’t look like it, but that is supposed to be perfectly symmetrical.)

The first is the correct way to make the interval, because notice all points on the curve over the shaded region are higher up (i.e. more probable) than points on the curve not in the region. There are lots of 95% intervals that are not HDI’s. The second is such a non-example, because even though the area under the curve is 0.95, the big purple point is not in the interval but is higher up than some of the points off to the left which are included in the interval.

Lastly, we will say that a hypothesized bias ${\theta_0}$ is credible if some small neighborhood of that value lies completely inside our 95% HDI. That small threshold is sometimes called the “region of practical equivalence (ROPE)” and is just a value we must set. If we set it to be 0.02, then we would say that the coin being fair is a credible hypothesis if the whole interval from 0.48 to 0.52 is inside the 95% HDI.

A note ahead of time, calculating the HDI for the beta distribution is actually kind of a mess because of the nature of the function. There is no closed form solution, so usually you can just look these things up in a table or approximate it somehow. Both the mean ${\mu=\frac{a}{a+b}}$ and the standard deviation ${\left(\frac{\mu(1-\mu)}{a+b+1}\right)^{1/2}}$ do have closed forms. Thus I’m going to approximate for the sake of this post using the “two standard deviations” rule that says that two standard deviations on either side of the mean is roughly 95%. Caution, if the distribution is highly skewed, for example ${\beta(3,25)}$ or something, then this approximation will actually be way off.

Let’s go back to the same examples from before and add in this new terminology to see how it works. Suppose we have absolutely no idea what the bias is and we make our prior belief ${\beta(0,0)}$ the flat line. This says that we believe ahead of time that all biases are equally likely. Now we observe ${3}$ heads and ${1}$ tails. Bayesian analysis tells us that our new distribution is ${\beta(3,1)}$. The 95% HDI in this case is approximately 0.49 to 0.84. Thus we can say with 95% certainty that the true bias is in this region. Note that it is NOT a credible hypothesis off of this data to guess that the coin is fair because 0.48 is not in HDI. This example really illustrates how choosing different thresholds can matter, because if we picked an interval of 0.01 rather than 0.02, then that guess would be credible!

Let’s see what happens if we use just an ever so slightly more reasonable prior. We’ll use ${\beta(2,2)}$. This gives us a starting assumption that the coin is probably fair, but it is still very open to whatever the data suggests. In this case our ${3}$ heads and ${1}$ tails tells us our posterior distribution is ${\beta(5,3)}$. In this case the 95% HDI is 0.45 to 0.75. Using the same data we get a little bit more narrow interval here, but more importantly we feel much more comfortable with the claim that the coin being fair is still a credible hypothesis.

This brings up a sort of “statistical uncertainty principle.” If we want a ton of certainty, then it forces our interval to get wider and wider. This makes intuitive sense, because if I want to give you a range that I’m 99.9999999% certain the true bias is in, then I better give you practically every possibility. If I want to pinpoint a precise spot for the bias, then I have to give up certainty (unless you’re in an extreme situation where the distribution is a really sharp spike or something). You’ll end up with something like: I can say with 1% certainty that the true bias is between 0.59999999 and 0.6000000001. We’ve locked onto a small range, but we’ve given up certainty. Note the similarity to the Heisenberg uncertainty principle which says the more precisely you know the momentum or position of a particle the less precisely you know the other.

Let’s wrap up by trying to pinpoint exactly where we needed to make choices for this statistical model. The most common objection to Bayesian models is that you can subjectively pick a prior to rig the model to get any answer you want. Hopefully this wrap up will show that in the abstract that objection is essentially correct, but in real life practice you cannot get away with this.

Step 1 was to write down the likelihood function ${P(\theta | a,b)=\beta(a,b)}$. This was derived directly from the type of data we were collecting and was not a choice. Step 2 was to determine our prior distribution. This was a choice, but a constrained one. In real life statistics you will probably have a lot of prior information that will go into this choice. Recall that the prior encodes both what we believe is likely to be true and how confident we are in that belief. Suppose you make a model to predict who will win an election based off of polling data. You have previous year’s data and that collected data has been tested, so you know how accurate it was! Thus forming your prior based on this information is a well-informed choice. Just because a choice is involved here doesn’t mean you can arbitrarily pick any prior you want to get any conclusion you want.

I can’t reiterate this enough. In our example, if you pick a prior of ${\beta(100,1)}$ with no reason to expect to coin is biased, then we have every right to reject your model as useless. Your prior must be informed and must be justified. If you can’t justify your prior, then you probably don’t have a good model. The choice of prior is a feature, not a bug. If a Bayesian model turns out to be much more accurate than all other models, then it probably came from the fact that prior knowledge was not being ignored. It is frustrating to see opponents of Bayesian statistics use the “arbitrariness of the prior” as a failure when it is exactly the opposite (see the picture at the end of this post for a humorous illustration.)

The last step is to set a ROPE to determine whether or not a particular hypothesis is credible. This merely rules out considering something right on the edge of the 95% HDI from being a credible guess. Admittedly, this step really is pretty arbitrary, but every statistical model has this problem. It isn’t unique to Bayesian statistics, and it isn’t typically a problem in real life. If something is so close to being outside of your HDI, then you’ll probably want more data. For example, if you are a scientist, then you re-run the experiment or you honestly admit that it seems possible to go either way.

## What is Bayesian Statistics: A basic worked example

I did a series on Bayes’ Theorem awhile ago and it gave us some nice heuristics on how a rational person ought to update their beliefs as new evidence comes in. The term “Bayesian statistics” gets thrown around a lot these days, so I thought I’d do a whole post just working through a single example in excruciating detail to show what is meant by this. If you understand this example, then you basically understand what Bayesian statistics is.

Problem: We run an experiment of flipping a coin ${N}$ times and record a ${1}$ every time it comes up heads and a ${0}$ every time it comes up tails. This gives us a data set. Using this data set and Bayes’ theorem, we want to figure out whether or not the coin is biased and how confident we are in that assertion.

Let’s get some technical stuff out of the way. This is the least important part to fully understand for this post, but is kind of necessary. Define ${\theta}$ to be the bias towards heads. This just means that if ${\theta=0.5}$, then the coin has no bias and is perfectly fair. If ${\theta=1}$, then the coin will never land on tails. If ${\theta = 0.75}$, then if we flip the coin a huge number of times we will see close to ${3}$ out of every ${4}$ flips lands on heads. For notation we’ll let ${y}$ be the trait of whether or not it lands on heads or tails (so it is ${0}$ or ${1}$).

We can encode this information mathematically by saying ${P(y=1|\theta)=\theta}$. In plain english: The probability that the coin lands on heads given that the bias towards heads is ${\theta}$ is ${\theta}$. Likewise, ${P(y=0|\theta)=1-\theta}$. Let’s just chain a bunch of these coin flips together now. Let ${a}$ be the event of seeing ${a}$ heads when flipping the coin ${N}$ times (I know, the double use of ${a}$ is horrifying there but the abuse makes notation easier later).

Since coin flips are independent we just multiply probabilities and hence ${P(a|\theta)=\theta^a(1-\theta)^{N-a}}$. Rather than lug around the total number ${N}$ and have that subtraction, normally people just let ${b}$ be the number of tails and write ${P(a,b |\theta)=\theta^a(1-\theta)^b}$. Let’s just do a quick sanity check to make sure this seems right. Note that if ${a,b\geq 1}$, then as the bias goes to zero the probability goes to zero. This is expected because we observed a heads (${a\geq 1}$), so it is highly unlikely to be totally biased towards tails. Likewise as ${\theta}$ gets near ${1}$ the probability goes to ${0}$, because we observed a tails.

The other special cases are when ${a=0}$ or ${b=0}$, and in these cases we just recover that the probability of getting heads a times in a row if the probability of heads is ${\theta}$ is ${\theta^a}$. Of course, the mean of ${\beta (a,b)}$ is ${a/(a+b)}$, the proportion of the number of heads observed. Moving on, we haven’t quite thought of this in the correct way yet, because in our introductory problem we have a fixed data set that we want to analyze. So from now on we should think about ${a}$ and ${b}$ being fixed from the data we observed.

The idea now is that as ${\theta}$ varies through ${[0,1]}$ we have a distribution ${P(a,b|\theta)}$. What we want to do is multiply this by the constant that makes it integrate to ${1}$ so we can think of it as a probability distribution. In fact, it has a name called the beta distribution (caution: the usual form is shifted from what I’m writing), so we’ll just write ${\beta(a,b)}$ for this (the number we multiply by is the inverse of ${B(a,b)=\int_0^1 \theta^a(1-\theta)^b d\theta}$ called the (shifted) beta function).

This might seem unnecessarily complicated to start thinking of this as a probability distribution in ${\theta}$, but it is actually exactly what we are looking for. Consider the following three examples:

The red one says if we observe ${2}$ heads and ${8}$ tails, then the probability that the coin has a bias towards tails is greater. The mean happens at ${0.20}$, but because we don’t have a lot of data there is still a pretty high probability of the true bias lying elsewhere. The middle one says if we observe 5 heads and 5 tails, then the most probable thing is that the bias is ${0.5}$, but again there is still a lot of room for error. If we do a ton of trials to get enough data to be more confident in our guess, then we see something like:

Already at observing 50 heads and 50 tails we can say with 95% confidence that the true bias lies between 0.40 and 0.60. Alright, you might be objecting at this point that this is just usual statistics, where the heck is Bayes’ Theorem? You’d be right. Bayes’ Theorem comes in because we aren’t building our statistical model in a vacuum. We have prior beliefs about what the bias is.

Let’s just write down Bayes’ Theorem in this case. We want to know the probability of the bias ${\theta}$ being some number given our observations in our data. We use the “continuous form” of Bayes’ Theorem:

$\displaystyle P(\theta|a,b)=\frac{P(a,b|\theta)P(\theta)}{\int_0^1 P(a,b|\theta)d\theta}$

I’m trying to give you a feel for Bayesian statistics, so I won’t work out in detail the simplification of this. Just note that the “posterior probability” (the left hand side of the equation), i.e. the distribution we get after taking into account our data is the likelihood times our prior beliefs divided by the evidence. Now if you use that the denominator is just the definition of ${B(a,b)}$ and work everything out it turns out to be another beta distribution!

If our prior belief is that the bias has distribution ${\beta(x,y)}$, then if our data has ${a}$ heads and ${b}$ tails we get ${P(\theta|a,b)=\beta(a+x, b+y)}$. The way we update our beliefs based on evidence in this model is incredibly simple. Now I want to sanity check that this makes sense again. Suppose we have absolutely no idea what the bias is and we make our prior belief ${\beta(0,0)}$ the flat line. This says that we believe ahead of time that all biases are equally likely.

Now we observe ${3}$ heads and ${1}$ tails. Bayesian analysis tells us that our new (posterior probability) distribution is ${\beta(3,1)}$:

Yikes! We don’t have a lot of certainty, but it looks like the bias is heavily towards heads. Danger: This is because we used a terrible prior. This is the real world so it isn’t reasonable to think that a bias of ${0.99}$ is just as likely as ${0.45}$. Let’s see what happens if we use just an ever so slightly more modest prior. We’ll use ${\beta(2,2)}$. This puts our assumption on it being most likely close to ${0.5}$, but it is still very open to whatever the data suggests. In this case our ${3}$ heads and ${1}$ tails tells us our updated belief is ${\beta(5,3)}$:

Ah. Much better. We see a slight bias coming from the fact that we observed ${3}$ heads and ${1}$ tails and these can’t totally be ignored, but our prior belief tames how much we let this sway our new beliefs. This is what makes Bayesian statistics so great. If we have tons of prior evidence of a hypothesis, then observing a few outliers shouldn’t make us change our minds. On the other hand, the setup allows for us to change our minds even if we are 99% certain on something as long as sufficient evidence is given. This is the mantra: extraordinary claims require extraordinary evidence.

Not only would a ton of evidence be able to persuade us that the coin bias is ${0.90}$, but we should need a ton of evidence. This is part of the shortcomings of non-Bayesian analysis. It would be much easier to become convinced of such a bias if we didn’t have a lot of data and we accidentally sampled some outliers.

Anyway. Now you should have an idea of Bayesian statistics. In fact, if you understood this example, then most of the rest is just adding parameters and using other distributions, so you actually have a really good idea of what is meant by that term now.

## In Defense of Gaming

It’s been over a month, so I decided to do a post that I’ve had in the bag for awhile, but don’t think adds anything to the discussion. This is what happens when you are taking classes, teaching classes, writing things up, and applying for jobs I guess.

Are video games art? What a bizarre question. It has been debated through the years, but I’m not sure there is anyone out there that has seriously thought about the question and is willing to defend that they are not. The debate seems over and the conclusion is that video games are art.

The one notable opposition is Roger Ebert, but his position boils down to a “no true Scotsman fallacy.” It is such a classic example that it should probably just start being used to illustrate what the fallacy is. He says games cannot be art. Then when shown a game that he admits is art he says, “But that isn’t a real game.” That would be like arguing novels cannot be art by just declaring that any novel that could be considered art is not a real novel. It is a silly argument that doesn’t need to be taken seriously.

First, we should notice that there is a “type error” (as a programmer would say) in the original question. No one would think “Are books art?” is a properly phrased question. What does that mean? If you find one book that is not art, then is the answer no? Do you merely need to give one book that is art to answer yes? The answer isn’t well-defined because “book” encompasses a whole class of objects: some of which are art and some of which are not.

For our purposes we’ll say a medium (like video games) “is art” if an artist can consistently use the medium to produce something that can be broadly recognized as art. This brings us to the difficult question of how to determine if something can be broadly recognized as art. Some things that come to mind are aesthetics/beauty, the ability to make a human being feel something, the ability to make someone think deeply about important questions, originality, and on and on we could go. Any given work of art could be missing any or all of these qualities, but if something exhibits enough these qualities, then we would probably have no problem calling it art.

In order to argue that games can be works of art, I’ll take two examples that are relatively recent from the “indie game” community. These are both games in a sense that even Ebert could not deny. I’ll stay away from controversial examples like Dear Esther or Proteus (which are undeniably works of art but more questionable about being games).

The first is Bastion. The art direction and world that has been constructed is a staggering work of beauty on its own. Remove everything about this game except just exploring this universe and I think you would find many people totally engrossed in the experience:

We already have check mark one down. But there’s more! The music is fantastic as well. But let’s get to what really sets this game apart as a work of art. The story is fantastic and is mostly told with great voice acting through a narrator. I won’t spoil the ending in its totality, but I’m about to give away a major plot point near the end.

Your good friend betrays you and comes close to destroying everything (literally the whole world) in the middle of the game. It hurts. Then near the end he is going to die and you have the choice to save him. The game branches and you can either keep your weapons and safely fight your way to the end of the game, or you can carry this traitor through a dangerous area possibly sacrificing your own life for him.

Books and movies can’t do this. You have to make this choice and it affects how the story progresses. It reveals to you what type of human you are. You have to live with the consequences of this choice. If you save him, then you slowly walk through an area where your enemies shoot you from afar and there is nothing you can do. When they realize what you’re doing they stop in awe and just solemnly let you pass. The visuals plus the music plus the dramatic climax of this moment brings many people to tears.

I know this because you can just search discussion boards on the game. Gaming discussion boards are notorious for being misogynistic and full of masculine one-up-manship. No one makes fun of the people who say it brought them to tears and usually there will be a bunch of other people admitting the same. If this sort of emotional connection isn’t art, then I don’t know what is. Not only that, but this type of connection can only really happen through games where you are wholly invested because you’ve made these decisions.

Maybe Bastion isn’t your thing, because it is a “gamer’s game” with a bit of a barrier to entry since it involves experience points, weapons, items, leveling up, and real-time fighting of monsters and bosses. That could be a bit much for the uninitiated. We’ll move on to a game that every person, regardless of gaming experience, can play and really see how elegantly simple an “art game” can be.

Thomas Was Alone is extremely simple. Thomas is a rectangle. You move him to a rectangular door. End of level. The game is in a genre called a “puzzle platformer.” As the levels progress you get different sized rectangles to move and moving and jumping in various orders will help you get to the end. This is the “puzzle” aspect, because you have to figure out the correct order to do things otherwise you’ll get stuck.

Why is this art? Well, why is writing a book about some animals on a farm art? Because it isn’t really about animals on a farm. The same is true here. The game is a huge metaphor. A deeply moving one at that. I consistently had to stop playing at parts because of how overwhelmed with the concept I became when I allowed myself to think about it.

Just like Bastion, this game is truly magnificent visually. The style is opposite. It has minimalism and simplicity as the guiding aesthetic virtue:

The music is perfect for the mood, and the narration which tells the story is beyond superb. You grow attached to these rectangles which have such nuanced personalities. What is the metaphor? Well, there are all these obstacles in your way, and you can’t get past them without working together. The whole idea is that there are seemingly impossible obstacles in life, but when humans cooperate and work together they can get past them.

The thing that makes the game so moving at parts is that your rectangle friends are so humanly flawed. They get upset at each other for such petty reasons. They have crushes on each other. They hate each other. But in the end they overcome those differences to work together and accomplish great things. If you haven’t experienced it, then this probably sounds totally absurd.

Again from discussion forums, I quote, “I just finished the game and a group of coloured quadrilaterals made me cry.” Or “Everything about this game makes me feel incredible. I feel as if I can achieve things I could never think of being. This is the best thing I could have experienced, and it’s worth everything…This game makes you love and cry over shapes.” When people have these reactions, that is without question the definition of art.

I think we’ve firmly established that games can be art. I thought I’d just bring up a few cultural tidbits right at the end here. Some famous art galleries across the world have started to recognize the importance of including works of art in their collection that happen to be games. MoMA (the Museum of Modern Art in NY) has a collection of 14 games in its collection currently. Paris had an exhibit that included Fez. The Smithsonian American Art Museum had one last year. There have been many others too.

I’ll try to wrap up now. If you’re the type of person that reads literary novels and goes to the symphony because you think experiencing art is an important and enriching experience, then you probably also write off video games as a mindless waste of time. This is partially warranted because so many of the most popular games today are mindless wastes of time (just like most popular music and movies are too).

I hope that after this maybe your mind has changed a little. If you are willing to make time in your schedule to read a book or go to an art gallery, then I’d argue that you should also be willing to make time in your schedule to experience great games. The medium has all the same artistic qualities as a great film, but has added value given by the interactivity you have with the medium.

## Composer Hidden Gem 1: Egon Wellesz

I said I wanted to do more “hidden gem” posts, and I’m totally stuck on coming up with a good math series to do (suggestions?). Egon Wellesz was an early twentieth century composer whose music seems to have been totally forgotten. He was also a musicologist, and I think that work survives as quite important (don’t quote me on that, I no nothing of the field).

It is a tragedy no one has heard of him. He was one of many students of the famous Arnold Schoenberg, who was essentially the father of twelve-tone composition. I’ve tried to look stuff up on Wellesz with little success. So bear with me as I expound my theory on why he isn’t better known based on absolutely no evidence or facts.

A lot (though not all) of Wellesz’s earlier works turn away from the atonality of “serious” music of the time. Look at Schoenberg’s students who are household names: Berg, Webern, Cage, …. These people all went on to make names for themselves by pushing the avant-garde envelope further through new compositional techniques involving twelve-tone methods or by moving past the use of 12 notes altogether.

Wellesz on the other hand decided to return to something that might be called neo-classicism. Works such as String Quartet No. 3 (op. 25) and Symphony No. 3 (op. 68) have a very strong sense of melody. These pieces are almost programmatic. The mood and melody are fantastically developed (he does still use the classical sonata form for the first movements, so what do you expect?), so much so that one could almost imagine these works as scores to a movie.

As one might imagine, the academic atmosphere of the time probably didn’t take these compositions very seriously (again, my totally made up theory which is mostly contradicted by the fame of people such as Ralph Vaughan Williams). My guess is that this is why he isn’t as famous as Schoenberg’s other students who continued along the post-tonal path.

I have to be very careful here to not give the impression that he is purely a neo-classicist though. A lot of his later works, such as the sixth and seventh symphonies draw strongly on his atonal roots. Just by listening I can’t tell if it is true serialized twelve-tone composition, but it sounds like it. Somewhat surprisingly, these later symphonies still have that earlier beautiful development. Instead of melodic development it is motivic. Despite their atonal bent, these later symphonies still have a grandiose quality to them. It sounds as if Mahler wrote atonal symphonies. These are well worth checking out (I think I like the sixth the best).

Here’s what I like so much about his music. He walks a fine line to obtain a great balance between popular, easily accessible “instant gratification” music and very academic style music. His music doesn’t sound like Hindemith or Bartok, but I think he probably had a similar philosophy as them. He takes traditional compositional technique such as strict counterpoint, but then expands it into the modern realm. He isn’t afraid to use all sorts of modern, interesting tonality and melody while employing these old techniques.

I think too often you have neo-classicists use these techniques as they were intended and ignore all modern advances in music, but then on the other hand have serialists (or some other post-tonal school) that totally ignores that older technique can be used effectively in this setting. Like I said, Wellesz isn’t the only one to do this, but he does it particularly well in my opinion.

He is also very good at tugging on those heartstrings. I think this is in part due to his great orchestration skills (I already compared him to Mahler), but it is also due to his melodic sensibilities which I think is an under-appreciated skill these days. I hear a lot of Vaughan Williams in his tonal slow movements (they were contemporaries and both British, so it is hard to say who influenced who here).

His melodic sensibilities make for an interesting effect in his less tonal works, because some slow movements are still beautiful and emotional even though none of the traditional harmonic motion is being used to create these effects.

Anyway, hopefully that is enough to get you interested in this less well-known composer. Here’s a sample:

## Some Thoughts on Lethem’s The Ecstasy of Influence

I recently learned that some Barnes and Noble has an “essay” section. This will be my downfall. I was glancing through it and stumbled upon something that sounded fascinating. If you’ve been reading this blog for any significant amount of time, then you’ll know that influence is a topic that is endlessly fascinating to me.

I’ve talked about the importance of expanding your influences in Literature, Originality, Influence, and the Anxiety Thereof. Of course, I referenced Barth’s essay “The Literature of Exhaustion” in it and Bloom’s The Anxiety of Influence. I bring these two things up a lot, actually. Even this horrible post back in 2008 shows this topic has been kicking around for awhile.

Anyway, so I was glancing through this essay section, and saw a book titled The Ecstasy of Influence. How could I not check it out? This book is a must read for anyone as obsessed with this topic as I am. There isn’t much new, but it feels good to read about a successful author grappling with these issues in a real life context. If these issues are boring to you, then stay far away from this book (unless you’re having trouble falling asleep or something).

A truly bizarre thing happened while reading the book. The new Harper’s came (I wrote my last post on the Nicholson Baker piece from it), but another major piece in the magazine was Franzen’s essay “A Different Kind of Father.” It is basically about dwelling on influences and trying to determine who his major influences (i.e. literary father) would be. The coincidence was made a little creepy when he started talking about Pynchon and literary coincidences turning into conspiracy theories when I had been thinking what a strange coincidence it was that this article appeared right when starting the Lethem book on the same topic.

Back to The Ecstasy of Influence (clearly a play on Bloom’s title The Anxiety of Influence). It is broken up into parts based on themes. Each part has a few chapters which are either short memoirs, essays, or even short stories with analysis. The fact that it isn’t all essay keeps the flow going nicely. I was really excited when in the Preface he had already mentioned John Ashbery, John Barth, David Foster Wallace, and Don DeLillo. These are all people I write about a lot. The book is practically my blog if you throw out the math. OK. Not really. But it was sort of feeling that way from the Preface.

The second chapter is all about postmodernism in SF (speculative fiction). I was delighted to find that Lethem makes almost the exact same argument in one of the essays that I made in the first post I referenced above. Roughly that people writing SF should be familiar with all these modern day trends like postmodernism so that they can incorporate the techniques into their works to create much more effective literature. Or maybe not. Your choice. But if you aren’t familiar with these techniques then you can’t make the choice. You’re limited by what you know. He also does an interesting deconstruction of Philip K. Dick and his influences (warning: it is an essay from his youth and he is sort of embarrassed by it now).

The flashy highlight of the book so far was the title essay. It first appeared in Harper’s (is there a question in anyone’s mind why I subscribe to this magazine anymore?), and although it focuses much more on the plagiarism aspects of influence it is still incredibly well-done. You learn at the end that the entire essay is made up of quotes from other people that he tied together to make one coherent (original?) essay.

To wrap up, the book is great so far. It brings up all these difficult issues in all sorts of ways. Sometimes he uses fun anecdotes and other times serious essays. They are always very readable. The main issues addressed so far (if you haven’t caught on yet) have to do with the following:

What is meant by originality? To be taken seriously as an artist do your influences have to be (in)visible? If you copy someone else too much are you unoriginal? How possible is it to cut ties from all people before you? Is this even a desirable thing to try? Should you be embarrassed or flattered when people compare your work to someone you admire? Where is the line between imitation and plagiarism? And so on.

My favorite quote so far is an interesting definition of postmodernism (in literature). Lethem is talking about Eliot’s The Waste Land and how the excessive notes in it seems to define modernism in terms of its anxiety of influence contamination. “Taken from this angle, what exactly is postmodernism, except modernism without the anxiety?” Of course, Lethem was just quoting someone else at that point …

## Thoughts on Nicholson Baker’s Case Against Algebra II

The debate over standards in high school math has been going on for a very long time, but things seemed to come to a pretty nasty head last year when the New York Times ran the article Is Algebra Necessary? Bloggers and educators were outraged on both sides and started throwing mud. In the most recent issue of Harper’s (Sept 2013), Nicholson Baker wrote an essay basically reiterating the arguments from the NYT’s piece and responding to some of the criticisms.

I’ve been trying to stay out of this, because I honestly have no idea what high school is designed to do. The real argument here doesn’t seem to be whether or not algebra is “useful in the real world,” but rather about whether or not we should force students to learn things in high school that they are not interested in. Is the purpose of high school to teach students the basics in a broad range of topics so that they have some fundamental skills that will allow them to choose a career from there? Is the purpose to allow students to learn topics that are of interest to them? Something else?

I don’t know, and it is impossible to participate in this debate without clearly defining first what you think the purpose of making students go to high school is (of course, the arguments are muddied by the fact that no one actually defines this first).

Here is Baker’s main argument in a nutshell (he is a fantastic writer, so you should read the full thing yourself if this interests you). Algebra (II) is unnecessary for most people, i.e. the 70% of the population that do not go into a STEM field. It causes excessive stress and failure for basically no reason. Why not just have some survey course in ninth grade where some great ideas of math throughout history are presented and then have all future math courses be electives?

I assume for consistency this means that since English, foreign languages, history, and all other subjects taught in high school are also not directly applicable to most people’s daily lives that basically you’ll do ninth grade as a taste of what the subjects are about through survey courses, and then literally everything is an elective afterwards.

Honestly, I agree with Baker that this would probably make high school a lot more enjoyable and useful for everyone. A lot more learning would take place as well. It just boils back down to what you think the purpose of high school should be, and since I don’t know, I can’t say whether or not this is what should be done.

Here’s two thoughts I had that don’t seem to be raised in the main discussion.

1. How do you know whether or not taking algebra will be useful to you? Having core standards in some sense protects the high school student who isn’t equipped to make this type of decision from making a really bad decision. I’ll just give an anecdote about my own experience as someone who really loved all forms of learning and went into math and who still made a really bad decision when given a choice of electives.

When I was going into my senior year of high school, I knew I wanted to be a composer. I knew this so confidently that despite being advised against it, I decided to not take physics since it was an elective. My reasoning was that I would never, ever need it for my future career as a composer. Let’s ignore the fact that I didn’t realize that understanding the physics of sound is an extremely important skill for a composer to have and so made a poor decision for that reason. Let’s assume that physics really was useless for my intended career.

After my first year of undergrad I switched to a math major. I really regretted not taking physics at that point and ended up loving physics in college so much that I minored in it. Here’s the point. Almost no one in high school knows what they are going to do. So how in the world are the going to know if algebra is necessary for their career? Even if they know what they are going to do, they could still end up mistakenly thinking that it is unnecessary.

My guess is that if we switch to a system where practically everything is an elective, then when people get to college and their interests change they won’t have the basic skills to succeed. They’ll have to fill in this lacking knowledge on their own, because math departments definitely cannot offer more remedial classes. We have so many students and classes as it is we can barely find enough people to teach them all.

2. This seems much ado about nothing. What I’m about to say might seem harsh, but algebra II is not that hard. You don’t have to be good at it. You don’t have to like it. But it isn’t a good sign if you can’t at the very least pass it. Baker himself points out that Cardano in the 1500′s was able to do this stuff. Since then we’ve come up with much easier and better ways to think about it. The abstraction level is just not that high. We’re not talking about quantum mechanics or something. Students in other cultures don’t seem to struggle in the same way, and I don’t think we’re inherently dumber or anything.

Depending on where you look, 30%-50% of students fail algebra II. Let’s say it is closer to 30 because a large number of this statistic does not take into account that there are lazy/rebellious/apatethic/whatever students who can easily handle the abstraction, but just don’t put any work in and fail for that reason. I’d imagine the number of people who try really hard and still fail is pretty low (maybe 20% or less? I’m just making stuff up at this point, but probably way less if you count people who never pass it).

Is it too insensitive and politically incorrect for me to say that someone who can’t handle this level of abstraction probably isn’t cut out for college in any subject? Is college for everyone? I can’t remember what the proper response is to this anymore. What if the number who never pass is around 5%? Is saying this 5% isn’t cut out for college still too much? Sure, give them a high school diploma if they can’t do it, but college may not be the best fit. It seems a good litmus test.

What major won’t require abstraction at least at the level of algebra II? STEM is out. English? Definitely out, unless you somehow avoid all literary theory. Business? Most business degrees require some form of calculus. Music? I hope you can somehow get out of your post-tonal theory classes. History? There has been a recent surge of Bayesian methods in historical methods.

I guess the point is that if a high school diploma is meant to indicate some level of readiness for college, then algebra is probably a good indicator. This does not mean that you will use it, but will just point out that you have some ability to do some abstract things. I’m not saying it is the only way to test this, but it is probably a pretty good one.

Again, if a high school diploma isn’t meant to indicate readiness for college, then who cares what you do?

*Cringes and waits for backlash*