Year of Short Fiction Part 6: Cosmicomics

I’ve sort of been dreading this one, but it’s the only thing remaining on my short fiction list that I own. Three years ago I wrote up my interpretation of Italo Calvino’s If on a winter’s night a traveler. Calvino can be strange and highly symbolic, but that book’s meaning jumped out at me with little effort. He had constructed a condensed history of critical theory through the story.

I had a vague familiarity with Cosmicomics, so I knew it would be harder. The stories all feature or are told by a character named Qfwfq. Each story starts with a tidbit of science such as:

Situated in the external zone of the Milky Way, the Sun takes about two hundred million years to make a complete revolution of the galaxy.

The story that follows is usually related to this somehow. The collection as a whole can be read as a symbolic retelling of the history of the universe. Calvino has taken real science and created mythologies that actually fit the data.

But it’s more than that. The stories often have a moral to them or a symbolic quality. They aren’t just fictionalizations of the events of the early universe. They’re almost parables like classic mythology. He’s achieved something odd with these.

The collection came out in 1965, fairly early in Calvino’s career, and well before the highly experimental If on a winter’s night a traveler. Calvino believed realism to be dead, and these stories mark his foray into a new type of fiction. He held on to pieces of realism but incorporated highly fantastical elements.

That’s enough of an overview, let’s dig into my favorite story to see these elements at work. “All at One Point” is a story about the Big Bang. More specifically, it’s about the time when the universe existed in a single point.

The beginning of the story comically plays with the idea that “we were all there.” On a scientific level, this is obviously true. Every atom in the universe existed in the singular point “before” the Big Bang. This includes every atom in our bodies, so we were physically there.

Calvino cleverly takes this statement to its extreme form and personifies us as actually existing at one point. The narrator, Qfwfq, says, “…having somebody unpleasant like Mr Pber^t Pber^t underfoot all the time is the most irritating thing.”

The story spends quite a bit of time in a Flatland-type thought experiment. Through humorous interactions, Calvino teases apart a lot of odd ideas about what it actually would mean to collapse the universe to a single point. For example, one couldn’t count how many people were there, because that would require pulling apart, no matter how slightly.

One family, the Z’zu, got labelled “immigrants.” This, of course, makes no sense, because there is no such thing as outside or inside the point. There is no such thing as before or after the point. Time only started at the Big Bang. So the family couldn’t have come from somewhere else.

The humor in this surface-level reading of the story is already worth it, and I won’t spoil any of the other awkward moments shared by these people from all occupying the same point.

Then the story turns its attention to Mrs Ph(i)Nk_o. She is one of the Z’zu, the family everyone hated. But she’s different. She is pure happiness and joy, and no one can say anything bad about her.

In an act of epic generosity, despite what people say about her family, she says:

Oh, if I only had some room, how I’d like to make some tagliatelle for you boys!

That’s what causes the Big Bang. The universe is made and expands and the Sun and planets and everything. It all happened because of a true act of selflessness and love. The phrasing of the final paragraph is very moving. I won’t quote it here, because I think it must be read in context to be appreciated.

The theme, when condensed to a pithy phrase, is something like “love can make universes.” It sounds really cliche and cheesy, and I think this is one of the things that makes these stories so brilliant. In the moment of reading, they feel profound and fresh.

Calvino’s use of vivid space imagery takes you on a grand journey. These cliche themes are the same that one can find in all the great ancient stories. They only feel tired when done in modern stories. By creating his own mythology, Calvino is able to revisit these sorts of themes without embarrassment.

For the Year of Short Fiction, I do want to return to the question of: why short? In other words, does great short fiction have a genuine uniqueness to it, or is it essentially the same as a novel, just shorter?

I think here we can definitively say that this type of writing can only work in short stories. Even expanding one of these to a novella length would be too much. These stories each revolve around a conceit and a theme. The conceit would grow tiresome if done for too long. I cannot imagine a novella of jokes about everyone existing on top of each other. They would lose their impact.

What excites me about Cosmicomics is that this is the first thing I’ve read this year that I feel this way about. I could imagine the novellas I’ve read and even Cthulhu working as full novels. They wouldn’t be as tightly written, but they’d still work. The very nature of Cosmicomics is that they are short stories. I’m glad to have finally found this.

I should stipulate, though, that one can read the entire collection of stories as a novel: an autobiography of Qfwfq’s life and fictionalization of the history of the universe. This is also an interesting and unique aspect, because almost every short story collection I can think of has separate, unrelated stories. This full collection should be read together to get the best experience.

Become a Patron!

I’ve come to a crossroads recently.

I write a blog post every week. It takes time. The last one was close to 2,000 words and required reading a book. For the past three years I’ve been writing full time, and so blogging can be a burden that cuts into this with no monetary rewards.

This blog is now over nine years old, and I’ve done nothing to monetize it. I think this is mostly a good thing. I do not and will not run any sort of advertisements. Even upon the release of my first book, I only did a brief mention and then no promotion afterward (and as far as I can tell, this converted to literally 0 sales).

I want this to be about the blog content. I do not want it to turn into some secret ad campaign to sell my work. I can think of many authors who have done this, and I ended up unsubscribing from them.

This brings me to the point. Putting this much work into something is not really sustainable anymore without some sort of support, so I’ve started a Patreon page. As you’ll see, my initial goal is quite modest and will barely cover the expenses to run my blog and website. But without anything, I will slowly phase out writing here regularly.

If this concept is new to you, Patreon is a site dedicated to supporting creative work. Patrons can pledge money to support people creating content they like. It can be as little as $1 a month (or as many podcasters say: “less than a coffee a month”), and in return, you not only help the site to keep running, you’ll receive bonus content as well.

Because of the scattered nature of my posts, I know a lot of you are probably scared to support, because you might not get content of interest for the month. Some of you like the math and tune out for the writing advice. Some of you like the critical analysis of philosophy and wish the articles on game mechanics didn’t exist.

For consistency, I’ll only put out something that would be tagged “literature” for the vast majority of posts from now on. This means once a month or less and probably never two months in a row (i.e. six per year spread out equally). This “literature” tag includes, but is not limited to, most posts on philosophy that touch on narrative or language somehow, editing rules, writing advice, book reviews, story structure analysis, examining pro’s prose, movie reviews, and so on.

Again, the core original vision for the blog included game and music and math posts, but these will be intentionally fewer now. If you check the past few years, I basically already did this anyway, but this way you know what you’re signing up for.

I think people are drawn to my literature analysis because I’m in a unique position. This month I’m about to submit my fifth romance novel under a pseudonym. This is the “commercial” work I do for money, and it’s going reasonably well. I’ve come to understand the ins and outs of genre fiction through this experience, and it has been a valuable part of learning the craft of writing for me.

My main work under my real name is much more literary. I’ve put out one novel of literary fiction. Next month I’ll put out my second “real” novel, which is firmly in the fantasy genre but hopefully doesn’t give up high-quality prose.

These two opposite experiences have given me an eye for what makes story work and what makes prose work. All over this blog I’ve shown that I love experimental writing, but I’ve also been one of the few people to unapologetically call out BS where I see it.

As you can imagine, writing several genre novels and a “real” novel every year makes it tough to justify this weekly blog for the fun of it.

If I haven’t convinced you that the quality here is worth supporting, I’ll give you one last tidbit. I get to see incoming links thanks to WordPress, so I know that more than one graduate seminar and MFA program has linked to various posts I’ve made on critical theory and difficult literature. Since I’m not in those classes, I can’t be sure of the purpose, but graduate programs tend to only suggest reading things that are worth reading. There just isn’t enough time for anything else.

I know, I know. Print is dead. You’d rather support people making podcasts or videos, but writing is the easiest way to get my ideas across. I listen to plenty of podcasts on writing, but none of them get to dig into things like prose style. The format isn’t conducive to it. One needs to see the text under analysis to really get the commentary on it.

Don’t panic. I won’t decrease blog production through the end of 2017, but I’m setting an initial goal of $100 per month. We’ll go from there, because even that might not be a sustainable level long-term. If it isn’t met, I’ll have to adjust accordingly. It’s just one of those unfortunate business decisions. Sometimes firing someone is the right move, even if they’re your friend.

I’ve set up a bunch supporter rewards, and I think anyone interested in the blog will find them well worth it. I’m being far more generous than most Patreon pages making similar content. Check out the page for details. The rewards involve seeing me put into practice what I talk about with video of me editing a current project with live commentary; extra fiction I write for free; free copies of my novels; extra “Examining Pro’s Prose” articles; and more!

I hope you find the content here worth supporting (I’m bracing myself for the humiliation of getting $2 a month and knowing it’s from my parents). If you don’t feel you can support the blog, feel free to continue reading and commenting for free. The community here has always been excellent.

Mathematical Reason for Uncertainty in Quantum Mechanics

Today I’d like to give a fairly simple account of why Uncertainty Principles exist in quantum mechanics. I thought I already did this post, but I can’t find it now. I often see in movies and sci-fi books (not to mention real-life discussions) a misunderstanding about what uncertainty means. Recall the classic form that says that we cannot know the exact momentum and position of a particle simultaneously.

First, I like this phrasing a bit better than a common alternative: we cannot measure perfectly the momentum and position simultaneously. Although, I guess this is technically true, it has a different flavor. This makes it sound like we don’t have good enough measuring equipment. Maybe in a hundred years our tools will get better, and we will be able to make more precise measurements to do both at once. This is wrong, and completely misunderstands the principle.

Even from a theoretical perspective, we cannot “know.” There are issues with that word as well. In some sense, the uncertainty principle should say that it makes no sense to ask for the momentum and position of a particle (although this again is misleading because we know the precise amount of uncertainty in attempting to do this).

It is like asking: Is blue hard or is blue soft? It doesn’t make sense to ask for the hardness property of a color. To drive the point home, it is even a mathematical impossibility, not just some physical one. You cannot ever write down an equation (a wavefunction for a particle) that has a precise momentum and position at the same time.

Here’s the formalism that lets this fall out easily. To each observable quantity (for example momentum and position) there exists a Hermition operator. If you haven’t seen this before, then don’t worry. The only fact we need about this is that “knowing” or “measuring” or “being in” a certain observable state corresponds to the wavefunction of the particle being an eigenfunction for this operator.

Suppose we have two operators {A} and {B} corresponding to observable quantities {a} and {b}, and it makes sense to say that {\Psi} we can simultaneously measure properties {a} and {b}. This means there are two number {\lambda_1} and {\lambda_2} such that {A\Psi = \lambda_1 \Psi} and {B\Psi = \lambda_2 \Psi}. That is the definition of being an eigenfunction.

This means that the commutator applied to {\Psi} has the property

{[A,B] = AB\Psi - BA\Psi = A\lambda_2 \Psi - B \lambda_1 \Psi = \lambda_2\lambda_1 \Psi - \lambda_1\lambda_2 \Psi = 0}.

Mathematically speaking, a particle that is in a state for which it makes sense to talk about having two definite observable quantities attached must be described by a wavefunction in the kernel of the commutator. Therefore, it never makes sense to ask for both if the commutator has no kernel. This is our proof. All we must do is compute the commutator of the momentum and position operator and see that it has no kernel (except for the 0 function which doesn’t correspond to a legitimate wavefunction).

You could check wikipedia or something, but the position operator is given by {\widehat{x}f= xf} and the momentum is given by {\widehat{p}f=-i\hbar f_x}.

Thus,

\displaystyle \begin{array}{rcl} [\widehat{x}, \widehat{p}]f & = & -ix\hbar f_x + i\hbar \frac{d}{dx}(xf) \\ & = & -i\hbar (xf_x - f - xf_x) \\ & = & i\hbar f \end{array}

This shows that the commutator is a constant times the identity operator. It has no kernel, and therefore makes no sense to ask for a definite position and momentum of a particle simultaneously. There isn’t even some crazy, abstract purely theoretical construction that can have that property. This also shows that we can have all sorts of other uncertainty principles by checking other operators.

Statistical Oddities 5: Sequential Testing

Our next decision theory post is going to be on how to rephrase hypothesis testing in terms of Bayesian decision theory. We already saw in our last statistical oddities post that {p}-values can cause some problems if you are not careful. This oddity makes the situation even worse. We’ll show that if you use a classical null hypothesis significance test (NHST) even at {p=0.05} and your experimental design is to check significance after each iteration of a sample, then as the sample size increases, you will falsely reject the hypothesis more and more.

I’ll reiterate that this is more of an experimental design flaw than a statistical problem, so a careful statistician will not run into the problem. On the other hand, lots of scientists are not careful statisticians and do make these mistakes. These mistakes don’t exist in the Bayesian framework (advertisement for the next post). I also want to reiterate that the oddity is not that you sometimes falsely reject hypotheses (this is obviously going to happen, since we are dealing with a degree of randomness). The oddity is that as the sample size grows, your false rejection rate will tend to 100% ! Usually people think that a higher sample size will protect them, but in this case it exacerbates the problem.

To avoid offending people, let’s assume you are a freshmen in college and you go to your very first physics lab. Of course, it will be to let a ball drop. You measure how long it takes to drop at various heights. You want to determine whether or not the acceleration due to gravity is really 9.8. You took a statistics class in high school, so you recall that you can run a NHST at the {p=0.05} level and impress your professor with this knowledge. Unfortunately, you haven’t quite grasped experimental methodology, so you rerun your NHST after each trial of dropping the ball.

When you see {p<0.05} you get excited because you can safely reject the hypothesis! This happens and you turn in a lab write-up claiming that with greater than {95\%} certainty the true acceleration due to gravity is NOT {9.8}. Let's make the nicest assumptions possible and see that it was still likely for you to reach that conclusion. Assume {g=9.8} exactly. Also, assume that your measurements are pretty good and hence form a normal distribution with mean {9.8}. I wrote the following code to simulate exactly that:

import random
import numpy as np
import pylab
from scipy import stats

#Generate normal sample
def norm():
    return random.normalvariate(9.8,1)

#Run the experiment, return 1 if falsely rejects and 0 else
def experiment(num_samples, p_val):
    x = []
    
    #One by one we append an observation to our list
    for i in xrange(num_samples):
        x.append(norm())
        
        #Run a t-test at p_val significance to see if we reject the hypothesis
        t,p = stats.ttest_1samp(x, 9.8)
        if p < p_val:
            return 1
    return 0

#Check the proportion of falsely rejecting at various sample sizes
rej_proportion = []
for j in xrange(10):
    f_rej = 0
    for i in xrange(5000):
        f_rej += experiment(10*j+1, 0.05)
    rej_proportion.append(float(f_rej)/5000)

#Plot the results
axis = [10*j+1 for j in xrange(10)]
pylab.plot(axis, rej_proportion)
pylab.title('Proportion of Falsely Rejecting the Hypothesis')
pylab.xlabel('Sample Size')
pylab.ylabel('Proportion')
pylab.show() 

What is this producing? On the first run of the experiment, what is the probability that you reject the null hypothesis? Basically {0}, because the test knows that this isn't enough data to make a firm conclusion. If you run the experiment 10 times, what is the probability that at some point you reject the null hypothesis? It has gone up a bit. On and on this goes up to 100 trials where you have nearly a 40% chance of rejecting the null hypothesis using this method. This should make you uncomfortable, because this is ideal data where the mean really is 9.8 exactly! This isn't coming from imprecise measurements or something.

The trend will actually continue, but already because of the so-called {n+1} problem in programming this was taking a while to run, so I cut it off. As you accumulate more and more experiments, you will be more and more likely to reject the hypothesis:

falsereject

Actually, if you think about this carefully it isn’t so surprising. The fault is that you recheck whether or not to reject after each sample. Recall that the {p}-value tells you how likely it is to see these results by random chance supposing the hypothesis is false. But the value is not {0} which means with enough trials you’ll get the wrong thing. If you have a sample size of {100} and you recheck your NHST after each sample is added, then you give yourself 100 chances to see this randomness manifest rather than checking once with all {100} data points. As your sample size increases, you give yourself more and more chances to see the randomness and hence as your sample goes to infinity your probability of falsely rejecting the hypothesis tends to {1}.

We can modify the above code to just track the p-value over a single 1000 sample experiment (the word “trial” in the title was meant to indicate dropping a ball in the physics experiment). This shows that if you cut your experiment off almost anywhere and run your NHST, then you would not reject the hypothesis. It is only because you incorrectly tracked the p-value until it dipped below 0.05 that a mistake was made:

pvals

Gauss’ Law

Since my blog claims to talk about physics sometimes and I just finished teaching multivariable calculus, I thought I’d do a post on one form of Gauss’ law. As a teacher of the course, I found this to be an astonishingly beautiful “application” of the divergence theorem. It turned out to be a touch too difficult for my students (and I vaguely recall being extremely confused about this when I took the class myself).

First, I’ll remind you what some of this stuff is if you haven’t thought about these concepts for awhile. Let’s work in {\mathbb{R}^3} for simplicity. Consider some subset {U\subset \mathbb{R}^3}. Let {F: U\rightarrow \mathbb{R}^3} be a vector field. Mathematically this is just assigning a vector to each point of {U}. For calculus we usually put some fairly restrictive conditions on {F}, such as all partial derivatives exist and are continuous.

The above situation is ubiquitous in classical physics. The vector field could be the gravitational field or the electric field or it could describe velocity of a flowing fluid or … One key quantity you might want to know about your field is what is the flux of the field through a given surface {S}? This measures the net change of the field flowing through the surface. If {S} is just a sphere, then it is easy to visualize the flux as the amount leaving the sphere minus the amount flowing in.

Let’s suppose {S} is a smooth surface bounding a solid volume {E} (e.g. the sphere bounding the solid ball). In this case we have a well-defined “outward normal” direction. Define {\mathbf{n}} to be the unit vector field in this direction at all points of {S}. Just by definition the flux of {F} through {S} must be “adding up” the values of {F\cdot \mathbf{n}} over {S}, because this dot product just tells us how much {F} is pointing in the outward direction.

Thus we define the flux (using Stewart’s notation) to be:

\displaystyle \iint_S F\cdot d\mathbf{S} := \iint_S F\cdot \mathbf{n} \,dS

Note the second integral is integrating a scalar valued function with respect to surface area “dS.” Now recall that the divergence theorem says that in our situation (given that {F} extends to a vector field on an open set containing {E}) we can calculate this rather tedious surface integral by converting it to a usual triple integral:

\displaystyle \iint_S F\cdot d\mathbf{S} = \iiint_E div(F) \,dV

If you’re advanced, then of course you could just work this out as a special case of Stoke’s theorem using the musical isomorphisms and so on. Let’s now return to our original problem. Suppose I have a charge {Q} inside some surface {S} and I want to compute the flux of the associated electric field through {S}.

From my given information this would seem absolutely impossible. If {S} can be anything, and {Q} can be located anywhere inside, then of course there are just way too many variables to come up with a reasonably succinct answer. Surprisingly, Gauss’ law tells us that no matter what {S} is and where {Q} is located, the answer is always the same, and it is just a quick application of the divergence theorem to prove it.

First, let’s translate everything so that {Q} is located at the origin. Since flux is translation invariant, this will not change our answer. We first need to know what the electric field is, and this is essentially a direct consequence of Coloumb’s law:

\displaystyle F(x,y,z)=\frac{kQ}{(x^2+y^2+z^2)^{3/2}}\langle x, y, z\rangle

If we care about higher dimensions, then we might want to note that the value only depends on the radial distance from the origin and write it in the more succinct way {\displaystyle F(r)=\frac{kQ}{|r|^3}r}, where {k} is just some constant that depends on the textbook/units you are working in. Let’s first compute the partial of the first coordinate with respect to {x} (ignoring the constant factor for now):

\displaystyle \frac{\partial}{\partial x}\left(\frac{x}{(x^2+y^2+z^2)^{3/2}}\right) = \frac{-2x^2+y^2+z^2}{(x^2+y^2+z^2)^2}

You get similar things for taking the other derivatives involved in the divergence except the minus sign moves to {-2y^2} and {-2z^2} respectively. When you add all these together you get in the numerator {-2x^2-2y^2-2z^2+2x^2+2y^2+2z^2=0}. Thus the divergence is {0} everywhere and hence by the divergence theorem the flux must be {0} too, right? Wrong! And that’s where I lost most of my students.

Recall that pesky hypothesis that {F} can be extended to a vector field on an open neighborhood of {E}. Our {F} can’t even be defined at all to extend continuously across the origin. Thus we must do something different. Here’s the idea, we just change our region {E}. Since {E} is open and contains the origin, we can find a small sphere of radius {\varepsilon>0} and centered at {(0,0,0)} whose interior is properly contained in {E}, say {S_\varepsilon}.

Let {\Omega} be the region between these two surfaces. Effectively this “cuts out” the bad point of {F} and now we are allowed to apply the divergence theorem to {\Omega} where our new boundary is {S} oriented outwards and {S_\varepsilon} oriented inward (negatively). We already calculated that {div F=0}, thus one side of the equation is {0}. This gives us

\displaystyle \iint_S F\cdot d\mathbf{S} = \iint_{S_\varepsilon} F\cdot d\mathbf{S}

This is odd, because it says that no matter how bizarre or gigantic {S} was we can just compute the flux through a small sphere and get the same answer. At this point we’ve converted the problem to something we can do because the unit normal is just {\mathbf{n}=\frac{1}{\sqrt{x^2+y^2+z^2}}\langle x, y, z\rangle}. Direct computation gives us

\displaystyle F\cdot \mathbf{n} = \frac{kQ (x^2+y^2+z^2)}{(x^2+y^2+z^2)^3}=\frac{kQ}{(x^2+y^2+z^2)^2}

Plugging this all in we get that the flux through {S} is

\displaystyle \iint_{S_\varepsilon} \frac{kQ}{\varepsilon^2} \,dS = \frac{kQ}{\varepsilon^2}Area(S_\varepsilon) = 4\pi k Q.

That’s Gauss’ Law. It says that no matter the shape of {S} or the location of the charge inside {S}, you can always compute the flux of the electric field produced by {Q} through {S} as a constant multiple of the amount of charge! In fact, most books use k=1/(4\pi \varepsilon_0) where $\varepsilon_0$ is the “permittivity of free space” which kills off practically all extraneous symbols in the answer.

Mathematical Music Theory 3: Combination Tones

Today we’ll cover one of the most dangerously overlooked consequences of what we’ve been talking about. I say it is overlooked because most people can probably get a degree in music composition without this ever once being mentioned. It should be obvious by the end of the post why it is dangerous for anyone writing music to be unaware of the phenomenon.

Suppose you play two notes at the same time. These are two sound waves, and (in the real world) it is impossible to keep them completely separate so that all you hear are the two sounds. The waves combine. Hence you get a new wave which is some other tone called the combination tone. For this reason they are sometimes called sum tones, or more confusingly difference tones (because the frequency of the tone is the difference of the frequencies).

As before, it is important to note that combination tones are a physical reality. I think that sometimes (when this is taught at all) people write them off as a psychological phenomenon. Maybe it is just your brain filling in something it thinks it should hear. In order to convince you this is not the case, take a look at this video:

He isn’t playing any of those low notes, yet they are the dominant sound. As you can see, with enough patience (and the knowledge I’ll give you below) you can work out how to play melodies entirely using the combination tones.

In terms of the overtone series there is a nice easy way to figure out what the combination tone will be. For example, take a major third by playing C and E at the same time. From the first post we see that the interval occurs as the fourth and fifth tones of the overtone series. All we do is subtract 5-4=1 and find that the combination tone is the fundamental of the series. Thus any two notes that occur sequentially in the overtone series will have a combination tone of the fundamental of the series.

If you take G and E, these are the third and fifth tones and hence the combination tone is the (5 minus 3) second tone in the overtone series and so on. It is quite easy. I suggest that anyone that wants to be a good composer take a simple two voice line in whole notes and work out what all the combination tones are and see if this alters what you thought you wanted.

It is scary that people can be out there composing and entirely unaware of this phenomenon. Think about the danger. You think you are writing certain sounds, but other tones are coming into the writing totally unbeknownst to you. It is even worse than that. Because of the overtone series, you actually get second order, third order, etc combination tones and not just this first order phenomenon.

Here is an example of why this might be important. There are certain intervals that feel stable and others that have tension. Here is a good way to tell which is which. Take the interval of a fifth (a C with a G over it). The combination tone is the same as the bottom note, so the combination tone anchors you to that bottom note and everything feels stable. The interval of a fourth is just the inversion of that (G with a C over it … the same two notes!!), so the combination tone doubles the higher note and you just have this floating middle note which makes it feel less stable.

Composers such as Bach were intimately familiar with this phenomenon. Rather than have it do unexpected things to his compositions, he used it to his advantage. When he wrote two part inventions there would only be two melodies on top of each other, but due to combination tones it sounded much more fleshed out as if many more parts were being played. He would know that in parts where he wanted forward motion he would use unstable forms of intervals and where he wanted resolution he would use the stable forms.

This may seem like some tiny unimportant detail, but it really makes a big difference in how you voice your chords, and takes quite a bit of time and effort to internalize so that you can start to use it effectively.

Mathematical Music Theory 2: Derivation of the Western Scale System

It is extremely important that you believe that the overtone series is inherent to music in a natural way before proceeding, because now what I want to do is make an argument that the reason the 12 note chromatic scale that is so fundamental to Western art music and sounds like the “natural” way to divide up the infinitely many possible pitch choices is that the scale is derivable from nature.

I think most musicologists would probably say the scale that sounds natural to you is based on the culture you grew up in and there is no objective reason to favor one over another. Of course, just like what language sounds natural to you depends on what you grew up with, what musical language sounds natural will also depend on culture. But except for a few very rare exceptions (I actually don’t know of any) all scale systems that have had enough cultural significance to survive history actually can be derived in a similar way to what I’m going to do.

In Paul Hindemith’s book The Craft of Musical Composition, he spends almost a fourth of the book doing this derivation. Note that he wrote this in a time when most academic composers were extreme relativists and wanted to throw all Western conventions out the window (including the use of scales and well-defined notes). My guess is that he wanted to make an argument that our scale system was not some subjective arbitrary system, but is objectively superior to a choice of scale system that is not derivable from the overtone series (N.B. this is not the same as saying the Western scale choices are superior).

Now that that rant is out of the way and I’ve alienated all readers we’ll move on. Actually, there isn’t much to derive if you fully understand the overtone series from last post. Let’s go back to C being the fundamental, because we need to pick some arbitrary starting point from which to derive the rest of the notes.

From the fundamental to the first tone of the overtone series, we get exactly an octave, so it makes sense to talk about moving a note down an octave (i.e. this is an allowable interval in our scale derivation). So really we just take the overtone series and move notes down by an octave until they are in the range between the fundamental tone and the first overtone.

Recall we got C, C, G, C, E. So at the fifth partial we only get two new notes which when moved down octaves give us C-E-G (this is a C major chord and now we see how the major chord can be “derived from nature”). An interesting historical tidbit is that the Pythagoreans construct the rest of the notes only using this many tones of the overtone series. Now that G and E are well-defined notes, you just start their overtone series and go as far as the fifth partial to get a few more notes. Then start the overtone series on those notes until you’ve gotten 12 notes and repeating the process just produces ones you already have.

This is actually what was done back then, and if we listened to music tuned in this way it would sound horribly out of tune to our ears. In Bach’s time a switch from this “equal temperament” to the well-tempered system happened.

We’ll follow Hindemith’s construction. Instead of only adding in notes you get from the overtone series, you go backwards too. You take an allowable note and you consider fundamentals for which that note could occur in the overtone series and you only add in notes that occur in the right octave (between the two C’s). This just amounts to dividing the frequency of an existing tone by the number of overtones we’ve gone up.

That sounds confusing but here’s how it works. Take 64 Hz C. The second note in the series is 128 Hz C. Testing out C in both the first and second spots of the overtone series produces no new notes within the octave.

Thus we move to the third note 192 Hz G. We test out G being in the first, second, or third spot of the overtone series and see what new notes occur in the first, second, or third spot. We rule out G being the fundamental because all new notes would be outside the octave. If it is the second note of an overtone series, then the fundamental is G an octave lower (outside the allowable range) and the third note would be C already in the scale.

The last thing to try before moving on is testing C as the third note of an overtone series. The fundamental would be 256/3=85.33 Hz, which is our modern day F. A new note! Then you keep going. You test each new note as the first, second, third, or fourth overtone of some overtone series and see which notes land in the range 64-128 Hz (this just amounts to dividing by 1, 2, 3, or 4 respectively as mentioned). You keep doing this and you’ll get our modern 12 note chromatic scale.

Mathematical Music Theory 1: The Overtone Series

Since I’m coming up dry on actual math, I thought I’d give a few lessons on music theory from a mathematical viewpoint. The overarching argument I’m going to try to make is that students of music (and of music composition in particular) need to know some math and physics.

I think for the most part you can go all the way through a bachelor’s degree in music and never learn about these things. This is truly a shame because as I’ll point out when it comes up, these ideas are not some abstract theoretical nonsense, but are extremely important to fully understand if you write (and sometimes play) music.

The first thing to get out of the way is something called the overtone series. Pretty much everyone learns this, so I’ll go through it quickly. Here’s how I like to think of it. If you sing or play a note, then there are a series of notes above and below it that are related to it.

If you take a wind instrument (for our purposes just think of it as a tube of metal), then depending on the length and size there is some “fundamental tone” that you can produce on it. Think of this as the lowest note you can play. For the sake of argument suppose the frequency of the wavelength is 64 Hz (this corresponds to a low C).

It turns out the next note that is playable on the tube/instrument will be at 128 Hz just by simple physics of waves considerations (recall that when you solve the eigenvalue problem for the wave equation you get some discrete set of eigenvalues which only allows certain solutions which are in bijection with the natural numbers). This is a C one octave higher. The next note playable will be at 192 Hz, a G. The frequencies continue: 256, 320, 384, …

Now we could have started at any fundamental frequency, so it is the ratios that matter and not the starting number (again, just solve the wave equation if you don’t believe me). So let’s normalize and see if we’ve ever seen this pattern before. Let’s call the first pitch above the fundamental 1 Hz. The next one is 1+\frac{1}{2}. The next one is 1 +\frac{1}{2}+\frac{1}{3} and so on.

This is the well-known partial sums of the harmonic series! Of course this is no accident. The ancients knew about the overtone series and that’s why this series got that name. Like I mentioned, everything I’ve said so far is quite standard and well-known. If this was too sparse to follow I suggest you glance at the numerous internet sources explaining this in more depth before moving on.

One thing that is often glossed over, but will be crucial in later posts is that the overtone series is a physical reality that exists in any note that is produced in a natural way. If you sing a note, all the notes in its overtone series are sitting inside it. If you pluck a string, then the overtone series is in that sounding note. The only way you could get rid of the overtones is to produce a “pure” tone with the overtones stripped out using a computer.

This means that the overtone series is a physical part of the nature of how sound works. If you don’t believe this, then it would be well worth your time to listen to the following clip. He is only singing a single note the whole time, but he draws out the overtones in that tone:

Noether’s Theorem

I want to do one more thing in classical mechanics before moving on to classical field theory. It is called Noether’s theorem, and it tells us how to find conserved quantities of our system if we know that a certain group of symmetries acts on the system.

Recall our setup. We have some configuration space {Q}. We think of this as the smooth manifold of all possible positions our system can take. A point on {Q} corresponds to some configuration. In classical mechanics we also have a Lagrangian {L:TQ\rightarrow \mathbb{R}}. Minimizing the integral of the Lagrangian over all paths in the configuration space tells us (given some initial starting configuration) what path our system will take and hence how it changes over time.

Also, we called {\Gamma} the space of paths on {Q}. We now define a (one-parameter) symmetry of {L} to be a smooth map {F:\mathbb{R}\times \Gamma\rightarrow \Gamma}, usually denoted using “group action” notation as {s\cdot q=q_s} with some special properties. First, {q_0=q} (i.e. the identity element acts as the identity). Second, there is some {\ell: TQ\rightarrow \mathbb{R}} such that {\displaystyle \delta L=\frac{d\ell}{dt}} (for all paths).

Noether’s theorem tells us that given such a symmetry we will get that the quantity {p^i\delta q_i-\ell} is conserved. Conserved in this case means that given any admissible path {q}, the time derivative of the quantity along {q} is {0}. Or unravelling what that means, as the system evolves in time, the quantity is constant.

If we get away from the symbols for a little bit, then we’ll find that we probably already would have guessed this intuitively. If the symmetry of our Lagrangian is shifting the time {(s\cdot q)(t)=q(s+t)}, then this says that our system has the same physics at all points of time. This occurs in our standard example of {L=\frac{1}{2}m\dot{q}^2-V} on {\mathbb{R}^n}. Since {\ell=L}, Noether’s theorem tells us that the conserved quantity is {m\dot{q}^2-(\frac{1}{2}m\dot{q}^2-V)=\frac{1}{2}mv^2+V}. Thus the potential energy plus the kinetic energy is conserved. This is just the total energy! We find that whenever our Lagrangian is invariant under shifting time, we recover the Law of Conservation of Energy.

Another type of symmetry is to consider our free particle in {\mathbb{R}^n}. For any vector {v\in\mathbb{R}^n} we can shift spatially along {v}. Thus {q_s(t)=q(t)+sv}. Certainly our Lagrangian is invariant under any of these shifts. Our conserved quantity in this case is merely {p_i\delta q^i=m\dot{q}_iv_i=m\dot{q}\cdot v} which is just the momentum in the direction {v}. Noether’s theorem tells us that if our Lagrangian doesn’t depend on shifts in the {v}-direction, then momentum in that direction is conserved. Moreover, this tells us that our free particle has all momentum conserved…and of course this is true! The equation of motion is just moving in a constant direction at a constant speed.

The same thing is true for our free particle when we consider rotational symmetries. We fix some rotation {A\in \frak{so}(n)}, and our action is {q_s(t)=e^{As}q(t)}. It shouldn’t be surprising now that we have a feel for Noether’s theorem that this gives us conservation of angular momentum.

The symmetries we have above are known as physical symmetries. One could think of it as moving the frame of reference to a different place and then finding out we get all the same answers. These physical symmetries give non-zero conserved quantities and they don’t introduce ambiguities in the equation of motion given sufficient initial data.

There is another type of symmetry known as a gauge symmetry (we are allowing our action now to be {G\times \Gamma\rightarrow \Gamma} for some Lie group {G}). When you work out the conserved quantity you will get {0}. This is subtler, because our symmetry shouldn’t be thought of as altering the “physical situation” of the setup, but more that it is a symmetry of the mathematics of the situation. This actually does introduce ambiguities in the path our system will take for the simple reason that given sufficient initial data find some solution path {q}, then all paths in the orbit of {q} (i.e. paths of the form {q_s} for some {s\in G}) are possible choices for the evolution of the system.

I’m not sure if there is a good example of a gauge symmetry for classical mechanics, but certainly there are for classical field theories which is our next topic. Most people are probably familiar with the fact that the standard model has {U(3)\times SU(2)\times U(1)} gauge symmetry.

Classical (Lagrangian) Mechanics

It turns out that because I work with Calabi-Yau varieties I often encounter various ideas and terms from physics. In particular, quantum field theory is a something that comes up a lot. I took a lot of physics as an undergrad, and I’ve pieced together a tiny bit about what is meant by “quantum field theory.” In order to record this somewhere before I forget it, I’m going to blog some stuff. This should be a very short series, because I don’t want to get hung up on it.

The main point is to try to express the idea of quantum field theory in a way a mathematician would understand it. Before we can do that I need to spend a post on classical mechanics. This post is going to present what is done over the course of a semester long undergrad class, so it will go fast. I’ll give you the take away up front. In a mathematically rigorous way one can prove that the “Lagrangian formalism” we’ll look at soon is exactly equivalent to Newton’s law {F=ma}.

Suppose we have some particle in space. If it is moving, that motion has something called kinetic energy. For simplicity, we’ll call this a function of time {K(t)=\frac{1}{2}mv(t)^2}. The formula isn’t important here. Usually you also have something called potential energy. For example, a ball is on a table. There is the potential energy of falling to the ground. Technically you can figure out the potential the same way you’d find the potential of any vector field (this took me awhile to connect as an undergrad).

Suppose your particle is moving in {\mathbb{R}^n}, then we can describe it by a function {q: \mathbb{R}\rightarrow\mathbb{R}^n}. There could be some ambient force (gravity, electromagnetic field, etc doesn’t matter). This is a vector field {F: \mathbb{R}^n\rightarrow \mathbb{R}^n}. The potential then is just a function {V: \mathbb{R}^n\rightarrow \mathbb{R}} such that {\nabla F=V}. This is what we tell our calculus students, so it shouldn’t be surprising. Of course, we must assume our force is conservative for a potential to exist, so we do that.

Now we define {L=K(t)-V(q(t))}. This is called the Lagrangian. We define the action over some path {q:[t_0, t_1]\rightarrow \mathbb{R}^n} to be the integral {S(q)=\int_{t_0}^{t_1}L(t)dt}. Now we get to the point. If we let our paths vary, then we get a bunch of real numbers by evaluating {S(q)}. From standard calculus we could find the minimum. This is the path of least action, and our particle will follow that path if and only if in our system Newton’s law {F=ma} holds.

We could go off and try to describe physically why one would think of this weird formalism. For example, integrating force over distance is the work needed to move the particle from point {a} to point {b}. We would expect that the particle will naturally follow the path that requires the least work. This has roughly the same flavor, but takes into account some extra stuff. Whatever the physical reason, it shouldn’t really matter to us, because it is exactly equivalent to the law we all know ought to be true.

In a classical mechanics class you’d probably take many weeks now just being handed various scenarios where you figure out the Lagrangian, and then given some inital starting point figure out the path by taking the derivative {\delta S(q)} and setting it to {0} and solving. Note: for practical purposes this is a little tricky, because the so-called variation of the action involves differentiating with respect to paths. Since we aren’t computing these, we won’t go through this, but the idea of how to do this is to just parametrize your paths in some nice way (think about a smooth {1}-parameter homotopy connecting them for the picture).

Now we must generalize a bit. Suppose we have some physical system (maybe a double pendulum for sufficient complicatedness). There’s more than just one particle, and there are constraints for how things can move in relation to eachother. What we do now is consider the space of all configurations. Think of this as the moduli space of all positions the system could ever take. A point in this space {Q} is one particular configuration. Now a path {q:[t_0, t_1]\rightarrow Q} is just a description of how the system evolves over that time period. This configuration space we assume is a smooth manifold.

This means the velocity, which is the time derivative {\dot{q}(t)\in T_{q(t)}Q} is actually a tangent vector now (it was before, but we just made the canonical identification). Let’s pick a starting point and ending point {a, b\in Q}. Then we can formalize what we did last time as follows. Define {\Gamma=\{q:[t_0, t_1]\rightarrow Q : q(t_0)=a, \ q(t_1)=b\}} to be the path space (of smooth paths) with those endpoints. Let {L: TQ\rightarrow \mathbb{R}} be a smooth function called the Lagrangian of the system.

Now the action is {S:\Gamma\rightarrow \mathbb{R}} defined by {S(q)=\int_{t_0}^{t_1}L(q, \dot{q})dt}. The path that our system will take in the configuration space will be a minimum of {S}. Thus to find it we just solve {\delta S=0}.

In order to test whether you follow this, a really quick (if you get it, but painful if you don’t) and wonderful exercise is to figure out the equation of motion of a single free particle in {\mathbb{R}^3}. What does this mean? Well, there is no force in the system at all, so the potential is {0}, and hence {L=K(t)=\frac{1}{2}mv(t)\cdot v(t)}. We already know the answer. No force means no acceleration. Thus from basic calculus the answer is that velocity is a constant {v_0} and the path is {q(t)=a+v_0t} where {a} is the initial starting point. Try to get that using the Lagrangian!