# Mathematical Music Theory

As a musician and mathematician, I thought I’d give a few lessons on music theory from a mathematical viewpoint. The overarching argument I’m going to try to make is that students of music (and of music composition in particular) need to know some math and physics.

I think for the most part you can go all the way through a bachelor’s degree in music and never learn about these things. This is truly a shame because as I’ll point out when it comes up, these ideas are not some abstract theoretical nonsense, but are extremely important to fully understand if you write (and sometimes play) music.

## The Overtone Series

The first thing to get out of the way is something called the overtone series. Pretty much everyone learns this, so I’ll go through it quickly. Here’s how I like to think of it. If you sing or play a note, then there are a series of notes above and below it that are related to it.

If you take a wind instrument (for our purposes just think of it as a tube of metal), then depending on the length and size there is some “fundamental tone” that you can produce on it. Think of this as the lowest note you can play. For the sake of argument suppose the frequency of the wavelength is 64 Hz (this corresponds to a low C).

It turns out the next note that is playable on the tube/instrument will be at 128 Hz just by simple physics of waves considerations (recall that when you solve the eigenvalue problem for the wave equation you get some discrete set of eigenvalues which only allows certain solutions which are in bijection with the natural numbers).

This is a C one octave higher. The next note playable will be at 192 Hz, a G. The frequencies continue: 256, 320, 384, …

Now we could have started at any fundamental frequency, so it is the ratios that matter and not the starting number (again, just solve the wave equation if you don’t believe me).

So, let’s normalize and see if we’ve ever seen this pattern before. Let’s call the first pitch above the fundamental 1 Hz. The next one is $1+\frac{1}{2}$. The next one is $1 +\frac{1}{2}+\frac{1}{3}$ and so on.

This is the well-known partial sums of the harmonic series!

Of course this is no accident. The ancients knew about the overtone series and that’s why this series got that name.

Like I mentioned, everything I’ve said so far is quite standard and well-known. If this was too sparse to follow I suggest you glance at the numerous internet sources explaining this in more depth before moving on.

One thing that is often glossed over, but will be crucial in later posts is that the overtone series is a physical reality that exists in any note that is produced in a natural way.

If you sing a note, all the notes in its overtone series are sitting inside it. If you pluck a string, then the overtone series is in that sounding note. The only way you could get rid of the overtones is to produce a “pure” tone with the overtones stripped out using a computer.

This means that the overtone series is a physical part of the nature of how sound works. If you don’t believe this, then it would be well worth your time to listen to the following clip. He is only singing a single note the whole time, but he draws out the overtones in that tone:

## Derivation of the Western Scale System

It is extremely important that you believe that the overtone series is inherent to music in a natural way before proceeding, because now what I want to do is make an argument that the reason the 12 note chromatic scale that is so fundamental to Western art music and sounds like the “natural” way to divide up the infinitely many possible pitch choices is that the scale is derivable from nature.

I think most musicologists would probably say the scale that sounds natural to you is based on the culture you grew up in and there is no objective reason to favor one over another.

Of course, just like what language sounds natural to you depends on what you grew up with, what musical language sounds natural will also depend on culture. But except for a few very rare exceptions (I actually don’t know of any) all scale systems that have had enough cultural significance to survive history actually can be derived in a similar way to what I’m going to do.

In Paul Hindemith’s book The Craft of Musical Composition, he spends almost a fourth of the book doing this derivation. Note that he wrote this in a time when most academic composers were extreme relativists and wanted to throw all Western conventions out the window (including the use of scales and well-defined notes).

My guess is that he wanted to make an argument that our scale system was not some subjective arbitrary system, but is objectively superior to a choice of scale system that is not derivable from the overtone series (N.B. this is not the same as saying the Western scale choices are superior).

Now that that rant is out of the way and I’ve alienated all readers we’ll move on. Actually, there isn’t much to derive if you fully understand the overtone series from last post. Let’s go back to C being the fundamental, because we need to pick some arbitrary starting point from which to derive the rest of the notes.

From the fundamental to the first tone of the overtone series, we get exactly an octave, so it makes sense to talk about moving a note down an octave (i.e. this is an allowable interval in our scale derivation). So really we just take the overtone series and move notes down by an octave until they are in the range between the fundamental tone and the first overtone.

Recall we got C, C, G, C, E. So at the fifth partial we only get two new notes which when moved down octaves give us C-E-G (this is a C major chord and now we see how the major chord can be “derived from nature”).

An interesting historical tidbit is that the Pythagoreans construct the rest of the notes only using this many tones of the overtone series. Now that G and E are well-defined notes, you just start their overtone series and go as far as the fifth partial to get a few more notes. Then start the overtone series on those notes until you’ve gotten 12 notes and repeating the process just produces ones you already have.

This is actually what was done back then, and if we listened to music tuned in this way it would sound horribly out of tune to our ears. In Bach’s time a switch from this “equal temperament” to the well-tempered system happened.

### Hindemith’s Construction

We’ll follow Hindemith’s construction. Instead of only adding in notes you get from the overtone series, you go backwards too. You take an allowable note and you consider fundamentals for which that note could occur in the overtone series and you only add in notes that occur in the right octave (between the two C’s).

This just amounts to dividing the frequency of an existing tone by the number of overtones we’ve gone up.

That sounds confusing but here’s how it works. Take 64 Hz C. The second note in the series is 128 Hz C. Testing out C in both the first and second spots of the overtone series produces no new notes within the octave.

Thus we move to the third note 192 Hz G. We test out G being in the first, second, or third spot of the overtone series and see what new notes occur in the first, second, or third spot. We rule out G being the fundamental because all new notes would be outside the octave.

If it is the second note of an overtone series, then the fundamental is G an octave lower (outside the allowable range) and the third note would be C already in the scale.

The last thing to try before moving on is testing C as the third note of an overtone series. The fundamental would be 256/3=85.33 Hz, which is our modern day F. A new note!

Then you keep going. You test each new note as the first, second, third, or fourth overtone of some overtone series and see which notes land in the range 64-128 Hz (this just amounts to dividing by 1, 2, 3, or 4 respectively as mentioned). You keep doing this and you’ll get our modern 12 note chromatic scale.

## Combination Tones

Next we’ll cover one of the most dangerously overlooked consequences of what we’ve been talking about.

I say it is overlooked because most people can probably get a degree in music composition without this ever once being mentioned. It should be obvious by the end of the post why it is dangerous for anyone writing music to be unaware of the phenomenon.

Suppose you play two notes at the same time.

These are two sound waves, and (in the real world) it is impossible to keep them completely separate so that all you hear are the two sounds. The waves combine. Hence you get a new wave which is some other tone called the combination tone.

For this reason they are sometimes called sum tones, or more confusingly difference tones (because the frequency of the tone is the difference of the frequencies).

As before, it is important to note that combination tones are a physical reality. I think that sometimes (when this is taught at all) people write them off as a psychological phenomenon. Maybe it is just your brain filling in something it thinks it should hear. In order to convince you this is not the case, take a look at this video:

He isn’t playing any of those low notes, yet they are the dominant sound. As you can see, with enough patience (and the knowledge I’ll give you below) you can work out how to play melodies entirely using the combination tones.

In terms of the overtone series there is a nice easy way to figure out what the combination tone will be.

For example, take a major third by playing C and E at the same time. From the first post we see that the interval occurs as the fourth and fifth tones of the overtone series. All we do is subtract 5-4=1 and find that the combination tone is the fundamental of the series.

Thus, any two notes that occur sequentially in the overtone series will have a combination tone of the fundamental of the series.

If you take G and E, these are the third and fifth tones and hence the combination tone is the (5 minus 3) second tone in the overtone series and so on. It is quite easy. I suggest that anyone that wants to be a good composer take a simple two voice line in whole notes and work out what all the combination tones are and see if this alters what you thought you wanted.

It is scary that people can be out there composing and entirely unaware of this phenomenon. Think about the danger.

You think you are writing certain sounds, but other tones are coming into the writing totally unbeknownst to you.

It is even worse than that. Because of the overtone series, you actually get second order, third order, etc combination tones and not just this first order phenomenon.

Here is an example of why this might be important.

There are certain intervals that feel stable and others that have tension. Here is a good way to tell which is which. Take the interval of a fifth (a C with a G over it). The combination tone is the same as the bottom note, so the combination tone anchors you to that bottom note and everything feels stable.

The interval of a fourth is just the inversion of that (G with a C over it … the same two notes!!), so the combination tone doubles the higher note and you just have this floating middle note which makes it feel less stable.

Composers such as Bach were intimately familiar with this phenomenon.

Rather than have it do unexpected things to his compositions, he used it to his advantage. When he wrote two part inventions, there would only be two melodies on top of each other, but due to combination tones it sounded much more fleshed out: as if many more parts were being played.

He would know that in parts where he wanted forward motion he would use unstable forms of intervals and where he wanted resolution he would use the stable forms.

This may seem like some tiny unimportant detail, but it really makes a big difference in how you voice your chords, and takes quite a bit of time and effort to internalize so that you can start to use it effectively.

## Post-Tonal Music Theory

All right.

Let’s skip from first week of first year of music theory to something that probably won’t come up until a music theory elective in your third year (yes, we’re skipping two full years of theory here).

What I’m going to describe is usually encountered in a class on “Post-tonal theory.” This can be misleading because for the most part it is an extremely useful mathematical way of thinking about music theory that doesn’t particularly have to do with atonal music or 12-tone serialism.

As we’ve already pointed out our Western 12-tone scale is essentially taking an octave and dividing it up into 12 parts. Since an octave (or 12 semitones up or down) gives the same note we can mathematically think of things more clearly by just labeling a C with 0, a C# with 1, a D with 2 and so on up to labeling a B with 11. When we back to C we “wrap around” and call it 0 again.

A great way to visualize this is to draw a 12-sided figure with all the side lengths the same (a regular dodecagon). Now if we take a C major chord: 0, 4, 7, then transposing it to a major chord 3 semitones up just amounts to adding every number by 3, i.e. 3, 7, 10. In fact, given any set of notes, we have the operation of transposition $T_n (i_1, i_2, \ldots, i_k)=(i_1+n, i_2+n, \ldots, i_k +n) \ \text{mod} \ 12$ where mod 12 means we add by wrapping around and consider 12=0, 13=1, 14=2, etc (because they are the same notes!!).

We can also do something called inversion. This just amounts to exactly inverting every interval.

This amounts to negating every single number and then figuring out what this number is mod 12. So the inversion of the C major chord: [0, 4, 7] is [0, -4, -7]=[0, 8, 5] or if we really are considering “chords” then the order doesn’t matter so it is [0,5,8].

But this is just an f minor chord! We call this operation I for “inversion.” It can be visualized as a reflection of the dodecagon as follows (don’t make fun, I whipped this together using Google draw in a minute or so):

It is pretty clear that doing $T_n$ for all choices of n to [0,4,7] gives you all 12 majors chords and if you do both I and $T_n$ then you’ll get all 12 minor chords too.

The operations of transpositions and inversions forms something called a group. In fact, visualizing with a regular 12-gon immediately tells us that the T/I group is what mathematicians call $D_{12}$ the Dihedral group of symmetries of the dodecagon. It has 24 elements.

We call an unordered collection of numbers between 0 and 12 a pitch class set, and we get that $D_{12}$ acts on the set of pitch class sets.

We just proved that the orbit of [0,4,7] under this action consists of exactly the collection of major and minor triads.

Note that none of the triads are sent to themselves, i.e. given a non-trivial symmetry/combination of transpositions and inversions we will always get a distinct new triad. Mathematicians might say this in a fancy way: the set of major and minor triads is a torsor under the T/I-action.

It turns out this is a “generic” phenomenon in the sense that choosing some random pitch class set you are likely (in that the probability is greater than 50%) to have chosen one that has this property.

We could say that it has the property of having no T/I-symmetry. Conversely, we could call a k-chord (read: an unordered chord with k notes in it) T/I-symmetric if there is some choice of non-trivial transposition and inversion such that the chord is sent to itself.

Now even though these are more rare, it turns out that for any choice of k, there is always a k-chord with this property. These exist for rather silly reasons. For example, [0,1,2, … , k] is always an example of such a chord (exercise: why?).

For less trivial examples you could take the whole-tone scale [0,2,4,6,8,10]. If you do $T_2$ then you certainly get the whole tone scale back again. Inversion also fixes this 6-chord.

This tells us that up to inversion and transposition there are only 2 distinct whole tone scales (if you want overkill then the subgroup generated by $T_2$ and I has 12 elements, so the Orbit-Stabilizer Theorem tells us this fact).

Here is an interesting question from pure music theory that to my knowledge is still open (although I suspect it is fairly easy to answer and if I spent time trying to figure out the answer in place of writing this post I’d have the answer).

None of this was specific to dividing up an octave into 12 notes. Suppose you invent a tonal system with n notes instead. Then you’d have an action of $D_n$ on the k-chords. Is there a simple closed form formula for the number of k-chords that are T/I-symmetric? More importantly, for a given n, which k gives the most number of k-chords with T/I-symmetry.

I should point out that if you rule out the “silly examples” of T/I-symmetry given by a strictly chromatic scale, then there is actually utility in figuring this out. T/I-symmetry has played a great role in the history of composition.

For example, the augmented triad, the French augmented sixth chord, the diminished seventh, the famous chord from Stravinsky’s Petrushka, the hexatonic scale, the whole tone scale, and the octatonic scale are all examples. So I think this is more than just a novelty problem.

## The Stack of Pitch Class Sets

I thought I might formally work this part out and write it up to submit to a music theory journal, but no one would probably accept it anyway.

So, I’ll sketch the idea now. Back here I talked about stacks as a useful way to generalize what we mean by a “space.”

I know Mazzola wrote a whole book on using topos theory in music, but I’ve never dug into it very deeply. I fully admit this is probably just a special case of something from that book. But it’s always useful to work out special cases.

Recall that a pitch set (or chord) is just converting notes to numbers: 0 is C, 1 is C#, 2 is D, etc. A given collection of pitches can be expressed in a more useful notation when there isn’t a key we’re working in. For example, a C major chord is (047).

A pitch class set is then saying that there are collections of these we want to consider to be the same. For one, our choice of 0 is completely arbitrary. We could have set 0 is A, and we should get the same theory. This amounts to identifying all pitch sets that are the same after translation.

We also want to identify sets that are the same after inversion. In the previous post on this topic, I showed that if we label the vertices of a dodecagon, this amounts to a reflection symmetry.

The reflections together with the translations generate the dihedral group , so we are secretly letting  act on the set of all tuples of numbers 0 to 11, where each number only appears once and without loss of generality we can assume they are in increasing order.

Thus a pitch class set is just an equivalence class of a chord under this group action. It is not the direction I want this post to go, but given such a class, there is always a unique representative that is usually called the “prime form” (basically the most “compact” representative starting with 0).

Here’s where we get to the part I never really worked out.

The set of all “chords” should have some sort of useful topology on it. For example, (0123) should be related to (0124), because they are the same chord except for one note.

I don’t think doing something obvious like defining a distance based on the coordinates works. If you try to construct the lattice of open sets by hand based on your intuition, a definition might become more obvious.

Call this space of chords .

Now we have a space with a group action on it. One might want to merely form the quotient space .

This will be 24 to 1 at most points, but it will also forget which chords were fixed by elements of the group. Part of the “theory” in music theory is to remember that information.

This is why I propose making the quotient stack . It seems like an overly complicated thing to do, but here’s what you gain.

You now have a “space” whose points are the pitch class sets. If that class contains 24 distinct chords, then the point is an “honest” point with no extra information.

The fiber of the quotient map contains the 24 chords, and you get to each of them by acting by the elements of  (i.e. it is a torsor under ).

Now consider something like the pitch class set [0,2,4,6,8,10]. The fiber of the quotient map only contains elements: (02468T) and (13579E). The stack will tag these points with , which is the subgroup of symmetries which send this chord to itself.

Now that I’ve drawn this, I can see that many of you will be skeptical about the simplicity.

Think of it this way:

The bottom thing is the space I’m describing. Each point in the space is tagged with the prime form representative together with the subgroup of symmetries that preserve the class. That’s pretty simple. Yet it remembers all of the complicated music theory of the top thing! If the topology was defined well, then studying this space may even lead to insights on how symmetries of classes are related to each other. Let me know if anyone has seen anything like this before.

## 6 thoughts on “Mathematical Music Theory”

1. This is important stuff since, amongst other things, the harmonic series governs how low in the orchestral range the various chordal functions can be taken, unless a specific effect is intended (i.e. ‘noise’). I’m surprised to hear that the subject still receives a low level of emphasis in education. The denominators in the fractions given form part of the harmonic number series, 1234567….. which is another way of looking at the ratios. Thanks for this, John Morton.

2. Shane M says:

I saw your post on Discover Magazines article and followed here. Nicely written and well thought presentation for somebody like myself for whom this is a fairly new topic. If you are inclined and write more on this it’d be interesting to see thoughts on not just scales, but how the dominant chord progressions in songs fit this type approach. (like the common 1-4-5 or 1 – 5 – 6 – 4 progressions so common in pop songs.).