Surviving Upper Division Math

It’s that time of the year. Classes are starting up. You’re nervous and excited to be taking some of your first “real” math classes called things like “Abstract Algebra” or “Real Anaylsis” or “Topology.”

It goes well for the first few weeks as the professor reviews some stuff and gets everyone on the same page. You do the homework and seem to be understanding.

Then, all of a sudden, you find yourself sitting there, watching an hour-long proof of a theorem you can’t even remember the statement of, using techniques you’ve never heard of.

You panic. Is this going to be on the test?

We’ve all been there.

I’ve been that teacher, I’m sad to say, where it’s perfectly clear in my head that the students are not supposed to regurgitate any of this. The proof is merely there for rigor and exposure to some ideas. It’s clear in my head which ideas are the key ones, though I maybe forgot to point it out carefully.

It’s a daunting situation for the best students in the class and a downright nightmare for the weaker ones.

Then it gets worse. Once your eyes glaze over that first time, it seems the class gets more and more abstract as the weeks go by, filled with more and more of these insanely long proofs and no examples to illuminate the ideas.

Here’s some advice for surviving these upper division math classes. I’m sure people told me this dozens of times, but I tended to ignore it. I only learned how effective it was when I got to grad school.

Disclaimer: Everyone is different. Do what works for you. This worked for me and may only end up frustrating someone with a different learning style.

Tip Summary: Examples, examples, examples!

I used to think examples were something given in a textbook to help me work the problems. They gave me a model of how to do things.

What I didn’t realize was that examples are how you’re going to remember everything: proofs, theorems, concepts, problems, and so on.

Every time you come to a major theorem, write out the converse, inverse, switch some quantifiers, remove hypotheses, weaken hyphotheses, strengthen conclusions, and whatever you can think of to mess it up.

When you do this you’ll produce a bunch of propositions that are false! Now come up with examples to show they’re false (and get away from that textbook when you do this!). Maybe some rearrangement of the theorem turns out to be true, and so you can’t figure out a counterexample.

This is good, too! I cannot overstate how much you will drill into your memory by merely trying unsuccessfully to find a counterexample to a true statement. You’ll start to understand and see why it’s probably true, which will help you follow along to the proof.

As someone who has taught these classes, I assure you that a huge amount of problems students have on a test would be solved by doing this. Students try to memorize too much, and then when they get to a test, they start to question: was that a “for every” or “there exists?” Does the theorem go this way or that?

You must make up your own examples, so when you have a question like that, the answer comes immediately. It’s so easy to forget the tiniest little hypothesis under pressure.

It’s astounding the number of times I’ve seen someone get to a point in a proof where it looks like everything is in place, but it’s not. Say you’re at a step where f: X\to Y is a continuous map of topological spaces, and X is connected. You realize you can finish the proof if Y is connected.

You “remember” this is a theorem from the book! You’re done!

Woops. It turns out that f has to be surjective to make that true.

But now imagine, before the test, you read that theorem and you thought: what’s a counterexample if I remove the surjective hypothesis?

The example you came up with was so easy and took no time at all. It’s f: [0,1] \to \{0\} \cup \{1\} given by f(x) = 1. This example being in your head saves you from bombing that question.

If you just try to memorize the examples in the book or that the professor gives you, that’s just more memorization, and you could run into trouble. By going through the effort of making your own examples, you’ll have the confidence and understanding to do it again in a difficult situation.

A lesser talked about benefit is that having a bunch of examples that you understand gives you something concrete to think about when watching these proofs. So when the epsilons and deltas and neighborhoods of functions and uniform convergence and on and on start to make your eyes glaze over, you can picture the examples you’ve already constructed.

Instead of thinking in abstract generality, you can think: why does that step of the proof work or not work if f_n(x) = x^n?

Lastly, half the problems on undergraduate exams are going to be examples. So, if you already know them, you can spend all your time on the “harder” problems.

Other Tip: Partial credit is riskier than in lower division classes.

There’s this thing that a professor will never tell you, but it’s true: saying wrong things on a test is worse than saying nothing at all.

Let me disclaimer again. Being wrong and confused is soooo important to the process of learning math. You have to be unafraid to try things out on homework and quizzes and tests and office hours and on your own.

Then you have to learn why you were wrong. When you’re wrong, make more examples!

Knowing a bunch of examples will make it almost impossible for you to say something wrong.

Here’s the thing. There comes a point every semester where the professor has to make a judgment call on how much you understand. If they know what they’re doing, they’ll wait until the final exam.

The student that spews out a bunch of stuff in the hopes of partial credit is likely to say something wrong. When we’re grading and see something wrong (like misremembering that theorem above), a red flag goes off: this student doesn’t understand that concept.

A student that writes nothing on a problem or only a very small amount that is totally correct will be seen as superior. This is because it’s okay to not be able to do a problem if you understand you didn’t know how to do it. That’s a way to demonstrate you’re understanding. In other words: know what you don’t know.

Now, you shouldn’t be afraid to try, and this is why the first tip is so much more important than this other tip (and will often depend on the instructor/class).

And the best way to avoid using a “theorem” that’s “obviously wrong” is to test any theorem you quote against your arsenal of examples. As you practice this, it will become second-nature and make all of these classes far, far easier.

Advertisements

Mathematical Reason for Uncertainty in Quantum Mechanics

Today I’d like to give a fairly simple account of why Uncertainty Principles exist in quantum mechanics. I thought I already did this post, but I can’t find it now. I often see in movies and sci-fi books (not to mention real-life discussions) a misunderstanding about what uncertainty means. Recall the classic form that says that we cannot know the exact momentum and position of a particle simultaneously.

First, I like this phrasing a bit better than a common alternative: we cannot measure perfectly the momentum and position simultaneously. Although, I guess this is technically true, it has a different flavor. This makes it sound like we don’t have good enough measuring equipment. Maybe in a hundred years our tools will get better, and we will be able to make more precise measurements to do both at once. This is wrong, and completely misunderstands the principle.

Even from a theoretical perspective, we cannot “know.” There are issues with that word as well. In some sense, the uncertainty principle should say that it makes no sense to ask for the momentum and position of a particle (although this again is misleading because we know the precise amount of uncertainty in attempting to do this).

It is like asking: Is blue hard or is blue soft? It doesn’t make sense to ask for the hardness property of a color. To drive the point home, it is even a mathematical impossibility, not just some physical one. You cannot ever write down an equation (a wavefunction for a particle) that has a precise momentum and position at the same time.

Here’s the formalism that lets this fall out easily. To each observable quantity (for example momentum and position) there exists a Hermition operator. If you haven’t seen this before, then don’t worry. The only fact we need about this is that “knowing” or “measuring” or “being in” a certain observable state corresponds to the wavefunction of the particle being an eigenfunction for this operator.

Suppose we have two operators {A} and {B} corresponding to observable quantities {a} and {b}, and it makes sense to say that {\Psi} we can simultaneously measure properties {a} and {b}. This means there are two number {\lambda_1} and {\lambda_2} such that {A\Psi = \lambda_1 \Psi} and {B\Psi = \lambda_2 \Psi}. That is the definition of being an eigenfunction.

This means that the commutator applied to {\Psi} has the property

{[A,B] = AB\Psi - BA\Psi = A\lambda_2 \Psi - B \lambda_1 \Psi = \lambda_2\lambda_1 \Psi - \lambda_1\lambda_2 \Psi = 0}.

Mathematically speaking, a particle that is in a state for which it makes sense to talk about having two definite observable quantities attached must be described by a wavefunction in the kernel of the commutator. Therefore, it never makes sense to ask for both if the commutator has no kernel. This is our proof. All we must do is compute the commutator of the momentum and position operator and see that it has no kernel (except for the 0 function which doesn’t correspond to a legitimate wavefunction).

You could check wikipedia or something, but the position operator is given by {\widehat{x}f= xf} and the momentum is given by {\widehat{p}f=-i\hbar f_x}.

Thus,

\displaystyle \begin{array}{rcl} [\widehat{x}, \widehat{p}]f & = & -ix\hbar f_x + i\hbar \frac{d}{dx}(xf) \\ & = & -i\hbar (xf_x - f - xf_x) \\ & = & i\hbar f \end{array}

This shows that the commutator is a constant times the identity operator. It has no kernel, and therefore makes no sense to ask for a definite position and momentum of a particle simultaneously. There isn’t even some crazy, abstract purely theoretical construction that can have that property. This also shows that we can have all sorts of other uncertainty principles by checking other operators.

How Hard is Adding Integers for a Computer?

In our modern world, we often use high level programming languages (Python, Ruby, etc) without much thought about what is happening. Even if we use a low level language like C, we still probably think of operations like {1+1} yielding {2} or {3-2} yielding {1} as extremely basic. We have no appreciation for how subtle and clever people had to be to first get those types of things to work.

I don’t want to go into detail about how those actually work at the machine level, because that would be a pretty boring post. I do want to do a thought experiment that should bring up some of the issues. Suppose you want to make a new programming language to see how one does such a thing. You think to yourself that it will at least be able to add and subtract integers. How hard could that be?

To play around a little you decide you will first make an integer take up 4 bits of memory. This means when you declare an integer {x = 1}, it gets put into a space of size {4} bits: {0001}. Each bit can be either a {0} or a {1}. Things seem to be going great, because you are comfortable with binary notation. You think that you’ll just take an integer and write its binary representation to memory.

Just for a quick refresher, for 4 bits this means that your integer type can only encode the numbers from {0} to {15}. Recall that you can go back to base {10} by taking each digit and using it a coefficient on the appropriate power of {2}. Thus {0101} would be {0\cdot 2^3 + 1\cdot 2^2 + 0\cdot 2^1 + 1\cdot 2^0 = 5}.

Things are going well. You cleverly come up with an adding function merely by manipulating bits with allowable machine level operations coming directly from the operating system. Then you test {15 + 1}. Woops. You overflow. This is the first problem, but it isn’t the interesting one I want to focus on. Even if you have a well defined integer type and a working addition function, this doesn’t mean that adding two integers will always result in an integer! There is an easy rule you think up to determine when it will happen and you just throw an error message for now.

Now you move on to subtraction. Oops. You then realize that you have no way of representing negative numbers with your integer type. If you haven’t seen this before, then you should really take a few moments to think about how you would do this. The “most obvious” solution takes some thought, and turns out to be terrible to use. The one that people actually use is quite clever.

The first thing you might try is to just reserve either the first or last bit as a way to indicate that you are positive or negative. Maybe you’ll take {1xxx} to be negative and {0xxx} to be postive. For example, {0001} is {1} and {1001} is {-1}. First, notice that this cuts the number of postive integers you can represent in half, but there isn’t a way around this. Second, there is a positive and negative “0” because {1000} is supposedly {-0}. This will almost certainly cause a bigger headache than it is solves.

Lastly, that adding function you wrote is meaningless now. Fortunately, people came up with a much better solution. It is called two’s complement notation. We just weight the most significant bit with a negative. This means that {1010} would convert to {-2^3 + 0\cdot 2^2 +2^1 + 0\cdot 2^0 = -6}. This makes all the numbers that start with 1 negative like our earlier example, except there is only a single 0 now because {1000} is {-8} (our most negative integer we can represent).

Moreover {3-2 = 3 + (-2) = 0011 + 1110 = 0001 = 1} (if we chop off overflow, yikes). So plain old addition works and gives us a subtraction. Except, sometimes it doesn’t. For example, take {0111 + 0001 = 1000}. This says that {7+1= -8}. This is basically the same overflow error from before, because {8} is not an integer that can be represented by our 4 bit type. This just means we have to be careful about some edge cases. It is doable, and in fact, this is exactly what C does (but with 32 bit integers).

Just to wrap up, it seems that to make this hobbled together solution of merely representing and adding integers work we want to make sure that our language is strongly typed (i.e. we know exactly how big an integer is so that we know where to place that leading 1 indicating a negative and the type isn’t going to change on us).

Just consider if we tried to prevent overflow issues by making a “big integer” class that is {8} bits instead of {4}. We try to do {3-2} again, and upon overflow we switch the type to a big int. We would then get {3-2= 0001 \ 0000} which is {16}. This means we have to be really careful when dealing with multiple types and recasting between them. It seems a minor miracle that in a language like Ruby you can throw around all sorts of different looking types (without declaring any of them) with plus signs between them and get the answer you would expect.

That brings us to the main point of this post. It is a really, really good thing we don’t have to worry about these technicalities when writing programs. The whole point of a good high level language is to not be aware of all the tedious machine level computations going on. But this also means that most people have no appreciation for just how complicated something as simple as adding two integers can be (of course, this is all standardized now, so you probably wouldn’t even worry about it if you were writing your own language from scratch).

An Application of p-adic Volume to Minimal Models

Today I’ll sketch a proof of Ito that birational smooth minimal models have all of their Hodge numbers exactly the same. It uses the {p}-adic integration from last time plus one piece of heavy machinery.

First, the piece of heavy machinery: If {X, Y} are finite type schemes over the ring of integers {\mathcal{O}_K} of a number field whose generic fibers are smooth and proper, then if {|X(\mathcal{O}_K/\mathfrak{p})|=|Y(\mathcal{O}_K/\mathfrak{p})|} for all but finitely many prime ideals, {\mathfrak{p}}, then the generic fibers {X_\eta} and {Y_\eta} have the same Hodge numbers.

If you’ve seen these types of hypotheses before, then there’s an obvious set of theorems that will probably be used to prove this (Chebotarev + Hodge-Tate decomposition + Weil conjectures). Let’s first restrict our attention to a single prime. Since we will be able to throw out bad primes, suppose we have {X, Y} smooth, proper varieties over {\mathbb{F}_q} of characteristic {p}.

Proposition: If {|X(\mathbb{F}_{q^r})|=|Y(\mathbb{F}_{q^r})|} for all {r}, then {X} and {Y} have the same {\ell}-adic Betti numbers.

This is a basic exercise in using the Weil conjectures. First, {X} and {Y} clearly have the same Zeta functions, because the Zeta function is defined entirely by the number of points over {\mathbb{F}_{q^r}}. But the Zeta function decomposes

\displaystyle Z(X,t)=\frac{P_1(t)\cdots P_{2n-1}(t)}{P_0(t)\cdots P_{2n}(t)}

where {P_i} is the characteristic polynomial of Frobenius acting on {H^i(X_{\overline{\mathbb{F}_q}}, \mathbb{Q}_\ell)}. The Weil conjectures tell us we can recover the {P_i(t)} if we know the Zeta function. But now

\displaystyle \dim H^i(X_{\overline{\mathbb{F}_q}}, \mathbb{Q}_\ell)=\deg P_i(t)=H^i(Y_{\overline{\mathbb{F}_q}}, \mathbb{Q}_\ell)

and hence the Betti numbers are the same. Now let’s go back and notice the magic of {\ell}-adic cohomology. If {X} and {Y} are as before over the ring of integers of a number field. Our assumption about the number of points over finite fields being the same for all but finitely many primes implies that we can pick a prime of good reduction and get that the {\ell}-adic Betti numbers of the reductions are the same {b_i(X_p)=b_i(Y_p)}.

One of the main purposes of {\ell}-adic cohomology is that it is “topological.” By smooth, proper base change we get that the {\ell}-adic Betti numbers of the geometric generic fibers are the same

\displaystyle b_i(X_{\overline{\eta}})=b_i(X_p)=b_i(Y_p)=b_i(Y_{\overline{\eta}}).

By the standard characteristic {0} comparison theorem we then get that the singular cohomology is the same when base changing to {\mathbb{C}}, i.e.

\displaystyle \dim H^i(X_\eta\otimes \mathbb{C}, \mathbb{Q})=\dim H^i(Y_\eta \otimes \mathbb{C}, \mathbb{Q}).

Now we use the Chebotarev density theorem. The Galois representations on each cohomology have the same traces of Frobenius for all but finitely many primes by assumption and hence the semisimplifications of these Galois representations are the same everywhere! Lastly, these Galois representations are coming from smooth, proper varieties and hence the representations are Hodge-Tate. You can now read the Hodge numbers off of the Hodge-Tate decomposition of the semisimplification and hence the two generic fibers have the same Hodge numbers.

Alright, in some sense that was the “uninteresting” part, because it just uses a bunch of machines and is a known fact (there’s also a lot of stuff to fill in to the above sketch to finish the argument). Here’s the application of {p}-adic integration.

Suppose {X} and {Y} are smooth birational minimal models over {\mathbb{C}} (for simplicity we’ll assume they are Calabi-Yau, Ito shows how to get around not necessarily having a non-vanishing top form). I’ll just sketch this part as well, since there are some subtleties with making sure you don’t mess up too much in the process. We can “spread out” our varieties to get our setup in the beginning. Namely, there are proper models over some {\mathcal{O}_K} (of course they aren’t smooth anymore), where the base change of the generic fibers are isomorphic to our original varieties.

By standard birational geometry arguments, there is some big open locus (the complement has codimension greater than {2}) where these are isomorphic and this descends to our model as well. Now we are almost there. We have an etale isomorphism {U\rightarrow V} over all but finitely many primes. If we choose nowhere vanishing top forms on the models, then the restrictions to the fibers are {p}-adic volume forms.

But our standard trick works again here. The isomorphism {U\rightarrow V} pulls back the volume form on {Y} to a volume form on {X} over all but finitely primes and hence they differ by a function which has {p}-adic valuation {1} everywhere. Thus the two models have the same volume over all but finitely many primes, and as was pointed out last time the two must have the same number of {\mathbb{F}_{q^r}}-valued points over these primes since we can read this off from knowing the volume.

The machinery says that we can now conclude the two smooth birational minimal models have the same Hodge numbers. I thought that was a pretty cool and unexpected application of this idea of {p}-adic volume. It is the only one I know of. I’d be interested if anyone knows of any other.

Volumes of p-adic Schemes

I came across this idea a long time ago, but I needed the result that uses it in its proof again, so I was curious about figuring out what in the world is going on. It turns out that you can make “{p}-adic measures” to integrate against on algebraic varieties. This is a pretty cool idea that I never would have guessed possible. I mean, maybe complex varieties or something, but over {p}-adic fields?

Let’s start with a pretty standard setup in {p}-adic geometry. Let {K/\mathbb{Q}_p} be a finite extension and {R} the ring of integers of {K}. Let {\mathbb{F}_q=R_K/\mathfrak{m}} be the residue field. If this scares you, then just take {K=\mathbb{Q}_p} and {R=\mathbb{Z}_p}.

Now let {X\rightarrow Spec(R)} be a smooth scheme of relative dimension {n}. The picture to have in mind here is some smooth {n}-dimensional variety over a finite field {X_0} as the closed fiber and a smooth characteristic {0} version of this variety, {X_\eta}, as the generic fiber. This scheme is just interpolating between the two.

Now suppose we have an {n}-form {\omega\in H^0(X, \Omega_{X/R}^n)}. We want to say what it means to integrate against this form. Let {|\cdot |_p} be the normalized {p}-adic valuation on {K}. We want to consider the {p}-adic topology on the set of {R}-valued points {X(R)}. This can be a little weird if you haven’t done it before. It is a totally disconnected, compact space.

The idea for the definition is the exact naive way of converting the definition from a manifold to this setting. Consider some point {s\in X(R)}. Locally in the {p}-adic topology we can find a “disk” containing {s}. This means there is some open {U} about {s} together with a {p}-adic analytic isomorphism {U\rightarrow V\subset R^n} to some open.

In the usual way, we now have a choice of local coordinates {x=(x_i)}. This means we can write {\omega|_U=fdx_1\wedge\cdots \wedge dx_n} where {f} is a {p}-adic analytic on {V}. Now we just define

\displaystyle \int_U \omega= \int_V |f(x)|_p dx_1 \cdots dx_n.

Now maybe it looks like we’ve converted this to another weird {p}-adic integration problem that we don’t know how to do, but we the right hand side makes sense because {R^n} is a compact topological group so we integrate with respect to the normalized Haar measure. Now we’re done, because modulo standard arguments that everything patches together we can define {\int_X \omega} in terms of these local patches (the reason for being able to patch without bump functions will be clear in a moment, but roughly on overlaps the form will differ by a unit with valuation {1}).

This allows us to define a “volume form” for smooth {p}-adic schemes. We will call an {n}-form a volume form if it is nowhere vanishing (i.e. it trivializes {\Omega^n}). You might be scared that the volume you get by integrating isn’t well-defined. After all, on a real manifold you can just scale a non-vanishing {n}-form to get another one, but the integral will be scaled by that constant.

We’re in luck here, because if {\omega} and {\omega'} are both volume forms, then there is some non-vanishing function such that {\omega=f\omega'}. Since {f} is never {0}, it is invertible, and hence is a unit. This means {|f(x)|_p=1}, so since we can only get other volume forms by scaling by a function with {p}-adic valuation {1} everywhere the volume is a well-defined notion under this definition! (A priori, there could be a bunch of “different” forms, though).

It turns out to actually be a really useful notion as well. If we want to compute the volume of {X/R}, then there is a natural way to do it with our set-up. Consider the reduction mod {\mathfrak{m}} map {\phi: X(R)\rightarrow X(\mathbb{F}_q)}. The fiber over any point is a {p}-adic open set, and they partition {X(R)} into a disjoint union of {|X(\mathbb{F}_q)|} mutually isomorphic sets (recall the reduction map is surjective here by the relevant variant on Hensel’s lemma). Fix one point {x_0\in X(\mathbb{F}_q)}, and define {U:=\phi^{-1}(x_0)}. Then by the above analysis we get

\displaystyle Vol(X)=\int_X \omega=|X(\mathbb{F}_q)|\int_{U}\omega

All we have to do is compute this integral over one open now. By our smoothness hypothesis, we can find a regular system of parameters {x_1, \ldots, x_n\in \mathcal{O}_{X, x_0}}. This is a legitimate choice of coordinates because they define a {p}-adic analytic isomorphism with {\mathfrak{m}^n\subset R^n}.

Now we use the same silly trick as before. Suppose {\omega=fdx_1\wedge \cdots \wedge dx_n}, then since {\omega} is a volume form, {f} can’t vanish and hence {|f(x)|_p=1} on {U}. Thus

\displaystyle \int_{U}\omega=\int_{\mathfrak{m}^n}dx_1\cdots dx_n=\frac{1}{q^n}

This tells us that no matter what {X/R} is, if there is a volume form (which often there isn’t), then the volume

\displaystyle Vol(X)=\frac{|X(\mathbb{F}_q)|}{q^n}

just suitably multiplies the number of {\mathbb{F}_q}-rational points there are by a factor dependent on the size of the residue field and the dimension of {X}. Next time we’ll talk about the one place I know of that this has been a really useful idea.

Newton Polygons of p-Divisible Groups

I really wanted to move on from this topic, because the theory gets much more interesting when we move to {p}-divisible groups over some larger rings than just algebraically closed fields. Unfortunately, while looking over how Demazure builds the theory in Lectures on {p}-divisible Groups, I realized that it would be a crime to bring you this far and not concretely show you the power of thinking in terms of Newton polygons.

As usual, let’s fix an algebraically closed field of positive characteristic to work over. I was vague last time about the anti-equivalence of categories between {p}-divisible groups and {F}-crystals mostly because I was just going off of memory. When I looked it up, I found out I was slightly wrong. Let’s compute some examples of some slopes.

Recall that {D(\mu_{p^\infty})\simeq W(k)} and {F=p\sigma}. In particular, {F(1)=p\cdot 1}, so in our {F}-crystal theory we get that the normalized {p}-adic valuation of the eigenvalue {p} of {F} is {1}. Recall that we called this the slope (it will become clear why in a moment).

Our other main example was {D(\mathbb{Q}_p/\mathbb{Z}_p)\simeq W(k)} with {F=\sigma}. In this case we have {1} is “the” eigenvalue which has {p}-adic valuation {0}. These slopes totally determine the {F}-crystal up to isomorphism, and the category of {F}-crystals (with slopes in the range {0} to {1}) is anti-equivalent to the category of {p}-divisible groups.

The Dieudonné-Manin decomposition says that we can always decompose {H=D(G)\otimes_W K} as a direct sum of vector spaces indexed by these slopes. For example, if I had a height three {p}-divisible group, {H} would be three dimensional. If it decomposed as {H_0\oplus H_1} where {H_0} was {2}-dimensional (there is a repeated {F}-eigenvalue of slope {0}), then {H_1} would be {1}-dimensional, and I could just read off that my {p}-divisible group must be isogenous to {G\simeq \mu_{p^\infty}\oplus (\mathbb{Q}_p/\mathbb{Z}_p)^2}.

In general, since we have a decomposition {H=H_0\oplus H' \oplus H_1} where {H'} is the part with slopes strictly in {(0,1)} we get a decomposition {G\simeq (\mu_{p^\infty})^{r_1}\oplus G' \oplus (\mathbb{Q}_p/\mathbb{Z}_p)^{r_0}} where {r_j} is the dimension of {H_j} and {G'} does not have any factors of those forms.

This is where the Newton polygon comes in. We can visually arrange this information as follows. Put the slopes of {F} in increasing order {\lambda_1, \ldots, \lambda_r}. Make a polygon in the first quadrant by plotting the points {P_0=(0,0)}, {P_1=(\dim H_{\lambda_1}, \lambda_1 \dim H_{\lambda_1})}, … , {\displaystyle P_j=\left(\sum_{l=1}^j\dim H_{\lambda_l}, \sum_{l=1}^j \lambda_l\dim H_{\lambda_l}\right)}.

This might look confusing, but all it says is to get from {P_{j}} to {P_{j+1}} make a line segment of slope {\lambda_j} and make the segment go to the right for {\dim H_{\lambda_j}}. This way you visually encode the slope with the actual slope of the segment, and the longer the segment is the bigger the multiplicity of that eigenvalue.

But this way of encoding the information gives us something even better, because it turns out that all these {P_i} must have integer coordinates (a highly non-obvious fact proved in the book by Demazure listed above). This greatly restricts our possibilities for Dieudonné {F}-crystals. Consider the height {2} case. We have {H} is two dimensional, so we have {2} slopes (possibly the same). The maximal {y} coordinate you could ever reach is if both slopes were maximal which is {1}. In that case you just get the line segment from {(0,0)} to {(2,2)}. The lowest you could get is if the slopes were both {0} in which case you get a line segment {(0,0)} to {(2,0)}.

Every other possibility must be a polygon between these two with integer breaking points and increasing order of slopes. Draw it (or if you want to cheat look below). You will see that there are obviously only two other possibilities. The one that goes {(0,0)} to {(1,0)} to {(2,1)} which is a slope {0} and slope {1} and corresponds to {\mu_{p^\infty}\oplus \mathbb{Q}_p/\mathbb{Z}_p} and the one that goes {(0,0)} to {(2,1)}. This corresponds to a slope {1/2} with multiplicity {2}. This corresponds to the {E[p^\infty]} for supersingular elliptic curves. That recovers our list from last time.

We now just have a bit of a game to determine all height {3} {p}-divisible groups up to isogeny (and it turns out in this small height case that determines them up to isomorphism). You can just draw all the possibilities for Newton polygons as in the height {2} case to see that the only {6} possibilities are {(\mu_{p^\infty})^3}, {(\mu_{p^\infty})^2\oplus \mathbb{Q}_p/\mathbb{Z}_p}, {\mu_{p^\infty}\oplus (\mathbb{Q}_p/\mathbb{Z}_p)^2}, {(\mathbb{Q}_p/\mathbb{Z}_p)^3}, and then two others: {G_{1/3}} which corresponds to the thing with a triple eigenvalue of slope {1/3} and {G_{2/3}} which corresponds to the thing with a triple eigenvalue of slope {2/3}.

To finish this post (and hopefully topic!) let’s bring this back to elliptic curves one more time. It turns out that {D(E[p^\infty])\simeq H^1_{crys}(E/W)}. Without reminding you of the technical mumbo-jumbo of crystalline cohomology, let’s think why this might be reasonable. We know {E[p^\infty]} is always height {2}, so {D(E[p^\infty])} is rank {2}. But if we consider that crystalline cohomology should be some sort of {p}-adic cohomology theory that “remembers topological information” (whatever that means), then we would guess that some topological {H^1} of a “torus” should be rank {2} as well.

Moreover, the crystalline cohomology comes with a natural Frobenius action. But if we believe there is some sort of Weil conjecture magic that also applies to crystalline cohomology (I mean, it is a Weil cohomology theory), then we would have to believe that the product of the eigenvalues of this Frobenius equals {p}. Recall in the “classical case” that the characteristic polynomial has the form {x^2-a_px+p}. So there are actually only two possibilities in this case, both slope {1/2} or one of slope {1} and the other of slope {0}. As we’ve noted, these are the two that occur.

In fact, this is a more general phenomenon. When thinking about {p}-divisible groups arising from algebraic varieties, because of these Weil conjecture type considerations, the Newton polygons must actually fit into much narrower regions and sometimes this totally forces the whole thing. For example, the enlarged formal Brauer group of an ordinary K3 surface has height {22}, but the whole Newton polygon is fully determined by having to fit into a certain region and knowing its connected component.

More Classification of p-Divisible Groups

Today we’ll look a little more closely at {A[p^\infty]} for abelian varieties and finish up a different sort of classification that I’ve found more useful than the one presented earlier as triples {(M,F,V)}. For safety we’ll assume {k} is algebraically closed of characteristic {p>0} for the remainder of this post.

First, let’s note that we can explicitly describe all {p}-divisible groups over {k} up to isomorphism (of any dimension!) up to height {2} now. This is basically because height puts a pretty tight constraint on dimension: {ht(G)=\dim(G)+\dim(G^D)}. If we want to make this convention, we’ll say {ht(G)=0} if and only if {G=0}, but I’m not sure it is useful anywhere.

For {ht(G)=1} we have two cases: If {\dim(G)=0}, then it’s dual must be the unique connected {p}-divisible group of height {1}, namely {\mu_{p^\infty}} and hence {G=\mathbb{Q}_p/\mathbb{Z}_p}. The other case we just said was {\mu_{p^\infty}}.

For {ht(G)=2} we finally get something a little more interesting, but not too much more. From the height {1} case we know that we can make three such examples: {(\mu_{p^\infty})^{\oplus 2}}, {\mu_{p^\infty}\oplus \mathbb{Q}_p/\mathbb{Z}_p}, and {(\mathbb{Q}_p/\mathbb{Z}_p)^{\oplus 2}}. These are dimensions {2}, {1}, and {0} respectively. The first and last are dual to each other and the middle one is self-dual. Last time we said there was at least one more: {E[p^\infty]} for a supersingular elliptic curve. This was self-dual as well and the unique one-dimensional connected height {2} {p}-divisible group. Now just playing around with the connected-étale decomposition, duals, and numerical constraints we get that this is the full list!

If we could get a bit better feel for the weird supersingular {E[p^\infty]} case, then we would have a really good understanding of all {p}-divisible groups up through height {2} (at least over algebraically closed fields).

There is an invariant called the {a}-number for abelian varieties defined by {a(A)=\dim Hom(\alpha_p, A[p])}. This essentially counts the number of copies of {\alpha_p} sitting inside the truncated {p}-divisible group. Let’s consider the elliptic curve case again. If {E/k} is ordinary, then we know {E[p]} explicitly and hence can argue that {a(E)=0}. For the supersingular case we have that {E[p]} is actually a non-split semi-direct product of {\alpha_p} by itself and we get that {a(E)=1}. This shows that the {a}-number is an invariant that is equivalent to knowing ordinary/supersingular.

This is a phenomenon that generalizes. For an abelian variety {A/k} we get that {A} is ordinary if and only if {a(A)=0} in which case the {p}-divisible group is a bunch of copies of {E[p^\infty]} for an ordinary elliptic curve, i.e. {A[p^\infty]\simeq E[p^\infty]^g}. On the other hand, {A} is supersingular if and only if {A[p^\infty]\simeq E[p^\infty]^g} for {E/k} supersingular (these two facts are pretty easy if you use the {p}-rank as the definition of ordinary and supersingular because it tells you the étale part and you mess around with duals and numerics again).

Now that we’ve beaten that dead horse beyond recognition, I’ll point out one more type of classification which is the one that comes up most often for me. In general, there is not redundant information in the triple {(M, F, V)}, but for special classes of {p}-divisible groups (for example the ones I always work with explained here) all you need to remember is the {(M, F)} to recover {G} up to isomorphism.

A pair {(M,F)} of a free, finite rank {W}-module equipped with a {\phi}-linear endomorphism {F} is sometimes called a Cartier module or {F}-crystal. Every Dieudonné module of a {p}-divisible group is an example of one of these. We could also consider {H=M\otimes_W K} where {K=Frac(W)} to get a finite dimensional vector space in characteristic {0} with a {\phi}-linear endomorphism preserving the {W}-lattice {M\subset H}.

Passing to this vector space we would expect to lose some information and this is usually called the associated {F}-isocrystal. But doing this gives us a beautiful classification theorem which was originally proved by Diedonné and Manin. We have that {H} is naturally an {A}-module where {A=K[T]} is the noncommutative polynomial ring {T\cdot a=\phi(a)\cdot T}. The classification is to break up {H\simeq \oplus H_\alpha} into a slope decomposition.

These {\alpha} are just rational numbers corresponding to the slopes of the {F} operator. The eigenvalues {\lambda_1, \ldots, \lambda_n} of {F} are not necessarily well-defined, but if we pick the normalized valuation {ord(p)=1}, then the valuations of the eigenvalues are well-defined. Knowing the slopes and multiplicities completely determines {H} up to isomorphism, so we can completely capture the information of {H} in a simple Newton polygon. Note that when {H} is the {F}-isocrystal of some Dieudonné module, then the relation {FV=VF=p} forces all slopes to be between 0 and 1.

Unfortunately, knowing {H} up to isomorphism only determines {M} up to equivalence. This equivalence is easily seen to be the same as an injective map {M\rightarrow M'} whose cokernel is a torsion {W}-module (that way it becomes an isomorphism when tensoring with {K}). But then by the anti-equivalence of categories two {p}-divisible groups (in the special subcategory that allows us to drop the {V}) {G} and {G'} have equivalent Dieudonné modules if and only if there is a surjective map {G' \rightarrow G} whose kernel is finite, i.e. {G} and {G'} are isogenous as {p}-divisible groups.

Despite the annoying subtlety in fully determining {G} up to isomorphism, this is still really good. It says that just knowing the valuation of some eigenvalues of an operator on a finite dimensional characteristic {0} vector space allows us to recover {G} up to isogeny.