Surviving Upper Division Math

It’s that time of the year. Classes are starting up. You’re nervous and excited to be taking some of your first “real” math classes called things like “Abstract Algebra” or “Real Anaylsis” or “Topology.”

It goes well for the first few weeks as the professor reviews some stuff and gets everyone on the same page. You do the homework and seem to be understanding.

Then, all of a sudden, you find yourself sitting there, watching an hour-long proof of a theorem you can’t even remember the statement of, using techniques you’ve never heard of.

You panic. Is this going to be on the test?

We’ve all been there.

I’ve been that teacher, I’m sad to say, where it’s perfectly clear in my head that the students are not supposed to regurgitate any of this. The proof is merely there for rigor and exposure to some ideas. It’s clear in my head which ideas are the key ones, though I maybe forgot to point it out carefully.

It’s a daunting situation for the best students in the class and a downright nightmare for the weaker ones.

Then it gets worse. Once your eyes glaze over that first time, it seems the class gets more and more abstract as the weeks go by, filled with more and more of these insanely long proofs and no examples to illuminate the ideas.

Here’s some advice for surviving these upper division math classes. I’m sure people told me this dozens of times, but I tended to ignore it. I only learned how effective it was when I got to grad school.

Disclaimer: Everyone is different. Do what works for you. This worked for me and may only end up frustrating someone with a different learning style.

Tip Summary: Examples, examples, examples!

I used to think examples were something given in a textbook to help me work the problems. They gave me a model of how to do things.

What I didn’t realize was that examples are how you’re going to remember everything: proofs, theorems, concepts, problems, and so on.

Every time you come to a major theorem, write out the converse, inverse, switch some quantifiers, remove hypotheses, weaken hyphotheses, strengthen conclusions, and whatever you can think of to mess it up.

When you do this you’ll produce a bunch of propositions that are false! Now come up with examples to show they’re false (and get away from that textbook when you do this!). Maybe some rearrangement of the theorem turns out to be true, and so you can’t figure out a counterexample.

This is good, too! I cannot overstate how much you will drill into your memory by merely trying unsuccessfully to find a counterexample to a true statement. You’ll start to understand and see¬†why it’s probably true, which will help you follow along to the proof.

As someone who has taught these classes, I assure you that a huge amount of problems students have on a test would be solved by doing this. Students try to memorize too much, and then when they get to a test, they start to question: was that a “for every” or “there exists?” Does the theorem go this way or that?

You must make up your own examples, so when you have a question like that, the answer comes immediately. It’s so easy to forget the tiniest little hypothesis under pressure.

It’s astounding the number of times I’ve seen someone get to a point in a proof where it looks like everything is in place, but it’s not. Say you’re at a step where f: X\to Y is a continuous map of topological spaces, and X is connected. You realize you can finish the proof if Y is connected.

You “remember” this is a theorem from the book! You’re done!

Woops. It turns out that f has to be surjective to make that true.

But now imagine, before the test, you read that theorem and you thought: what’s a counterexample if I remove the surjective hypothesis?

The example you came up with was so easy and took no time at all. It’s f: [0,1] \to \{0\} \cup \{1\} given by f(x) = 1. This example being in your head saves you from bombing that question.

If you just try to memorize the examples in the book or that the professor gives you, that’s just more memorization, and you could run into trouble. By going through the effort of making your own examples, you’ll have the confidence and understanding to do it again in a difficult situation.

A lesser talked about benefit is that having a bunch of examples that you understand gives you something concrete to think about when watching these proofs. So when the epsilons and deltas and neighborhoods of functions and uniform convergence and on and on start to make your eyes glaze over, you can picture the examples you’ve already constructed.

Instead of thinking in abstract generality, you can think: why does that step of the proof work or not work if f_n(x) = x^n?

Lastly, half the problems on undergraduate exams are going to be examples. So, if you already know them, you can spend all your time on the “harder” problems.

Other Tip: Partial credit is riskier than in lower division classes.

There’s this thing that a professor will never tell you, but it’s true: saying wrong things on a test is worse than saying nothing at all.

Let me disclaimer again. Being wrong and confused is soooo important to the process of learning math. You have to be unafraid to try things out on homework and quizzes and tests and office hours and on your own.

Then you have to learn why you were wrong. When you’re wrong, make more examples!

Knowing a bunch of examples will make it almost impossible for you to say something wrong.

Here’s the thing. There comes a point every semester where the professor has to make a judgment call on how much you understand. If they know what they’re doing, they’ll wait until the final exam.

The student that spews out a bunch of stuff in the hopes of partial credit is likely to say something wrong. When we’re grading and see something wrong (like misremembering that theorem above), a red flag goes off: this student doesn’t understand that concept.

A student that writes nothing on a problem or only a very small amount that is totally correct will be seen as superior. This is because it’s okay to not be able to do a problem if you understand you didn’t know how to do it. That’s a way to demonstrate you’re understanding. In other words: know what you don’t know.

Now, you shouldn’t be afraid to try, and this is why the first tip is so much more important than this other tip (and will often depend on the instructor/class).

And the best way to avoid using a “theorem” that’s “obviously wrong” is to test any theorem you quote against your arsenal of examples. As you practice this, it will become second-nature and make all of these classes far, far easier.


PDE’s and Frobenius Theorem

I’ve started many blog posts on algebra/algebraic geometry, but they won’t get finished and posted for a little while. I’ve been studying for a test I have to take in a few weeks in differential geometry-esque things. So I’ll do a few posts on things that I think are usually considered pretty easy and obvious to most people, but are just things I never sat down and figured out. Hopefully this set of posts will help others who are confused as I recently was.

My first topic is about the Frobenius Theorem. I’ve posted about it before. Here’s the general idea of it: If {M} is a smooth manifold and {D} is a smooth distribution on it, then {D} is involutive if and only if it is completely integrable (i.e. there is are local flat charts for the distribution).

What does this have to do with being able to solve partial differential equations? I’ve always heard that it does, but other than the symbol {\displaystyle\frac{\partial}{\partial x}} appearing in the defining of a distribution or of the flat chart, I’ve never figured it out.

Let’s go through this with some examples. Are there any non-constant solutions {f\in C^\infty (\mathbb{R}^3)} to the systems of equations: {\displaystyle \frac{\partial f}{\partial x}-y\frac{\partial f}{\partial z}=0} and {\displaystyle \frac{\partial f}{\partial y}+x\frac{\partial f}{\partial z}=0}?

Until a few days ago, I would have never thought we could use the Frobenius Theorem to do this. Suppose {f} were such a solution. Define the vector fields {\displaystyle X=\frac{\partial}{\partial x}-y\frac{\partial}{\partial z}} and {\displaystyle Y=\frac{\partial}{\partial y}+x\frac{\partial}{\partial z}} and define the distribution {D_p=\text{span} \{X_p, Y_p\}}.

Choose a regular value of {f}, say {C} (one exists by say Sard’s Theorem). Then {f=C} is a 2-dimensional submanifold {M\subset \mathbb{R}^3}, and since {f} is a defining function {T_pM=ker(Df_p)}. But the very fact that {f} satisfies, by assumption, {X(f)=0} and {Y(f)=0}, we have {T_pM=\text{span} \{X_p, Y_p\}}. I.e. {M} is an integral manifold for the distribution {D}. Thus {D} must be involutive.

Just check now. {\displaystyle [X,Y]=2\frac{\partial}{\partial z}}, so in particular at the origin {\displaystyle X_0=\frac{\partial}{\partial x}} and {\displaystyle Y_0=\frac{\partial}{\partial y}} it is not in the span, and hence not involutive. Thus no such {f} exists. This didn’t even use Frobenius.

Now let’s spice up the language and difficulty. Is it possible to find a function {z=f(x,y)}, {C^\infty} in a neighborhood of {(0,0)}, such that {f(0,0)=0} and {\displaystyle df=(ye^{-(x+y)}-f)dx+(xe^{-(x+y)}-f)dy}? Alright, the {d} phrasing is just asking there is a local solution to the system {\displaystyle \frac{\partial f}{\partial x}=ye^{-(x+y)}-f} and {\displaystyle \frac{\partial f}{\partial y}=x^{-(x+y)}-f}. Uh oh. The above method fails us now since it isn’t homogeneous.

Alright, so let’s extrapolate a little. We have a system of the form {\displaystyle \frac{\partial f}{\partial x}=\alpha(x,y,f)} and {\displaystyle \frac{\partial f}{\partial y}=\beta(x,y,f)}. The claim is that necessary and sufficient conditions to have a local solution to this system is {\displaystyle \frac{\partial \alpha}{\partial y}+\beta\frac{\partial \alpha}{\partial z}=\frac{\partial \beta}{\partial x}+\alpha \frac{\partial \beta}{\partial z}}.

I won’t go through the details of the proof, but the main idea is not bad. Define the distribution spanned by {\displaystyle X=\frac{\partial}{\partial x}+\alpha\frac{\partial}{\partial z}} and {\displaystyle Y=\frac{\partial}{\partial y}+\beta\frac{\partial}{\partial z}}.

Then use that assumption to see that {[X,Y]=0} and hence the distribution is involutive and hence there is an integral manifold for the distribution by the Frobenius Theorem. If {g} is a local defining function to that integral manifold, then we can hit that with the Implicit Function Theorem and get that {z=f(x,y)} (the implicit function) is a local solution.

If we go back to that original problem, we can easily check that the sufficient condition is met and hence that local solution exists.

I had one other neat little problem, but it doesn’t really fit in here other than the fact that solutions to PDEs are involved.

Harmonic Growth as Related to Complex Analytic Growth

Let’s change gears a bit. This post will be on something I haven’t talked about in probably a year…that’s right, analysis. Since the last post was short, I’ll do another quick one. The past few days have had varying efforts to solve a problem of the form if f is an analytic function and we know that |Ref(z)|\leq M|z|^k (for large |z| say), do we actually know something like |f(z)|\leq M|z|^k?

Let’s rephrase this a bit. Essentially we’re talking about growth. It would be sufficient to show something along the lines of: if u is harmonic, and grows at some rate, then v the harmonic conjugate also must grow at a related rate. But all of this growth talk is vague. What does this even mean?

One measure of growth would be |\nabla u|=\sqrt{\left(\frac{\partial u}{\partial x}\right)^2+\left(\frac{\partial u}{\partial y}\right)^2}. In fact, gradient points in the direction of greater change, so this is in some sense an upper bound on the growth. Another is f'(z). Does this help? Well, first off, if this is our notion of growth, then by the Cauchy-Riemann equations, we immediately get that the harmonic conjugate grows exactly the same: |\nabla u|=|\nabla v|. Let’s check how useful this is in recovering growth of f.

Since I haven’t talked about complex analysis much, note that the derivative operator for complex functions is \frac{1}{2}\left(\frac{\partial}{\partial x}-i\frac{\partial}{\partial y}\right).

Now f'(z)=\frac{1}{2}\left(\frac{\partial(u+iv)}{\partial x}-i\frac{\partial(u+iv)}{\partial y}\right)
= \frac{1}{2}\left(\frac{\partial u}{\partial x}+\frac{\partial v}{\partial y}+i\left(\frac{\partial v}{\partial x}-\frac{\partial u}{\partial y}\right)\right)
= \frac{\partial u}{\partial x}-i\frac{\partial u}{\partial y} by Cauchy-Riemann

Thus |f'(z)|=|\nabla u|.

Did this solve our original problem? Yes. Since if we work out the partial derivatives we get that if $|u|\leq M(x^2+y^2)^k/2$, then |\nabla u(z)|\leq Mk|z|^{k-1}.

In particular, |f'(z)|\leq Mk|z|^{k-1}. So we wanted to show that f was a polynomial of degree at most k, and we can now use Cauchy estimates to get that.

If any of what I just wrote is true, then there is some really obvious way of doing it that isn’t messy like this at all. I mean, the result is |f'(z)|=|\nabla u|. Is this for real? Am I horribly mistaken? I can’t find this in any book…

Zeros of Analytic Functions

A strange property of analytic functions is that the zeros are isolated. I don’t remember the proof I originally learned of this fact, but today I saw a really interesting topological way to do it. It makes sense now.

More precise formulation: If \Omega\subset\mathbb{C} is a connected open set, then \{z: f(z)=0\} consists of isolated points if f is analytic on \Omega. (Oops, I started writing this up and realized that I need to trivially throw out the case where f\equiv 0.

Proof: Let U_1=\{a\in\Omega : \exists\delta>0, \ f(z)\neq 0 \ on \ 0<|z-a|<\delta\} and let U_2=\{a\in\Omega : \exists\delta>0, \ f(z)\equiv 0 \ on \ 0<|z-a|<\delta \}. Reformulating the setup we see that U_1 means: if f has a zero, it is isolated since f is nonzero on a punctured disk (meaning the zero must be the punctured part). Also U_2 is just the regions that f has non-isolated zeros.

It is straightforward to check that both U_1 and U_2 are open (just choose \delta‘s sufficiently small to stay inside the declared sets). Also we have that U_1\cap U_2=\emptyset and I now claim \Omega=U_1\cup U_2.

This seems obvious, but should be pinned down in some sort of argument. Let z_0\in\Omega. We claim that there is a punctured disk about z_0 such that either f\equiv 0 on the disk or f\neq 0 anywhere on the disk. By analyticity, we have a power series convergent on some radius r>0 about z_0, i.e. f(z)=\sum_{n=0}^\infty a_n(z-z_0)^n on |z-z_0|<r.

Suppose a_k is the first nonzero coefficient (by not being equivalently zero, this must exist). Then f(z)=\sum_{n=0}^\infty a_{n+k}(z-z_0)^n=(z-z_0)^{-k}\sum_{n=k}a_n(z-z_0)^n. So since the series converges in 0<|z-z_0|<r and since f is continuous we can choose 0<\delta<r small enough so that |f(z)-f(z_0)|=|f(z)-a_k|<\frac{|a_k|}{2}. This clearly shows that f(z)\neq 0 on 0<|z-z_0|<\delta else we’d have |a_k|<\frac{|a_k|}{2}. So either there is a punctured disk on which f is non-zero, or the f has no first non-zero coefficient making it zero everywhere on that first disk |z-z_0|<r proving the claim.

The properties U_1\cap U_2=\empty and \Omega=U_1\cup U_2 (along with both sets being open) combine to give that either U_1=\emptyset or U_2=\emptyset by the connectedness of \Omega. This simply means that all the zeros are isolated since we ruled out the alternative of being equivalently zero.

This goes to show how remarkably different analytic on \mathbb{C} is to continuous on \mathbb{R}. In fact, even infinitely differentiable functions on \mathbb{R}. Bump functions play a crucial role in many areas of analysis and they are smooth functions with compact support meaning that outside of a bounded they are zero. An entire class of important functions violates this property that analytic functions are guaranteed to have.

Banach Algebra Homomorphism

I’m in no mood to do something challenging after this last ditch effort to learn analysis before my prelim, so I’ll do something nice (functional analytic like I promised) that never ceases to amaze me.

Theorem: If \phi is a complex homomorphism on a Banach algebra A, then the norm of \phi, as a linear functional, is at most 1.

Recall that a Banach algebra is just a Banach space (complete normed linear space) with a multiplication that satisfies \|xy\|\leq\|x\|\|y\|, associativity, distributivity, and (\alpha x)y=x(\alpha y)=\alpha (xy) for any scalar \alpha.

Complex homomorphisms are just linear functionals that preserve multiplication \phi(\alpha x+\beta y)=\alpha\phi(x)+\beta\phi(y) and \phi(xy)=\phi(x)\phi(y).

Assume not, i.e. there exists x_0\in A such that |\phi(x_0)|>\|x_0\|. To simplify notation, let \displaystyle \lambda=\phi(x_0) and let \displaystyle x=\frac{x_0}{\lambda}. Then \displaystyle \|x\|=\frac{\|x_0\|}{\lambda}<1 and \displaystyle\phi(x)=\phi(\frac{x_0}{\lambda})=1.

Now \|x^n\|\leq\|x\|^n so s_n=-x-x^2-\cdots-x^n \in A form a Cauchy sequence. Now A is a Banach space and hence complete, so there exists y\in A such that \|y-s_n\|\to 0. But now factor to see that x+s_n=xs_{n-1} and take the limit to get x+y=xy. Now take the homomorphism of both sides, and we have a contradiction \phi(x)+\phi(y)=\phi(x)\phi(y) (in particular 1+\phi(y)=\phi(y)).

So some reasons why this may not be all that shocking: we require these to be complex, and complex things tend to work out nicer than real. Also, these are pretty stringent conditions on what constitutes a Banach algebra and what constitutes a homomorphism. We should be able to get some nice structure with all the tools available. It isn’t like we got a lot. Really we’re just saying that these things are bounded and hence continuous, which isn’t all that surprising.

OK. I’ll stop down playing it. It does surprise me.

Applying Covering Theorems

I’ve searched far and wide to not do one of the standard applications that are in all grad analysis texts (yes I’m referring to the Hardy-Littlewood maximal function being weakly bounded). We are getting into the parts of analysis that I despise (it will all be over in 4 days…I hope *crosses fingers*).

Claim: If f: [0,1]\to\mathbb{R} is an increasing function and we define \displaystyle g(x)=\limsup_{h\to 0}\frac{f(x+h)-f(x)}{h} (so almost a Dini derivative), then the outer measure m^*\{x: g(x)>1\}\leq f(1)-f(0). I know, it is rather similar to the maximal function theorem, but it is hard to find something that utilizes these tools that doesn’t have the same flavor.

Proof: Call S=\{x: g(x)>1\}. Then for any x\in S we can find an h>0 as small as we like so that f(x+h)-f(x)>h (If some h were smallest, then the limsup wouldn’t be >1). Now we just cover [0,1] with these intervals [x+h, x]. It is a Vitali covering, since we’ve already checked that the diameter of the intervals (the h’s) can be made arbitrarily small.

Let \epsilon>0. Now by our Theorem we can choose a finite disjoint collection of them say \{[x_n+h_n, x_n]\}_1^N such that m(\cup [x_n+h_n, x_n])>m^*(S)-\epsilon (just the definition of how outer measure relates to measure).

Now: f(1)-f(0)\geq\sum_{n=1}^N(f(x_n+h_n)-f(x_n))

>\sum_{n=1}^N h_n

= m(\cup [x_n+h_n, x_n])

> m^*(S) - \epsilon.

Since \epsilon was arbitrary m^*(S)\leq f(1)-f(0).

Some notes: Often times you don’t need the full Vitali Covering Theorem (it may have been overkill here even, but heck I wanted to use it). Also, the setup for these things is almost always in this standard form. If you see m(\{x: F(x)>blah\})\leq blahblah, then you are bound to have to use one of these Lemmas or Theorems.

I never really understood the huge fuss over maximal functions, but here is the definition: Given f\in L^1, we define \displaystyle Mf(x)=\sup_{0<r<\infty}\frac{1}{m(B(x,r))}\int_{B(x,r)}|f(y)|dm(y). We get the inequality that m\{x: Mf(x)>\lambda\}\leq \lambda^{-1}3^k\|f\|_1 where we are working in \mathbb{R}^k. So this basically says that M: L^1\to \text{weak} L^1. Also note that when you see things like 3^k or 5^k we probably won’t need the full theorem as those constants appear in the Lemmas. While we’re at it, we only get weak L^1, but if we are working in L^p for p>1, then M: L^p\to L^p (we use a much different technique from the Fourier Transform, though.

Ack! Moving on next time. Maybe functional analysis type stuff?

Edit: Does anyone else have issues LaTeXing square brackets?! They seem to work when they are in the middle of stuff, but never parse when I just want something like [0,1].

Covering Theorem (we use past Lemmas)

A brief break occured while I moved 2700 miles away. The important thing is I’m back, and we’re going to prove a big one today. First let’s define a Vitali covering. A set is Vitali covered by the collection of sets \mathcal{V} if for any \epsilon>0 and any x in the set, there exists a set V\in\mathcal{V} such that x\in V and diam(V)<\epsilon. So note that this is sort of stringent in that we always have to find one of the covering sets to be of arbitrarily small diameter at any point.

Vitali’s Covering Theorem (easy version): If \mathcal{I} is a sequence of intervals that Vitali covers an interval E\subset \mathbb{R}, then there is a countable disjoint subcollection of \mathcal{I} that covers E except for a set of Lebesgue measure 0. (Note I call this the easy version because it can be extended to finite dimensional space using balls and such, but this is easier to prove).

Proof: Suppose the hypothesis with same notation. The claim is that we can find I_n\in\mathcal{I} disjoint such that m\left(E\setminus\cup I_n\right)=0. All interval types have same measure, so WLOG assume the intervals are closed in \mathcal{I}. Define \mathcal{I}^* to be the collection of finite unions of disjoint intervals from \mathcal{I}.

Claim: If A\in\mathcal{I}^* and m(E\setminus A)>0 then there is a B\in\mathcal{I}^* such that A\cap B=\emptyset and m(E\setminus(A\cup B))<\frac{3}{4}m(E\setminus A).

Proof of claim: Since \mathcal{I} is a Vitali covering we can choose intervals of small enough diameter \{J_i\}_1^n\subset\mathcal{I} so that each J_i\subset E\setminus A (since it has positive measure there will be at least one of these). Since we don’t care about overlap right now, we can do this at enough points so that m(E\setminus(A\cup J_1\cup\cdots\cup J_n))<\frac{1}{12}m(E\setminus A). Now by the Vitali Covering Lemma of last time we can find a disjoint subset \{I_j\}_1^k\subset \{J_i\}_1^n so that m(\bigcup I_j)\geq \frac{1}{3}m(\bigcup J_i).

Then m(E\setminus (A\cup I_1\cup\cdots\cup I_k))

<\frac{2}{3}m(J_1\cup\cdots\cup J_n)+\frac{1}{12}m(E\setminus A)

\leq \frac{2}{3}m(E\setminus A)+\frac{1}{12}m(E\setminus A)

=\frac{3}{4}m(E\setminus A).

Thus B=I_1\cup\cdots\cup I_k\in\mathcal{I}^* works.

Now simply apply this inductively and use countable additivity of measure to get that m(E\setminus A_1\cup\cdots \cup A_k)\leq \left(\frac{3}{4}\right)^km(E), i.e. \displaystyle m\left(E\setminus \bigcup_{k=1}^\infty A_k\right)=0. We are done.

The generalization is exactly the same, except where you use Vitali Covering Lemma, you replace 3 with 3^n (notice that this relies on n being finite). It is not true in infinite dimensional spaces. Also, you can reformulate the the statement to use Hausdorff measure.