The Functor of Points Revisited

Mike Hopkins is giving the Milliman Lectures this week at the University of Washington and the first talk involved this idea that I’m extremely familiar with, but am also surprised at how unfamiliar most mathematicians are with it. I’ve made almost this exact post several other times, but it bears repeating. As I basked in the amazingness of this idea during the talk, I couldn’t help but notice how annoyed some people seemed to be at the level of abstractness and generality this notion forces on you.

Every branch of math has some crowning achievements and insights into how to actually think about something so that it works. The idea I’ll present in this post is a truly remarkable insight into geometry and topology. It is incredibly simple (despite the daunting language) which is what makes it so fascinating. Here is the idea. Suppose you care about some type of spaces (metric, topological, manifolds, varieties, …).

Let {X} be one of your spaces. In order to figure out what {X} is you could probe it by other spaces. What does this mean? It just means you look at maps {Y\rightarrow X}. If {X} is a topological space, then you can recover the points of {X} by considering all the maps from a singleton (i.e. point) {\{x\} \rightarrow X}. If you want to understand more about the topology, then you probe by some other spaces. Simple.

Even analysts use this idea all the time. A distribution {\phi} (on {\mathbb{R}}) is not a well-defined function, so you can’t just tell whether or not two distributions are the same by looking at values. Instead you probe it by test functions {\int \phi f dx}. If these probes give you the same thing for all test functions, then the distributions are the same. This is all we are doing with our spaces above, and this is all the Yoneda lemma is saying. It says that if the maps (test functions) to {X} and the maps to {Y} are the same, then {X} and {Y} are the same.

We can fancy up the language now. Considering maps to {X} is a functor {Hom(-,X): Spaces^{op} \rightarrow Set}. Such a functor is called a presheaf on the category of Spaces (recall, that for your particular situation this might be the category of smooth manifolds or metric spaces or algebraic varieties or …). Don’t be scared. This is literally the definition of presheaf, so if you were following to now, then introducing this term requires no new definitions.

The Yoneda lemma is saying something very simple in this fancy language. It says that there is a (fully faithful) embedding of Spaces into Pre(Spaces), the category of presheaves on Spaces. If we now work with this new category of functors, we just enlarge what we consider to be a space and this is of fundamental importance for many reasons. If {X} is one of our old spaces, then we can just naturally identify it with the presheaf {Hom(-,X)}. The reason Mike Hopkins is giving for why this is important is very different from the one I’ll give which just goes to show how incredibly useful this idea is.

In every single branch of math people care about some sort of classification problem. Classify all elliptic curves. What are the vector bundles on my manifold? If I fix a vector bundle, what are the connections on my vector bundle? What are the Borel measures on my metric space? The list goes on forever.

In general, classification is a hugely impossible task to grapple with. We know a ton of stuff about smooth manifolds, but how can we leverage that to make the seemingly unrelated problem of classifying vector bundles more manageable? Here our insight comes to the rescue, because there is a way to write down a functor that outputs vector bundles. There is subtlety in writing it down properly (and we should now land in Grpds instead of Set so that we can identify isomorphic ones), but once we do this we get a presheaf. In other words, we make a (generalized) space whose points are the objects we are classifying.

In many situations you then go on to prove that this moduli space of vector bundles is actually one of the original types of spaces (or not too far from one) we know a lot about. Now our impossible task of understanding what the vector bundles on my manifold are is reduced to the already studied problem of understanding the geometry of a manifold itself!

Here is my challenge to any analyst who knows about measures. Warning, this could be totally ridiculous and nonsense because it is based on reading Wikipedia for 5 minutes. Construct a presheaf of real-valued Radon measures on {\mathbb{R}}. Analyze this “space”. If it was done right, you should somehow recover that the space is the dual space to the convex space, {C_c(\mathbb{R})}, of compactly supported real-valued functions on {\mathbb{R}}. Congratulations, you’ve just started a new branch of math in which you classify measures on a space by analyzing the topology/geometry of the associated presheaf.


Applying Covering Theorems

I’ve searched far and wide to not do one of the standard applications that are in all grad analysis texts (yes I’m referring to the Hardy-Littlewood maximal function being weakly bounded). We are getting into the parts of analysis that I despise (it will all be over in 4 days…I hope *crosses fingers*).

Claim: If f: [0,1]\to\mathbb{R} is an increasing function and we define \displaystyle g(x)=\limsup_{h\to 0}\frac{f(x+h)-f(x)}{h} (so almost a Dini derivative), then the outer measure m^*\{x: g(x)>1\}\leq f(1)-f(0). I know, it is rather similar to the maximal function theorem, but it is hard to find something that utilizes these tools that doesn’t have the same flavor.

Proof: Call S=\{x: g(x)>1\}. Then for any x\in S we can find an h>0 as small as we like so that f(x+h)-f(x)>h (If some h were smallest, then the limsup wouldn’t be >1). Now we just cover [0,1] with these intervals [x+h, x]. It is a Vitali covering, since we’ve already checked that the diameter of the intervals (the h’s) can be made arbitrarily small.

Let \epsilon>0. Now by our Theorem we can choose a finite disjoint collection of them say \{[x_n+h_n, x_n]\}_1^N such that m(\cup [x_n+h_n, x_n])>m^*(S)-\epsilon (just the definition of how outer measure relates to measure).

Now: f(1)-f(0)\geq\sum_{n=1}^N(f(x_n+h_n)-f(x_n))

>\sum_{n=1}^N h_n

= m(\cup [x_n+h_n, x_n])

> m^*(S) - \epsilon.

Since \epsilon was arbitrary m^*(S)\leq f(1)-f(0).

Some notes: Often times you don’t need the full Vitali Covering Theorem (it may have been overkill here even, but heck I wanted to use it). Also, the setup for these things is almost always in this standard form. If you see m(\{x: F(x)>blah\})\leq blahblah, then you are bound to have to use one of these Lemmas or Theorems.

I never really understood the huge fuss over maximal functions, but here is the definition: Given f\in L^1, we define \displaystyle Mf(x)=\sup_{0<r<\infty}\frac{1}{m(B(x,r))}\int_{B(x,r)}|f(y)|dm(y). We get the inequality that m\{x: Mf(x)>\lambda\}\leq \lambda^{-1}3^k\|f\|_1 where we are working in \mathbb{R}^k. So this basically says that M: L^1\to \text{weak} L^1. Also note that when you see things like 3^k or 5^k we probably won’t need the full theorem as those constants appear in the Lemmas. While we’re at it, we only get weak L^1, but if we are working in L^p for p>1, then M: L^p\to L^p (we use a much different technique from the Fourier Transform, though.

Ack! Moving on next time. Maybe functional analysis type stuff?

Edit: Does anyone else have issues LaTeXing square brackets?! They seem to work when they are in the middle of stuff, but never parse when I just want something like [0,1].

Covering Theorem (we use past Lemmas)

A brief break occured while I moved 2700 miles away. The important thing is I’m back, and we’re going to prove a big one today. First let’s define a Vitali covering. A set is Vitali covered by the collection of sets \mathcal{V} if for any \epsilon>0 and any x in the set, there exists a set V\in\mathcal{V} such that x\in V and diam(V)<\epsilon. So note that this is sort of stringent in that we always have to find one of the covering sets to be of arbitrarily small diameter at any point.

Vitali’s Covering Theorem (easy version): If \mathcal{I} is a sequence of intervals that Vitali covers an interval E\subset \mathbb{R}, then there is a countable disjoint subcollection of \mathcal{I} that covers E except for a set of Lebesgue measure 0. (Note I call this the easy version because it can be extended to finite dimensional space using balls and such, but this is easier to prove).

Proof: Suppose the hypothesis with same notation. The claim is that we can find I_n\in\mathcal{I} disjoint such that m\left(E\setminus\cup I_n\right)=0. All interval types have same measure, so WLOG assume the intervals are closed in \mathcal{I}. Define \mathcal{I}^* to be the collection of finite unions of disjoint intervals from \mathcal{I}.

Claim: If A\in\mathcal{I}^* and m(E\setminus A)>0 then there is a B\in\mathcal{I}^* such that A\cap B=\emptyset and m(E\setminus(A\cup B))<\frac{3}{4}m(E\setminus A).

Proof of claim: Since \mathcal{I} is a Vitali covering we can choose intervals of small enough diameter \{J_i\}_1^n\subset\mathcal{I} so that each J_i\subset E\setminus A (since it has positive measure there will be at least one of these). Since we don’t care about overlap right now, we can do this at enough points so that m(E\setminus(A\cup J_1\cup\cdots\cup J_n))<\frac{1}{12}m(E\setminus A). Now by the Vitali Covering Lemma of last time we can find a disjoint subset \{I_j\}_1^k\subset \{J_i\}_1^n so that m(\bigcup I_j)\geq \frac{1}{3}m(\bigcup J_i).

Then m(E\setminus (A\cup I_1\cup\cdots\cup I_k))

<\frac{2}{3}m(J_1\cup\cdots\cup J_n)+\frac{1}{12}m(E\setminus A)

\leq \frac{2}{3}m(E\setminus A)+\frac{1}{12}m(E\setminus A)

=\frac{3}{4}m(E\setminus A).

Thus B=I_1\cup\cdots\cup I_k\in\mathcal{I}^* works.

Now simply apply this inductively and use countable additivity of measure to get that m(E\setminus A_1\cup\cdots \cup A_k)\leq \left(\frac{3}{4}\right)^km(E), i.e. \displaystyle m\left(E\setminus \bigcup_{k=1}^\infty A_k\right)=0. We are done.

The generalization is exactly the same, except where you use Vitali Covering Lemma, you replace 3 with 3^n (notice that this relies on n being finite). It is not true in infinite dimensional spaces. Also, you can reformulate the the statement to use Hausdorff measure.

Covering Lemma 2

Today I’ll do probably the best known Vitali Covering Lemma. I’ll take the approach of Rudin.

Statement (finite version): If W is the union of a finite collection of balls B(x_i, r_i) (say to N), then there is a subcollection S\subset \{1,\ldots , N\} so that

a) the balls B(x_i, r_i) with i\in S are disjoint.

b) W\subset \bigcup_{i\in S} B(x_i, 3r_i), and

c) m(W)\leq 3^k\sum_{i\in S} m(B(x_i, r_i)). Hmm…I guess I should say W\subset\mathbb{R}^k from the looks of it.

Proof: Quite simple in this case. Just order the radii in decreasing order (finite so we can list them all), r_1\geq r_2\geq \cdots \geq r_N. So now just take a subsequence of that \{r_{i_k}\} where i_1=1 and then go down the line until you get to the first ball that doesn’t intersect B_{i_1}. Now choose B_{i_3} as the next one that doesn’t intersect either of the ones before it. Continue in this process to completion. (a) is done since we’ve picked a disjoint subset of the original (this is a trivial condition, though, since we could ignore (b) and (c) and just choose a single element).

Now for (b), look at any of the skipped over B_j, then we claim it was a subset of B(x_i, r_i) for some i that we picked. This is clear when we note order. If we skipped some r’, then there was an early one we didn’t skip, so r'\leq r and since we skipped it, B(x', r') intersects B_r. If we double the radius of r, then it will go cover at least half of B_r' and if we triple it, it will cover all of B_r'. So for any skipped ball we have B(x', r')\subset B(x, 3r) giving us (b).

For (c), we just use the standard property of Lebesgue measure that m(B(x, 3r))=3^km(B(x,r)). Sum over the set we created and we are done.

Infinite Case: Let \{B_j : j\in J\} be any collection of balls in \mathbb{R}^k such that \sup_j diam(B_j)<\infty. Then there exists a disjoint subcollection (J'\subset J) such that \bigcup_{j\in J}B(x_j, r_j) \subset \bigcup_{j\in J'}B(x_j, 5r_j).

Proof: Let R be the sup of the radius of the balls (which we’ve forced to be finite). Now we define subcollections. Let Z_i be the subcollection of balls with radius in \left(\frac{R}{2^{i+1}}, 2^iR \right]. Now take the maximal disjoint subcollection Z_0' of Z_0, etc. (maximal subcollection of Z_i' of Z_i disjoint from Z_{i-1}'…). This collection now satisfies the requirements.

Next time I’ll do Vitali’s Covering Theorem. I’m debating whether to prove it or not. Applications of it might be more interesting.

Measure Decomposition Theorems

Well, I’ve been mostly posting comments around on other people’s blogs and not really getting around to my own. I’m giving up on NCG for now. It seems that the stuff I already know I’m reading, and I’m skipping the stuff that will take effort to sort through. This seems pointless, especially with Analysis prelims coming up. That’s why things may take a turn in that direction over the next couple of weeks. I do still have at least one major ethical issue I want to sort out, though.

So. What is the most confusing part of measure theory? To me it is the fact that there are tons of ways to decompose your measure. In fact, I usually can’t remember which one is named what, and when to use which one. This post is an attempt to sort out which one is which, and what to look out for when you want to use them.

Jordan Decomposition: Any real measure \mu on a \sigma-algebra can be expressed in terms of two positive measures, called the positive and negative variations (\mu^+ and \mu^-) by \mu=\mu^+-\mu^-. This allows us to examine the total variation more easily, since |\mu|=\mu^+ + \mu^-. Also, it is quite simple to prove the existence and uniqueness since we can write \mu^+=\frac{1}{2}(|\mu| + \mu) and \mu^-=\frac{1}{2}(|\mu|-\mu).

Jordan decomposition seems to be used when you can prove something for positive measures and need to extend it to all measures. Since J decomp gets you any measure in terms of positive measures, this eases the process. The other main use is when you invoke the uniqueness along with the next decomp theorem.

Hahn Decomposition: This is different from all the rest. It is not a decomposition of the measure, but of the measure space. It says: Let \mu be a real measure on a \sigma-algebra \mathfrak{M} in a set X. Then there exist sets A and B in \mathfrak{M} such that X=A\cup B, A\cap B=\emptyset and such that the positive and negative variations \mu^+, \mu^- satisfy \mu^+(E)=\mu(A\cap E) and \mu^-(E)=-\mu(B\cap E), for any E\in\mathfrak{M}.

Things to note. This is not unique! Also, you get as a quick corollary that since the positive and negative variations are concentrated on disjoint sets, they are mutually singular. The Hahn decomp is usually invoked in conjunction with the J decomp, as in, “Let ____ be the J decomposition and A, B be the respective Hahn decomp.” These two together get you that the J decomp is minimal. In other words, if \mu=\lambda_1 - \lambda_2, where \lambda_1 and \lambda_2 are positive measures, then \lambda_1\geq \mu^+ and \lambda_2\geq\mu^-.

Lebesgue Decomposition: Let \mu be a positive \sigma-finite measure and let \lambda be a complex measure on the same sigma algebra, then there is a unique pair of complex measures \lambda_a and \lambda_s such that \lambda=\lambda_a + \lambda_s and \lambda_a \ll \mu and \lambda_s \perp \mu. Also, if \lambda is positive and finite, then so are the two parts of the decomp. Caution: the measure \mu MUST be sigma finite. This theorem says that given any complex measure and any sigma finite measure, you can decompose the complex one into two unique parts that are absolutely continuous with respect to and mutually singular with the sigma finite measure respectively.

The major use of this is when you want to invoke the Radon-Nikodym theorem to get an integral representation of your measure. The Radon-Nikodym theorem only works if your measure is absolutely continuous with respect to the other. Luckily, with Lebesgue decomposition you can always apply R-D to at least a part of the measure.

Polar Decomposition: Let \mu be a complex measure on a sigma algebra. Then there is a measurable function h such that |h(x)|=1 for all x and such that d\mu=hd|\mu|. Note that the name “polar” is in reference to the polar form of writing a complex number as the product of its absolutely value and a number of absolute value 1. I’m not entirely sure I’ve ever used this. I guess the main place that it seems useful is when working with the integral representation of the measure. If you need to manipulate with the total variation, then this gives you how to put it into the integral representation.

Those seem to be the big ones. This is quite possibly the most useful math post I’ve made. I didn’t go into too much depth, but hopefully if someone is struggling with the differences between these, or trying to get a vague idea of when to use them, this post will help. I suppose I could have elaborated a little by proving the simple claims and showing counterexamples for the “cautions.” This would have given a feel for using them. Oh well.

Lebesgue Points

Just a quick detour. I’ve found a new reason to dislike analysis. I’m trying to learn Radon-Nikodym derivatives (i.e. an attempt to take derivatives in a general measure theory sense and maintain the Fundamental Theorem of Calculus for the Lebesgue integral), and Rudin uses the approach of Lebesgue Points. Since I’ve never learned this before, I’m not sure if the other methods are easier, but this is certainly proving to be rough. Apparently we are supposed to be familiar with random facts about LPs, even though this is the very first time the definition is given. So here are the random unproven statements about Lebesgue points that I’ve encountered and my proofs to go along with them. I don’t think all of these are what Rudin had in mind, since my proofs are far more complicated than one could probably just think through.

Definition: Let f\in L^1(\mathbb{R}^k), then x is a Lebesgue point of f if \displaystyle \lim_{r\to 0}\frac{1}{m(B_r)}\int_{B_r}|f(y)-f(x)|dm(y)=0. Where m is Lebesgue measure, and that B notation is the open ball centered at x of radius r. Yeah. Not the simplest definition to be assuming knowledge of.

Claim 1: If f is continuous at x, then x is a Lebesgue point of f (under the suitable conditions on f that will always be assumed in this post). Let f be continuous at x. Then let \varepsilon>0 and choose \delta>0 such that if |x-y|<\delta, then |f(x)-f(y)|<\varepsilon. Now whenever |x-0|<\delta, we have \displaystyle \big| \frac{1}{m(B_\delta)}\int_{B_\delta}|f(y)-f(x)|dm -0 \big| \leq \frac{1}{m(B_\delta)}\int_{B_\delta}\varepsilon dm =\frac{\varepsilon m(B_\delta)}{m(B_\delta)}=\varepsilon. i.e. The limit behaves as we would like and x is a Lebesgue point.

Claim 2: If x is a Lebesgue point of f, then \displaystyle f(x)=\lim_{r\to 0}\frac{1}{m(B_r)}\int_{m(B_r)}fdm. Now I’m not sure if it is just me, but things were just moved around, so the fishy business I’m going to pull doesn’t seem necessary. Let x be a Lebesgue point of f. Then

\displaystyle 0=\lim_{r\to 0}\frac{1}{m(B_r)}\int_{B_r}|f(y)-f(x)|dx
\displaystyle\geq \lim_{r\to 0}\big|\frac{1}{m(B_r)} \int_{B_r} f(y)dm -\frac{1}{m(B_r)}\int_{B_r} f(x)dm\big|
\displaystyle =\lim_{r\to 0}\big|\frac{1}{m(B_r)}\int_{B_r}fdm-f(x)\big|. Thus since the right side is positive and less than or equal to 0 get rid of the absolute value since it must be equal to 0 and we have \displaystyle 0=\lim_{r\to 0}\frac{1}{m(B_r)}\int_{B_r}fdm - f(x) and rearrange.

I think there was a third claim, but I can’t find it now. Also, these proofs may look rather trivial now, but when you are completely unfamiliar with the definition and properties, this is rather confusing to try to work out quickly to continue reading the proof. Hopefully this post will help future readers of Rudin when they come to this.

I guess since I’ve come this far I should probably post some bonus material just to see the point.

Interesting result 1: Almost every point of f is a Lebesgue point (still assuming appropriate conditions on f).

The point is to get to the definition of the derivative, so if for all measurable sets E, we have \mu(E)=\int_E fdm for some f, then f is called the Radon-Nikodym derivative and notationally it is usually written that d\mu=f dm (for the obvious reason that if you integrate both sides you get the first form). But that notation leads us nicely to a more familiar Leibniz-type notation: f=\frac{d\mu}{dm}. Now skipping some other interesting results, some of the meat of the theory comes out in a FTC type result

Interesting result 2: If f\in L^1(\mathbb{R}^k) and F(x)=\int_{-\infty}^x fdm, then F'(x)=f(x) at every Lebesgue point of f (and by IR 1 almost everywhere).

Lp Space Property

Something has been bothering me for a few days. Here goes. An L^p space is a space of functions consisting of \{f: \int_X |f|^pd\mu <\infty\}. So essentially, if we can integrate the p-th power of the function and get a finite answer, then the function is in the space. (The careful reader will note that this is dependent not just on p, but on the space and on the measure. Also, since we only care about the integral, functions that differ by a set of measure zero are considered the same function, i.e. equivalence class).

This is a normed space with norm given by \|f\|_p=\left(\int_X|f|^pd\mu\right)^{1/p}. Well, I could go on for awhile about this, but here is what was troubling me. If you define a function \phi(p)=\|f\|_p^p, or the p-th power of the p-norm, then this function is continuous on E=\{p : \phi(p)<\infty\}.

I couldn’t seem to do this in any classical straightforward sense. I tried, I really did, with the epsilons and deltas. Luckily, my inability led me to a rather sneaky method (though it assumes knowledge of two other things that I won’t prove). Assume knowledge of the following facts: that if \ln\phi is convex then \phi is convex, every convex function is continuous, and Holder’s Inequality. These facts should be in any standard advanced analysis or measure theory text.

Let \lambda\in (0,1) and p,q\in E, examine:
\ln\phi ((1-\lambda)p+\lambda q)  =  \ln\left(\int_X|f|^{(1-\lambda)p}|f|^{\lambda q}d\mu\right)
\leq  \ln\left(\int |f|^p\right)^{1-\lambda}\left(\int |f|^q\right)^\lambda
=  \ln\left( (\phi(p)^{1-\lambda})(\phi(q))^\lambda\right)
=  (1-\lambda)\ln(\phi(p))+\lambda\ln(\phi(q))

So, \phi is convex and hence continuous. I thought that although this relies on some heavier machinery than a direct way, it was much slicker.