Gauss’ Law

Since my blog claims to talk about physics sometimes and I just finished teaching multivariable calculus, I thought I’d do a post on one form of Gauss’ law. As a teacher of the course, I found this to be an astonishingly beautiful “application” of the divergence theorem. It turned out to be a touch too difficult for my students (and I vaguely recall being extremely confused about this when I took the class myself).

First, I’ll remind you what some of this stuff is if you haven’t thought about these concepts for awhile. Let’s work in {\mathbb{R}^3} for simplicity. Consider some subset {U\subset \mathbb{R}^3}. Let {F: U\rightarrow \mathbb{R}^3} be a vector field. Mathematically this is just assigning a vector to each point of {U}. For calculus we usually put some fairly restrictive conditions on {F}, such as all partial derivatives exist and are continuous.

The above situation is ubiquitous in classical physics. The vector field could be the gravitational field or the electric field or it could describe velocity of a flowing fluid or … One key quantity you might want to know about your field is what is the flux of the field through a given surface {S}? This measures the net change of the field flowing through the surface. If {S} is just a sphere, then it is easy to visualize the flux as the amount leaving the sphere minus the amount flowing in.

Let’s suppose {S} is a smooth surface bounding a solid volume {E} (e.g. the sphere bounding the solid ball). In this case we have a well-defined “outward normal” direction. Define {\mathbf{n}} to be the unit vector field in this direction at all points of {S}. Just by definition the flux of {F} through {S} must be “adding up” the values of {F\cdot \mathbf{n}} over {S}, because this dot product just tells us how much {F} is pointing in the outward direction.

Thus we define the flux (using Stewart’s notation) to be:

\displaystyle \iint_S F\cdot d\mathbf{S} := \iint_S F\cdot \mathbf{n} \,dS

Note the second integral is integrating a scalar valued function with respect to surface area “dS.” Now recall that the divergence theorem says that in our situation (given that {F} extends to a vector field on an open set containing {E}) we can calculate this rather tedious surface integral by converting it to a usual triple integral:

\displaystyle \iint_S F\cdot d\mathbf{S} = \iiint_E div(F) \,dV

If you’re advanced, then of course you could just work this out as a special case of Stoke’s theorem using the musical isomorphisms and so on. Let’s now return to our original problem. Suppose I have a charge {Q} inside some surface {S} and I want to compute the flux of the associated electric field through {S}.

From my given information this would seem absolutely impossible. If {S} can be anything, and {Q} can be located anywhere inside, then of course there are just way too many variables to come up with a reasonably succinct answer. Surprisingly, Gauss’ law tells us that no matter what {S} is and where {Q} is located, the answer is always the same, and it is just a quick application of the divergence theorem to prove it.

First, let’s translate everything so that {Q} is located at the origin. Since flux is translation invariant, this will not change our answer. We first need to know what the electric field is, and this is essentially a direct consequence of Coloumb’s law:

\displaystyle F(x,y,z)=\frac{kQ}{(x^2+y^2+z^2)^{3/2}}\langle x, y, z\rangle

If we care about higher dimensions, then we might want to note that the value only depends on the radial distance from the origin and write it in the more succinct way {\displaystyle F(r)=\frac{kQ}{|r|^3}r}, where {k} is just some constant that depends on the textbook/units you are working in. Let’s first compute the partial of the first coordinate with respect to {x} (ignoring the constant factor for now):

\displaystyle \frac{\partial}{\partial x}\left(\frac{x}{(x^2+y^2+z^2)^{3/2}}\right) = \frac{-2x^2+y^2+z^2}{(x^2+y^2+z^2)^2}

You get similar things for taking the other derivatives involved in the divergence except the minus sign moves to {-2y^2} and {-2z^2} respectively. When you add all these together you get in the numerator {-2x^2-2y^2-2z^2+2x^2+2y^2+2z^2=0}. Thus the divergence is {0} everywhere and hence by the divergence theorem the flux must be {0} too, right? Wrong! And that’s where I lost most of my students.

Recall that pesky hypothesis that {F} can be extended to a vector field on an open neighborhood of {E}. Our {F} can’t even be defined at all to extend continuously across the origin. Thus we must do something different. Here’s the idea, we just change our region {E}. Since {E} is open and contains the origin, we can find a small sphere of radius {\varepsilon>0} and centered at {(0,0,0)} whose interior is properly contained in {E}, say {S_\varepsilon}.

Let {\Omega} be the region between these two surfaces. Effectively this “cuts out” the bad point of {F} and now we are allowed to apply the divergence theorem to {\Omega} where our new boundary is {S} oriented outwards and {S_\varepsilon} oriented inward (negatively). We already calculated that {div F=0}, thus one side of the equation is {0}. This gives us

\displaystyle \iint_S F\cdot d\mathbf{S} = \iint_{S_\varepsilon} F\cdot d\mathbf{S}

This is odd, because it says that no matter how bizarre or gigantic {S} was we can just compute the flux through a small sphere and get the same answer. At this point we’ve converted the problem to something we can do because the unit normal is just {\mathbf{n}=\frac{1}{\sqrt{x^2+y^2+z^2}}\langle x, y, z\rangle}. Direct computation gives us

\displaystyle F\cdot \mathbf{n} = \frac{kQ (x^2+y^2+z^2)}{(x^2+y^2+z^2)^3}=\frac{kQ}{(x^2+y^2+z^2)^2}

Plugging this all in we get that the flux through {S} is

\displaystyle \iint_{S_\varepsilon} \frac{kQ}{\varepsilon^2} \,dS = \frac{kQ}{\varepsilon^2}Area(S_\varepsilon) = 4\pi k Q.

That’s Gauss’ Law. It says that no matter the shape of {S} or the location of the charge inside {S}, you can always compute the flux of the electric field produced by {Q} through {S} as a constant multiple of the amount of charge! In fact, most books use k=1/(4\pi \varepsilon_0) where $\varepsilon_0$ is the “permittivity of free space” which kills off practically all extraneous symbols in the answer.


Naturality of Flows

This is something I always forget exists and has a name, so I end up reproving it. Since this sequence of posts is a hodge-podge of things to help me take a differential geometry test, hopefully this will lodge the result in my brain and save me time if it comes up.

I’m not sure whether to call it a lemma or not, but the setup is you have a smooth map {F:M\rightarrow N} and a vector field on {M}, say {X} and a vector field on {N} say {Y} such that {X} and {Y} are {F}-related. Define {M_t} and {N_t} to be the image of flowing for time {t} and let {\theta} and {\eta} be the flows of {X} and {Y} respectively. Then the lemma says for all {t} we have {F(M_t)\subset N_t} and {\eta_t\circ F=F\circ \theta_t} on {M_t}.

This is a “naturality” condition because all it really says is that the following diagram commutes:

{\begin{matrix} M_t & \stackrel{F}{\longrightarrow} & N_t \\ \theta_t \downarrow & & \downarrow \eta_t \\ M_{-t} & \stackrel{\longrightarrow}{F} & N_{-t} \end{matrix}}

Proof: Let {p\in M}, then {F\circ \theta^p: \mathbb{R}\rightarrow N} is a curve that satisfies the property \displaystyle {\frac{d}{dt}\Big|_{t=t_0}(F\circ \theta^p)(t)=DF_{\theta^p(t_0)}(\frac{d}{dt}\theta^p (t)\Big|_{t=t_0})=DF_{\theta^p(t_0)}(X_{\theta^p(t_0)})=Y_{F\circ \theta^p(t_0)}}. Since {F\circ \theta^p(0)=F(p)}, and integral curves are unique, we get that {F\circ\theta^p(t)=\eta^{F(p)}(t)} at least on the domain of {\theta^p}.

Thus if {p\in M_t} then {F(p)\in N_t}, or equivalently {F(M_t)\subset N_t}. But we just wrote that {F(\theta^p(t))=\eta^{F(p)}(t)} where defined, which is just a different form of the equation {\eta_t\circ F=F\circ \theta_t(p)}.

We get a nice corollary out of this. If our function {F:M\rightarrow N} was actually a diffeo, then take {Y=F_*X} the pushforward, and we get that the flow of the pushforward is {\eta_t=F\circ \theta_t\circ F^{-1}} and the flow domain is actually equal {N_t=F(M_t)}.

In algebraic geometry we care a lot about families of things. In the differentiable world, the nicest case of this would be when you have a smooth submersion: {F: M\rightarrow N}, where {M} is compact and both are connected. Then since all values are regular, {F^{-1}(n_0)} is smooth embedded submanifold. If {N} were say {\mathbb{R}} (of course, {M} couldn’t be compact in this case), then we would have a nice 1-dimensional family of manifolds that are parametrized in a nice way.

It turns out to be quite easy to prove that in the above circumstance all fibers are diffeomorphic. In AG we often call this an “iso-trivial” family, although I’m not sure that is the best analogy. The proof basically comes down to the naturality of flows. Given any vector field {Y} on {N}, we can lift it to a vector field {X} on {M} that is {F}-related. I won’t do the details, but it can be done clearly in nice choice of coordinates {(x^1, \ldots, x^n)\mapsto (x^1, \ldots, x^{n-k})} and then just patch together with a partition of unity.

Let {M_x} be the notation for {F^{-1}(x)}. Fix an {x\in N}, then by the above naturality lemma {\theta_t\Big|_{M_x} : M_x\rightarrow M_{\eta_t(x)}} is well-defined and hence a diffeomorphism since it has smooth inverse {\theta_{-t}}. Let {y\in N}. Then as long as there is a vector field on {N} which flows {x} to {y}, then we’ve shown that {M_x\simeq M_y}, so since {x}, {y} were arbitrary, all fibers are diffeomorphic. But there is such a vector field, since {N} is connected.

Handlebodies II

Let’s think back to our example to model our \lambda-handle (where \lambda is not a max or min). Well, it was a “saddle point”. So it consisted of a both a downward arc and upward arc. If you got close enough, it would probably look like D^1\times D^1.

Well, generally this will fit with our scheme. An n-handle looked like D^n … or better yet D^n\times 0, and a 0-handle looked like 0\times D^n, so maybe it is the case that a \lambda-handle looks like D^\lambda\times D^{n-\lambda}. Let’s call D^\lambda\times 0 the core of the handle, and D^{n-\lambda} the co-core.

By doing the same trick of writing out what our function looks like at a critical point of index \lambda in some small enough neighborhood using the Morse lemma, we could actually prove this, but we’re actually more interested now in how to figure out what happens with M_t as t crosses this point.

By that I mean, it is time to figure out what exactly it is to “attach a \lambda-handle” to the manifold.

Suppose as in the last post that c_i is a critical value of index \lambda. Then I propose that M_{c_i+\varepsilon} is diffeomorphic to M_{c_i-\varepsilon}\cup D^\lambda\times D^{m-\lambda} (sorry again, recall my manifold is actually m-dimensional with n critical values).

I wish I had a good way of making pictures to get some of the intuition behind this across. I’ll try in words. A 1-handle for a 3-manifold, will be D^1\times D^2, i.e. a solid cylinder. So we can think of this as literally a handle that we will bend the cylinder into, and attach those two ends to the existing manifold. This illustration is quite useful in bringing up a concern we should have. Attaching in this manner is going to create “corners” and we want a smooth manifold, so we need to make sure to smooth it out. But we won’t worry about that now, and we’ll just call the smoothed out M_{c_i-\varepsilon}\cup D^\lambda\times D^{m-\lambda}, say M'.

Let’s use our gradient-like vector field again. Let’s choose \varepsilon small enough so that we are in a coordinate chart centered at p_i such that f=-x_1^2-\cdots - x_\lambda^2 + x_{\lambda +1}^2+\cdots + x_m^2 is in standard Morse lemma form.

Let’s see what happens on the core D^\lambda\times 0. At the center, it takes the critical value c_i and it decreases everywhere from there (as we move from 0, only the first \lambda coordinates change). This decreasing goes all the way to the boundary where it is c_i-\varepsilon. Thus it is the upside down bowl (of dimension \lambda). Likewise, the co-core goes from the critical value and increases (as in the right side up bowl) to the boundary of a m-\lambda disk at a value c_i+\delta (where 0<\delta<\varepsilon).

Let's carefully figure out the attaching procedure now. If we think of our 3-manifold for intuition, we want to attach D^\lambda\times D^{m-\lambda} to M_{c_i-\varepsilon} by pasting \partial D^\lambda\times D^{m-\lambda} along \partial M_{c_i-\varepsilon}.

So I haven't talked about attaching procedures in this blog, but basically we want a map \phi: \partial D^\lambda\times D^{m-\lambda}\to \partial M_{c_i-\varepsilon} and then forming the quotient space of the disjoint union under the relation of identifying p\in \partial D^\lambda\times D^{m-\lambda} with \phi (p). Sometimes this is called an adjunction space.

So really \phi is a smooth embedding of a thickened sphere S^{\lambda - 1}, since \partial D^\lambda=S^{\lambda-1}. And the dimensions in which it was thickened is m-\lambda. Think about the "handle" in the 3-dimensional 1-handle case. We gave the two endpoints of line segment (two points = S^0) a 2-dimensional thickening by a disk.

Now it is the same old trick to get the diffeo. The gradient-like vector field, X, flows from \partial M' to \partial M_{c_i+\varepsilon}, so just multiply X by a smooth function that will make M' match M_{c_i+\varepsilon} after some time. This is our diffoemorphism and we are done.

Everywhere Normal Vector Field

The title is maybe a little misleading, but here we go.

Not so much a standard problem, but a neat little result. Let U\subset \mathbb{R}^3. Let X be a nowhere vanishing vector field on U. Then for each point p\in U there is a surface passing through p such that X is normal to the surface if and only if \langle X, curl(X)\rangle=0.

This is nice since it ties back to the Frobenius posts. If X=(X_1, X_2, X_3) in coordinates, then define \omega=X_1dx+X_2dy +X_3dz. Some texts use “musical notation,” which is amazingly effective once you get used to it. In which case, we would just say let \omega=X^\flat. Now \omega is a smooth 1-form, so it defines a 2-dimensional distribution on U in the standard way D_p=ker\omega\big|_p.

Now by definition, T_pU=D_p\oplus X_p, so our problem has been rephrased in Frobenius language to say D is integrable iff \langle X, curl(X)\rangle=0 since an integral manifold for D through p will satisfy the surface condition.

Thus by the Frobenius Theorem, the problem is reduced to showing D is involutive iff \langle X, curl(X)\rangle=0. But D is involutive iff given any two smooth sections Y and Z of D, d\omega(Y, Z)=0. In fancy notation, d\omega=\beta(curl(X)), in layman’s terms, if curl(X)=(C_1, C_2, C_3), then d\omega=C_3dx\wedge dy-C_2dx\wedge dz+C_1dy\wedge dz (maybe a sign is wrong there, it doesn’t really matter to finish the problem, but I didn’t actually work it out so don’t fully trust it).

So we now have D involutive iff d\omega(Y, Z)=det(curl(X)|Y|Z)=0. But this happens iff curl(X)\in span(Y, Z) iff curl(X) has trivial projection onto X, which is precisely \langle X, curl(X) \rangle=0.


We need to build up a lot of definitions now to properly state the Frobenius Theorem. The main definition will be a distribution on a manifold. Essentially, the theory of distributions is a way to generalize the notion of a vector field and the flow of a vector field.

A distribution is a choice of k-dimensional subspace D_p\subset T_pM at each point of the manifold. Note that this is just a subbundle of the tangent bundle, so we have a nice notion of smoothness. In particular, just as we could check a local frame for smoothness of a vector field (i.e. a 1-dimensional distribution), we can check for smoothness of a distribution by checking if each point has a neighborhood on which there are smooth vector fields X_1, \ldots , X_k such that X_1\big|_q, \ldots , X_k\big|_q forms as basis for D_q at each point of the open set.

The analogous thing for integral curves for vector fields will be what we call an integral manifold. If we think about the natural way to define this we would see that all we want is an immersed submanifold N\subset M such that T_pN=D_p \forall p\in N. Thus in the one dimensional case, the immersed submanifold is just a curve on the manifold.

Unfortunately, it is not the case that integral manifolds exist for all distributions. Our goal is to figure out when they exist. This leads us to our next two definitions. A distribution is integrable if every point of the manifold is in some integral manifold for the distribution. A distribution is called involutive if for any pair of local sections of the distribution, the Lie bracket is also a local section. Note that a local section is really just a vector field where the vectors are chosen from the distribution rather than the whole tangent bundle.

Every integrable distribution is involutive. If D\subset TM is involutive, then given any p\in M and local sections X, Y there is some integral manifold about p, say N. Since both X, Y\in T_pN, we have that [X, Y]_p\in T_pN=D_p, which is the definition of involutive.

This gives us an easy way to see that there are non-integrable distributions (recall, this is not going to happen for 1-distributions, i.e. vector fields, since every point has an integral curve). We don’t even need some weird manifold. Just take \mathbb{R}^3, and let the distribution be the span of two vector fields whose Lie bracket is not in the span. Thus something like \displaystyle D=span\{X=\frac{\partial}{\partial x}+y\frac{\partial}{\partial z}, Y=\frac{\partial}{\partial y}\} will work, since [X, Y]_0=-\frac{\partial}{\partial z}\notin D_0.

I think we need only one more definition to be in a place to move on. A distribution is completely integrable if there exists a flat chart for the distribution in a neighborhood of every point. By this I mean that I can find a coordinate chart such that \displaystyle D=span\{\frac{\partial}{\partial x^1}, \ldots , \frac{\partial}{\partial x^k}\}. This is obviously the strongest condition.

So our definitions at this point satisfy completely integrable distributions are integrable, and integrable distributions are involutive. The utterly remarkable thing that the Frobenius Theorem says, is that all of these implications reverse, and so all of the definitions are actually equivalent! We’ll get there later, though.