Study Vector Bundle in a Relatively Harder Way - Definition


Direction is a considerable thing. For example take a look at this picture (by David Gunderman):


The position of the red ball and black ball shows that this triple of balls turns upside down every time they finish one round. This wouldn't happen if this triple were on a normal band, which can be denoted by \(S^1 \times (0,1)\). What would happen if we try to describe their velocity on the Möbius band, both locally and globally? There must be some significant difference from a normal band. If we set some move pattern on balls, for example let them run horizontally or zig-zagly, hopefully we get different set of vectors. those vectors can span some vector spaces as well.

A Formal Construction

Here and in the forgoing posts, we will try to develop purely formally certain functorial constructions having to do with vector bundles. It may be overly generalized, but we will offer some examples to make it concrete.

Let \(M\) be a manifold (of class \(C^p\), where \(p \geq 0\) and can be set to \(\infty\)) modeled on a Banach space \(\mathbf{E}\). Let \(E\) be another topological space and \(\pi: E \to M\) a surjective \(C^p\)-morphism. A vector bundle is a topological construction associated with \(M\) (base space), \(E\) (total space) and \(\pi\) (bundle projection) such that, roughly speaking, \(E\) is locally a product of \(M\) and \(\mathbf{E}\).

We use \(\mathbf{E}\) instead of \(\mathbb{R}^n\) to include the infinite dimensional cases. We will try to distinguish finite-dimensional and infinite-dimensional Banach spaces here. There are a lot of things to do, since, for example, infinite dimensional Banach spaces have no countable Hamel basis, while the finite-dimensional ones have finite ones (this can be proved by using the Baire category theorem).

Next we will show precisely how \(E\) locally becomes a product space. Let \(\mathfrak{U}=(U_i)_i\) be an open covering of \(M\), and for each \(i\), suppose that we are given a mapping \[ \tau_i:\pi^{-1}(U_i)\to U_i \times E \] satisfying the following three conditions.

VB 1 \(\tau_i\) is a \(C^p\) diffeomorphism making the following diagram commutative:


where \(pr\) is the projection of the first component: \((x,y) \mapsto x\). By restricting \(\tau_i\) on one point of \(U_i\), we obtain an isomorphism on each fiber \(\pi^{-1}(x)\): \[ \tau_{ix}:\pi^{-1}(x) \xrightarrow{\simeq} \{x\} \times \mathbf{E} \]

VB 2 For each pair of open sets \(U_i\), \(U_j \in \mathfrak{U}\), we have the map \[ \tau_{jx} \circ \tau_{ix}^{-1}: \mathbf{E} \to \mathbf{E} \] to be a toplinear isomorphism (that is, it preserves \(\mathbf{E}\) for being a topological vector space).

VB 3 For any two members \(U_i\), \(U_j \in \mathfrak{U}\), we have the following function to be a \(C^p\)-morphism: \[ \begin{aligned} \varphi:U_i \cap U_j &\to L(\mathbf{E},\mathbf{E}) \\ x &\mapsto \left(\tau_j\circ \tau_i^{-1}\right)_x \end{aligned} \]

REMARKS. As with manifold, we call the set of 2-tuples \((U_i,\tau_i)_i\) a trivializing covering of \(\pi\), and that \((\tau_i)\) are its trivializing maps. Precisely, for \(x \in U_i\), we say \(U_i\) or \(\tau_i\) trivializes at \(x\).

Two trivializing coverings for \(\pi\) is said to be VB-equivalent if taken together they also satisfy conditions of VB 2 and VB 3. It's immediate that VB-equivalence is an equivalence relation and we leave the verification to the reader. It is this VB-equivalence class of trivializing coverings that determines a structure of vector bundle on \(\pi\). With respect to the Banach space \(\mathbf{E}\), we say that the vector bundle has fiber \(\mathbf{E}\), or is modeled on \(\mathbf{E}\).

Next we shall give some motivations of each condition. Each pair \((U_i,\tau_i)\) determines a local product of 'a part of the manifold' and the model space, on the latter of which we can deploy the direction with ease. This is what VB 1 tells us. But that's far from enough if we want our vectors fine enough. We do want the total space \(E\) to actually be able to qualify our requirements. As for VB 2, it is ensured that using two different trivializing maps will give the same structure of some Banach spaces (with equivalent norms). According to the image of \(\tau_{ix}\), we can say, for each point \(x \in X\), which can be determined by a fiber \(\pi^{-1}(x)\) (the pre-image of \(\tau_{ix}\)), can be given another Banach space by being sent via \(\tau_{jx}\) for some \(j\). Note that \(\pi^{-1}(x) \in E\), the total space. In fact, VB 2 has an equivalent alternative:

VB 2' On each fiber \(\pi^{-1}(x)\) we are given a structure of Banach space as follows. For \(x \in U_i\), we have a toplinear isomorphism which is in fact the trivializing map: \[ \tau_{ix}:\pi^{-1}(x)=E_x \to \mathbf{E}. \] As stated, VB 2 implies VB 2'. Conversely, if VB 2' is satisfied, then for open sets \(U_i\), \(U_j \in \mathfrak{U}\), and \(x \in U_i \cap U_j\), we have \(\tau_{jx} \circ \tau_{ix}^{-1}:\mathbf{E} \to \mathbf{E}\) to be an toplinear isomorphism. Hence, we can consider VB 2 or VB 2' as the refinement of VB 1.

In finite dimensional case, one can omit VB 3 since it can be implied by VB 2, and we will prove it below.

(Lemma) Let \(\mathbf{E}\) and \(\mathbf{F}\) be two finite dimensional Banach spaces. Let \(U\) be open in some Banach space. Let \[ f:U \times \mathbf{E} \to \mathbf{F} \] be a \(C^p\)-morphism such that for each \(x \in U\), the map \[ f_x: \mathbf{E} \to \mathbf{F} \] given by \(f_x(v)=f(x,v)\) is a linear map. Then the map of \(U\) into \(L(\mathbf{E},\mathbf{F})\) given by \(x \mapsto f_x\) is a \(C^p\)-morphism.

PROOF. Since \(L(\mathbf{E},\mathbf{F})=L(\mathbf{E},\mathbf{F_1}) \times L(\mathbf{E},\mathbf{F_2}) \times \cdots \times L(\mathbf{E},\mathbf{F_n})\) where \(\mathbf{F}=\mathbf{F_1} \times \cdots \times \mathbf{F_n}\), by induction on the dimension of \(\mathbf{F}\) and \(\mathbf{E}\), it suffices to assume that \(\mathbf{E}\) and \(\mathbf{F}\) are toplinearly isomorphic to \(\mathbb{R}\). But in that case, the function \(f(x,v)\) can be written \(g(x)v\) for some \(g:U \to \mathbb{R}\). Since \(f\) is a morphism, it follows that as a function of each argument \(x\), \(v\) is also a morphism, Putting \(v=1\) shows that \(g\) is also a morphism, which finishes the case when both the dimension of \(\mathbf{E}\) and \(\mathbf{F}\) are equal to \(1\), and the proof is completed by induction. \(\blacksquare\)

To show that VB 3 is implied by VB 2, put \(\mathbf{E}=\mathbf{F}\) as in the lemma. Note that \(\tau_j \circ \tau_i^{-1}\) maps \(U_i \cap U_j \times \mathbf{E}\) to \(\mathbf{E}\), and \(U_i \cap U_j\) is open, and for each \(x \in U_i \cap U_j\), the map \((\tau_j \circ \tau_i^{-1})_x=\tau_{jx} \circ \tau_{ix}^{-1}\) is toplinear, hence linear. Then the fact that \(\varphi\) is a morphism follows from the lemma.


Trivial bundle

Let \(M\) be any \(n\)-dimensional smooth manifold that you are familiar with, then \(pr:M \times \mathbb{R}^n \to M\) is actually a vector bundle. Here the total space is \(M \times \mathbb{R}^n\) and the base is \(M\) and \(pr\) is the bundle projection but in this case it is simply a projection. Intuitively, on a total space, we can determine a point \(x \in M\), and another component can be any direction in \(\mathbb{R}^n\), hence a vector.

We need to verify three conditions carefully. Let \((U_i,\varphi_i)_i\) be any atlas of \(M\), and \(\tau_i\) is the identity map on \(U_i\) (which is naturally of \(C^p\)). We claim that \((U_i,\tau_i)_i\) satisfy the three conditions, thus we get a vector bundle.

For VB 1 things are clear: since \(pr^{-1}(U_i)=U_i \times \mathbb{R}^n\), the diagram is commutative. Each fiber \(pr^{-1}(x)\) is essentially \((x) \times \mathbb{R}^n\), and still, \(\tau_{jx} \circ \tau_{ix}^{-1}\) is the identity map between \((x) \times \mathbb{R}^n\) and \((x) \times \mathbb{R}^n\), under the same Euclidean topology, hence VB 2 is verified, and we have no need to verify VB 3.

Möbius band

First of all, imagine you have embedded a circle into a Möbius band. Now we try to give some formal definition. As with quotient topology, \(S^1\) can be defined as \[ S^1=I/\sim_1, \]

where \(I\) is the unit interval and \(0 \sim_1 1\) (identifying two ends). On the other hand, the infinite Möbius band can be defined by \[ B= (I \times \mathbb{R})/\sim_2 \] where \((0,v) \sim_2 (1,-v)\) for all \(v \in \mathbb{R}\) (not only identifying two ends of \(I\) but also 'flips' the vertical line). Then all we need is a natural projection on the first component: \[ \pi:B \to S^1. \] And the verification has few difference from the trivial bundle. Quotient topology of Banach spaces follows naturally in this case, but things might be troublesome if we restrict ourself in \(\mathbb{R}^n\).

Tangent bundle of the sphere

The first example is relatively rare in many senses. By \(S^n\) we mean the set in \(\mathbb{R}^{n+1}\) with \[ S^n=\{(x_0,x_1,\dots,x_n):x_0^2+x_1^2+\cdots+x_n^2=1\} \] and the tangent bundle can be defined by \[ TS^n=\{(\mathbf{x},\mathbf{y}):\langle\mathbf{x},\mathbf{y}\rangle=0\} \subset S^{n} \times\mathbb{R}^{n+1}, \] where, of course, \(\mathbf{x} \in S^n\) and \(\mathbf{y} \in \mathbb{R}^{n+1}\). The vector bundle is given by \(pr:TS^n \to S^n\) where \(pr\) is the projection of the first factor. This total space is of course much finer than \(M \times \mathbb{R}^n\) in the first example. Each point in the manifold now is associated with a tangent space \(T_x(M)\) at this point.

More generally, we can define it in any Hilbert space \(H\), for example, \(L^2\) space: \[ TS=\{(x,y):\langle x , y \rangle=0\} \subset S \times H \] where \[ S=\{x:\langle x , x \rangle = 1\}. \] The projection is natural: \[ \begin{aligned} \pi: TM &\to M \\ T_x(M) & \mapsto x \end{aligned} \] But we will not cover the verification in this post since it is required to introduce the abstract definition of tangent vectors. This will be done in the following post.

There are still many things remain undiscovered

We want to study those 'vectors' associated to some manifold both globally and locally. For example we may want to describe the tangent line of some curves at some point without heavily using elementary calculus stuff. Also, we may want to describe the vector bundle of a manifold globally, for example, when will we have a trivial one? Can we classify the manifold using the behavior of the bundle? Can we make it a little more abstract, for example, consider the class of all isomorphism bundles? How do one bundle transform to another? But to do this we need a big amount of definitions and propositions.

The Big Three Pt. 6 - Closed Graph Theorem with Applications

(Before everything: elementary background in topology and vector spaces, in particular Banach spaces, is assumed.)

A surprising result of Banach spaces

We can define several relations between two norms. Suppose we have a topological vector space \(X\) and two norms \(\lVert \cdot \rVert_1\) and \(\lVert \cdot \rVert_2\). One says \(\lVert \cdot \rVert_1\) is weaker than \(\lVert \cdot \rVert_2\) if there is \(K>0\) such that \(\lVert x \rVert_1 \leq K \lVert x \rVert_2\) for all \(x \in X\). Two norms are equivalent if each is weaker than the other (trivially this is a equivalence relation). The idea of stronger and weaker norms is related to the idea of the "finer" and "coarser" topologies in the setting of topological spaces.

So what about their limit? Unsurprisingly this can be verified with elementary \(\epsilon-N\) arguments. Suppose now \(\lVert x_n - x \rVert_1 \to 0\) as \(n \to 0\), we immediately have \[ \lVert x_n - x \rVert_2 \leq K \lVert x_n-x \rVert_1 < K\varepsilon \]

for some large enough \(n\). Hence \(\lVert x_n - x \rVert_2 \to 0\) as well. But what about the converse? We give a new definition of equivalence relation between norms.

(Definition) Two norms \(\lVert \cdot \rVert_1\) and \(\lVert \cdot \rVert_2\) of a topological vector space are compatible if given that \(\lVert x_n - x \rVert_1 \to 0\) and \(\lVert x_n - y \rVert_2 \to 0\) as \(n \to \infty\), we have \(x=y\).

By the uniqueness of limit, we see if two norms are equivalent, then they are compatible. And surprisingly, with the help of the closed graph theorem we will discuss in this post, we have

(Theorem 1) If \(\lVert \cdot \rVert_1\) and \(\lVert \cdot \rVert_2\) are compatible, and both \((X,\lVert\cdot\rVert_1)\) and \((X,\lVert\cdot\rVert_2)\) are Banach, then \(\lVert\cdot\rVert_1\) and \(\lVert\cdot\rVert_2\) are equivalent.

This result looks natural but not seemingly easy to prove, since one find no way to build a bridge between the limit and a general inequality. But before that, we need to elaborate some terminologies.


(Definition) For \(f:X \to Y\), the graph of \(f\) is defined by \[ G(f)=\{(x,f(x)) \in X \times Y:x \in X\}. \]

If both \(X\) and \(Y\) are topological spaces, and the topology of \(X \times Y\) is the usual one, that is, the smallest topology that contains all sets \(U \times V\) where \(U\) and \(V\) are open in \(X\) and \(Y\) respectively, and if \(f: X \to Y\) is continuous, it is natural to expect \(G(f)\) to be closed. For example, by taking \(f(x)=x\) and \(X=Y=\mathbb{R}\), one would expect the diagonal line of the plane to be closed.

(Definition) The topological space \((X,\tau)\) is an \(F\)-space if \(\tau\) is induced by a complete invariant metric \(d\). Here invariant means that \(d(x+z,y+z)=d(x,y)\) for all \(x,y,z \in X\).

A Banach space is easily to be verified to be a \(F\)-space by defining \(d(x,y)=\lVert x-y \rVert\).

(Open mapping theorem) See this post

By definition of closed set, we have a practical criterion on whether \(G(f)\) is closed.

(Proposition 1) \(G(f)\) is closed if and only if, for any sequence \((x_n)\) such that the limits \[ x=\lim_{n \to \infty}x_n \quad \text{ and }\quad y=\lim_{n \to \infty}f(x_n) \] exist, we have \(y=f(x)\).

In this case, we say \(f\) is closed. For continuous functions, things are trivial.

(Proposition 2) If \(X\) and \(Y\) are two topological spaces and \(Y\) is Hausdorff, and \(f:X \to Y\) is continuous, then \(G(f)\) is closed.

Proof. Let \(G^c\) be the complement of \(G(f)\) with respect to \(X \times Y\). Fix \((x_0,y_0) \in G^c\), we see \(y_0 \neq f(x_0)\). By the Hausdorff property of \(Y\), there exists some open subsets \(U \subset Y\) and \(V \subset Y\) such that \(y_0 \in U\) and \(f(x_0) \in V\) and \(U \cap V = \varnothing\). Since \(f\) is continuous, we see \(W=f^{-1}(V)\) is open in \(X\). We obtained a open neighborhood \(W \times U\) containing \((x_0,y_0)\) which has empty intersection with \(G(f)\). This is to say, every point of \(G^c\) has a open neighborhood contained in \(G^c\), hence a interior point. Therefore \(G^c\) is open, which is to say that \(G(f)\) is closed. \(\square\)


REMARKS. For \(X \times Y=\mathbb{R} \times \mathbb{R}\), we have a simple visualization. For \(\varepsilon>0\), there exists some \(\delta\) such that \(|f(x)-f(x_0)|<\varepsilon\) whenever \(|x-x_0|<\delta\). For \(y_0 \neq f(x_0)\), pick \(\varepsilon\) such that \(0<\varepsilon<\frac{1}{2}|f(x_0)-y_0|\), we have two boxes (\(CDEF\) and \(GHJI\) on the picture), namely \[ B_1=\{(x,y):x_0-\delta<x<x_0+\delta,f(x_0)-\varepsilon<y<f(x_0)+\varepsilon\} \] and \[ B_2=\{(x,y):x_0-\delta<x<x_0+\delta,y_0-\varepsilon<y<y_0+\varepsilon\}. \] In this case, \(B_2\) will not intersect the graph of \(f\), hence \((x_0,y_0)\) is an interior point of \(G^c\).

The Hausdorff property of \(Y\) is not removable. To see this, since \(X\) has no restriction, it suffices to take a look at \(X \times X\). Let \(f\) be the identity map (which is continuous), we see the graph \[ G(f)=\{(x,x):x \in X\} \] is the diagonal. Suppose \(X\) is not Hausdorff, we reach a contradiction. By definition, there exists some distinct \(x\) and \(y\) such that all neighborhoods of \(x\) contain \(y\). Pick \((x,y) \in G^c\), then all neighborhoods of \((x,y) \in X \times X\) contain \((x,x)\) so \((x,y) \in G^c\) is not a interior point of \(G^c\), hence \(G^c\) is not open.

Also, as an immediate consequence, every affine algebraic variety in \(\mathbb{C}^n\) and \(\mathbb{R}^n\) is closed with respect to Euclidean topology. Further, we have the Zariski topology \(\mathcal{Z}\) by claiming that, if \(V\) is an affine algebraic variety, then \(V^c \in \mathcal{Z}\). It's worth noting that \(\mathcal{Z}\) is not Hausdorff (example?) and in fact much coarser than the Euclidean topology although an affine algebraic variety is both closed in the Zariski topology and the Euclidean topology.

The closed graph theorem

After we have proved this theorem, we are able to prove the theorem about compatible norms. We shall assume that both \(X\) and \(Y\) are \(F\)-spaces, since the norm plays no critical role here. This offers a greater variety but shall not be considered as an abuse of abstraction.

(The Closed Graph Theorem) Suppose

  1. \(X\) and \(Y\) are \(F\)-spaces,

  2. \(f:X \to Y\) is linear,

  3. \(G(f)\) is closed in \(X \times Y\).

Then \(f\) is continuous.

In short, the closed graph theorem gives a sufficient condition to claim the continuity of \(f\) (keep in mind, linearity does not imply continuity). If \(f:X \to Y\) is continuous, then \(G(f)\) is closed; if \(G(f)\) is closed and \(f\) is linear, then \(f\) is continuous.

Proof. First of all we should make \(X \times Y\) an \(F\)-space by assigning addition, scalar multiplication and metric. Addition and scalar multiplication are defined componentwise in the nature of things: \[ \alpha(x_1,y_1)+\beta(x_2,y_2)=(\alpha x_1+\beta x_2,\alpha y_1 + \beta y_2). \] The metric can be defined without extra effort: \[ d((x_1,y_1),(x_2,y_2))=d_X(x_1,x_2)+d_Y(y_1,y_2). \] Then it can be verified that \(X \times Y\) is a topological space with translate invariant metric. (Potentially the verifications will be added in the future but it's recommended to do it yourself.)

Since \(f\) is linear, the graph \(G(f)\) is a subspace of \(X \times Y\). Next we quote an elementary result in point-set topology, a subset of a complete metric space is closed if and only if it's complete, by the translate-invariance of \(d\), we see \(G(f)\) is an \(F\)-space as well. Let \(p_1: X \times Y \to X\) and \(p_2: X \times Y \to Y\) be the natural projections respectively (for example, \(p_1(x,y)=x\)). Our proof is done by verifying the properties of \(p_1\) and \(p_2\) on \(G(f)\).

For simplicity one can simply define \(p_1\) on \(G(f)\) instead of the whole space \(X \times Y\), but we make it a global projection on purpose to emphasize the difference between global properties and local properties. One can also write \(p_1|_{G(f)}\) to dodge confusion.

Claim 1. \(p_1\) (with restriction on \(G(f)\)) defines an isomorphism between \(G(f)\) and \(X\).

For \(x \in X\), we see \(p_1(x,f(x)) = x\) (surjectivity). If \(p_1(x,f(x))=0\), we see \(x=0\) and therefore \((x,f(x))=(0,0)\), hence the restriction of \(p_1\) on \(G\) has trivial kernel (injectivity). Further, it's trivial that \(p_1\) is linear.

Claim 2. \(p_1\) is continuous on \(G(f)\).

For every sequence \((x_n)\) such that \(\lim_{n \to \infty}x_n=x\), we have \(\lim_{n \to \infty}f(x_n)=f(x)\) since \(G(f)\) is closed, and therefore \(\lim_{n \to \infty}p_1(x_n,f(x_n)) =x\). Meanwhile \(p_1(x,f(x))=x\). The continuity of \(p_1\) is proved.

Claim 3. \(p_1\) is a homeomorphism with restriction on \(G(f)\).

We already know that \(G(f)\) is an \(F\)-space, so is \(X\). For \(p_1\) we have \(p_1(G(f))=X\) is of the second category (since it's an \(F\)-space and \(p_1\) is one-to-one), and \(p_1\) is continuous and linear on \(G(f)\). By the open mapping theorem, \(p_1\) is an open mapping on \(G(f)\), hence is a homeomorphism thereafter.

Claim 4. \(p_2\) is continuous.

This follows the same way as the proof of claim 2 but much easier since there is no need to care about \(f\).

Now things are immediate once one realises that \(f=p_2 \circ p_1|_{G(f)}^{-1}\), which implies that \(f\) is continuous. \(\square\)


Before we go for theorem 1 at the beginning, we drop an application on Hilbert spaces.

Let \(T\) be a bounded operator on the Hilbert space \(L_2([0,1])\) so that if \(\phi \in L_2([0,1])\) is a continuous function so is \(T\phi\). Then the restriction of \(T\) to \(C([0,1])\) is a bounded operator of \(C([0,1])\).

For details please check this.

Now we go for the identification of norms. Define \[ \begin{aligned} f:(X,\lVert\cdot\rVert_1) &\to (X,\lVert\cdot\rVert_2) \\ x &\mapsto x \end{aligned} \] i.e. the identity map between two Banach spaces (hence \(F\)-spaces). Then \(f\) is linear. We need to prove that \(G(f)\) is closed. For the convergent sequence \((x_n)\) \[ \lim_{n \to \infty}\lVert x_n -x \rVert_1=0, \] we have \[ \lim_{n \to \infty} \lVert f(x_n)-x \rVert_2=\lim_{n \to \infty}\lVert x_n -x\rVert_2=\lim_{n \to \infty}\lVert f(x_n)-f(x)\rVert_2=0. \] Hence \(G(f)\) is closed. Therefore \(f\) is continuous, hence bounded, we have some \(K\) such that \[ \lVert x \rVert_2 =\lVert f(x) \rVert_1 \leq K \lVert x \rVert_1. \] By defining \[ \begin{aligned} g:(X,\lVert\cdot\rVert_2) &\to (X,\lVert\cdot\rVert_1) \\ x &\mapsto x \end{aligned} \] we see \(g\) is continuous as well, hence we have some \(K'\) such that \[ \lVert x \rVert_1 =\lVert g(x) \rVert_2 \leq K'\lVert x \rVert_2 \] Hence two norms are weaker than each other.

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it's time to make a list of the series. It's been around half a year.


  • Walter Rudin, Functional Analysis
  • Peter Lax, Functional Analysis
  • Jesús Gil de Lamadrid, Some Simple Applications of the Closed Graph Theorem

Partition of Unity on Different Manifolds (Part 1. Introduction)

An application of partition of unity

Partition of unity builds a bridge between local properties and global properties. A nice example is the Stokes' theorem on manifolds.

Suppose \(\omega\) is a \((n-1)\)-form with compact support on a oriented manifold \(M\) of dimension \(n\) and if \(\partial{M}\) is given the induced orientation, then \[ \int_M d\omega=\int_{\partial{M}}\omega \]

This theorem can be proved in two steps. First, by Fubini's theorem, one proves the identity on \(\mathbb{R}^n\) and \(\mathbb{H}^n\). Second, for the general case, let \((U_\alpha)\) be an oriented atlas for \(M\) and \((\rho_\alpha)\) a partition of unity to \((U_\alpha)\), one naturally writes \(\omega=\sum_{\alpha}\rho_\alpha\omega\). Since \(\int_M d\omega=\int_{\partial M}\omega\) is linear with respect to \(\omega\), it suffices to prove it only for \(\rho_\alpha\omega\). Note that the support of \(\rho_\alpha\omega\) is contained in the intersection of supports of \(\rho_\alpha\) and \(\omega\), hence a compact set.

On the other hand, \(U_\alpha\) is diffeomorphic to either \(\mathbb{R}^n\) or \(\mathbb{H}^n\), it is immediate that \[ \int_M d\rho_\alpha\omega=\int_{U_\alpha}d\rho_\alpha\omega=\int_{\partial U_\alpha}\rho_\alpha\omega=\int_{\partial{M}}\rho_\alpha\omega. \] Which furnishes the proof for the general case.

As is seen, to prove a global thing, we do it locally. If you have trouble with these terminologies, never mind. We will go through this right now (in a more abstract way however). If you are familiar with them however, fell free to skip.


Manifold (of finite or infinite dimension)

Throughout, we use bold letters like \(\mathbf{E}\), \(\mathbf{F}\) to denote Banach spaces. We will treat Euclidean spaces as a case instead of our restriction. Indeed since Banach spaces are not necessarily of finite dimension, our approach can be troublesome. But the benefit is a better view of abstraction.

Let \(X\) be a set. An atlas of class \(C^p\) (\(p \geq 0\)) on \(X\) is a collection of pairs \((U_i,\varphi_i)\) where \(i\) ranges through some indexing set, satisfying the following conditions:

AT 1. Each \(U_i\) is a subset of \(X\) and \(\bigcup_{i}U_i=X\).

AT 2. Each \(\varphi_i\) is a bijection of \(U_i\) onto an open subset \(\varphi_iU_i\) of some Banach space \(\mathbf{E}_i\) and for any \(i\) and \(j\), \(\phi_i(U_i \cap U_j)\) is open in \(E_i\).

AT 3. The map \[ \varphi_j\circ\varphi_i^{-1}:\varphi_i(U_i \cap U_j) \to \varphi_j(U_i \cap U_j) \] is a \(C^p\)-isomorphism for all \(i\) and \(j\).

One should be advised that isomorphism here does not come from group theory, but category theory. Precisely speaking, it's the isomorphism in the category \(\mathfrak{O}\) whose objects are the continuous maps of Banach spaces and whose morphisms are the continuous maps of class \(C^p\).

Also, by setting \(\tau_X=(U_i)_i\), we see \(\tau_X\) is a topology, and \(\varphi_i\) are topological isomorphisms. Also, we see no need to assume that \(X\) is Hausdorff unless we start with Hausdorff spaces. Lifting this restriction gives us more freedom (also sometimes more difficulty to some extent though).

For condition AT 2, we did not require that the vector spaces be the same for all indexes \(i\), or even that they be toplinearly isomorphic. If they are all equal to the same space \(\mathbf{E}\), then we say that the atlas is an \(\mathbf{E}\)-atlas.

Suppose that we are given an open subset \(U\) of \(X\) and a topological isomorphism \(\phi:U \to U'\) onto an open subset of some Banach space \(\mathbb{E}\). We shall say that \((U,\varphi)\) is compatible with the atlas \((U_i,\varphi_i)_i\) if each map \(\varphi\circ\varphi^{-1}\) is a \(C^p\)-isomorphism. Two atlas are said to be compatible if each chart of one is compatible with other atlas. It can be verified that this is a equivalence relation. An equivalence relation of atlases of class \(C^p\) on \(X\) is said to define a structure of \(C^p\)-manifold on \(X\). If all the vector spaces \(\mathbf{E}_i\) in some atlas are toplinearly isomorphic, we can find some universal \(\mathbf{E}\) that is equal to all of them. In this case, we say \(X\) is a \(\mathbf{E}\)-manifold or that \(X\) is modeled on \(\mathbf{E}\).

As we know, \(\mathbb{R}^n\) is a Banach space. If \(\mathbf{E}=\mathbb{R}^n\) for some fixed \(n\), then we say that the manifold is \(n\)-dimensional. Also we have the local coordinates. A chart \[ \varphi:U \to \mathbb{R}^n \] is given by \(n\) coordinate functions \(\varphi_1,\cdots,\varphi_n\). If \(P\) denotes a point of \(U\), these functions are often written \[ x_1(P),\cdots,x_n(P), \] or simply \(x_1,\cdots,x_n\).

Topological prerequisites

Let \(X\) be a topological space. A covering \(\mathfrak{U}\) of \(X\) is locally finite if every point \(x\) has a neighborhood \(U\) such that all but a finite number of members of \(\mathfrak{U}\) do not intersect with \(U\) (as you will see, this prevents some nonsense summation). A refinement of a covering \(\mathfrak{U}\) is a covering \(\mathfrak{U}'\) such that for any \(U' \in \mathfrak{U}'\), there exists some \(U \in \mathfrak{U}\) such that \(U' \subset U\). If we write \(\mathfrak{U} \leq \mathfrak{U}'\) in this case, we see that the set of open covers on a topological space forms a direct set.

A topological space is paracompact if it is Hausdorff, and every open covering has a locally finite open refinement. Here follows some examples of paracompact spaces.

  1. Any compact Hausdorff space.
  2. Any CW complex.
  3. Any metric space (hence \(\mathbb{R}^n\)).
  4. Any Hausdorff Lindelöf space.
  5. Any Hausdorff \(\sigma\)-compact space

These are not too difficult to prove, and one can easily find proofs on the Internet. Below are several key properties of paracompact spaces.

If \(X\) is paracompact, then \(X\) is normal. (Proof here)

Let \(X\) be a paracompact (hence normal) space and \(\mathfrak{U}=(U_i)\) a locally finite open cover, then there exists a locally finite open covering \(\mathfrak{V}=(V_i)\) such that \(\overline{V_i} \subset U_i\). (Proof here. Note the axiom of choice is assumed.

One can find proofs of the following propositions on Elements of Mathematics, General Topology, Chapter 1-4 by N. Bourbaki. It's interesting to compare them to the corresponding ones of compact spaces.

Every closed subspace \(F\) of a paracompact space \(X\) is paracompact.

The product of a paracompact space and a compact space is paracompact.

Let \(X\) be a locally compact paracompact space. Then every open covering \(\mathfrak{R}\) of \(X\) has a locally finite open refinement \(\mathfrak{R}'\) formed of relatively compact sets. If \(X\) is \(\sigma\)-compact then \(\mathfrak{R}'\) can be taken to be countable.

Partition of unity

A partition of unity (of class \(C^p\)) on a manifold \(X\) consists of an open covering \((U_i)\) of \(X\) and a family of functions \[ \psi_i:X \to \mathbb{R} \] satisfying the following conditions:

PU 1. For all \(x \in X\) we have \(\phi_i(x) \geq 0\).

PU 2. The support of \(\psi_i\) is contained in \(U_i\).

PU 3. The covering is locally finite

PU 4. For each point \(x \in X\) we have \[ \sum_{i}\psi_i(x)=1 \]

The sum in PU 4 makes sense because for given point \(x\), there are only finite many \(i\) such that \(\psi_i(x) >0\), according to PU 3.

A manifold \(X\) will be said to admit partition of unity if it is paracompact, and if, given a locally finite open covering \((U_i)\), there exists a partition of unity \((\psi_i)\) such that the support of \(\psi_i\) is contained in \(U_i\).

Bump function

This function will be useful when dealing with finite dimensional case.

For every integer \(n\) and every real number \(\delta>0\) there exist maps \(\psi_n \in C^{\infty}(\mathbb{R}^n;\mathbb{R})\) which equal \(1\) on \(B(0,1)\) and vanish in \(\mathbb{R}^n\setminus B(1,1+\delta)\).

Proof. It suffices to prove it for \(\mathbb{R}\) since once we proved the existence of \(\psi_1\), then we may write \[ \psi_n(x_1,x_2,\cdots,x_n)=\psi_1(\sqrt{x_1^2+x_2^2+\cdots+x_n^2}). \] Consider the function \(\phi: \mathbb{R} \to \mathbb{R}\) defined by \[ \phi(t)= \begin{cases} \exp\left(\frac{1}{(t-a)(t-b)}\right)&\quad\text{if } a<t<b,\\ 0 &\quad \text{otherwise}. \end{cases} \] The reader may have seen it in some analysis course and should be able to check that \(\phi \in C^{\infty}(\mathbb{R};\mathbb{R})\). Integrating \(\phi\) from \(-\infty\) to \(x\) and divide it by \(\lVert \phi \rVert_1\) (you may have done it in probability theory) to obtain \[ \theta(x)=\frac{\int_{-\infty}^{x}\phi(t)dt}{\int_{-\infty}^{+\infty}\phi(t)dt}; \] it is immediate that \(\theta(x)=0\) for \(x \leq a\) and \(\theta(x)=1\) for \(x \geq b\). By taking \(a=1\) and \(b=(1+\delta)^2\), our job is done by letting \(\psi_1(x)=1-\theta(x^2)\). Considering \(x^2=|x|^2\), one sees that the identity about \(\psi_n\) and \(\psi_1\) is redundant. \(\square\)

In the following blog posts, we will generalize this to Hilbert spaces.

Is partition of unity ALWAYS available?

Of course this is desirable. But we will give an example that sometimes we cannot find a satisfying partition of unity.

Let \(D\) be a connected bounded open set in \(\ell^p\) where \(p\) is not an even integer. Assume \(f\) is a real-valued function, continuous on \(\overline{D}\) and \(n\)-times differentiable in \(D\) with \(n \geq p\). Then \(f(\overline{D}) \subset \overline{f(\partial D)}\).

(Corollary) Let \(f\) be an \(n\)-times differentiable function on \(\ell^p\) space, where \(n \geq p\), and \(p\) is not an even integer. If \(f\) has its support in a bounded set, then \(f\) is identically zero.

It follows that for \(n \geq p\), \(C^n\) partitions of unity do not exists whenever \(p\) is not an even integer. For example,e \(\ell^1[0,1]\) does not have a \(C^2\) partition of unity. It is then our duty to find that under what condition does the desired partition of unity available.

Existence of partition of unity

Below are two theorems about the existence of partitions of unity. We are not proving them here but in the future blog post since that would be rather long. The restrictions on \(X\) are acceptable. For example \(\mathbb{R}^n\) is locally compact and hence the manifold modeled on \(\mathbb{R}^n\).

Let \(X\) be a manifold which is locally compact Hausdorff and whose topology has a countable base. Then \(X\) admits partitions of unity

Let \(X\) be a paracompact manifold of class \(C^p\), modeled on a separable Hilbert space \(E\), then \(X\) admits partitions of unity (of class \(C^p\))


  • N. Bourbaki, Elements of Mathematics
  • S. Lang, Fundamentals of Differential Geometry
  • M. Berger, Differential Geometry: Manifolds, Curves, and Surfaces
  • R. Bonic and J. Frampton, Differentiable Functions on Certain Banach Spaces



对于\(\Gamma\)函数,我们有一个经典的极限式(证明请见ProofWiki)。 \[ \lim_{n\to\infty}\frac{\Gamma(x+1)}{(x/e)^x\sqrt{2\pi{x}}}=1. \] 利用这个式子,我们能立刻计算出一些比较难算的极限。注意到这个公式如果写成自然数的形式,有 \[ \lim_{n \to\infty}\frac{n!}{(n/e)^n\sqrt{2\pi{n}}}=1 \] 所以我们能立刻计算出这个极限: \[ \begin{aligned} \lim_{n \to\infty}\sqrt\frac{n!}{n^n} &= \lim_{n \to\infty}\sqrt\frac{n!\cdot (n/e)^n\sqrt{2\pi{n}}}{n^n \cdot (n/e)^n\sqrt{2\pi n}} \\ &= \lim_{n \to\infty} \sqrt\frac{(n/e)^n\sqrt{2\pi n}}{n^n} \\ &=\frac{1}{e} \end{aligned} \] 但是Stirling公式不仅仅如此。这篇博客里我们会见到几个比较经典的估计。


这一节我们会看到的结论是 \[ 1 < \frac{n!}{(n/e)^n\sqrt{2\pi n}}\leq\frac{e}{\sqrt{2\pi}} \] 如果在计算器里算一下右边的数,会发现,\(\phi_n=\frac{n!}{(n/e)^n\sqrt{2\pi n}}\)一直在\(1\)附近。

对于\(m=1,2,3,\dots\),在\(y=\ln(x)\)下方定义“折线函数”: \[ f(x)=(m+1-x)\ln{m}+(x-m)\ln(m+1) \] 其中\(m \leq x \leq m+1\)。在上方定义另一个“折线函数”: \[ g(x)=\frac{x}{m}-1+\ln{m} \] 其中\(m-1/2 \leq x < m+1/2\)。如果画出\(f\)\(\ln{x}\)\(g\)的图像,会发现,\(f\)\(g\)是对\(\ln{x}\)的拟合。且在\(x \geq 1\)时,我们有 \[ f \leq \ln{x} \leq g. \] 所以计算定积分的时候就有 \[ \int_1^n f(x)dx \leq \int_1^n \ln{x}dx=n\ln{n}-n+1 \leq \int_1^n g(x)dx \] 但是\(f\)\(g\)的关系并不是那么简单。计算\(f\)的积分,我们发现 \[ \begin{aligned} \int_1^n f(x)dx &=\sum_{k=1}^{n-1}\int_{k}^{k+1}f(x)dx \\ &=\sum_{k=1}^{n-1}\left((k+\frac{1}{2})(\ln(k+1)-\ln{k})+(k+1)\ln{k}-k\ln(k+1)\right) \\ &=\ln(n!)-\frac{1}{2}\ln{n} \end{aligned} \] 而对于\(g\),我们又有 \[ \begin{aligned} \int_1^n g(x)dx &= \left(\int_{1}^{\frac{2}{3}}+\sum_{k=2}^{n-1}\int_{k-\frac{1}{2}}^{k+\frac{1}{2}}+\int_{n-\frac{1}{2}}^n\right)g(x)dx \\ &= \frac{1}{8}+\sum_{k=2}^{n-1}\ln{k}+\frac{1}{2}\ln{n}-\frac{1}{8n} \\ &=\frac{1}{8}-\frac{1}{8n}+\ln(n!)-\frac{1}{2}\ln{n} \end{aligned} \] 这就说明 \[ \int_1^n f(x)dx > \int_1^n g(x)dx - \frac{1}{8} \] 总结上面几个不等式,我们得到,对\(n>1\)\[ \int_1^n g(x)dx -\frac{1}{8}<\int_1^n f(x)dx < \int_1^n g(x)dx \] 不等式各项都减去\(\int_1^n \ln x dx\),我们又有 \[ -\frac{1}{8n}+\ln(n!)-(\frac{1}{2}+n)\ln{n}+n < \ln(n!)-(n+\frac{1}{2})\ln{n}+n<\frac{1}{8}-\frac{1}{8n}+\ln(n!)-(n+\frac{1}{2})\ln{n}+n \] 由Stirling公式我们知道, \[ \ln(n!)-(n+\frac{1}{2})\ln{n} + n \to \ln\sqrt{2\pi} \] 而数列\(x_n=-\frac{1}{8n}+\ln(n!)-(\frac{1}{2}+n)\ln{n}+n\)是单调递增的,由上式可知收敛到\(\ln\sqrt{2\pi}\)。在不等式左边,我们取上确界\(\ln\sqrt{2\pi}\)。在不等式右边,我们取下确界\(x_1+\frac{1}{8}=1\)。这就让我们得到了 \[ \ln\sqrt{2\pi}<\ln\frac{n!}{(n/e)^n\sqrt{n}}<1 \] 这也就导致 \[ 1<\frac{n!}{(n/e)^n\sqrt{2\pi n}}\leq\frac{e}{\sqrt{2\pi}} \] 这对所有\(n =1,2,3,\dots\)都成立。


对于任意\(c \in \mathbb{R}\),我们有 \[ \lim_{x \to \infty}\frac{\Gamma(x+c)}{x^c\Gamma(x)}=1 \] 这可以看成,把\(\Gamma(x)\)向左平移\(c\)后,在\(x\)足够大时,其值和\(x^c\Gamma(x)\)接近。这个等式的证明也是比较简单的,虽然计算比较繁琐,只需要利用Stirling公式。 \[ \begin{aligned} \lim_{x \to \infty}\frac{\Gamma(x+c)}{x^c\Gamma(x)} &= \lim_{x \to \infty}\frac{\left(\frac{x+c-1}{e}\right)^{x+c-1}\sqrt{2\pi(x+c-1)}}{x^c\left(\frac{x-1}{e}\right)^{x-1}\sqrt{2\pi(x-1)}} \\ &=\lim_{x \to \infty}\sqrt{\frac{x+c-1}{x-1}} \left(\frac{x+c-1}{ex}\right)^c\left(\frac{x+c-1}{x-1}\right)^{x-1} \end{aligned} \] 现在这三个因式的极限就很好计算了。显然我们有 \[ \lim_{x \to \infty}\sqrt\frac{x+c-1}{x-1}=1 \] 以及 \[ \lim_{x \to \infty}\left(\frac{x+c-1}{ex}\right)^c=\frac{1}{e^c}. \] 最后, \[ \lim_{x \to \infty}\left(\frac{x+c-1}{x-1}\right)^{x-1}=\lim_{x \to \infty} \left(1+\frac{c}{x-1}\right)^{x-1}=e^c \] 故原极限为\(1\)。计算过程也非常精彩。注意到如果把\(x\)\(c\)换成正整数\(n\)和整数\(k\),我们又有 \[ \lim_{n \to \infty}\frac{(n+k-1)!}{n^k(n-1)!}=1. \]


结合Bernoulli不等式我们有 \[ \begin{aligned} \int_{-1}^1 (1-x^2)^ndx &\geq \int_{-1/\sqrt{n}}^{1/\sqrt{n}}(1-x^2)^ndx \\ &\geq \int_{-1/\sqrt{n}}^{1/\sqrt{n}} (1-nx^2)dx \\ &=\frac{4}{3\sqrt{n}}. \end{aligned} \] 接下来我们会给出一个比较精细的估计。实际上, \[ \lim_{n \to \infty}\sqrt{n}\int_{-1}^1 (1-x^2)^ndx=\sqrt{\pi}. \] 根据\(B(x,y)\)函数的定义, \[ B(x,y)=\int_0^1 t^{x-1}(1-t)^{y-1}dt=\frac{\Gamma(x)\Gamma(y)}{\Gamma(x+y)} \] 令$t=u^2,我们得到 \[ B(x,y)=2\int_0^1 u^{2x-1}(1-u^2)^{y-1}du \] 代入\(x=\frac{1}{2}\)\(y=n+1\),我们就和所想要的结果很近了: \[ \begin{aligned} B(\frac{1}{2},n+1)&=2\int_0^1(1-u^2)^ndu \\ &=\int_{-1}^{1}(1-u^2)^ndu \\ &=B(\frac{1}{2},n+1) \\ &=\frac{\Gamma(\frac{1}{2})\Gamma(n+1)}{\Gamma(n+\frac{3}{2})} \end{aligned} \] 注意到,利用\(B\)函数的第二个表达式,我们是可以计算出\(\Gamma(\frac{1}{2})\)的。实际上, \[ B(\frac{1}{2},\frac{1}{2})=2\int_0^1\frac{1}{\sqrt{1-u^2}}du=\pi \] 从而\(\Gamma(\frac{1}{2})=\sqrt{\pi}\)。对于\(B(\frac{1}{2},n+1)\),我们可以用到上面的平移公式了: \[ \lim_{n \to \infty}\frac{\Gamma(n+\frac{3}{2})}{\sqrt{n}\Gamma(n+1)}=1. \] 从而 \[ \lim_{n \to \infty}\sqrt{n}\int_{-1}^{1}(1-x^2)^ndx=\lim_{n \to \infty} \frac{\sqrt{n}\Gamma(\frac{1}{2})\Gamma(n+1)}{\Gamma(n+\frac{1}{2})}=\sqrt{\pi} \]


最后我们证明一个和Stirling公式没有关系的等式 \[ \Gamma\left(\frac{1}{n}\right)\Gamma\left(\frac{2}{n}\right)\cdots\Gamma\left(\frac{n-1}{n}\right)=\frac{(2\pi)^{\frac{n-1}{2}}}{\sqrt{n}} \] 根据古典代数学基本定理,我们立刻有 \[ 1+x+x^2+\cdots+x^{n-1}=\prod_{k=1}^{n-1}\left(x-e^{\frac{2k\pi i}{n}}\right). \] 注意到另一方面 \[ 1+x+x^2+\cdots+x^{n-1}=\frac{x^{n}-1}{x-1} \] \(x=1\)时,我们有 \[ \begin{aligned} n=\prod_{k=1}^{n-1}\left(1-e^{\frac{2k\pi i}{n}}\right)&=\prod_{k=1}^{n-1}\left( e^{-\frac{k\pi i}{n}}-e^{\frac{k \pi i}{n}} \right)e^{\frac{k \pi i}{n}} \\ &=\prod_{k=1}^{n-1}-2i\sin\frac{k\pi}{n}e^{\frac{k\pi i}{n}} \\ &=\left\vert \prod_{k=1}^{n}-2i\sin\frac{k\pi}{n}e^{\frac{k\pi i}{n}}\right\vert \\ &=2^{n-1}\prod_{k=1}^{n-1}\sin\frac{k\pi}{n} \end{aligned} \] 此即 \[ \prod_{k=1}^{n-1}\sin\frac{k\pi}{n}=\frac{n}{2^{n-1}}. \]

考虑到欧拉反射公式,对于\(1 \leq k \leq n-1\),我们有 \[ \Gamma\left(\frac{k}{n}\right)\Gamma\left(\frac{n-k}{n}\right)=\frac{\pi}{\sin\frac{k\pi}{n}}=\frac{\pi}{\sin\frac{(n-k)\pi}{n}} \] 如果\(n\)为奇数,那么根据上面的结果,我们能得到 \[ \Gamma\left(\frac{1}{n}\right)\Gamma\left(\frac{2}{n}\right)\cdots\Gamma\left(\frac{n-1}{n}\right)=\prod_{k=1}^{\frac{n-1}{2}}\Gamma\left(\frac{n-k}{n}\right)\Gamma\left(\frac{k}{n}\right)=\frac{\pi^{(n-1)/2}}{\prod_{k=1}^{(n-1)/2}\sin(k\pi/n)} \] 这时我们只用到了一半数量的\(k\)。要用上另一半的\(k\),我们只需要把\(k\)\(n-k\)交换顺序,从而得到了 \[ \left[\Gamma\left(\frac{1}{n}\right)\Gamma\left(\frac{2}{n}\right)\cdots\Gamma\left(\frac{n-1}{n}\right)\right]^2=\frac{\pi^{n-1}}{n/2^{n-1}} \] 即为所得。如果\(n\)为偶数,只需要把\(1/2\)这一项单独拿出来分两段计算即可。



\[ \lim_{n \to \infty}\left(1+\frac{1}{n}\right)^{n^2}\frac{n!}{n^n\sqrt{n}} \]

如果用Stirling公式直接替换\(n!\),这个极限的结果是显然的。 \[ \begin{aligned} \lim_{n \to \infty}\left(1+\frac{1}{n}\right)^{n^2}\frac{n!}{n^n\sqrt{n}} &=\lim_{n \to \infty}\left(1+\frac{1}{n}\right)^{n^2}\frac{n!}{n^n\sqrt{n}}\frac{(n/e)^n\sqrt{2\pi n}}{n!} \\ &=\lim_{n \to \infty}\left(1+\frac{1}{n}\right)^{n^2}\frac{\sqrt{2\pi}}{e^n} \end{aligned} \] 所以只需要求\((1+\frac{1}{n})^{n^2}e^{-n}\)的极限即可。但是可千万别想当然地认为这个极限是\(1\)。如果我们利用Taylor展开,能得到 \[ \begin{aligned} \lim_{n \to \infty}\left(1+\frac{1}{n}\right)^{n^2}e^{-n}&=\lim_{n \to \infty}\exp\left(n^2\ln\left(1+\frac{1}{n}\right)-n\right) \\ &=\lim_{n \to \infty} \exp\left(n^2\left(\frac{1}{n}-\frac{1}{2n^2}+o\left(\frac{1}{n^2}\right)\right)-n\right) \\ &=\frac{1}{\sqrt{e}} \end{aligned} \] 所以原极限为\(\sqrt\frac{2\pi}{e}\)

\[ \lim_{n\to\infty} \sqrt{n}\prod_{k=1}^{n}\frac{e^{1-\frac{1}{k}}}{\left(1+\frac{1}{k}\right)^k} \]

注意\(n\)项的分子相乘,有\(\exp(n-1-\frac{1}{2}-\cdots-\frac{1}{n})\),而调和级数是发散的,我们想得到收敛,自然就要想到Euler常数\(\gamma=\lim_{n\to\infty}\left(1+\frac{1}{2}+\cdots+\frac{1}{n}-\ln{n}\right)\)。我们似乎也没有办法直接化简分母,我们知道\((1+1/k)^k\)的极限是\(e\),但是这里似乎用不上。所以不如先把分母展开化简一下。 \[ \prod_{k=1}^{n}\left(1+\frac{1}{k}\right)^k=\frac{2^1\cdot3^2\cdot4^3\cdot4^5\cdots{(n+1)^n}}{1^1\cdot2^2\cdot3^3\cdot4^4\cdots{n^n}}=\frac{(n+1)^n}{n!} \] 所以原极限可以写成 \[ \lim_{n \to \infty}\sqrt{n}\frac{n!e^{n-1-\frac{1}{2}-\cdots-\frac{1}{n}}}{(n+1)^n} \] 这时候就可以直接使用Stirling公式了。 \[ \begin{aligned} \lim_{n \to \infty}\sqrt{n}\frac{n!e^{n-1-\frac{1}{2}-\cdots-\frac{1}{n}}}{(n+1)^n} &=\lim_{n\to\infty}\sqrt{n}\frac{n!e^{n-1-\frac{1}{2}-\cdots-\frac{1}{n}}}{(n+1)^n}\cdot\frac{(n/e)^n\sqrt{2\pi n}}{n!} \\ &=\sqrt{2\pi}\lim_{n\to\infty}\frac{n^{n+1}}{(n+1)^n}e^{-1-\frac{1}{2}-\cdots-\frac{1}{n}} \\ &=\sqrt{2\pi}\lim_{n\to\infty}\left(1+\frac{1}{n}\right)^{-n}\cdot e^{\ln{n}}\cdot e^{-1-\frac{1}{2}-\frac{1}{3}-\cdots-\frac{1}{n}} \end{aligned} \]\(\lim_{n\to\infty}\left(1+\frac{1}{n}\right)^{-n}=e^{-1}\)\(\lim_{n\to\infty}e^{\ln{n}-1-\frac{1}{2}-\frac{1}{3}-\cdots-\frac{1}{n}}=e^{-\gamma}\),我们得到原极限为\(\frac{\sqrt{2\pi}}{e^{1+\gamma}}\)

A proof of the ordinary Gleason-Kahane-Żelazko theorem for complex functionals

The Theorem

(Gleason-Kahane-Żelazko) If \(\phi\) is a complex linear functional on a unitary Banach algebra \(A\), such that \(\phi(e)=1\) and \(\phi(x) \neq 0\) for every invertible \(x \in A\), then \[ \phi(xy)=\phi(x)\phi(y) \] Namely, \(\phi\) is a complex homomorphism.

Notations and remarks

Suppose \(A\) is a complex unitary Banach algebra and \(\phi: A \to \mathbb{C}\) is a linear functional which is not identically \(0\) (for convenience), and if \[ \phi(xy)=\phi(x)\phi(y) \] for all \(x \in A\) and \(y \in A\), then \(\phi\) is called a complex homomorphism on \(A\). Note that a unitary Banach algebra (with \(e\) as multiplicative unit) is also a ring, so is \(\mathbb{C}\), we may say in this case \(\phi\) is a ring-homomorphism. For such \(\phi\), we have an instant proposition:

Proposition 0 \(\phi(e)=1\) and \(\phi(x) \neq 0\) for every invertible \(x \in A\).

Proof. Since \(\phi(e)=\phi(ee)=\phi(e)\phi(e)\), we have \(\phi(e)=0\) or \(\phi(e)=1\). If \(\phi(e)=0\) however, for any \(y \in A\), we have \(\phi(y)=\phi(ye)=\phi(y)\phi(e)=0\), which is an excluded case. Hence \(\phi(e)=1\).

For invertible \(x \in A\), note that \(\phi(xx^{-1})=\phi(x)\phi(x^{-1})=\phi(e)=1\). This can't happen if \(\phi(x)=0\). \(\square\)

The theorem reveals that Proposition \(0\) actually characterizes the complex homomorphisms (ring-homomorphisms) among the linear functionals (group-homomorphisms).

This theorem was proved by Andrew M. Gleason in 1967 and later independently by J.-P. Kahane and W. Żelazko in 1968. Both of them worked mainly on commutative Banach algebras, and the non-commutative version, which focused on complex homomorphism, was by W. Żelazko. In this post we will follow the third one.

Unfortunately, one cannot find an educational proof on the Internet with ease, which may be the reason why I write this post and why you read this.


Following definitions of Banach algebra and some logic manipulation, we have several equivalences worth noting.

Subspace and ideal version

(Stated by Gleason) Let \(M\) be a linear subspace of codimension one in a commutative Banach algebra \(A\) having an identity. Suppose no element of \(M\) is invertible, then \(M\) is an ideal.

(Stated by Kahane and Żelazko) A subspace \(X \subset A\) of codimension \(1\) is a maximal ideal if and only if it consists of non-invertible elements.

Spectrum version

(Stated by Kahane and Żelazko) Let \(A\) be a commutative complex Banach algebra with unit element. Then a functional \(f \in A^\ast\) is a multiplicative linear functional if and only if \(f(x)=\sigma(x)\) holds for all \(x \in A\).

Here \(\sigma(x)\) denotes the spectrum of \(x\).

The connection

Clearly any maximal ideal contains no invertible element (if so, then it contains \(e\), then it's the ring itself). So it suffices to show that it has codimension 1, and if it consists of non-invertible elements. Also note that every maximal ideal is the kernel of some complex homomorphism. For such a subspace \(X \subset A\), since \(e \notin X\), we may define \(\phi\) so that \(\phi(e)=1\), and \(\phi(x) \in \sigma(x)\) for all \(x \in A\). Note that \(\phi(e)=1\) holds if and only if \(\phi(x) \in \sigma(x)\). As we will show, \(\phi\) has to be a complex homomorphism.

Tools to prove the theorem

Lemma 0 Suppose \(A\) is a unitary Banach algebra, \(x \in A\), \(\lVert x \rVert<1\), then \(e-x\) is invertible.

This lemma can be found in any functional analysis book introducing Banach algebra.

Lemma 1 Suppose \(f\) is an entire function of one complex variable, \(f(0)=1\), \(f'(0)=0\), and \[ 0<|f(\lambda)| \leq e^{|\lambda|} \] for all complex \(\lambda\), then \(f(\lambda)=1\) for all \(\lambda \in \mathbb{C}\).

Note that there is an entire function \(g\) such that \(f=\exp(g)\). It can be shown that \(g=0\). Indeed, if we put \[ h_r(\lambda) = \frac{r^2g(\lambda)}{\lambda^2[2r-g(\lambda)]} \] then we see \(h_r\) is holomorphic in the open disk centred at \(0\) with radius \(2r\). Besides, \(|h_r(\lambda)| \leq 1\) if \(|\lambda|=r\). By the maximum modulus theorem, we have \[ |h_r(\lambda)| \leq 1 \] whenever \(|\lambda| \leq r\). Fix \(\lambda\) and let \(r \to \infty\), by definition of \(h_r(\lambda)\), we must have \(g(\lambda)=0\).

Jordan homomorphism

A mapping \(\phi\) from one ring \(R\) to another ring \(R'\) is said to be a Jordan homomorphism from \(R\) to \(R'\) if \[ \phi(a+b)=\phi(a)+\phi(b) \] and \[ \phi(ab+ba)=\phi(a)\phi(b)+\phi(b)\phi(a). \] It's of course clear that every homomorphism is Jordan. Note if \(R'\) is not of characteristic \(2\), the second identity is equivalent to \[ \phi(a^2)=\phi(a)^2. \] To show the equivalence, one let \(b=a\) in the first case and puts \(a+b\) in place of \(a\) in the second case.

Since in this case \(R=A\) and \(R'=\mathbb{C}\), the latter of which is commutative, we also write \[ \phi(ab+ba)=2\phi(a)\phi(b). \] As we will show, the \(\phi\) in the theorem is a Jordan homomorphism.

The proof

We will follow an unusual approach. By keep 'downgrading' the goal, one will see this algebraic problem be transformed into a pure analysis problem neatly.

To begin with, let \(N\) be the kernel of \(\phi\).

Step 1 - It suffices to prove that \(\phi\) is a Jordan homomorphism

If \(\phi\) is a complex homomorphism, it is immediate that \(\phi\) is a Jordan homomorphism. Conversely, if \(\phi\) is Jordan, we have \[ \phi(xy+yx) =2\phi(x)\phi(y). \] If \(x\in N\), the right hand becomes \(0\), and therefore \[ xy+yx \in N \quad \text{if } x \in N, y \in A. \] Consider the identity \[ (xy-yx)^2+(xy+yx)^2=2[x(yxy)+(yxy)x] \]

Therefore \[ \begin{aligned} \phi((xy-yx)^2+(xy+yx)^2)&=\phi((xy-yx)^2)+\phi((xy+yx)^2) \\ &=\phi(xy-yx)^2+\phi(xy+yx)^2 \\ &= \phi(xy-yx)^2 \\ &=2\phi[x(yxy)+(yxy)x] \\ &=0 \end{aligned} \] Since \(x \in N\) and \(yxy \in A\), we see \(x(yxy)+(yxy)x \in N\). Therefore \(\phi(xy-yx)=0\) and \[ xy-yx \in N \] if \(x \in N\) and \(y \in A\). Further we see \[ xy-yx+xy+yx=2xy \in N \quad \text {and}\quad xy+yx-xy+yx = 2yx \in N, \] which implies that \(N\) is an ideal. This may remind you of this classic diagram (we will not use it since it is additive though):

Ring Homomorphism

For \(x,y \in A\), we have \(x \in \phi(x)e+N\) and \(y \in \phi(y)e+N\). As a result, \(xy \in \phi(x)\phi(y)e+N\), and therefore \[ \phi(xy)=\phi(x)\phi(y)+0. \]

Step 2 - It suffices to prove that \(\phi(a^2)=0\) if \(\phi(a)=0\).

Again, if \(\phi\) is Jordan, we have \(\phi(x^2)=\phi(x)^2\) for all \(x \in A\). Conversely, if \(\phi(a^2)=0\) for all \(a \in N\), we may write \(x\) by \[ x=\phi(x)e+a \] where \(a \in N\) for all \(x \in A\). Therefore \[ \begin{aligned} \phi(x^2)&=\phi((\phi(x)e+a)^2)=\phi(x)^2+2\phi(x)\phi(a)+\phi(a)^2=\phi(x)^2, \end{aligned} \] which also shows that \(\phi\) is Jordan.

Step 3 - It suffices to show that the following function is constant

Fix \(a \in N\), assume \(\lVert a \rVert = 1\) without loss of generality, and define \[ f(\lambda)=\sum_{n=0}^{\infty}\frac{\phi(a^n)}{n!}\lambda^n \] for all complex \(\lambda\). If this function is constant (lemma 1), we immediately have \(f''(0)=\phi(a^2)=0\). This is purely a complex analysis problem however.

Step 4 - It suffices to describe the behaviour of an entire function

Note in the definition of \(f\), we have \[ \lvert \phi(a^n) \rvert \leq \lVert \phi \rVert \lVert a^n \rVert \leq \lVert \phi \rVert \lVert a \rVert^n=\lVert \phi \rVert. \] So we expect the norm of \(\phi\) to be finite, which ensures that \(f\) is entire. By reductio ad absurdum, if \(\lVert e-a \rVert < 1\) for \(a \in N\), by lemma 0, we have \(e-e+a=a\) to be invertible, which is impossible. Hence \(\lVert e-a \rVert \geq 1\) for all \(a \in N\). On the other hand, for \(\lambda \in \mathbb{C}\), we have the following inequality: \[ \begin{aligned} \lVert \lambda e-a \rVert = \lambda\lVert e-\lambda^{-1}a \rVert &\geq|\lambda| \\ &= |\phi(\lambda e)-\phi(a)| \\ &= |\phi(\lambda e-a)| \end{aligned} \] Therefore \(\phi\) is continuous with norm less than \(1\). The continuity of \(\phi\) is not assumed at the beginning but proved here.

For \(f\) we have some immediate facts. Since each coefficient in the series of \(f\) has finite norm, \(f\) is entire with \(f'(0)=\phi(a)=0\). Also, since \(\phi\) has norm \(1\), we also have \[ |f(\lambda)|=\left|\sum_{n=0}^{\infty}\frac{\phi(a^n)}{n!}\lambda^n\right| \leq \sum_{n=0}^{\infty}\frac{|\lambda^n|}{n!}=e^{|\lambda|}. \] All we need in the end is to show that \(f(\lambda) \neq 0\) for all \(\lambda \in \mathbb{C}\).

The series \[ E(\lambda)=\exp(a\lambda)=\sum_{n=0}^{\infty}\frac{(\lambda a)^n}{n!} \] converges since \(\lVert a \rVert=1\). The continuity of \(\phi\) shows now \[ f(\lambda)=\phi(E(\lambda)). \] Note \[ E(-\lambda)E(\lambda)=\left(\sum_{n=0}^{\infty}\frac{(-\lambda a)^n}{n!}\right)\left(\sum_{n=0}^{\infty}\frac{(\lambda a)^n}{n!}\right)=e. \] Hence \(E(\lambda)\) is invertible for all \(\lambda \in C\), hence \(f(\lambda)=\phi(E(\lambda)) \neq 0\). By lemma 1, \(f(\lambda)=1\) is constant. The proof is completed by reversing the steps. \(\square\)

References / Further reading

  • Walter Rudin, Real and Complex Analysis
  • Walter Rudin, Functional Analysis
  • Andrew M. Gleason, A Characterization of Maximal Ideals
  • J.-P. Kahane and W. Żelazko, A Characterization of Maximal Ideals in Commutative Banach Algebras
  • W. Żelazko A Characterization of Multiplicative linear functionals in Complex Banach Algebras
  • I. N. Herstein, Jordan Homomorphisms

The Big Three Pt. 5 - The Hahn-Banach Theorem (Dominated Extension)

About this post

The Hahn-Banach theorem has been a central tool for functional analysis and therefore enjoys a wide variety, many of which have a numerous uses in other fields of mathematics. Therefore it's not possible to cover all of them. In this post we are covering two 'abstract enough' results, which are sometimes called the dominated extension theorem. Both of them will be discussed in real vector space where topology is not endowed. This allows us to discuss any topological vector space.

Another interesting thing is, we will be using axiom of choice, or whatever equivalence you may like, for example Zorn's lemma or well-ordering principle. Before everything, we need to examine more properties of vector spaces.

Vector space

It's obvious that every complex vector space is also a real vector space. Suppose \(X\) is a complex vector space, and we shall give the definition of real-linear and complex-linear functionals.

An addictive functional \(\Lambda\) on \(X\) is called real-linear (complex-linear) if \(\Lambda(\alpha x)=\alpha\Lambda(x)\) for every \(x \in X\) and for every real (complex) scalar \(\alpha\).

For *-linear functionals, we have two important but easy theorems.

If \(u\) is the real part of a complex-linear functional \(f\) on \(X\), then \(u\) is real-linear and \[ f(x)=u(x)-iu(ix) \quad (x \in X). \]

Proof. For complex \(f(x)=u(x)+iv(x)\), it suffices to denote \(v(x)\) correctly. But \[ if(x)=iu(x)-v(x), \] we see \(\Im(f(x)=v(x)=-\Re(if(x))\). Therefore \[ f(x)=u(x)-i\Re(if(x))=u(x)-i\Re(f(ix)) \] but \(\Re(f(ix))=u(ix)\), we get \[ f(x)=u(x)-iu(ix). \] To show that \(u(x)\) is real-linear, note that \[ f(x+y)=u(x+y)+iv(x+y)=f(x)+f(y)=u(x)+u(y)+i(v(x)+v(y)). \] Therefore \(u(x)+u(y)=u(x+y)\). Similar process can be applied to real scalar \(\alpha\). \(\square\)

Conversely, we are able to generate a complex-linear functional by a real one.

If \(u\) is a real-linear functional, then \(f(x)=u(x)-iu(ix)\) is a complex-linear functional

Proof. Direct computation. \(\square\)

Suppose now \(X\) is a complex topological vector space, we see a complex-linear functional on \(X\) is continuous if and only if its real part is continuous. Every continuous real-linear \(u: X \to \mathbb{R}\) is the real part of a unique complex-linear continuous functional \(f\).

Sublinear, seminorm

Sublinear functional is 'almost' linear but also 'almost' a norm. Explicitly, we say \(p: X \to \mathbb{R}\) a sublinear functional when it satisfies \[ \begin{aligned} p(x)+p(y) &\leq p(x+y) \\ p(tx) &= tp(x) \\ \end{aligned} \] for all \(t \geq 0\). As one can see, if \(X\) is normable, then \(p(x)=\lVert x \rVert\) is a sublinear functional. One should not be confused with semilinear functional, where inequality is not involved. Another thing worth noting is that \(p\) is not restricted to be nonnegative.

A seminorm on a vector space \(X\) is a real-valued function \(p\) on \(X\) such that \[ \begin{aligned} p(x+y) &\leq p(x)+p(y) \\ p(\alpha x)&=|\alpha|p(x) \end{aligned} \] for all \(x,y \in X\) and scalar \(\alpha\).

Obviously a seminorm is also a sublinear functional. For the connection between norm and seminorm, one shall note that \(p\) is a norm if and only if it satisfies \(p(x) \neq 0\) if \(x \neq 0\).

Dominated extension theorems

Are the results will be covered in this post. Generally speaking, we are able to extend a functional defined on a subspace to the whole space as long as it's dominated by a sublinear functional. This is similar to the dominated convergence theorem, which states that if a convergent sequence of measurable functions are dominated by another function, then the convergence holds under the integral operator.

(Hahn-Banach) Suppose

  1. \(M\) is a subspace of a real vector space \(X\),
  2. \(f: M \to \mathbb{R}\) is linear and \(f(x) \leq p(x)\) on \(M\) where \(p\) is a sublinear functional on \(X\)

Then there exists a linear \(\Lambda: X \to \mathbb{R}\) such that \[ \Lambda(x)=f(x) \] for all \(x \in M\) and \[ -p(-x) \leq \Lambda(x) \leq p(x) \] for all \(x \in X\).

Step 1 - Extending the function by one dimension

With that being said, if \(f(x)\) is dominated by a sublinear functional, then we are able to extend this functional to the whole space with a relatively proper range.

Proof. If \(M=X\) we have nothing to do. So suppose now \(M\) is a nontrivial proper subspace of \(X\). Choose \(x_1 \in X-M\) and define \[ M_1=\{x+tx_1:x \in M,t \in R\}. \] It's easy to verify that \(M_1\) satisfies all axioms of vector space (warning again: no topology is endowed). Now we will be using the properties of sublinear functionals.

Since \[ f(x)+f(y)=f(x+y) \leq p(x+y) \leq p(x-x_1)+p(x_1+y) \] for all \(x,y \in M\), we have \[ f(x)-p(x-x_1) \leq p(x_1+y) -f(y). \] Let \[ \alpha=\sup_{x}\{f(x)-p(x-x_1):x \in M\}. \] By definition, we naturally get \[ f(x)-\alpha \leq p(x-x_1) \] and \[ f(y)+\alpha \leq p(x_1+y). \] Define \(f_1\) on \(M_1\) by \[ f_1(x+tx_1)=f(x)+t\alpha. \] So when \(x +tx_1 \in M\), we have \(t=0\), and therefore \(f_1=f\).

To show that \(f_1 \leq p\) on \(M_1\), note that for \(t>0\), we have \[ f(x/t)-\alpha \leq p(x/t-x_1), \] which implies \[ f(x)-t\alpha=f_1(x-t\alpha)\leq p(x-tx_1). \] Similarly, \[ f(y/t)+\alpha \leq p(y/t+x_1), \] and therefore \[ f(y)+t\alpha=f_1(y+tx_1) \leq p(y+tx_1). \] Hence \(f_1 \leq p\).

Step 2 - An application of Zorn's lemma

Side note: Why Zorn's lemma

It seems that we can never stop using step 1 to extend \(M\) to a larger space, but we have to extend. (If \(X\) is a finite dimensional space, then this is merely a linear algebra problem.) This meets exactly what William Timothy Gowers said in his blog post:

If you are building a mathematical object in stages and find that (i) you have not finished even after infinitely many stages, and (ii) there seems to be nothing to stop you continuing to build, then Zorn’s lemma may well be able to help you.

-- How to use Zorn's lemma

And we will show that, as W. T. Gowers said,

If the resulting partial order satisfies the chain condition and if a maximal element must be a structure of the kind one is trying to build, then the proof is complete.

To apply Zorn's lemma, we need to construct a partially ordered set. Let \(\mathscr{P}\) be the collection of all ordered pairs \((M',f')\) where \(M'\) is a subspace of \(X\) containing \(M\) and \(f'\) is a linear functional on \(M'\) that extends \(f\) and satisfies \(f' \leq p\) on \(M'\). For example we have \[ (M,f) , (M_1,f_1) \subset \mathscr{P}. \] The partial order \(\leq\) is defined as follows. By \((M',f') \leq (M'',f'')\), we mean \(M' \subset M''\) and \(f' = f''\) on \(M'\). Obviously this is a partial order (you should be able to check this).

Suppose now \(\mathcal{F}\) is a chain (totally ordered subset of \(\mathscr{P}\)). We claim that \(\mathcal{F}\) has an upper bound (which is required by Zorn's lemma). Let \[ M_0=\bigcup_{(M',f') \in \mathcal{F}}M' \] and \[ f_0(y)=f(y) \] whenever \((M',f') \in \mathcal{F}\) and \(y \in M'\). It's easy to verify that \((M_0,f_0)\) is the upper bound we are looking for. But \(\mathcal{F}\) is arbitrary, therefore by Zorn's lemma, there exists a maximal element \((M^\ast,f^\ast)\) in \(\mathscr{P}\). If \(M^* \neq X\), according to step 1, we are able to extend \(M^\ast\), which contradicts the maximality of \(M^\ast\). And \(\Lambda\) is defined to be \(f^\ast\). By the linearity of \(\Lambda\), we see \[ -p(-x) \leq -\Lambda(-x)=\Lambda{x}. \] The theorem is proved. \(\square\)

How this proof is constructed

This is a classic application of Zorn's lemma (well-ordering principle, or Hausdorff maximality theorem). First, we showed that we are able to extend \(M\) and \(f\). But since we do not know the dimension or other properties of \(X\), it's not easy to control the extension which finally 'converges' to \((X,\Lambda)\). However, Zorn's lemma saved us from this random exploration: Whatever happens, the maximal element is there, and take it to finish the proof.

Generalisation onto the complex field

Since inequality is appeared in the theorem above, we need more careful validation.

(Bohnenblust-Sobczyk-Soukhomlinoff) Suppose \(M\) is a subspace of a vector space \(X\), \(p\) is a seminorm on \(X\), and \(f\) is a linear functional on \(M\) such that \[ |f(x)| \leq p(x) \] for all \(x \in M\). Then \(f\) extends to a linear functional \(\Lambda\) on \(X\) satisfying \[ |\Lambda (x)| \leq p(x) \] for all \(x \in X\).

Proof. If the scalar field is \(\mathbb{R}\), then we are done, since \(p(-x)=p(x)\) in this case (can you see why?). So we assume the scalar field is \(\mathbb{C}\).

Put \(u = \Re f\). By dominated extension theorem, there is some real-linear functional \(U\) such that \(U(x)=u\) on \(M\) and \(U \leq p\) on \(X\). And here we have \[ \Lambda(x)=U(x)-iU(ix) \] where \(\Lambda(x)=f(x)\) on \(M\).

To show that \(|\Lambda(x)| \leq p(x)\) for \(x \neq 0\), by taking \(\alpha=\frac{|\Lambda(x)|}{\Lambda(x)}\), we have \[ U(\alpha{x})=\Lambda(\alpha{x})=|\Lambda(x)|\leq p(\alpha x)=p(x) \] since \(|\alpha|=1\) and \(p(\alpha{x})=|\alpha|p(x)=p(x)\). \(\square\)

Extending Hahn-Banach theorem under linear transform

To end this post, we state a beautiful and useful extension of the Hahn-Banach theorem, which is done by R. P. Agnew and A. P. Morse.

(Agnew-Morse) Let \(X\) denote a real vector space and \(\mathcal{A}\) be a collection of linear maps \(A_\alpha: X \to X\) that commute, or namely \[ A_\alpha A_\beta=A_\beta A_\alpha \] for all \(A_\alpha,A_\beta \in \mathcal{A}\). Let \(p\) be a sublinear functional such that \[ p(A_\alpha{x})=p(x) \] for all \(A_\alpha \in \mathcal{A}\). Let \(Y\) be a subspace of \(X\) on which a linear functional \(f\) is defined such that

  1. \(f(y) \leq p(y)\) for all \(y \in Y\).
  2. For each mapping \(A\) and \(y \in Y\), we have \(Ay \in Y\).
  3. Under the hypothesis of 2, we have \(f(Ay)=f(y)\).

Then \(f\) can be extended to \(X\) by \(\Lambda\) so that \(-p(-x) \leq \Lambda(x) \leq p(x)\) for all \(x \in X\), and \[ \Lambda(A_\alpha{x})=\Lambda{x}. \]

To prove this theorem, we need to construct a sublinear functional that dominates \(f\). For the whole proof, see Functional Analysis by Peter Lax.

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it's time to make a list of the series. It's been around half a year.

References / Further Readings

  1. Walter Rudin, Functional Analysis.
  2. Peter Lax, Functional Analysis.
  3. William Timothy Gowers, How to use Zorn's lemma.

A long exact sequence of cohomology groups (zig-zag and diagram-chasing)

Exterior differentiation

(This section is intended to introduce the background. Feel free to skip if you already know exterior differentiation.)

There are several useful tools for vector calculus on \(\mathbb{R}^3,\) namely gradient, curl, and divergence. It is possible to treat the gradient of a differentiable function \(f\) on \(\mathbb{R}^3\) at a point \(x_0\) as the Fréchet derivative at \(x_0\). But it does not work for curl and divergence at all. Fortunately there is another abstraction that works for all of them. It comes from differential forms.

Let \(x_1,\cdots,x_n\) be the linear coordinates on \(\mathbb{R}^n\) as usual. We define an algebra \(\Omega^{\ast}\) over \(\mathbb{R}\) generated by \(dx_1,\cdots,dx_n\) with the following relations: \[ \begin{cases} dx_idx_i=0 \\ dx_idx_j = -dx_jdx_i \quad i \neq j \end{cases} \] This is a vector space as well, and it's easy to derive that it has a basis by \[ 1,dx_i,dx_idx_j,dx_idx_jdx_k,\cdots,dx_1\dots dx_n \] where \(i<j<k\). The \(C^{\infty}\) differential forms on \(\mathbb{R}^n\) are defined to be the tensor product \[ \Omega^*(\mathbb{R}^n)=\{C^{\infty}\text{ functions on }\mathbb{R}^n\} \otimes_\mathbb{R}\Omega^*. \] As is can be shown, for \(\omega \in \Omega^{\ast}(\mathbb{R}^n)\), we have a unique representation by \[ \omega=\sum f_{i_1\cdots i_k}dx_{i_1}\dots dx_{i_k}, \] and in this case we also say \(\omega\) is a \(C^{\infty}\) \(k\)-form on \(\mathbb{R}^n\) (for simplicity we also write \(\omega=\sum f_Idx_I\)). The algebra of all \(k\)-forms will be denoted by \(\Omega^k(\mathbb{R}^n)\). And naturally we have \(\Omega^{\ast}(\mathbb{R}^n)\) to be graded since \[ \Omega^{*}(\mathbb{R}^n)=\bigoplus_{k=0}^{n}\Omega^k(\mathbb{R}^n). \]

The operator \(d\)

But if we have \(\omega \in \Omega^0(\mathbb{R}^n)\), we see \(\omega\) is merely a \(C^{\infty}\) function. As taught in multivariable calculus course, for the differential of \(\omega\) we have \[ d\omega=\sum_{i}\partial\omega/\partial x_idx_i \] and it turns out that \(d\omega\in\Omega^{1}(\mathbb{R}^n)\). This inspires us to obtain a generalization onto the differential operator \(d\): \[ \begin{aligned} d:\Omega^{k}(\mathbb{R}^n) &\to \Omega^{k+1}(\mathbb{R}^n) \\ \omega &\mapsto d\omega \end{aligned} \] and \(d\omega\) is defined as follows. The case when \(k=0\) is defined as usual (just the one above). For \(k>0\) and \(\omega=\sum f_I dx_I,\) \(d\omega\) is defined 'inductively' by \[ d\omega=\sum df_I dx_I. \] This \(d\) is the so-called exterior differentiation, which serves as the ultimate abstract extension of gradient, curl, divergence, etc. If we restrict ourself to \(\mathbb{R}^3\), we see these vector calculus tools comes up in the nature of things.

Functions \[ df=\frac{\partial f}{\partial x}dx+\frac{\partial f}{\partial y}dy+\frac{\partial f}{\partial z}dz. \] \(1\)-forms \[ d(f_1dx+f_2dy+f_3dz)=\left(\frac{\partial f_3}{\partial y}-\frac{\partial f_2}{\partial z}\right)dydz-\left(\frac{\partial f_1}{\partial z}-\frac{\partial f_3}{\partial x}\right)dxdz+\left(\frac{\partial f_2}{\partial x}-\frac{\partial f_1}{\partial y}\right)dxdy. \] \(2\)-forms \[ d(f_1dydz-f_2dxdz+f_3dxdy)=\left(\frac{\partial f_1}{\partial x}+\frac{\partial f_2}{\partial y}+ \frac{\partial f_3}{\partial z}\right)dxdydz. \] The calculation is tedious but a nice exercise to understand the definition of \(d\) and \(\Omega^{\ast}\).

Conservative field - on the kernel and image of \(d\)

By elementary computation we are also able to show that \(d^2\omega=0\) for all \(\omega \in \Omega^{\ast}(\mathbb{R}^n)\) (Hint: \(\frac{\partial^2 f}{\partial x_i \partial x_j}=\frac{\partial^2 f}{\partial x_j \partial x_i}\) but \(dx_idx_j=-dx_idx_j\)). Now we consider a vector field \(\overrightarrow{v}=(v_1,v_2)\) of dimension \(2\). If \(C\) is an arbitrary simply closed smooth curve in \(\mathbb{R}^2\), then we expect \[ \oint_C\overrightarrow{v}d\overrightarrow{r}=\oint_C v_1dx+v_2dy \] to be \(0\). If this happens (note the arbitrary of \(C\)), we say \(\overrightarrow{v}\) to be a conservative field (path independent).

So when conservative? It happens when there is a function \(f\) such that \[ \nabla f=\overrightarrow{v}=(v_1,v_2)=(\partial{f}/\partial{x},\partial{f}/\partial{y}). \] This is equivalent to say that \[ df=v_1dx+v_2dy. \] If we use \(C^{\ast}\) to denote the area enclosed by \(C\), by Green's theorem, we have \[ \begin{aligned} \oint_C v_1dx+v_2dy&=\iint_{C^*}\left(\frac{\partial{v_2}}{\partial{x}}-\frac{\partial{v_1}}{\partial{y}}\right)dxdy \\ &=\iint_{C^*}d(v_1dx+v_2dy) \\ &=\iint_{C^*}d^2f \\ &=0 \end{aligned} \] If you translate what you've learned in multivariable calculus course (path independence) into the language of differential form, you will see that the set of all conservative fields is precisely the image of \(d_0:\Omega^0(\mathbb{R}^2) \to \Omega^1(\mathbb{R}^2)\). Also, they are in the kernel of the next \(d_1:\Omega^1(\mathbb{R}^2) \to \Omega^2(\mathbb{R}^2)\). These \(d\)'s are naturally homomorphism, so it's natural to discuss the factor group. But before that, we need some terminologies.

de Rham complex and de Rham cohomology group

The complex \(\Omega^{\ast}(\mathbb{R}^n)\) together with \(d\) is called the de Rham complex on \(\mathbb{R}^n\). Now consider the sequence \[ \Omega^0(\mathbb{R}^n)\xrightarrow{d_0}\Omega^1(\mathbb{R}^n)\xrightarrow{d_1}\cdots\xrightarrow{d_{n-2}}\Omega^{n-1}(\mathbb{R}^n)\xrightarrow{d_{n-1}}\Omega^{n}(\mathbb{R^n}). \] We say \(\omega \in \Omega^k(\mathbb{R}^n)\) is closed if \(d_k\omega=0\), or equivalently, \(\omega \in \ker d_k\). Dually, we say \(\omega\) is exact if there exists some \(\mu \in \Omega^{k-1}(\mathbb{R}^n)\) such that \(d\mu=\omega\), that is, \(\omega \in \operatorname{im}d_{k-1}\). Of course all \(d_k\)'s can be written as \(d\) but the index makes it easier to understand. Instead of doing integration or differentiation, which is 'uninteresting', we are going to discuss the abstract structure of it.

The \(k\)-th de Rham cohomology in \(\mathbb{R}^n\) is defined to be the factor space \[ H_{DR}^{k}(\mathbb{R}^n)=\frac{\ker d_k}{\operatorname{im} d_{k-1}}. \] As an example, note that by the fundamental theorem of calculus, every \(1\)-form is exact, therefore \(H_{DR}^1(\mathbb{R})=0\).

Since de Rham complex is a special case of differential complex, and other restrictions of de Rham complex plays no critical role thereafter, we are going discuss the algebraic structure of differential complex directly.

The long exact sequence of cohomology groups

We are going to show that, there exists a long exact sequence of cohomology groups after a short exact sequence is defined. For the convenience let's recall here some basic definitions

Exact sequence

A sequence of vector spaces (or groups) \[ \cdots \rightarrow G_{k-1} \xrightarrow{f_{k-1}} G_k \xrightarrow{f_k} G_{k+1} \xrightarrow{f_{k+1}}\cdots \] is said to be exact if the image of \(f_{k-1}\) is the kernel of \(f_k\) for all \(k\). Sometimes we need to discuss a extremely short one by \[ 0 \rightarrow A \xrightarrow{f} B \xrightarrow{g} C \rightarrow 0. \] As one can see, \(f\) is injective and \(g\) is surjective.

Differential complex

A direct sum of vector spaces \(C=\oplus_{k \in \mathbb{Z}}C^k\) is called a differential complex if there are homomorphisms by \[ \cdots \rightarrow C^{k-1} \xrightarrow{d_{k-1}} C^k \xrightarrow{d_k} C^{k+1} \xrightarrow{d_{k+1}}\cdots \] such that \(d_{k-1}d_k=0\). Sometimes we write \(d\) instead of \(d_{k}\) since this differential operator of \(C\) is universal. Therefore we may also say that \(d^2=0\). The cohomology of \(C\) is the direct sum of vector spaces $H(C)=_{k }H^k(C) $ where \[ H^k(C)=\frac{\ker d_{k}}{\operatorname{im}d_{k-1}}. \] A map \(f: A \to B\) where \(A\) and \(B\) are differential complexes, is called a chain map if we have \(fd_A=d_Bf\).

The sequence

Now consider a short exact sequence of differential complexes \[ 0 \rightarrow A \xrightarrow{f} B \xrightarrow{g} C \rightarrow 0 \] where both \(f\) and \(g\) are chain maps (this is important). Then there exists a long exact sequence by \[ \cdots\rightarrow H^q(A) \xrightarrow{f^*} H^{q}(B) \xrightarrow{g^*} H^q(C)\xrightarrow{d^{*}}H^{q+1}(A) \xrightarrow{f^*}\cdots. \] Here, \(f^{\ast}\) and \(g^{\ast}\) are the naturally induced maps. For \(c \in C^q\), \(d^{\ast}[c]\) is defined to be the cohomology class \([a]\) where \(a \in A^{q+1}\), and that \(f(a)=db\), and that \(g(b)=c\). The sequence can be described using the two-layer commutative diagram below.


The long exact sequence is actually the purple one (you see why people may call this zig-zag lemma). This sequence is 'based on' the blue diagram, which can be considered naturally as an expansion of the short exact sequence. The method that will be used in the following proof is called diagram-chasing, whose importance has already been described by Professor James Munkres: master this. We will be abusing the properties of almost every homomorphism and group appeared in this commutative diagram to trace the elements.


First, we give a precise definition of \(d^{\ast}\). For a closed \(c \in C^q\), by the surjectivity of \(g\) (note this sequence is exact), there exists some \(b \in B^q\) such that \(g(b)=c\). But \(g(db)=d(g(b))=dc=0\), we see for \(db \in B^{q+1}\) we have \(db \in \ker g\). By the exactness of the sequence, we see \(db \in \operatorname{im}{f}\), that is, there exists some \(a \in A^{q+1}\) such that \(f(a)=db\). Further, \(a\) is closed since \[ f(da)=d(f(a))=d^2b=0 \] and we already know that \(f\) has trivial kernel (which contains \(da\)).

\(d^{\ast}\) is therefore defined by \[ d^*[c]=[a], \] where \([\cdot]\) means "the homology class of".

But it is expected that \(d^{\ast}\) is a well-defined homomorphism. Let \(c_q\) and \(c_q'\) be two closed forms in \(C^q\). To show \(d^{\ast}\) is well-defined, we suppose \([c_q]=[c_q']\) (i.e. they are homologous). Choose \(b_q\) and \(b_q'\) so that \(g(b_q)=c_q\) and \(g(b_q')=c_q'\). Accordingly, we also pick \(a_{q+1}\) and \(a_{q+1}'\) such that \(f(a_{q+1})=db_q\) and \(f(a_{q+1}')=db_q'\). By definition of \(d^{\ast}\), we need to show that \([a_{q+1}]=[a_{q+1}']\).

Recall the properties of factor group. \([c_q]=[c_q']\) if and only if \(c_q-c_q' \in \operatorname{im}d\). Therefore we can pick some \(c_{q-1} \in C^{q-1}\) such that \(c_q-c_q'=dc_{q-1}\). Again, by the surjectivity of \(g\), there is some \(b_{q-1}\) such that \(g(b_{q-1})=c_{q-1}\).

Note that \[ \begin{aligned} g(b_q-b_q'-db_{q-1})&=c_q-c_{q}'-g(db_{q-1}) \\ &=dc_{q-1}-d(g(b_{q-1})) \\ &=dc_{q-1}-dc_{q-1} \\ &= 0. \end{aligned} \] Therefore \(b_q-b_q'-db_{q-1} \in \operatorname{im} f\). We are able to pick some \(a_q \in A^{q}\) such that \(f(a_q)=b_q-b_q'-db_{q-1}\). But now we have \[ \begin{aligned} f(da_q)=df(a_q)&=d(b_q-b_q'-db_{q-1}) \\ &=db_q-db_q'-d^2b_{q-1} \\ &=db_q-db_q' \\ &=f(a_{q+1}-a_{q+1}'). \end{aligned} \] Since \(f\) is injective, we have \(da_q=a_{q+1}-a_{q+1}'\), which implies that \(a_{q+1}-a_{q+1}' \in \operatorname{im}d\). Hence \([a_{q+1}]=[a_{q+1}']\).

To show that \(d^{\ast}\) is a homomorphism, note that \(g(b_q+b_q')=c_q+c_q'\) and \(f(a_{q+1}+a_{q+1}')=d(b_q+b_q')\). Thus we have \[ d^*[c_q+c_q']=[a_{q+1}+a_{q+1}']. \] The latter equals \([a_{q+1}]+[a_{q+1}']\) since the canonical map is a homomorphism. Therefore we have \[ d^*[c_q+c_q']=d^*[c_q]+d^*[c_q']. \] Therefore the long sequence exists. It remains to prove exactness. Firstly we need to prove exactness at \(H^q(B)\). Pick \([b] \in H^q(B)\). If there is some \(a \in A^q\) such that \(f(a)=b\), then \(g(f(a))=0\). Therefore \(g^{\ast}[b]=g^{\ast}[f(a)]=[g(f(a))]=[0]\); hence \(\operatorname{im}f \subset \ker g\).

Conversely, suppose now \(g^{\ast}[b]=[0]\), we shall show that there exists some \([a] \in H^q(A)\) such that \(f^{\ast}[a]=[b]\). Note \(g^{\ast}[b]=\operatorname{im}d\) where \(d\) is the differential operator of \(C\) (why?). Therefore there exists some \(c_{q-1} \in C^{q-1}\) such that \(g(b)=dc_{q-1}\). Pick some \(b_{q-1}\) such that \(g(b_{q-1})=c_{q-1}\). Then we have \[ g(b-db_{q-1})=g(b)-d(g(b_{q-1}))=g(b)-dc_{q-1}=0. \]

Therefore \(f(a)=b-db_{q-1}\) for some \(a \in A^q\). Note \(a\) is closed since \[ f(da)=df(a)=d(b-db_{q-1})=db-d^2b_{q-1}=db=0 \] and \(f\) is injective. \(db=0\) since we have \[ g(db)=d(g(b))=d(dc_{q-1})=0. \] Furthermore, \[ f^*[a]=[f(a)]=[b-dc_{q-1}]=[b]-[0]=[b]. \] Therefore \(\ker g^{\ast} \subset \operatorname{im} f\) as desired.

Now we prove exactness at \(H^q(C)\). (Notation:) pick \([c_q] \in H^q(C)\), there exists some \(b_q\) such that \(g(b_q)=c_q\); choose \(a_{q+1}\) such that \(f(a_{q+1})=db_q\). Then \(d^{\ast}[c_q]=[a_{q+1}]\) by definition.

If \([c_q] \in \operatorname{im}g^{\ast}\), we see \([c_q]=[g(b_q)]=g^{\ast}[b_q]\). But \(b_q\) is closed since \([b_q] \in H^q(B)\), we see \(f(a_{q+1})=db_q=0\), therefore \(d^{\ast}[c_q]=[a_{q+1}]=[0]\) since \(f\) is injective. Therefore \(\operatorname{im}g^{\ast} \subset \ker d^{\ast}\).

Conversely, suppose \(d^{\ast}[c^q]=[0]\). By definition of \(H^{q+1}(A)\), there is some \(a_q \in A\) such that \(da_q = a_{q+1}\) (can you see why?). We claim that \(b_q-f(a_q)\) is closed and we have \([c_q]=g^{\ast}[b_q-f(a_q)]\).

By direct computation, \[ d(b_q-f(a_q))=db_q-d(f(a_q))=db_q-f(d(a_q))=db_q-f(a_{q+1})=0. \] Meanwhile \[ g^*[b_q-f(a_q)]=[g(b_q)]-[g(f(a_q))]=[c_q]. \] Therefore \(\ker d^{\ast} \subset \operatorname{im}g^{\ast}\). Note that \(g(f(a_q))=0\) by exactness.

Finally, we prove exactness at \(H^{q+1}(A)\). Pick \(\alpha \in H^{q+1}(A)\). If \(\alpha \in \operatorname{im}d^{\ast}\), then \(\alpha=[a_{q+1}]\) where \(f(a_{q+1})=db_q\) by definition. Then \[ f^*(\alpha)=[f(a_{q+1})]=[db_q]=[0]. \] Therefore \(\alpha \in \ker f^{\ast}\). Conversely, if we have \(f^{\ast}(\alpha)=[0]\), pick the representative element of \(\alpha\), namely we write \(\alpha=[a]\); then \([f(a)]=[0]\). But this implies that \(f(a) \in \operatorname{im}d\) where \(d\) denotes the differential operator of \(B\). There exists some \(b_{q+1} \in B^{q+1}\) and \(b_q \in B^q\) such that \(db_{q}=b_{q+1}\). Suppose now \(c_q=g(b_q)\). \(c_q\) is closed since \(dc_q=g(db_q)=g(b_{q+1})=g(f(a))=0\). By definition, \(\alpha=d^{\ast}[c_q]\). Therefore \(\ker f^{\ast} \subset \operatorname{im}d^{\ast}\).


As you may see, almost every property of the diagram has been used. The exactness at \(B^q\) ensures that \(g(f(a))=0\). The definition of \(H^q(A)\) ensures that we can simplify the meaning of \([0]\). We even use the injectivity of \(f\) and the surjectivity of \(g\).

This proof is also a demonstration of diagram-chasing technique. As you have seen, we keep running through the diagram to ensure that there is "someone waiting" at the destination.

This long exact group is useful. Here is an example.

Application: Mayer-Vietoris Sequence

By differential forms on a open set \(U \subset \mathbb{R}^n\), we mean \[ \Omega^*(U)=\{C^{\infty}\text{ functions on }U\}\otimes_\mathbb{R}\Omega^*. \] And the de Rham cohomology of \(U\) comes up in the nature of things.

We are able to compute the cohomology of the union of two open sets. Suppose \(M=U \cup V\) is a manifold with \(U\) and \(V\) open, and \(U \amalg V\) is the disjoint union of \(U\) and \(V\) (the coproduct in the category of sets). \(\partial_0\) and \(\partial_1\) are inclusions of \(U \cap V\) in \(U\) and \(V\) respectively. We have a natural sequence of inclusions \[ M \leftarrow U\amalg V \leftleftarrows^{\partial_0}_{\partial_1}\leftleftarrows U \cap V. \] Since \(\Omega^{*}\) can also be treated as a contravariant functor from the category of Euclidean spaces with smooth maps to the category of commutative differential graded algebras and their homomorphisms, we have \[ \Omega^*(M) \rightarrow \Omega^*(U) \oplus \Omega^*(V) \rightrightarrows^{\partial^*_0}_{\partial^*_1}\rightrightarrows\Omega^*({U \cap V}). \] By taking the difference of the last two maps, we have \[ \begin{aligned} 0 \rightarrow \Omega^*(M) \rightarrow \Omega^*(U) \oplus \Omega^*(V) &\rightarrow \Omega^*(U \cap V) \rightarrow 0 \\ (\omega,\tau) &\mapsto \tau-\omega \end{aligned} \] The sequence above is a short exact sequence. Therefore we may use the zig-zag lemma to find a long exact sequence (which is also called the Mayer-Vietoris sequence) by \[ \cdots\to H^q(M) \to H^q(U) \oplus H^q(V) \to H^q(U \cap V) \xrightarrow{d^*} H^{q+1}(M) \to \cdots \]

An example

This sequence allows one to compute the cohomology of two union of two open sets. For example, for \(H^{*}_{DR}(\mathbb{R}^2-P-Q)\), where \(P(x_p,y_p)\) and \(Q(x_q,y_q)\) are two distinct points in \(\mathbb{R}^2\), we may write \[ (\mathbb{R}^2-P)\cap(\mathbb{R}^2-Q)=\mathbb{R}^2-P-Q \] and \[ (\mathbb{R}^2-P)\cup(\mathbb{R}^2-Q)=\mathbb{R}^2. \] Therefore we may write \(M=\mathbb{R}^2\), \(U=\mathbb{R}^2-P\) and \(V=\mathbb{R}^2-Q\). For \(U\) and \(V\), we have another decomposition by \[ \mathbb{R}^2-P=(\mathbb{R}^2-P_x)\cup(\mathbb{R}^2-P_y) \] where \[ P_x=\{(x,y_p):x \in \mathbb{R}\}. \] But \[ (\mathbb{R}^2-P_x)\cap(\mathbb{R}^2-P_y) \] is a four-time (homeomorphic) copy of \(\mathbb{R}^2\). So things become clear after we compute \(H^{\ast}_{DR}(\mathbb{R}^2)\).

References / Further reading

  • Raoul Bott, Loring W. Tu, Differential Forms in Algebraic Topology
  • Munkres J. R., Elements of Algebraic Topology
  • Micheal Spivak, Calculus on Manifolds
  • Serge Lang, Algebra

The Fourier transform of sinx/x and (sinx/x)^2 and more

In this post

We are going to evaluate the Fourier transform of \(\frac{\sin{x}}{x}\) and \(\left(\frac{\sin{x}}{x}\right)^2\). And it turns out to be a comprehensive application of many elementary theorems in complex analysis. It is a good thing to make sure that you can compute and understand all the identities in this post by yourself in the end. Also, you are expected to be able to recall what all words in italics mean.

To be clear, by Fourier transform we actually mean

\[ \hat{f}(t) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}f(x)e^{-itx}dx. \]

This is a matter of convenience. Indeed, the coefficient \(\frac{1}{\sqrt{2\pi}}\) is superfluous, but without it, when computing the Fourier inverse, one has to write \(\frac{1}{2\pi}\). Instead of making it unbalanced, we write \(\frac{1}{\sqrt{2\pi}}\) all the time and pretend it is not here.

We say a function \(f\) is in \(L^1\) if \(\int_{-\infty}^{+\infty}|f(x)|dx<+\infty\). As classic exercises in elementary calculus, \(\frac{\sin{x}}{x} \not\in L^1\) but \(\left(\frac{\sin{x}}{x}\right)^2 \in L^1\).

Problem 1

For real \(t\), find the following limit:

\[ \lim_{A \to \infty}\int_{-A}^{A}\frac{\sin{x}}{x}e^{itx}dx. \]

Since \(\frac{\sin{x}}{x}e^{itx}\not\in L^1\), we cannot evaluate the integral of it over \(\mathbb{R}\) directly since it's not defined in the sense of Lebesgue integral (the reader can safely ignore this if he or she has no background in it at this moment, but do keep in mind that being in \(L^1\) is a big matter). Instead, for given \(A>0\), the integral of it over \([-A,A]\) is defined, and we evaluate this limit to get what we want.

We will do this using contour integration. Since the complex function \(f(z)=\frac{\sin{z}}{z}e^{itz}\) is entire, by Cauchy's theorem, its integral over \([-A,A]\) is equal to the one over the path \(\Gamma_A\) by going from \(-A\) to \(-1\) along the real axis, from \(-1\) to \(1\) along the lower half of the unit circle, and from \(1\) to \(A\) along the real axis (why?). Since the path \(\Gamma_A\) avoids the origin, we are safe to use the identity

\[ 2i\sin{z}=e^{iz}-e^{-iz}. \]

Replacing \(\sin{z}\) with \(\frac{1}{2i}(e^{itz}-e^{-itz})\), we get

\[ I_A(t)=\int_{\Gamma_A}f(z)dz=\int_{\Gamma_A}\frac{1}{2iz}(e^{i(t+1)z}-e^{i(t-1)z})dz. \]

If we put \(\varphi_A(t)=\int_{\Gamma_A}\frac{1}{2iz}e^{itz}dz\), we see \(I_A(t)=\varphi_A(t+1)-\varphi_A(t-1)\). It is convenient to divide \(\varphi_A\) by \(\pi\) since we therefore get

\[ \frac{1}{\pi}\varphi_A(t)=\frac{1}{2\pi i}\int_{\Gamma_A}\frac{e^{itz}}{z}dz \]

and we are cool with the divisor \(2\pi i\).

Now, close the path \(\Gamma_A\) in two ways. First, by the semicircle from \(A\) to \(-Ai\) to \(-A\); second, by the semicircle from \(A\) to \(Ai\) to \(-A\), which finishes a circle with radius \(A\). For simplicity we denote the two paths by \(\Gamma_U\) and \(\Gamma_L\). Again by the Cauchy theorem, the first case gives us an integral with value \(0\), thus by Cauchy's theorem,

\[ \frac{1}{\pi}\varphi_A(t)=\frac{1}{2\pi i}\int_{-\pi}^{0}\frac{\exp{(itAe^{i\theta})}}{Ae^{i\theta}}dAe^{i\theta}=\frac{1}{2\pi}\int_{-\pi}^{0}\exp{(itAe^{i\theta})}d\theta. \]

Notice that

\[ \begin{aligned} |\exp(itAe^{i\theta})|&=|\exp(itA(\cos\theta+i\sin\theta))| \\ &=|\exp(itA\cos\theta)|\cdot|\exp(-At\sin\theta)| \\ &=\exp(-At\sin\theta) \end{aligned} \]

hence if \(t\sin\theta>0\), we have \(|\exp(iAte^{i\theta})| \to 0\) as \(A \to \infty\). When \(-\pi < \theta <0\) however, we have \(\sin\theta<0\). Therefore we get

\[ \frac{1}{\pi}\varphi_{A}(t)=\frac{1}{2\pi}\int_{-\pi}^{0}\exp(itAe^{i\theta})d\theta \to 0\quad (A \to \infty,t<0). \]

(You should be able to prove the convergence above.) Also trivially

\[ \varphi_A(0)=\frac{1}{2}\int_{-\pi}^{0}1d\theta=\frac{\pi}{2}. \]

But what if \(t>0\)? Indeed, it would be difficult to obtain the limit using the integral over \([-\pi,0]\). But we have another path, namely the upper one.

Note that \(\frac{e^{itz}}{z}\) is a meromorphic function in \(\mathbb{C}\) with a pole at \(0\). For such a function we have

\[ \frac{e^{itz}}{z}=\frac{1}{z}\left(1+itz+\frac{(itz)^2}{2!}+\cdots\right)=\frac{1}{z}+it+\frac{(it)^2z}{2!}+\cdots. \]

which implies that the residue at \(0\) is \(1\). By the residue theorem,

\[ \begin{aligned} \frac{1}{2\pi{i}}\int_{\Gamma_L}\frac{e^{itz}}{z}dz&=\frac{1}{2\pi{i}}\int_{\Gamma_A}\frac{e^{itz}}{z}dz+\frac{1}{2\pi}\int_{0}^{\pi}\exp(itAe^{i\theta})d\theta \\ &=1\cdot\operatorname{Ind}_{\Gamma_L}(0)=1. \end{aligned} \]

Note that we have used the change-of-variable formula as we did for the upper one. \(\operatorname{Ind}_{\Gamma_L}(0)\) denotes the winding number of \(\Gamma_L\) around \(0\), which is \(1\) of course. The identity above implies

\[ \frac{1}{\pi}\varphi_A(t)=1-\frac{1}{2\pi}\int_{0}^{\pi}\exp{(itAe^{i\theta})}d\theta. \]

Thus if \(t>0\), since \(\sin\theta>0\) when \(0<\theta<\pi\), we get

\[ \frac{1}{\pi}\varphi_A(t)\to 1 \quad(A \to \infty,t>0). \]

But as is already shown, \(I_A(t)=\varphi_A(t+1)-\varphi_A(t-1)\). To conclude,

\[ \lim_{A\to\infty}I_A(t)= \begin{cases} \pi\quad &|t|<1, \\ 0 \quad &|t|>1 ,\\ \frac{1}{2\pi} \quad &|t|=1. \end{cases} \]

What we can learn from this integral

Since \(\psi(x)=\left(\frac{\sin{x}}{x}\right)\) is even, dividing \(I_A\) by \(\sqrt{\frac{1}{2\pi}}\), we actually obtain the Fourier transform of it by abuse of language. Therefore we also get

\[ \hat\psi(t)= \begin{cases} \sqrt{\frac{\pi}{2}}\quad & |t|<1, \\ 0 \quad & |t|>1, \\ \frac{1}{2\pi\sqrt{2\pi}} & |t|=1. \end{cases} \]

Note that \(\hat\psi(t)\) is not continuous, let alone being uniformly continuous. Therefore, \(\psi(x) \notin L^1\). The reason is, if \(f \in L^1\), then \(\hat{f}\) is uniformly continuous (proof). Another interesting fact is, this also implies the value of the Dirichlet integral since we have

\[ \begin{aligned} \int_{-\infty}^{\infty}\left(\frac{\sin{x}}{x}\right)dx&=\int_{-\infty}^{\infty}\left(\frac{\sin{x}}{x}\right)e^{0\cdot ix}dx \\ &=\sqrt{2\pi}\hat\psi(0) \\ &=\pi. \end{aligned} \]

We end this section by evaluating the inverse of \(\hat\psi(t)\). This requires a simple calculation.

\[ \begin{aligned} \sqrt{\frac{1}{2\pi}}\int_{-\infty}^{\infty}\hat\psi(t)e^{itx}dt &= \sqrt{\frac{1}{2\pi}}\int_{-1}^{1}\sqrt{\frac{\pi}{2}}e^{itx}dt \\ &=\frac{1}{2}\cdot\frac{1}{ix}(e^{ix}-e^{-ix}) \\ &=\frac{\sin{x}}{x}. \end{aligned} \]

Problem 2

For real \(t\), compute

\[ J=\int_{-\infty}^{\infty}\left(\frac{\sin{x}}{x}\right)^2e^{itx}dx. \]

Now since \(h(x)=\frac{\sin^2{x}}{x^2} \in L^1\), we are able to say with ease that the integral above is the Fourier transform of \(h(x)\) (multiplied by \(\sqrt{2\pi}\)). But still we will be using the limit form

\[ J(t)=\lim_{A \to \infty}J_A(t) \]


\[ J_A(t)=\int_{-A}^{A}\left(\frac{\sin{x}}{x}\right)^2e^{itx}dx. \]

And we are still using the contour integration as above (keep \(\Gamma_A\), \(\Gamma_U\) and \(\Gamma_L\) in mind!). For this we get

\[ \left(\frac{\sin z}{z}\right)^2e^{itz}=\frac{e^{i(t+2)z}+e^{i(t-2)z}-2e^{itz}}{-4z^2}. \]

Therefore it suffices to discuss the function

\[ \mu_A(z)=\int_{\Gamma_A}\frac{e^{itz}}{2z^2}dz \]

since we have

\[ J_A(t)=\mu_A(t)-\frac{1}{2}(\mu_A(t+2)-\mu_A(t-2)). \]

Dividing \(\mu_A(z)\) by \(\frac{1}{\pi i}\), we see

\[ \frac{1}{\pi i}\mu_A(t)=\frac{1}{2\pi i}\int_{\Gamma_A}\frac{e^{itz}}{z^2}dz. \]

An integration of \(\frac{e^{itz}}{z^2}\) over \(\Gamma_L\) gives

\[ \begin{aligned} \frac{1}{\pi i}\mu_A(z)&=\frac{1}{2\pi i}\int_{-\pi}^{0}\frac{\exp(itAe^{i\theta})}{A^2e^{2i\theta}}dAe^{i\theta} \\ &=\frac{1}{2\pi}\int_{-\pi}^{0}\frac{\exp(itAe^{i\theta})}{Ae^{i\theta}}d\theta. \end{aligned} \]

Since we still have

\[ \left|\frac{\exp(itAe^{i\theta})}{Ae^{i\theta}}\right|=\frac{1}{A}\exp(-At\sin\theta), \]

if \(t<0\) in this case, \(\frac{1}{\pi i}\mu_A(z) \to 0\) as \(A \to \infty\). For \(t>0\), integrating along \(\Gamma_U\), we have

\[ \frac{1}{\pi i}\mu_A(t)=it-\frac{1}{2\pi}\int_{0}^{\pi}\frac{\exp(itAe^{i\theta})}{Ae^{i\theta}}d\theta \to it \quad (A \to \infty) \]

We can also evaluate \(\mu_A(0)\) by computing the integral but we are not doing that. To conclude,

\[ \lim_{A \to\infty}\mu_A(t)=\begin{cases} 0 \quad &t>0, \\ -\pi t \quad &t<0. \end{cases} \]

Therefore for \(J_A\) we have

\[ J(t)=\lim_{A \to\infty}J_A(t)=\begin{cases} 0 \quad &|t| \geq 2, \\ \pi(1+\frac{t}{2}) \quad &-2<t \leq 0, \\ \pi(1-\frac{t}{2}) \quad & 0<t <2. \end{cases} \]

Now you may ask, how did you find the value at \(0\), \(2\) or \(-2\)? \(\mu_A(0)\) is not evaluated. But \(h(t) \in L^1\), \(\hat{h}(t)=\sqrt{\frac{1}{2\pi}}J(t)\) is uniformly continuous, thus continuous, and the values at these points follows from continuity.

What we can learn from this integral

Again, we get the value of a classic improper integral by

\[ \int_{-\infty}^{\infty}\left(\frac{\sin{x}}{x}\right)^2dx = J(0)=\pi. \]

And this time it's not hard to find the Fourier inverse:

\[ \begin{aligned} \sqrt{\frac{1}{2\pi}}\int_{-\infty}^{\infty}\hat{h}(t)e^{itx}dt&=\frac{1}{2\pi}\int_{-\infty}^{\infty}J(t)e^{itx}dt \\ &=\frac{1}{2\pi}\int_{-2}^{2}\pi(1-\frac{1}{2}|t|)e^{itx}dt \\ &=\frac{e^{2ix}+e^{-2ix}-2}{-4x^2} \\ &=\frac{(e^{ix}-e^{-ix})^2}{-4x^2} \\ &=\left(\frac{\sin{x}}{x}\right)^2. \end{aligned} \]

The Riesz-Markov-Kakutani Representation Theorem

This post

Is intended to establish the existence of the Lebesgue measure in the future, which is often denoted by \(m\). In fact, the Lebesgue measure follows as a special case of R-M-K representation theorem. You may not believe it, but euclidean properties of \(\mathbb{R}^k\) plays no role in the existence of \(m\). The only topological property that works is the fact that \(\mathbb{R}^k\) is a locally compact Hausdorff space.

The theorem is named after F. Riesz who introduced it for continuous functions on \([0,1]\) (with respect to Riemann-Steiltjes integral). Years later, after the generalization done by A. Markov and S. Kakutani, we are able to view it on a locally compact Hausdorff space.

You may find there are some over-generalized properties, but this is intended to have you being able to enjoy more alongside (there are some tools related to differential geometry). Also there are many topology and analysis tricks worth your attention.


Different kinds of topological spaces

Again, euclidean topology plays no role in this proof. We need to specify the topology for different reasons. This is similar to what we do in linear functional analysis. Throughout, let \(X\) be a topological space.

0.0 Definition. \(X\) is a Hausdorff space if the following is true: If \(p \in X\), \(q\in X\) but \(p \neq q\), then there are two disjoint open sets \(U\) and \(V\) such that \(p \in U\) and \(q \in V\).

0.1 Definition. \(X\) is locally compact if every point of \(X\) has a neighborhood whose closure is compact.

0.2 Remarks. A Hausdorff space is also called a \(T_2\) space (see Kolmogorov classification) or a separated space. There is a classic example of locally compact Hausdorff space: \(\mathbb{R}^n\). It is trivial to verify this. But this is far from being enough. In the future we will see, we can construct some ridiculous but mathematically valid measures.

0.3 Definition. A set \(E \subset X\) is called \(\sigma\)-compact if \(E\) is a countable union of compact sets. Note that every open subset in a euclidean space \(\mathbb{R}^n\) is \(\sigma\)-compact since it can always be a countable union of closed balls (which is compact).

0.4 Definition. A covering of \(X\) is locally finite if every point has a neighborhood which intersects only finitely many elements of the covering. Of course, if the covering is already finite, it's also locally finite.

0.5 Definition. A refinement of a covering of \(X\) is a second covering, each element of which is contained in an element of the first covering.

0.6 Definition. \(X\) is paracompact if it is Hausdorff, and every open covering has a locally finite open refinement. Obviously any compact space is paracompact.

0.7 Theorem. If \(X\) is a second countable Hausdorff space and is locally compact, then \(X\) is paracompact. For proof, see this [Theorem 2.6]. One uses this to prove that a differentiable manifold admits a partition of unity.

0.8 Theorem. If \(X\) is locally compact and sigma compact, then \(X=\bigcup_{i=1}^{\infty}K_i\) where for all \(i \in \mathbb{N}\), \(K_i\) is compact and \(K_i \subset\operatorname{int}K_{i+1}\).

Partition of unity

The basic technical tool in the theory of differential manifolds is the existence of a partition of unity. We will steal this tool for the application of analysis theory.

1.0 Definition. A partition of unity on \(X\) is a collection \((g_i)\) of continuous real valued functions on \(X\) such that

  1. \(g_i \geq 0\) for each \(i\).
  2. every \(x \in X\) has a neighborhood \(U\) such that \(U \cap \operatorname{supp}(g_i)=\varnothing\) for all but finitely many of \(g_i\).
  3. for each \(x \in X\), we have \(\sum_{i}g_i(x)=1\). (That's why you see the word 'unity'.)

One should be reminded that, partition of unity is frequently used in many other fields. For example, in differential geometry, one uses it to find Riemannian structure on a smooth manifold. In generalised function theory, one uses it to find the connection between local property and global property as well.

1.1 Definition. A partition of unity \((g_i)\) on \(X\) is subordinate to an open cover of \(X\) if and only if for each \(g_i\) there is an element \(U\) of the cover such that \(\operatorname{supp}(g_i) \subset U\). We say \(X\) admits partitions of unity if and only if for every open cover of \(X\), there exists a partition of unity subordinate to the cover.

1.2 Theorem. A Hausdorff space admits a partition of unity if and only if it is paracompact (the 'only if' part is by considering the definition of partition of unity. For the 'if' part, see here). As a corollary, we have:

1.3 Corollary. Suppose \(V_1,\cdots,V_n\) are open subsets of a locally compact Hausdorff space \(X\), \(K\) is compact, and \[ K \subset \bigcup_{k=1}^{n}V_k. \] Then there exists a partition of unity \((h_i)\) that is subordinate to the cover \((V_n)\) such that \(\operatorname{supp}(h_i) \subset V_i\) and \(\sum_{i=1}^{n}h_i=1\) for all \(x \in K\).

Urysohn's lemma (for locally compact Hausdorff spaces)

2.0 Notation. The notation \[ K \prec f \] will mean that \(K\) is a compact subset of \(X\), that \(f \in C_c(X)\), that \(f(X) \subset [0,1]\), and that \(f(x)=1\) for all \(x \in K\). The notation \[ f \prec V \] will mean that \(V\) is open, that \(f \in C_c(X)\), that \(f(X) \subset [0,1]\) and that \(\operatorname{supp}(f) \subset V\). If both hold, we write \[ K \prec f \prec V. \] 2.1 Remarks. Clearly, with this notation, we are able to simplify the statement of being subordinate. We merely need to write \(g_i \prec U\) in 1.1 instead of \(\operatorname{supp}(g_i) \subset U\).

2.2 Urysohn's Lemma for locally compact Hausdorff space. Suppose \(X\) is locally compact and Hausdorff, \(V\) is open in \(X\) and \(K \subset V\) is a compact set. Then there exists an \(f \in C_c(X)\) such that \[ K \prec f \prec V. \] 2.3 Remarks. By \(f \in C_c(X)\) we shall mean \(f\) is a continuous function with a compact support. This relation also says that \(\chi_K \leq f \leq \chi_V\). For more details and the proof, visit this page. This lemma is generally for normal space, for a proof on that level, see arXiv:1910.10381. (Question: why we consider two disjoint closed subsets thereafter?)

The \(\varepsilon\)-definitions of \(\sup\) and \(\inf\)

We will be using the \(\varepsilon\)-definitions of \(\sup\) and \(\inf\), which will makes the proof easier in this case, but if you don't know it would be troublesome. So we need to put it down here.

Let \(S\) be a nonempty subset of the real numbers that is bounded below. The lower bound \(w\) is to be the infimum of \(S\) if and only if for any \(\varepsilon>0\), there exists an element \(x_\varepsilon \in S\) such that \(x_\varepsilon<w+\varepsilon\).

This definition of \(\inf\) is equivalent to the if-then definition by

Let \(S\) be a set that is bounded below. We say \(w=\inf S\) when \(w\) satisfies the following condition.

  1. \(w\) is a lower bound of \(S\).
  2. If \(t\) is also a lower bound of \(S\), then \(t \leq s\).

We have the analogous definition for \(\sup\).

The main theorem

Analysis is full of vector spaces and linear transformations. We already know that the Lebesgue integral induces a linear functional. That is, for example, \(L^1([0,1])\) is a vector space, and we have a linear functional by \[ f \mapsto \int_0^1 f(x)dx. \] But what about the reverse? Given a linear functional, is it guaranteed that we have a measure to establish the integral? The R-M-K theorem answers this question affirmatively. The functional to be discussed is positive, which means that if \(\Lambda\) is positive and \(f(X) \subset [0,\infty)\), then \(\Lambda{f} \in [0,\infty)\).

Let \(X\) be a locally compact Hausdorff space, and let \(\Lambda\) be a positive linear functional on \(C_c(X)\). Then there exists a \(\sigma\)-algebra \(\mathfrak{M}\) on \(X\) which contains all Borel sets in \(X\), and there exists a unique positive measure \(\mu\) on \(\mathfrak{M}\) which represents \(\Lambda\) in the sense that \[ \Lambda{f}=\int_X fd\mu \] for all \(f \in C_c(X)\).

For the measure \(\mu\) and the \(\sigma\)-algebra \(\mathfrak{M}\), we have four assertions:

  1. \(\mu(K)<\infty\) for every compact set \(K \subset X\).
  2. For every \(E \in \mathfrak{M}\), we have

\[ \mu(E)=\{\mu(V):E \subset V, V\text{ open}\}. \]

  1. For every open set \(E\) and every \(E \in \mathfrak{M}\), we have

\[ \mu(E)=\sup\{\mu(K):K \subset E, K\text{ compact}\}. \]

  1. If \(E \in \mathfrak{M}\), \(A \subset E\), and \(\mu(E)=0\), then \(A \in \mathfrak{M}\).

Remarks before proof. It would be great if we can establish the Lebesgue measure \(m\) by putting \(X=\mathbb{R}^n\). But we need a little more extra work to get this result naturally. If 2 is satisfied, we say \(\mu\) is outer regular, and inner regular for 3. If both hold, we say \(\mu\) is regular. The partition of unity and Urysohn's lemma will be heavily used in the proof of the main theorem, so make sure you have no problem with it. It can also be extended to complex space, but that requires much non-trivial work.

Proving the theorem

The proof is rather long so we will split it into several steps. I will try my best to make every line clear enough.

Step 0 - Construction of \(\mu\) and \(\mathfrak{M}\)

For every open set \(V \in X\), define \[ \mu(V)=\sup\{\Lambda{f}:f \prec V\}. \]

If \(V_1 \subset V_2\) and both are open, we claim that \(\mu(V_1) \leq \mu(V_2)\). For \(f \prec V_1\), since \(\operatorname{supp}f \subset V_1 \subset V_2\), we see \(f \prec V_2\). But we are able to find some \(g \prec V_2\) such that \(g \geq f\), or more precisely, \(\operatorname{supp}(g) \supset \operatorname{supp}(f)\). By taking another look at the proof of Urysohn's lemma for locally compact Hausdorff space, we see there is an open set G with compact closure such that \[ \operatorname{supp}(f) \subset G \subset \overline{G} \subset V_2. \] By Urysohn's lemma to the pair \((\overline{G},V_2)\), we see there exists a function \(g \in C_c(X)\) such that \[ \overline{G} \prec g \prec V_2. \] Therefore \[ \operatorname{supp}(f) \subset \overline{G} \subset \operatorname{supp}(g). \] Thus for any \(f \prec V_1\) and \(g \prec V_2\), we have \(\Lambda{g} \geq \Lambda{f}\) (monotonic) since \(\Lambda{g}-\Lambda{f}=\Lambda{(g-f)}\geq 0\). By taking the supremum over \(f\) and \(g\), we see \[ \mu(V_1) \leq \mu(V_2). \] The 'monotonic' property of such \(\mu\) enables us to define \(\mu(E)\) for all \(E \subset X\) by \[ \mu(E)=\inf \{\mu(V):E \subset V, V\text{ open}\}. \] The definition above is trivial to valid for open sets. Sometimes people say \(\mu\) is the outer measure. We will discuss other kind of sets thoroughly in the following steps. Warning: we are not saying that \(\mathfrak{M} = 2^X\). The crucial property of \(\mu\), namely countable additivity, will be proved only on a certain \(\sigma\)-algebra.

It follows from the definition of \(\mu\) that if \(E_1 \subset E_2\), then \(\mu(E_1) \leq \mu(E_2)\).

Let \(\mathfrak{M}_F\) be the class of all \(E \subset X\) which satisfy the two following conditions:

  1. \(\mu(E) <\infty\).

  2. 'Inner regular': \[ \mu(E)=\sup\{\mu(K):K \subset E, K\text{ compact}\}. \]

One may say here \(\mu\) is the 'inner measure'. Finally, let \(\mathfrak{M}\) be the class of all \(E \subset X\) such that for every compact \(K\), we have \(E \cap K \in \mathfrak{M}_F\). We shall show that \(\mathfrak{M}\) is the desired \(\sigma\)-algebra.

Remarks of Step 0. So far, we have only proved that \(\mu(E) \geq 0\) for all \(E {\color\red{\subset}}X\). What about the countable additivity? It's clear that \(\mathfrak{M}_F\) and \(\mathfrak{M}\) has some strong relation. We need to get a clearer view of it. Also, if we restrict \(\mu\) to \(\mathfrak{M}_F\), we restrict ourself to finite numbers. In fact, we will show finally \(\mathfrak{M}_F \subset \mathfrak{M}\).

Step 1 - The 'measure' of compact sets (outer)

If \(K\) is compact, then \(K \in \mathfrak{M}_F\), and \[ \mu(K)=\inf\{\Lambda{f}:K \prec f\}<\infty \]

Define \(V_\alpha=f^{-1}(\alpha,1]\) for \(K \prec f\) and \(0 < \alpha < 1\). Since \(f(x)=1\) for all \(x \in K\), we have \(K \subset V_{\alpha}\). Therefore by definition of \(\mu\) for all \(E \subset X\), we have \[ \mu(K) \leq \mu(V_\alpha)=\sup\{\Lambda{g}:g \prec V_{\alpha}\} < \frac{1}{\alpha}\Lambda{f}. \] Note that \(f \geq \alpha{g}\) whenever \(g \prec V_{\alpha}\) since \(\alpha{g} \leq \alpha < f\). Since \(\mu(K)\) is an lower bound of \(\frac{1}{\alpha}\Lambda{f}\) with \(0<\alpha<1\), we see \[ \mu(K) \leq \inf_{\alpha \in (0,1)}\{\frac{1}{\alpha}\Lambda{f}\}=\Lambda{f}. \] Since \(f(X) \in [0,1]\), we have \(\Lambda{f}\) to be finite. Namely \(\mu(K) <\infty\). Since \(K\) itself is compact, we see \(K \in \mathfrak{M}_F\).

To prove the identity, note that there exists some \(V \supset K\) such that \(\mu(V)<\mu(K)+\varepsilon\) for some \(\varepsilon>0\). By Urysohn's lemma, there exists some \(h \in C_c(X)\) such that \(K \prec h \prec V\). Therefore \[ \Lambda{h} \leq \mu(V) < \mu(K)+\varepsilon \] Therefore \(\mu(K)\) is the infimum of \(\Lambda{h}\) with \(K \prec h\).

Remarks of Step 1. We have just proved assertion 1 of the property of \(\mu\). The hardest part of this proof is the inequality \[ \mu(V)<\mu(K)+\varepsilon. \] But this is merely the \(\varepsilon\)-definition of \(\inf\). Note that \(\mu(K)\) is the infimum of \(\mu(V)\) with \(V \supset K\). For any \(\varepsilon>0\), there exists some open \(V\) for what? Under certain conditions, this definition is much easier to use. Now we will examine the relation between \(\mathfrak{M}_F\) and \(\tau_X\), namely the topology of \(X\).

Step 2 - The 'measure' of open sets (inner)

\(\mathfrak{M}_F\) contains every open set \(V\) with \(\mu(V)<\infty\).

It suffices to show that for open set \(V\), we have \[ \mu(V)=\sup\{\mu(K):K \subset E, K\text{ compact}\}. \] For \(0<\varepsilon<\mu(V)\), we see there exists an \(f \prec V\) such that \(\Lambda{f}>\mu(V)-\varepsilon\). If \(W\) is any open set which contains \(K= \operatorname{supp}(f)\), then \(f \prec W\), and therefore \(\Lambda{f} \leq \mu(W)\). Again by definition of \(\mu(K)\), we see \[ \Lambda{f}\leq\mu(K). \] Therefore \[ \mu(V)-\varepsilon<\Lambda{f}\leq\mu(K)\leq\mu(V). \] This is exactly the definition of \(\sup\). The identity is proved.

Remarks of Step 2. It's important to that this identity can only be satisfied by open sets and sets \(E\) with \(\mu(E)<\infty\), the latter of which will be proved in the following steps. This is the flaw of this theorem. With these preparations however, we are able to show the countable additivity of \(\mu\) on \(\mathfrak{M}_F\).

Step 3 - The subadditivity of \(\mu\) on \(2^X\)

If \(E_1,E_2,E_3,\cdots\) are arbitrary subsets of \(X\), then \[ \mu\left(\bigcup_{k=1}^{\infty}E_k\right) \leq \sum_{k=1}^{\infty}\mu(E_k) \]

First we show this holds for finitely many open sets. This is tantamount to show that \[ \mu(V_1 \cup V_2)\leq \mu(V_1)+\mu(V_2) \] if \(V_1\) and \(V_2\) are open. Pick \(g \prec V_1 \cup V_2\). This is possible due to Urysohn's lemma. By corollary 1.3, there is a partition of unity \((h_1,h_2)\) subordinate to \((V_1,V_2)\) in the sense of corollary 1.3. Therefore, \[ \begin{aligned} \Lambda(g)&=\Lambda((h_1+h_2)g) \\ &=\Lambda(h_1g)+\Lambda(h_2g) \\ &\leq\mu(V_1)+\mu(V_2). \end{aligned} \] Notice that \(h_1g \prec V_1\) and \(h_2g \prec V_2\). By taking the supremum, we have \[ \mu(V_1 \cup V_2)\leq \mu(V_1)+\mu(V_2). \]

Now we back to arbitrary subsets of \(X\). If \(\mu(E_i)=\infty\) for some \(i\), then there is nothing to prove. Therefore we shall assume that \(\mu(E_i)<\infty\) for all \(i\). By definition of \(\mu(E_i)\), we see there are open sets \(V_i \supset E_i\) such that \[ \mu(V_i)<\mu(E_i)+\frac{\varepsilon}{2^i}. \] Put \(V=\bigcup_{i=1}^{\infty}V_i\), and choose \(f \prec V_i\). Since \(f \in C_c(X)\), there is a finite collection of \(V_i\) that covers the support of \(f\). Therefore without loss of generality, we may say that \[ f \prec V_1 \cup V_2 \cup \cdots \cup V_n \] for some \(n\). We therefore obtain \[ \begin{aligned} \Lambda{f} &\leq \mu(V_1 \cup V_2 \cup \cdots \cup V_n) \\ &\leq \mu(V_1)+\mu(V_2)+\cdots+\mu(V_n) \\ &\leq \sum_{i=1}^{n}\left(\mu(E_i)+\frac{\varepsilon}{2^i}\right) \\ &\leq \sum_{i=1}^{\infty}\mu(E_i)+\varepsilon, \end{aligned} \] for all \(f \prec V\). Since \(\bigcup E_i \subset V\), we have \(\mu(\bigcup E_i) \leq \mu(V)\). Therefore \[ \mu(\bigcup_{i=1}^{\infty}E_i)\leq\mu(V)=\sup\{\Lambda{f}\}\leq\sum_{i=1}^{\infty}\mu(E_i)+\varepsilon. \] Since \(\varepsilon\) is arbitrary, the inequality is proved.

Remarks of Step 3. Again, we are using the \(\varepsilon\)-definition of \(\inf\). One may say this step showed the subaddtivity of the outer measure. Also note the geometric series by \(\sum_{k=1}^{\infty}\frac{\varepsilon}{2^k}=\varepsilon\).

Step 4 - Additivity of \(\mu\) on \(\mathfrak{M}_F\)

Suppose \(E=\bigcup_{i=1}^{\infty}E_i\), where \(E_1,E_2,\cdots\) are pairwise disjoint members of \(\mathfrak{M}_F\), then \[ \mu(E)=\sum_{i=1}^{\infty}\mu(E_i). \] If \(\mu(E)<\infty\), we also have \(E \in \mathfrak{M}_F\).

As a dual to Step 3, we firstly show this holds for finitely many compact sets. As proved in Step 1, compact sets are in \(\mathfrak{M}_F\). Suppose now \(K_1\) and \(K_2\) are disjoint compact sets. We want to show that \[ \mu(K_1 \cup K_2)=\mu(K_1)+\mu(K_2). \] Note that compact sets in a Hausdorff space is closed. Therefore we are able to apply Urysohn's lemma to the pair \((K_1,K_2^c)\). That said, there exists a \(f \in C_c(X)\) such that \[ K_1 \prec f \prec K_2^c. \] In other words, \(f(x)=1\) for all \(x \in K_1\) and \(f(x)=0\) for all \(x \in K_2\), since \(\operatorname{supp}(f) \cap K_2 = \varnothing\). By Step 1, since \(K_1 \cup K_2\) is compact, there exists some \(g \in C_c(X)\) such that \[ K_1 \cup K_2 \prec g \quad \text{and} \quad \Lambda(g) < \mu(K_1 \cup K_2)+\varepsilon. \] Now things become tricky. We are able to write \(g\) by \[ g=fg+(1-f)g. \] But \(K_1 \prec fg\) and \(K_2 \prec (1-f)g\) by the properties of \(f\) and \(g\). Also since \(\Lambda\) is linear, we have \[ \mu(K_1)+\mu(K_2) \leq \Lambda(fg)+\Lambda((1-f)g)=\Lambda(g) < \mu(K_1 \cup K_2)+\varepsilon. \] Therefore we have \[ \mu(K_1)+\mu(K_2) \leq \mu(K_1 \cup K_2). \] On the other hand, by Step 3, we have \[ \mu(K_1 \cup K_2) \leq \mu(K_1)+\mu(K_2). \] Therefore they must equal.

If \(\mu(E)=\infty\), there is nothing to prove. So now we should assume that \(\mu(E)<\infty\). Since \(E_i \in \mathfrak{M}_F\), there are compact sets \(K_i \subset E_i\) with \[ \mu(K_i) > \mu(E_i)-\frac{\varepsilon}{2^i}. \] Putting \(H_n=K_1 \cup K_2 \cup \cdots \cup K_n\), we see \(E \supset H_n\) and \[ \mu(E) \geq \mu(H_n)=\sum_{i=1}^{n}\mu(H_i)>\sum_{i=1}^{n}\mu(E_i)-\varepsilon. \] This inequality holds for all \(n\) and \(\varepsilon\), therefore \[ \mu(E) \geq \sum_{i=1}^{\infty}\mu(E_i). \] Therefore by Step 3, the identity holds.

Finally we shall show that \(E \in \mathfrak{M}_F\) if \(\mu(E) <\infty\). To make it more understandable, we will use elementary calculus notation. If we write \(\mu(E)=x\) and \(x_n=\sum_{i=1}^{n}\mu(E_i)\), we see \[ \lim_{n \to \infty}x_n=x. \] Therefore, for any \(\varepsilon>0\), there exists some \(N \in \mathbb{N}\) such that \[ x-x_N<\varepsilon. \] This is tantamount to \[ \mu(E)<\sum_{i=1}^{N}\mu(E_i)+\varepsilon. \] But by definition of the compact set \(H_N\) above, we see \[ \mu(E)<{\color\red{\sum_{i=1}^{N}\mu(E_i)}}+\varepsilon<{\color\red {\mu(H_N)+\varepsilon}}+\varepsilon=\mu(H_N)+2\varepsilon. \] Hence \(E\) satisfies the requirements of \(\mathfrak{M}_F\), thus an element of it.

Remarks of Step 4. You should realize that we are heavily using the \(\varepsilon\)-definition of \(\sup\) and \(\inf\). As you may guess, \(\mathfrak{M}_F\) should be a subset of \(\mathfrak{M}\) though we don't know whether it is a \(\sigma\)-algebra or not. In other words, we hope that the countable additivity of \(\mu\) holds on a \(\sigma\)-algebra that is properly extended from \(\mathfrak{M}_F\). However it's still difficult to show that \(\mathfrak{M}\) is a \(\sigma\)-algebra. We need more properties of \(\mathfrak{M}_F\) to go on.

Step 5 - The 'continuity' of \(\mathfrak{M}_F\).

If \(E \in \mathfrak{M}_F\) and \(\varepsilon>0\), there is a compact \(K\) and an open \(V\) such that \(K \subset E \subset V\) and \(\mu(V-K)<\varepsilon\).

There are two ways to write \(\mu(E)\), namely \[ \mu(E)=\sup\{\mu(K):K \subset E\} \quad \text{and} \quad \mu(E)=\inf\{\mu(V):V\supset E\} \] where \(K\) is compact and \(V\) is open. Therefore there exists some \(K\) and \(V\) such that \[ \mu(V)-\frac{\varepsilon}{2}<\mu(E)<\mu(K)+\frac{\varepsilon}{2}. \] Since \(V-K\) is open, and \(\mu(V-K)<\infty\), we have \(V-K \in \mathfrak{M}_F\). By Step 4, we have \[ \mu(K)+\mu(V-K)=\mu(V) <\mu(K)+\varepsilon. \] Therefore \(\mu(V-K)<\varepsilon\) as proved.

Remarks of Step 5. You should be familiar with the \(\varepsilon\)-definitions of \(\sup\) and \(\inf\) now. Since \(V-K =V\cap K^c \subset V\), we have \(\mu(V-K)\leq\mu(V)<\mu(E)+\frac{\varepsilon}{2}<\infty\).

Step 6 - \(\mathfrak{M}_F\) is closed under certain operations

If \(A,B \in \mathfrak{M}_F\), then \(A-B,A\cup B\) and \(A \cap B\) are elements of \(\mathfrak{M}_F\).

This shows that \(\mathfrak{M}_F\) is closed under union, intersection and relative complement. In fact, we merely need to prove \(A-B \in \mathfrak{M}_F\), since \(A \cup B=(A-B) \cup B\) and \(A\cap B = A-(A-B)\).

By Step 5, for \(\varepsilon>0\), there are sets \(K_A\), \(K_B\), \(V_A\), \(V_B\) such that \(K_A \subset A \subset V_A\), \(K_B \subset B \subset V_B\), and for \(A-B\) we have \[ A-B \subset V_A-K_B \subset (V_A-K_A) \cup (K_A-V_B) \cup (V_B-K_B). \] With an application of Step 3 and 5, we have \[ \mu(A-B) \leq \mu(V_A-K_A)+\mu(K_A-V_B)+\mu(V_B-K_B)< \varepsilon+\mu(K_A-V_B)+\varepsilon. \] Since \(K_A-V_B\) is a closed subset of \(K_A\), we see \(K_A-V_B\) is compact as well (a closed subset of a compact set is compact). But \(K_A-V_B \subset A-B\), and \(\mu(A-B) <\mu(K_A-V_B)+2\varepsilon\), we see \(A-B\) meet the requirement of \(\mathfrak{M}_F\) (, the fact that \(\mu(A-B)<\infty\) is trivial since \(\mu(A-B)<\mu(A)\)).

Since \(A-B\) and \(B\) are pairwise disjoint members of \(\mathfrak{M}_F\), we see \[ \mu(A \cup B)=\mu(A-B)+\mu(B)<\infty. \] Thus \(A \cup B \in \mathfrak{M}_F\). Since \(A,A-B \in \mathfrak{M}_F\), we see \(A \cap B = A-(A-B) \in \mathfrak{M}_F\).

Remarks of Step 6. In this step, we demonstrated several ways to express a set, all of which end up with a huge simplification. Now we are able to show that \(\mathfrak{M}_F\) is a subset of \(\mathfrak{M}\).

Step 7 - \(\mathfrak{M}_F \subset \mathfrak{M}\)

There is a precise relation between \(\mathfrak{M}\) and \(\mathfrak{M}_F\) given by \[ \mathfrak{M}_F=\{E \in \mathfrak{M}:\mu(E)<\infty\} \subset \mathfrak{M}. \]

If \(E \in \mathfrak{M}_F\), we shall show that \(E \in \mathfrak{M}\). For compact \(K\in\mathfrak{M}_F\) (Step 1), by Step 6, we see \(K \cap E \in \mathfrak{M}_F\), therefore \(E \in \mathfrak{M}\).

If \(E \in \mathfrak{M}\) with \(\mu(E)<\infty\) however, we need to show that \(E \in \mathfrak{M}_F\). By definition of \(\mu\), for \(\varepsilon>0\), there is an open \(V\) such that \[ \mu(V)<\mu(E)+\varepsilon<\infty. \] Therefore \(V \in \mathfrak{M}_F\). By Step 5, there is a compact set \(K\) such that \(\mu(V-K)<\varepsilon\) (the open set containing \(V\) should be \(V\) itself). Since \(E \cap K \in \mathfrak{M}_F\), there exists a compact set \(H \subset E \cap K\) with \[ \mu(E \cap K)<\mu(H)+\varepsilon. \] Since \(E \subset (E \cap K) \cup (V-K)\), it follows from Step 1 that \[ \mu(E) \leq {\color\red{\mu(E\cap K)}}+\mu(V-K)<{\color\red{\mu(H)+\varepsilon}}+\varepsilon=\mu(H)+2\varepsilon. \] Therefore \(E \in \mathfrak{M}_F\).

Remarks of Step 7. Several tricks in the preceding steps are used here. Now we are pretty close to the fact that \((X,\mathfrak{M},\mu)\) is a measure space. Note that for \(E \in \mathfrak{M}-\mathfrak{M}_F\), we have \(\mu(E)=\infty\), but we have already proved the countable additivity for \(\mathfrak{M}_F\). Is it 'almost trivial' for \(\mathfrak{M}\)? Before that, we need to show that \(\mathfrak{M}\) is a \(\sigma\)-algebra. Note that assertion 3 of \(\mu\) has been proved.

Step 8 - \(\mathfrak{M}\) is a \(\sigma\)-algebra in \(X\) containing all Borel sets

We will validate the definition of \(\sigma\)-algebra one by one.

\(X \in \mathfrak{M}\).

For any compact \(K \subset X\), we have \(K \cap X=K\). But as proved in Step 1, \(K \in \mathfrak{M}_F\), therefore \(X \in \mathfrak{M}\).

If \(A \in \mathfrak{M}\), then \(A^c \in\mathfrak{M}\).

If \(A \in \mathfrak{M}\), then \(A \cap K \in \mathfrak{M}_F\). But \[ K-(A \cap K)=K \cap(A^c \cup K^c)=K\cap A^c \cup \varnothing=K \cap A^c. \] By Step 1 and Step 6, we see \(K \cap A^c \in \mathfrak{M}_F\), thus \(A^c \in \mathfrak{M}\).

If \(A_n \in \mathfrak{M}\) for all \(n \in \mathbb{N}\), then \(A=\bigcup_{n=1}^{\infty}A_n \in \mathfrak{M}\).

We assign an auxiliary sequence of sets inductively. For \(n=1\), we write \(B_1=A_1 \cap K\) where \(K\) is compact. Then \(B_1 \in \mathfrak{M}_F\). For \(n \geq 2\), we write \[ B_n=(A_n \cap K)-(B_1 \cup \cdots\cup B_{n-1}). \] Since \(A_n \cap K \in \mathfrak{M}_F\), \(B_1,B_2,\cdots,B_{n-1} \in \mathfrak{M}_F\), by Step 6, \(B_n \in \mathfrak{M}_F\). Also \(B_n\) is pairwise disjoint.

Another set-theoretic manipulation shows that \[ \begin{aligned} A \cap K&=K \cap\left(\bigcup_{n=1}^{\infty}A_n\right) \\ &=\bigcup_{n=1}^{\infty}(K \cap A_n) \\ &=\bigcup_{n=1}^{\infty}B_n \cup(B_1 \cup \cdots\cup B_{n-1}) \\ &=\bigcup_{n=1}^{\infty}B_n. \end{aligned} \] Now we are able to evaluate \(\mu(A \cap K)\) by Step 4. \[ \begin{aligned} \mu(A \cap K)&=\sum_{n=1}^{\infty}\mu(B_n) \\ &= \lim_{n \to \infty}(A_n \cap K) <\infty. \end{aligned} \] Therefore \(A \cap K \in \mathfrak{M}_F\), which implies that \(A \in \mathfrak{M}\).

\(\mathfrak{M}\) contains all Borel sets.

Indeed, it suffices to prove that \(\mathfrak{M}\) contains all open sets and/or closed sets. We'll show two different paths. Let \(K\) be a compact set.

  1. If \(C\) is closed, then \(C \cap K\) is compact, therefore \(C\) is an element of \(\mathfrak{M}_F\). (By Step 2.)
  2. If \(D\) is open, then \(D \cap K \subset K\). Therefore \(\mu(D \cap K) \leq \mu(K)<\infty\), which shows that \(D\) is an element of \(\mathfrak{M}_F\) (step 7).

Therefore by 1 or 2, \(\mathfrak{M}\) contains all Borel sets.

Step 9 - \(\mu\) is a positive measure on \(\mathfrak{M}\)

Again, we will verify all properties of \(\mu\) one by one.

\(\mu(E) \geq 0\) for all \(E \in \mathfrak{M}\).

This follows immediately from the definition of \(\mu\), since \(\Lambda\) is positive and \(0 \leq f \leq 1\).

\(\mu\) is countably additive.

If \(A_1,A_2,\cdots\) form a disjoint countable collection of members of \(\mathfrak{M}\), we need to show that \[ \mu\left(\bigcup_{n=1}^{\infty}A_n\right)=\sum_{n=1}^{\infty}\mu(A_n). \] If \(A_n \in \mathfrak{M}_F\) for all \(n\), then this is merely what we have just proved in Step 4. If \(A_j \in \mathfrak{M}-\mathfrak{M}_F\) however, we have \(\mu(A_j)=\infty\). So \(\sum_n\mu(A_n)=\infty\). For \(\mu(\cup_n A_n)\), notice that \(\cup_n A_n \supset A_j\), we have \(\mu(\cup_n A_n) \geq \mu(A_j)=\infty\). The identity is now proved.

Step 10 - The completeness of \(\mu\)

So far assertion 1-3 have been proved. But the final assertion has not been proved explicitly. We do that since this property will be used when discussing the Lebesgue measure \(m\). In fact, this will show that \((X,\mathfrak{M},\mu)\) is a complete measure space.

If \(E \in \mathfrak{M}\), \(A \subset E\), and \(\mu(E)=0\), then \(A \in \mathfrak{M}\).

It suffices to show that \(A \in \mathfrak{M}_F\). By definition, \(\mu(A)=0\) as well. If \(K \subset A\), where \(K\) is compact, then \(\mu(K)=\mu(A)=0\). Therefore \(0\) is the supremum of \(\mu(K)\). It follows that \(A \in \mathfrak{M}_F \subset \mathfrak{M}\).

Step 11 - The functional and the measure

For every \(f \in C_c(X)\), \(\Lambda{f}=\int_X fd\mu\).

This is the absolute main result of the theorem. It suffices to prove the inequality \[ \Lambda f \leq \int_X fd\mu \] for all \(f \in C_c(X)\). What about the other side? By the linearity of \(\Lambda\) and \(\int_X \cdot d\mu\), once inequality above proved, we have \[ \Lambda(-f)=-\Lambda{f}\leq\int_{X}-fd\mu=-\int_Xfd\mu. \] Therefore \[ \Lambda{f} \geq \int_X fd\mu \] holds as well, and this establish the equality.

Notice that since \(K=\operatorname{supp}(f)\) is compact, we see the range of \(f\) has to be compact. Namely we may assume that \([a,b]\) contains the range of \(f\). For \(\varepsilon>0\), we are able to pick a partition around \([a,b]\) such that \(y_n - y_{n-1}<\varepsilon\) and \[ y_0 < a < y_1<\cdots<y_n=b. \] Put \[ E_i=\{x:y_{i-1}< f(x) \leq y_i\}\cap K. \] Since \(f\) is continuous, \(f\) is Borel measurable. The sets \(E_i\) are trivially pairwise disjoint Borel sets. Again, there are open sets \(V_i \supset E_i\) such that \[ \mu(V_i) < \mu(E_i)+\frac{\varepsilon}{n} \] for \(i=1,2,\cdots,n\), and such that \(f(x)<y_i + \varepsilon\) for all \(x \in V_i\). Notice that \((V_i)\) covers \(K\), therefore by the partition of unity, there are a sequence of functions \((h_i)\) such that \(h_i \prec V_i\) for all \(i\) and \(\sum h_i=1\) on \(K\). By Step 1 and the fact that \(f=\sum_i h_i\), we see \[ \mu(K) \leq \Lambda(\sum_i h_i)=\sum_i \Lambda{h_i}. \] By the way we picked \(V_i\), we see \(h_if \leq (y_i+\varepsilon)h_i\). We have the following inequality: \[ \begin{aligned} \Lambda{f} &= \sum_{i=1}^{n}\Lambda(h_if) \leq\sum_{i=1}^{n}(y_i+\varepsilon)\Lambda{h_i} \\ &= \sum_{i=1}^{n}\left(|a|-|a|+y_i+\varepsilon\right)\Lambda{h_i} \\ &=\sum_{i=1}^{n}(|a|+y_i+\varepsilon)\Lambda{h_i}-|a|\sum_{i=1}^{n}\Lambda{h_i}. \end{aligned} \] Since \(h_i \prec V_i\), we have \(\mu(E_i)+\frac{\varepsilon}{n}>\mu(V_i) \geq \Lambda{h_i}\). And we already get \(\sum_i \Lambda{h_i} \geq \mu(K)\). If we put them into the inequality above, we get \[ \begin{aligned} \Lambda{f} &\leq \sum_{i=1}^{n}(|a|+y_i+\varepsilon)\Lambda{h_i}-|a|\sum_{i=1}^{n}\Lambda{h_i} \\ &\leq \sum_{i=1}^{n}(|a|+y_i+\varepsilon){\color\red{(\mu(E_i)+\frac{\varepsilon}{n})}}-|a|\color\red{\mu(K)}. \end{aligned} \] Observe that \(\cup_i E_i=K\), by Step 9 we have \(\sum_{i}\mu(E_i)=\mu(K)\). A slight manipulation shows that \[ \begin{aligned} \sum_{i=1}^{n}(|a|+y_i+\varepsilon)\mu(E_i)-|a|\mu(K)&=|a|\sum_{i=1}^{n}\mu(E_i)-|a|\mu(K)+\sum_{i=1}^{n}(y_i+\varepsilon)\mu(E_i) \\ &=\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)+2\varepsilon\mu(K). \end{aligned} \] Therefore for \(\Lambda f\) we get \[ \begin{aligned} \Lambda{f} &\leq\sum_{i=1}^{n}(|a|+y_i+\varepsilon)(\mu(E_i)+\frac{\varepsilon}{n})-|a|\mu(K) \\ &=\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)+2\varepsilon\mu(K)+\frac{\varepsilon}{n}\sum_{i=1}^n(|a|+y_i+\varepsilon). \end{aligned} \] Now here comes the trickiest part of the whole blog post. By definition of \(E_i\), we see \(f(x) > y_{i-1}>y_{i}-\varepsilon\) for \(x \in E_i\). Therefore we get simple function \(s_n\) by \[ s_n=\sum_{i=1}^{n}(y_i-\varepsilon)\chi_{E_i}. \] If we evaluate the Lebesgue integral of \(f\) with respect to \(\mu\), we see \[ \int_X s_nd\mu={\color\red{\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)}} \leq {\color\red{\int_X fd\mu}}. \] For \(2\varepsilon\mu(K)\), things are simple since \(0\leq\mu(K)<\infty\). Therefore \(2\varepsilon\mu(K) \to 0\) as \(\varepsilon \to 0\). Now let's estimate the final part of the inequality. It's trivial that \(\frac{\varepsilon}{n}\sum_{i=1}^{n}(|a|+\varepsilon)=\varepsilon(\varepsilon+|a|)\). For \(y_i\), observe that \(y_i \leq b\) for all \(i\), therefore \(\frac{\varepsilon}{n}\sum_{i=1}^{n}y_i \leq \frac{\varepsilon}{n}nb=\varepsilon b\). Thus \[ {\color\green{\frac{\varepsilon}{n}\sum_{i=1}^{n}(|a|+y_i+\varepsilon)}} \color\black\leq {\color\green {\varepsilon(|a|+b+\varepsilon)}}\color\black{.} \] Notice that \(b+|a| \geq 0\) since \(b \geq a \geq -|a|\). Our estimation of \(\Lambda{f}\) is finally done: \[ \begin{aligned} \Lambda{f} &\leq{\color\red{\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)}}+2\varepsilon\mu(K)+{\color\green{\frac{\varepsilon}{n}\sum_{i=1}^n(|a|+y_i+\varepsilon)}} \\ &\leq{\color\red {\int_Xfd\mu}}+2\varepsilon\mu(K)+{\color\green{\varepsilon(|a|+b+\varepsilon)}} \\ &= \int_X fd\mu+\varepsilon(2\mu(K)+|a|+b+\varepsilon). \end{aligned} \] Since \(\varepsilon\) is arbitrary, we see \(\Lambda{f} \leq \int_X fd\mu\). The identity is proved.

Step 12 - The uniqueness of \(\mu\)

If there are two measures \(\mu_1\) and \(\mu_2\) that satisfy assertion 1 to 4 and are correspond to \(\Lambda\), then \(\mu_1=\mu_2\).

In fact, according to assertion 2 and 3, \(\mu\) is determined by the values on compact subsets of \(X\). It suffices to show that

If \(K\) is a compact subset of \(X\), then \(\mu_1(K)=\mu_2(K)\).

Fix \(K\) compact and \(\varepsilon>0\). By Step 1, there exists an open \(V \supset K\) such that \(\mu_2(V)<\mu_2(K)+\varepsilon\). By Urysohn's lemma, there exists some \(f\) such that \(K \prec f \prec V\). Hence \[ \mu_1(K)=\int_X\chi_kd\mu \leq\int_X fd\mu=\Lambda{f}=\int_X fd\mu_2 \\ \leq \int_X \chi_V fd\mu_2=\mu_2(V)<\mu_2(V)+\varepsilon. \] Thus \(\mu_1(K) \leq \mu_2(K)\). If \(\mu_1\) and \(\mu_2\) are exchanged, we see \(\mu_2(K) \leq \mu_1(K)\). The uniqueness is proved.

The flaw

Can we simply put \(X=\mathbb{R}^k\) right now? The answer is no. Note that the outer regularity is for all sets but inner is only for open sets and members of \(\mathfrak{M}_F\). But we expect the outer and inner regularity to be 'symmetric'. There is an example showing that locally compact is far from being enough to offer the 'symmetry'.

A weird example

Define \(X=\mathbb{R}_1 \times \mathbb{R}_2\), where \(\mathbb{R}_1\) is the real line equipped with discrete metric \(d_1\), and \(\mathbb{R}_2\) is the real line equipped with euclidean metric \(d_2\). The metric of \(X\) is defined by \[ d_X((x_1,y_1),(x_2,y_2))=d_1(x_1,x_2)+d_2(x_1,x_2). \] The topology \(\tau_X\) induced by \(d_X\) is naturally Hausdorff and locally compact by considering the vertical segments. So what would happen to this weird locally compact Hausdorff space?

If \(f \in C_c(X)\), let \(x_1,x_2,\cdots,x_n\) be those values of \(x\) for which \(f(x,y) \neq 0\) for at least one \(y\). Since \(f\) has compact support, it is ensured that there are only finitely many \(x_i\)'s. We are able to define a positive linear functional by \[ \Lambda f=\sum_{i=1}^{n}\int_{-\infty}^{+\infty}f(x_i,y)dy=\int_X fd\mu, \] where \(\mu\) is the measure associated with \(\Lambda\) in the sense of R-M-K theorem. Let \[ E=\mathbb{R}_1 \times \{0\}. \] By squeezing the disjoint vertical segments around \((x_i,0)\), we see \(\mu(K)=0\) for all compact \(K \subset E\) but \(\mu(E)=\infty\).

This is in violent contrast to what we do expect. However, if \(X\) is required to be \(\sigma\)-compact (note that the space in this example is not), this kind of problems disappear neatly.

References / Further reading

  1. Walter Rudin, Real and Complex Analysis
  2. Serge Lang, Fundamentals of Differential Geometry
  3. Joel W. Robbin, Partition of Unity
  4. Brian Conrad, Paracompactness and local compactness
  5. Raoul Bott & Loring W. Tu, Differential Forms in Algebraic Topology

The Big Three Pt. 4 - The Open Mapping Theorem (F-Space)

The Open Mapping Theorem

We are finally going to prove the open mapping theorem in \(F\)-space. In this version, only metric and completeness are required. Therefore it contains the Banach space version naturally.

(Theorem 0) Suppose we have the following conditions:

  1. \(X\) is a \(F\)-space,
  2. \(Y\) is a topological space,
  3. \(\Lambda: X \to Y\) is continuous and linear, and
  4. \(\Lambda(X)\) is of the second category in \(Y\).

Then \(\Lambda\) is an open mapping.

Proof. Let \(B\) be a neighborhood of \(0\) in \(X\). Let \(d\) be an invariant metric on \(X\) that is compatible with the \(F\)-topology of \(X\). Define a sequence of balls by \[ B_n=\{x:d(x,0) < \frac{r}{2^n}\} \] where \(r\) is picked in such a way that \(B_0 \subset B\). To show that \(\Lambda\) is an open mapping, we need to prove that there exists some neighborhood \(W\) of \(0\) in \(Y\) such that \[ W \subset \Lambda(B). \] To do this however, we need an auxiliary set. In fact, we will show that there exists some \(W\) such that \[ W \subset \overline{\Lambda(B_1)} \subset \Lambda(B). \] We need to prove the inclusions one by one.

The first inclusion requires BCT. Since \(B_2 -B_2 \subset B_1\), and \(Y\) is a topological space, we get \[ \overline{\Lambda(B_2)}-\overline{\Lambda(B_2)} \subset \overline{\Lambda(B_2)-\Lambda(B_2)} \subset \overline{\Lambda(B_1)} \] Since \[ \Lambda(X)=\bigcup_{k=1}^{\infty}k\Lambda(B_2), \] according to BCT, at least one \(k\Lambda(B_2)\) is of the second category in \(Y\). But scalar multiplication \(y\mapsto ky\) is a homeomorphism of \(Y\) onto \(Y\), we see \(k\Lambda(B_2)\) is of the second category for all \(k\), especially for \(k=1\). Therefore \(\overline{\Lambda(B_2)}\) has nonempty interior, which implies that there exists some open neighborhood \(W\) of \(0\) in \(Y\) such that \(W \subset \overline{\Lambda(B_1)}\). By replacing the index, it's easy to see this holds for all \(n\). That is, for \(n \geq 1\), there exists some neighborhood \(W_n\) of \(0\) in \(Y\) such that \(W_n \subset \overline{\Lambda(B_n)}\).

The second inclusion requires the completeness of \(X\). Fix \(y_1 \in \overline{\Lambda(B_1)}\), we will show that \(y_1 \in \Lambda(B)\). Pick \(y_n\) inductively. Assume \(y_n\) has been chosen in \(\overline{\Lambda(B_n)}\). As stated before, there exists some neighborhood \(W_{n+1}\) of \(0\) in \(Y\) such that \(W_{n+1} \subset \overline{\Lambda(B_{n+1})}\). Hence \[ (y_n-W_{n+1}) \cap \Lambda(B_n) \neq \varnothing \] Therefore there exists some \(x_n \in B_n\) such that \[ \Lambda x_n = y_n - W_{n+1}. \] Put \(y_{n+1}=y_n-\Lambda x_n\), we see \(y_{n+1} \in W_{n+1} \subset \overline{\Lambda(B_{n+1})}\). Therefore we are able to pick \(y_n\) naturally for all \(n \geq 1\).

Since \(d(x_n,0)<\frac{r}{2^n}\) for all \(n \geq 0\), the sums \(z_n=\sum_{k=1}^{n}x_k\) converges to some \(z \in X\) since \(X\) is a \(F\)-space. Notice we also have \[ \begin{aligned} d(z,0)& \leq d(x_1,0)+d(x_2,0)+\cdots \\ & < \frac{r}{2}+\frac{r}{4}+\cdots \\ & = r \end{aligned} \] we have \(z \in B_0 \subset B\).

By the continuity of \(\Lambda\), we see \(\lim_{n \to \infty}y_n = 0\). Notice we also have \[ \sum_{k=1}^{n} \Lambda x_k = \sum_{k=1}^{n}(y_k-y_{k+1})=y_1-y_{n+1} \to y_1 \quad (n \to \infty), \] we see \(y_1 = \Lambda z \in \Lambda(B)\).

The whole theorem is now proved, that is, \(\Lambda\) is an open mapping. \(\square\)


You may think the following relation comes from nowhere: \[ (y_n - W_{n+1}) \cap \Lambda(B_{n}) \neq \varnothing. \] But it's not. We need to review some set-point topology definitions. Notice that \(y_n\) is a limit point of \(\Lambda(B_n)\), and \(y_n-W_{n+1}\) is a open neighborhood of \(y_n\). If \((y_n - W_{n+1}) \cap \Lambda(B_{n})\) is empty, then \(y_n\) cannot be a limit point.

The geometric series by \[ \frac{\varepsilon}{2}+\frac{\varepsilon}{4}+\cdots+\frac{\varepsilon}{2^n}+\cdots=\varepsilon \] is widely used when sum is taken into account. It is a good idea to keep this technique in mind.


The formal proof will not be put down here, but they are quite easy to be done.

(Corollary 0) \(\Lambda(X)=Y\).

This is an immediate consequence of the fact that \(\Lambda\) is open. Since \(Y\) is open, \(\Lambda(X)\) is an open subspace of \(Y\). But the only open subspace of \(Y\) is \(Y\) itself.

(Corollary 1) \(Y\) is a \(F\)-space as well.

If you have already see the commutative diagram by quotient space (put \(N=\ker\Lambda\)), you know that the induced map \(f\) is open and continuous. By treating topological spaces as groups, by corollary 0 and the first isomorphism theorem, we have \[ X/\ker\Lambda \simeq \Lambda(X)=Y. \] Therefore \(f\) is a isomorphism; hence one-to-one. Therefore \(f\) is a homeomorphism as well. In this post we showed that \(X/\ker{\Lambda}\) is a \(F\)-space, therefore \(Y\) has to be a \(F\)-space as well. (We are using the fact that \(\ker{\Lambda}\) is a closed set. But why closed?)

(Corollary 2) If \(\Lambda\) is a continuous linear mapping of an \(F\)-space \(X\) onto a \(F\)-space \(Y\), then \(\Lambda\) is open.

This is a direct application of BCT and open mapping theorem. Notice that \(Y\) is now of the second category.

(Corollary 3) If the linear map \(\Lambda\) in Corollary 2 is injective, then \(\Lambda^{-1}:Y \to X\) is continuous.

This comes from corollary 2 directly since \(\Lambda\) is open.

(Corollary 4) If \(X\) and \(Y\) are Banach spaces, and if \(\Lambda: X \to Y\) is a continuous linear bijective map, then there exist positive real numbers \(a\) and \(b\) such that \[ a \lVert x \rVert \leq \lVert \Lambda{x} \rVert \leq b\rVert x \rVert \] for every \(x \in X\).

This comes from corollary 3 directly since both \(\Lambda\) and \(\Lambda^{-1}\) are bounded as they are continuous.

(Corollary 5) If \(\tau_1 \subset \tau_2\) are vector topologies on a vector space \(X\) and if both \((X,\tau_1)\) and \((X,\tau_2)\) are \(F\)-spaces, then \(\tau_1 = \tau_2\).

This is obtained by applying corollary 3 to the identity mapping \(\iota:(X,\tau_2) \to (X,\tau_1)\).

(Corollary 6) If \(\lVert \cdot \rVert_1\) and \(\lVert \cdot \rVert_2\) are two norms in a vector space \(X\) such that

  • \(\lVert\cdot\rVert_1 \leq K\lVert\cdot\rVert_2\).
  • \((X,\lVert\cdot\rVert_1)\) and \((X,\lVert\cdot\rVert_2)\) are Banach

Then \(\lVert\cdot\rVert_1\) and \(\lVert\cdot\rVert_2\) are equivalent.

This is merely a more restrictive version of corollary 5.

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it's time to make a list of the series. It's been around half a year.