Direction is a considerable thing. For example take a look at this picture (by David Gunderman):

The position of the red ball and black ball shows that this triple of balls turns upside down every time they finish one round. This wouldn’t happen if this triple were on a normal band, which can be denoted by $S^1 \times (0,1)$. What would happen if we try to describe their velocity on the Möbius band, both locally and globally? There must be some significant difference from a normal band. If we set some move pattern on balls, for example let them run horizontally or zig-zagly, hopefully we get different *set* of vectors. those vectors can span some vector spaces as well.

Here and in the forgoing posts, we will try to develop purely formally certain functorial constructions having to do with vector bundles. It may be overly generalized, but we will offer some examples to make it concrete.

Let $M$ be a manifold (of class $C^p$, where $p \geq 0$ and can be set to $\infty$) modeled on a Banach space $\mathbf{E}$. Let $E$ be another topological space and $\pi: E \to M$ a surjective $C^p$-morphism. A **vector bundle** is a topological construction associated with $M$ (base space), $E$ (total space) and $\pi$ (bundle projection) such that, roughly speaking, $E$ is locally a product of $M$ and $\mathbf{E}$.

We use $\mathbf{E}$ instead of $\mathbb{R}^n$ to include the infinite dimensional cases. We will try to distinguish finite-dimensional and infinite-dimensional Banach spaces here. There are a lot of things to do, since, for example, infinite dimensional Banach spaces have no countable Hamel basis, while the finite-dimensional ones have finite ones (this can be proved by using the Baire category theorem).

Next we will show precisely how $E$ locally becomes a product space. Let $\mathfrak{U}=(U_i)_i$ be an open covering of $M$, and for each $i$, suppose that we are *given* a mapping

satisfying the following three conditions.

**VB 1** $\tau_i$ is a $C^p$ diffeomorphism making the following diagram commutative:

where $pr$ is the projection of the first component: $(x,y) \mapsto x$. By restricting $\tau_i$ on one point of $U_i$, we obtain an isomorphism on each fiber $\pi^{-1}(x)$:

**VB 2** For each pair of open sets $U_i$, $U_j \in \mathfrak{U}$, we have the map

to be a toplinear isomorphism (that is, it preserves $\mathbf{E}$ for being a *topological* vector space).

**VB 3** For any two members $U_i$, $U_j \in \mathfrak{U}$, we have the following function to be a $C^p$-morphism:

**REMARKS.** As with manifold, we call the set of 2-tuples $(U_i,\tau_i)_i$ a **trivializing covering** of $\pi$, and that $(\tau_i)$ are its **trivializing maps**. Precisely, for $x \in U_i$, we say $U_i$ or $\tau_i$ trivializes at $x$.

Two trivializing *coverings* for $\pi$ is said to be **VB-equivalent** if taken together they also satisfy conditions of **VB 2** and **VB 3**. It’s immediate that **VB-equivalence** is an equivalence relation and we leave the verification to the reader. It is this VB-equivalence *class* of trivializing coverings that determines a structure of **vector bundle** on $\pi$. With respect to the Banach space $\mathbf{E}$, we say that the vector bundle has **fiber** $\mathbf{E}$, or is **modeled on** $\mathbf{E}$.

Next we shall give some motivations of each condition. Each pair $(U_i,\tau_i)$ determines a local product of ‘a part of the manifold’ and the model space, on the latter of which we can deploy the direction with ease. This is what **VB 1** tells us. But that’s far from enough if we want our vectors fine enough. We do want the total space $E$ to actually be able to qualify our requirements. As for **VB 2**, it is ensured that using two different trivializing maps will give the same structure of some Banach spaces (with *equivalent* norms). According to the image of $\tau_{ix}$, we can say, for each point $x \in X$, which can be determined by a fiber $\pi^{-1}(x)$ (the pre-image of $\tau_{ix}$), can be given another Banach space by being sent via $\tau_{jx}$ for some $j$. Note that $\pi^{-1}(x) \in E$, the total space. In fact, **VB 2** has an equivalent alternative:

**VB 2’** On each fiber $\pi^{-1}(x)$ we are given a structure of Banach space as follows. For $x \in U_i$, we have a toplinear isomorphism which is in fact the trivializing map:

As stated, **VB 2** implies **VB 2’**. Conversely, if **VB 2’** is satisfied, then for open sets $U_i$, $U_j \in \mathfrak{U}$, and $x \in U_i \cap U_j$, we have $\tau_{jx} \circ \tau_{ix}^{-1}:\mathbf{E} \to \mathbf{E}$ to be an toplinear isomorphism. Hence, we can consider **VB 2** or **VB 2’** as the refinement of **VB 1**.

In finite dimensional case, one can omit **VB 3** since it can be implied by **VB 2**, and we will prove it below.

(Lemma)Let $\mathbf{E}$ and $\mathbf{F}$ be two finite dimensional Banach spaces. Let $U$ be open in some Banach space. Letbe a $C^p$-morphism such that for each $x \in U$, the map

given by $f_x(v)=f(x,v)$ is a linear map. Then the map of $U$ into $L(\mathbf{E},\mathbf{F})$ given by $x \mapsto f_x$ is a $C^p$-morphism.

**PROOF.** Since $L(\mathbf{E},\mathbf{F})=L(\mathbf{E},\mathbf{F_1}) \times L(\mathbf{E},\mathbf{F_2}) \times \cdots \times L(\mathbf{E},\mathbf{F_n})$ where $\mathbf{F}=\mathbf{F_1} \times \cdots \times \mathbf{F_n}$, by induction on the dimension of $\mathbf{F}$ and $\mathbf{E}$, it suffices to assume that $\mathbf{E}$ and $\mathbf{F}$ are toplinearly isomorphic to $\mathbb{R}$. But in that case, the function $f(x,v)$ can be written $g(x)v$ for some $g:U \to \mathbb{R}$. Since $f$ is a morphism, it follows that as a function of each argument $x$, $v$ is also a morphism, Putting $v=1$ shows that $g$ is also a morphism, which finishes the case when both the dimension of $\mathbf{E}$ and $\mathbf{F}$ are equal to $1$, and the proof is completed by induction. $\blacksquare$

To show that **VB 3** is implied by **VB 2**, put $\mathbf{E}=\mathbf{F}$ as in the lemma. Note that $\tau_j \circ \tau_i^{-1}$ maps $U_i \cap U_j \times \mathbf{E}$ to $\mathbf{E}$, and $U_i \cap U_j$ is open, and for each $x \in U_i \cap U_j$, the map $(\tau_j \circ \tau_i^{-1})_x=\tau_{jx} \circ \tau_{ix}^{-1}$ is toplinear, hence linear. Then the fact that $\varphi$ is a morphism follows from the lemma.

Let $M$ be any $n$-dimensional smooth manifold that you are familiar with, then $pr:M \times \mathbb{R}^n \to M$ is actually a vector bundle. Here the total space is $M \times \mathbb{R}^n$ and the base is $M$ and $pr$ is the bundle projection but in this case it is simply a projection. Intuitively, on a total space, we can determine a point $x \in M$, and another component can be any direction in $\mathbb{R}^n$, hence a *vector*.

We need to verify three conditions carefully. Let $(U_i,\varphi_i)_i$ be any atlas of $M$, and $\tau_i$ is the identity map on $U_i$ (which is naturally of $C^p$). We claim that $(U_i,\tau_i)_i$ satisfy the three conditions, thus we get a vector bundle.

For **VB 1** things are clear: since $pr^{-1}(U_i)=U_i \times \mathbb{R}^n$, the diagram is commutative. Each fiber $pr^{-1}(x)$ is essentially $(x) \times \mathbb{R}^n$, and still, $\tau_{jx} \circ \tau_{ix}^{-1}$ is the identity map between $(x) \times \mathbb{R}^n$ and $(x) \times \mathbb{R}^n$, under the same Euclidean topology, hence **VB 2** is verified, and we have no need to verify **VB 3**.

First of all, imagine you have embedded a circle into a Möbius band. Now we try to give some formal definition. As with quotient topology, $S^1$ can be defined as

where $I$ is the unit interval and $0 \sim_1 1$ (identifying two ends). On the other hand, the infinite Möbius band can be defined by

where $(0,v) \sim_2 (1,-v)$ for all $v \in \mathbb{R}$ (not only identifying two ends of $I$ but also ‘flips’ the vertical line). Then all we need is a natural projection on the first component:

And the verification has few difference from the trivial bundle. Quotient topology of Banach spaces follows naturally in this case, but things might be troublesome if we restrict ourself in $\mathbb{R}^n$.

The first example is relatively rare in many senses. By $S^n$ we mean the set in $\mathbb{R}^{n+1}$ with

and the tangent bundle can be defined by

where, of course, $\mathbf{x} \in S^n$ and $\mathbf{y} \in \mathbb{R}^{n+1}$. The vector bundle is given by $pr:TS^n \to S^n$ where $pr$ is the projection of the first factor. This total space is of course much finer than $M \times \mathbb{R}^n$ in the first example. Each point in the manifold now is associated with a *tangent space* $T_x(M)$ at this point.

More generally, we can define it in any Hilbert space $H$, for example, $L^2$ space:

where

The projection is natural:

But we will not cover the verification in this post since it is required to introduce the abstract definition of tangent vectors. This will be done in the following post.

We want to study those ‘vectors’ associated to some manifold both globally and locally. For example we may want to describe the tangent line of some curves at some point without heavily using elementary calculus stuff. Also, we may want to describe the vector bundle of a manifold globally, for example, when will we have a trivial one? Can we classify the manifold using the behavior of the bundle? Can we make it a little more abstract, for example, consider the class of all isomorphism bundles? How do one bundle *transform* to another? But to do this we need a big amount of definitions and propositions.

We can define several relations between two norms. Suppose we have a topological vector space $X$ and two norms $\lVert \cdot \rVert_1$ and $\lVert \cdot \rVert_2$. One says $\lVert \cdot \rVert_1$ is *weaker* than $\lVert \cdot \rVert_2$ if there is $K>0$ such that $\lVert x \rVert_1 \leq K \lVert x \rVert_2$ for all $x \in X$. Two norms are *equivalent* if each is weaker than the other (trivially this is a equivalence relation). The idea of stronger and weaker norms is related to the idea of the “finer” and “coarser” topologies in the setting of topological spaces.

So what about their limit of convergence? Unsurprisingly this can be verified with elementary $\epsilon-N$ arguments. Suppose now $\lVert x_n - x \rVert_1 \to 0$ as $n \to 0$, we immediately have

for some large enough $n$. Hence $\lVert x_n - x \rVert_2 \to 0$ as well. But what about the converse? We give a new definition of equivalence relation between norms.

(Definition)Two norms $\lVert \cdot \rVert_1$ and $\lVert \cdot \rVert_2$ of a topological vector space arecompatibleif given that $\lVert x_n - x \rVert_1 \to 0$ and $\lVert x_n - y \rVert_2 \to 0$ as $n \to \infty$, we have $x=y$.

By the uniqueness of limit, we see if two norms are equivalent, then they are compatible. And surprisingly, with the help of the closed graph theorem we will discuss in this post, we have

(Theorem 1)If $\lVert \cdot \rVert_1$ and $\lVert \cdot \rVert_2$ are compatible, and both $(X,\lVert\cdot\rVert_1)$ and $(X,\lVert\cdot\rVert_2)$ are Banach, then $\lVert\cdot\rVert_1$ and $\lVert\cdot\rVert_2$ are equivalent.

This result looks natural but not seemingly easy to prove, since one find no way to build a bridge between the limit and a general inequality. But before that, we need to elaborate some terminologies.

(Definition)For $f:X \to Y$, thegraphof $f$ is defined by

If both $X$ and $Y$ are topological spaces, and the topology of $X \times Y$ is the usual one, that is, the smallest topology that contains all sets $U \times V$ where $U$ and $V$ are open in $X$ and $Y$ respectively, and if $f: X \to Y$ is continuous, it is natural to expect $G(f)$ to be closed. For example, by taking $f(x)=x$ and $X=Y=\mathbb{R}$, one would expect the diagonal line of the plane to be closed.

(Definition)The topological space $(X,\tau)$ is an $F$-space if $\tau$ is induced by a complete invariant metric $d$. Here invariant means that $d(x+z,y+z)=d(x,y)$ for all $x,y,z \in X$.

A Banach space is easily to be verified to be a $F$-space by defining $d(x,y)=\lVert x-y \rVert$.

(Open mapping theorem)See this post

By definition of closed set, we have a practical criterion on whether $G(f)$ is closed.

(Proposition 1)$G(f)$ is closed if and only if, for any sequence $(x_n)$ such that the limitsexist, we have $y=f(x)$.

In this case, we say $f$ is closed. For continuous functions, things are trivial.

(Proposition 2)If $X$ and $Y$ are two topological spaces and $Y$ is Hausdorff, and $f:X \to Y$ is continuous, then $G(f)$ is closed.

*Proof.* Let $G^c$ be the complement of $G(f)$ with respect to $X \times Y$. Fix $(x_0,y_0) \in G^c$, we see $y_0 \neq f(x_0)$. By the Hausdorff property of $Y$, there exists some open subsets $U \subset Y$ and $V \subset Y$ such that $y_0 \in U$ and $f(x_0) \in V$ and $U \cap V = \varnothing$. Since $f$ is continuous, we see $W=f^{-1}(V)$ is open in $X$. We obtained a open neighborhood $W \times U$ containing $(x_0,y_0)$ which has empty intersection with $G(f)$. This is to say, every point of $G^c$ has a open neighborhood contained in $G^c$, hence a interior point. Therefore $G^c$ is open, which is to say that $G(f)$ is closed. $\square$

**REMARKS.** For $X \times Y=\mathbb{R} \times \mathbb{R}$, we have a simple visualization. For $\varepsilon>0$, there exists some $\delta$ such that $|f(x)-f(x_0)|<\varepsilon$ whenever $|x-x_0|<\delta$. For $y_0 \neq f(x_0)$, pick $\varepsilon$ such that $0<\varepsilon<\frac{1}{2}|f(x_0)-y_0|$, we have two boxes ($CDEF$ and $GHJI$ on the picture), namely

and

In this case, $B_2$ will not intersect the graph of $f$, hence $(x_0,y_0)$ is an interior point of $G^c$.

The Hausdorff property of $Y$ is not removable. To see this, since $X$ has no restriction, it suffices to take a look at $X \times X$. Let $f$ be the identity map (which is continuous), we see the graph

is the diagonal. Suppose $X$ is not Hausdorff, we reach a contradiction. By definition, there exists some distinct $x$ and $y$ such that all neighborhoods of $x$ contain $y$. Pick $(x,y) \in G^c$, then *all* neighborhoods of $(x,y) \in X \times X$ contain $(x,x)$ so $(x,y) \in G^c$ is *not* a interior point of $G^c$, hence $G^c$ is not open.

Also, as an immediate consequence, every affine algebraic variety in $\mathbb{C}^n$ and $\mathbb{R}^n$ is closed with respect to Euclidean topology. Further, we have the Zariski topology $\mathcal{Z}$ by claiming that, if $V$ is an affine algebraic variety, then $V^c \in \mathcal{Z}$. It’s worth noting that $\mathcal{Z}$ is *not* Hausdorff (example?) and in fact much coarser than the Euclidean topology although an affine algebraic variety is both closed in the Zariski topology and the Euclidean topology.

After we have proved this theorem, we are able to prove the theorem about compatible norms. We shall assume that both $X$ and $Y$ are $F$-spaces, since the norm plays no critical role here. This offers a greater variety but shall not be considered as an abuse of abstraction.

(The Closed Graph Theorem)Suppose(a) $X$ and $Y$ are $F$-spaces,

(b) $f:X \to Y$ is linear,

(c) $G(f)$ is closed in $X \times Y$.

Then $f$ is continuous.

In short, the closed graph theorem gives a sufficient condition to claim the continuity of $f$ (keep in mind, linearity does not imply continuity). If $f:X \to Y$ is continuous, then $G(f)$ is closed; if $G(f)$ is closed and $f$ is linear, then $f$ is continuous.

*Proof.* First of all we should make $X \times Y$ an $F$-space by assigning addition, scalar multiplication and metric. Addition and scalar multiplication are defined componentwise in the nature of things:

The metric can be defined without extra effort:

Then it can be verified that $X \times Y$ is a topological space with translate invariant metric. (Potentially the verifications will be added in the future but it’s recommended to do it yourself.)

Since $f$ is linear, the graph $G(f)$ is a subspace of $X \times Y$. Next we quote an elementary result in point-set topology, a subset of a complete metric space is closed if and only if it’s complete, by the translate-invariance of $d$, we see $G(f)$ is an $F$-space as well. Let $p_1: X \times Y \to X$ and $p_2: X \times Y \to Y$ be the natural projections respectively (for example, $p_1(x,y)=x$). Our proof is done by verifying the properties of $p_1$ and $p_2$ on $G(f)$.

*For simplicity one can simply define $p_1$ on $G(f)$ instead of the whole space $X \times Y$, but we make it a global projection on purpose to emphasize the difference between global properties and local properties. One can also write $p_1|_{G(f)}$ to dodge confusion.*

**Claim 1.** $p_1$ (with restriction on $G(f)$) defines an isomorphism between $G(f)$ and $X$.

For $x \in X$, we see $p_1(x,f(x)) = x$ (surjectivity). If $p_1(x,f(x))=0$, we see $x=0$ and therefore $(x,f(x))=(0,0)$, hence the restriction of $p_1$ on $G$ has trivial kernel (injectivity). Further, it’s trivial that $p_1$ is linear.

**Claim 2.** $p_1$ is continuous on $G(f)$.

For every sequence $(x_n)$ such that $\lim_{n \to \infty}x_n=x$, we have $\lim_{n \to \infty}f(x_n)=f(x)$ since $G(f)$ is closed, and therefore $\lim_{n \to \infty}p_1(x_n,f(x_n)) =x$. Meanwhile $p_1(x,f(x))=x$. The continuity of $p_1$ is proved.

**Claim 3.** $p_1$ is a homeomorphism with restriction on $G(f)$.

We already know that $G(f)$ is an $F$-space, so is $X$. For $p_1$ we have $p_1(G(f))=X$ is of the second category (since it’s an $F$-space and $p_1$ is one-to-one), and $p_1$ is continuous and linear on $G(f)$. By the open mapping theorem, $p_1$ is an open mapping on $G(f)$, hence is a homeomorphism thereafter.

**Claim 4.** $p_2$ is continuous.

This follows the same way as the proof of claim 2 but much easier since we have no need to care about $f$.

Now things are immediate once one realizes that $f=p_2 \circ p_1|_{G(f)}^{-1}$, and hence $f$ is continuous. $\square$

Before we go for theorem 1 at the beginning, we drop an application on Hilbert spaces.

Let $T$ be a bounded operator on the Hilbert space $L_2([0,1])$ so that if $\phi \in L_2([0,1])$ is a continuous function so is $T\phi$. Then the restriction of $T$ to $C([0,1])$ is a bounded operator of $C([0,1])$.

For details please check this.

Now we go for the identification of norms. Define

i.e. the identity map between two Banach spaces (hence $F$-spaces). Then $f$ is linear. We need to prove that $G(f)$ is closed. For the convergent sequence $(x_n)$

we have

Hence $G(f)$ is closed. Therefore $f$ is continuous, hence bounded, we have some $K$ such that

By defining

we see $g$ is continuous as well, hence we have some $K’$ such that

Hence two norms are weaker than each other.

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it’s time to make a list of the series. It’s been around half a year.

- The Big Three Pt. 1 - Baire Category Theorem Explained
- The Big Three Pt. 2 - The Banach-Steinhaus Theorem
- The Big Three Pt. 3 - The Open Mapping Theorem (Banach Space)
- The Big Three Pt. 4 - The Open Mapping Theorem (F-Space)
- The Big Three Pt. 5 - The Hahn-Banach Theorem (Dominated Extension)
- The Big Three Pt. 6 - Closed Graph Theorem with Applications

- Walter Rudin,
*Functional Analysis* - Peter Lax,
*Functional Analysis* - Jesús Gil de Lamadrid,
*Some Simple Applications of the Closed Graph Theorem*

Partition of unity builds a bridge between local properties and global properties. A nice example is the Stokes’ theorem on manifolds.

Suppose $\omega$ is a $(n-1)$-form with compact support on a oriented manifold $M$ of dimension $n$ and if $\partial{M}$ is given the induced orientation, then

This theorem can be proved in two steps. First, by Fubini’s theorem, one proves the identity on $\mathbb{R}^n$ and $\mathbb{H}^n$. Second, for the general case, let $(U_\alpha)$ be an oriented atlas for $M$ and $(\rho_\alpha)$ a partition of unity to $(U_\alpha)$, one naturally writes $\omega=\sum_{\alpha}\rho_\alpha\omega$. Since $\int_M d\omega=\int_{\partial M}\omega$ is linear with respect to $\omega$, it suffices to prove it only for $\rho_\alpha\omega$. Note that the support of $\rho_\alpha\omega$ is contained in the intersection of supports of $\rho_\alpha$ and $\omega$, hence a compact set.

On the other hand, $U_\alpha$ is diffeomorphic to either $\mathbb{R}^n$ or $\mathbb{H}^n$, it is immediate that

Which furnishes the proof for the general case.

As is seen, to prove a global thing, we do it locally. If you have trouble with these terminologies, never mind. We will go through this right now (in a more abstract way however). If you are familiar with them however, fell free to skip.

Throughout, we use bold letters like $\mathbf{E}$, $\mathbf{F}$ to denote Banach spaces. We will treat Euclidean spaces as a case instead of our restriction. Indeed since Banach spaces are not necessarily of finite dimension, our approach can be troublesome. But the benefit is a better view of abstraction.

Let $X$ be a set. An

atlas of class$C^p$ ($p \geq 0$) on $X$ is a collection of pairs $(U_i,\varphi_i)$ where $i$ ranges through some indexing set, satisfying the following conditions:

AT 1.Each $U_i$ is a subset of $X$ and $\bigcup_{i}U_i=X$.

AT 2.Each $\varphi_i$ is a bijection of $U_i$ onto an open subset $\varphi_iU_i$ of some Banach space $\mathbf{E}_i$ and for any $i$ and $j$, $\phi_i(U_i \cap U_j)$ is open in $E_i$.

AT 3.The mapis a $C^p$-isomorphism for all $i$ and $j$.

One should be advised that isomorphism here does not come from group theory, but category theory. Precisely speaking, it’s the isomorphism in the category $\mathfrak{O}$ whose objects are the continuous maps of Banach spaces and whose morphisms are the continuous maps of class $C^p$.

Also, by setting $\tau_X=(U_i)_i$, we see $\tau_X$ is a topology, and $\varphi_i$ are topological isomorphisms. Also, we see no need to assume that $X$ is Hausdorff unless we start with Hausdorff spaces. Lifting this restriction gives us more freedom (also sometimes more difficulty to some extent though).

For condition **AT 2**, we did not require that the vector spaces be the same for all indexes $i$, or even that they be toplinearly isomorphic. If they are all equal to the same space $\mathbf{E}$, then we say that the atlas is an $\mathbf{E}$-atlas.

Suppose that we are given an open subset $U$ of $X$ and a topological isomorphism $\phi:U \to U’$ onto an open subset of some Banach space $\mathbb{E}$. We shall say that $(U,\varphi)$ is **compatible** with the atlas $(U_i,\varphi_i)_i$ if each map $\varphi\circ\varphi^{-1}$ is a $C^p$-isomorphism. Two atlas are said to be **compatible** if each chart of one is compatible with other atlas. It can be verified that this is a equivalence relation. *An equivalence relation of atlases of class $C^p$ on $X$ is said to define a structure of $C^p$- manifold on $X$.* If all the vector spaces $\mathbf{E}_i$ in some atlas are toplinearly isomorphic, we can find some universal $\mathbf{E}$ that is equal to all of them. In this case, we say $X$ is a $\mathbf{E}$-manifold or that $X$ is

As we know, $\mathbb{R}^n$ is a Banach space. If $\mathbf{E}=\mathbb{R}^n$ for some fixed $n$, then we say that the manifold is $n$-dimensional. Also we have the **local coordinates**. A chart

is given by $n$ coordinate functions $\varphi_1,\cdots,\varphi_n$. If $P$ denotes a point of $U$, these functions are often written

or simply $x_1,\cdots,x_n$.

Let $X$ be a topological space. A covering $\mathfrak{U}$ of $X$ is **locally finite** if every point $x$ has a neighborhood $U$ such that all but a finite number of members of $\mathfrak{U}$ do not intersect with $U$ (as you will see, this prevents some nonsense summation). A **refinement** of a covering $\mathfrak{U}$ is a covering $\mathfrak{U}’$ such that for any $U’ \in \mathfrak{U}’$, there exists some $U \in \mathfrak{U}$ such that $U’ \subset U$. If we write $\mathfrak{U} \leq \mathfrak{U}’$ in this case, we see that the set of open covers on a topological space forms a *direct set*.

A topological space is **paracompact** if it is Hausdorff, and every open covering has a locally finite open refinement. Here follows some examples of paracompact spaces.

- Any compact Hausdorff space.
- Any CW complex.
- Any metric space (hence $\mathbb{R}^n$).
- Any Hausdorff Lindelöf space.
- Any Hausdorff $\sigma$-compact space

These are not too difficult to prove, and one can easily find proofs on the Internet. Below are several key properties of paracompact spaces.

If $X$ is paracompact, then $X$ is normal. (Proof here)

Let $X$ be a paracompact (hence normal) space and $\mathfrak{U}=(U_i)$ a locally finite open cover, then there exists a locally finite open covering $\mathfrak{V}=(V_i)$ such that $\overline{V_i} \subset U_i$. (Proof here. Note the axiom of choice is assumed.

One can find proofs of the following propositions on *Elements of Mathematics, General Topology, Chapter 1-4* by N. Bourbaki. It’s interesting to compare them to the corresponding ones of compact spaces.

Every closed subspace $F$ of a paracompact space $X$ is paracompact.

The product of a paracompact space and a compact space is paracompact.

Let $X$ be a locally compact paracompact space. Then every open covering $\mathfrak{R}$ of $X$ has a locally finite open refinement $\mathfrak{R}’$ formed of relatively compact sets. If $X$ is $\sigma$-compact then $\mathfrak{R}’$ can be taken to be countable.

A

partition of unity(of class $C^p$) on a manifold $X$ consists of an open covering $(U_i)$ of $X$ and a family of functionssatisfying the following conditions:

PU 1.For all $x \in X$ we have $\phi_i(x) \geq 0$.

PU 2.The support of $\psi_i$ is contained in $U_i$.

PU 3.The covering is locally finite

PU 4.For each point $x \in X$ we have

The sum in PU 4 makes sense because for given point $x$, there are only finite many $i$ such that $\psi_i(x) >0$, according to PU 3.

A manifold $X$ will be said to **admit partition of unity** if it is paracompact, and if, given a locally finite open covering $(U_i)$, there exists a partition of unity $(\psi_i)$ such that the support of $\psi_i$ is contained in $U_i$.

This function will be useful when dealing with finite dimensional case.

For every integer $n$ and every real number $\delta>0$ there exist maps $\psi_n \in C^{\infty}(\mathbb{R}^n;\mathbb{R})$ which equal $1$ on $B(0,1)$ and vanish in $\mathbb{R}^n\setminus B(1,1+\delta)$.

*Proof.* It suffices to prove it for $\mathbb{R}$ since once we proved the existence of $\psi_1$, then we may write

Consider the function $\phi: \mathbb{R} \to \mathbb{R}$ defined by

The reader may have seen it in some analysis course and should be able to check that $\phi \in C^{\infty}(\mathbb{R};\mathbb{R})$. Integrating $\phi$ from $-\infty$ to $x$ and divide it by $\lVert \phi \rVert_1$ (you may have done it in probability theory) to obtain

it is immediate that $\theta(x)=0$ for $x \leq a$ and $\theta(x)=1$ for $x \geq b$. By taking $a=1$ and $b=(1+\delta)^2$, our job is done by letting $\psi_1(x)=1-\theta(x^2)$. Considering $x^2=|x|^2$, one sees that the identity about $\psi_n$ and $\psi_1$ is redundant. $\square$

In the following blog posts, we will generalize this to Hilbert spaces.

Of course this is desirable. But we will give an example that sometimes we cannot find a satisfying partition of unity.

Let $D$ be a connected bounded open set in $\ell^p$ where $p$ is not an even integer. Assume $f$ is a real-valued function, continuous on $\overline{D}$ and $n$-times differentiable in $D$ with $n \geq p$. Then $f(\overline{D}) \subset \overline{f(\partial D)}$.

(Corollary)Let $f$ be an $n$-times differentiable function on $\ell^p$ space, where $n \geq p$, and $p$ is not an even integer. If $f$ has its support in a bounded set, then $f$ is identically zero.

It follows that for $n \geq p$, $C^n$ partitions of unity do not exists whenever $p$ is not an even integer. For example,e $\ell^1[0,1]$ does not have a $C^2$ partition of unity. It is then our duty to find that under what condition does the desired partition of unity available.

Below are two theorems about the existence of partitions of unity. We are not proving them here but in the future blog post since that would be rather long. The restrictions on $X$ are acceptable. For example $\mathbb{R}^n$ is locally compact and hence the manifold modeled on $\mathbb{R}^n$.

Let $X$ be a manifold which is locally compact Hausdorff and whose topology has a countable base. Then $X$ admits partitions of unity

Let $X$ be a paracompact manifold of class $C^p$, modeled on a separable Hilbert space $E$, then $X$ admits partitions of unity (of class $C^p$)

- N. Bourbaki,
*Elements of Mathematics* - S. Lang,
*Fundamentals of Differential Geometry* - M. Berger,
*Differential Geometry: Manifolds, Curves, and Surfaces* - R. Bonic and J. Frampton,
*Differentiable Functions on Certain Banach Spaces*

对于$\Gamma$函数，我们有一个经典的极限式（证明请见ProofWiki）。

利用这个式子，我们能立刻计算出一些比较难算的极限。注意到这个公式如果写成自然数的形式，有

所以我们能立刻计算出这个极限：

但是Stirling公式不仅仅如此。这篇博客里我们会见到几个比较经典的估计。

这一节我们会看到的结论是

如果在计算器里算一下右边的数，会发现，$\phi_n=\frac{n!}{(n/e)^n\sqrt{2\pi n}}$一直在$1$附近。

对于$m=1,2,3,\dots$，在$y=\ln(x)$下方定义“折线函数”：

其中$m \leq x \leq m+1$。在上方定义另一个“折线函数”：

其中$m-1/2 \leq x < m+1/2$。如果画出$f$，$\ln{x}$，$g$的图像，会发现，$f$和$g$是对$\ln{x}$的拟合。且在$x \geq 1$时，我们有

所以计算定积分的时候就有

但是$f$和$g$的关系并不是那么简单。计算$f$的积分，我们发现

而对于$g$，我们又有

这就说明

总结上面几个不等式，我们得到，对$n>1$：

不等式各项都减去$\int_1^n \ln x dx$，我们又有

由Stirling公式我们知道，

而数列$x_n=-\frac{1}{8n}+\ln(n!)-(\frac{1}{2}+n)\ln{n}+n$是单调递增的，由上式可知收敛到$\ln\sqrt{2\pi}$。在不等式左边，我们取上确界$\ln\sqrt{2\pi}$。在不等式右边，我们取下确界$x_1+\frac{1}{8}=1$。这就让我们得到了

这也就导致

这对所有$n =1,2,3,\dots$都成立。

对于任意$c \in \mathbb{R}$，我们有

这可以看成，把$\Gamma(x)$向左平移$c$后，在$x$足够大时，其值和$x^c\Gamma(x)$接近。这个等式的证明也是比较简单的，虽然计算比较繁琐，只需要利用Stirling公式。

现在这三个因式的极限就很好计算了。显然我们有

以及

最后，

故原极限为$1$。计算过程也非常精彩。注意到如果把$x$和$c$换成正整数$n$和整数$k$，我们又有

结合Bernoulli不等式我们有

接下来我们会给出一个比较精细的估计。实际上，

根据$B(x,y)$函数的定义，

令$t=u^2，我们得到

代入$x=\frac{1}{2}$和$y=n+1$，我们就和所想要的结果很近了：

注意到，利用$B$函数的第二个表达式，我们是可以计算出$\Gamma(\frac{1}{2})$的。实际上，

从而$\Gamma(\frac{1}{2})=\sqrt{\pi}$。对于$B(\frac{1}{2},n+1)$，我们可以用到上面的平移公式了：

从而

最后我们证明一个和Stirling公式没有关系的等式

根据古典代数学基本定理，我们立刻有

注意到另一方面

$x=1$时，我们有

此即

考虑到欧拉反射公式，对于$1 \leq k \leq n-1$，我们有

如果$n$为奇数，那么根据上面的结果，我们能得到

这时我们只用到了一半数量的$k$。要用上另一半的$k$，我们只需要把$k$和$n-k$交换顺序，从而得到了

即为所得。如果$n$为偶数，只需要把$1/2$这一项单独拿出来分两段计算即可。

我们给出两个看上去很难计算的极限式。

如果用Stirling公式直接替换$n!$，这个极限的结果是显然的。

所以只需要求$(1+\frac{1}{n})^{n^2}e^{-n}$的极限即可。但是可千万别想当然地认为这个极限是$1$。如果我们利用Taylor展开，能得到

所以原极限为$\sqrt\frac{2\pi}{e}$

注意$n$项的分子相乘，有$\exp(n-1-\frac{1}{2}-\cdots-\frac{1}{n})$，而调和级数是发散的，我们想得到收敛，自然就要想到Euler常数$\gamma=\lim_{n\to\infty}\left(1+\frac{1}{2}+\cdots+\frac{1}{n}-\ln{n}\right)$。我们似乎也没有办法直接化简分母，我们知道$(1+1/k)^k$的极限是$e$，但是这里似乎用不上。所以不如先把分母展开化简一下。

所以原极限可以写成

这时候就可以直接使用Stirling公式了。

而$\lim_{n\to\infty}\left(1+\frac{1}{n}\right)^{-n}=e^{-1}$，$\lim_{n\to\infty}e^{\ln{n}-1-\frac{1}{2}-\frac{1}{3}-\cdots-\frac{1}{n}}=e^{-\gamma}$，我们得到原极限为$\frac{\sqrt{2\pi}}{e^{1+\gamma}}$

]]>

(Gleason-Kahane-Żelazko)If $\phi$ is a complex linear functional on a unitary Banach algebra $A$, such that $\phi(e)=1$ and $\phi(x) \neq 0$ for every invertible $x \in A$, thenNamely, $\phi$ is a complex homomorphism.

Suppose $A$ is a complex unitary Banach algebra and $\phi: A \to \mathbb{C}$ is a linear functional which is not identically $0$ (for convenience), and if

for all $x \in A$ and $y \in A$, then $\phi$ is called a *complex homomorphism* on $A$. Note that a unitary Banach algebra (with $e$ as multiplicative unit) is also a ring, so is $\mathbb{C}$, we may say in this case $\phi$ is a ring-homomorphism. For such $\phi$, we have an instant proposition:

Proposition 0$\phi(e)=1$ and $\phi(x) \neq 0$ for every invertible $x \in A$.

*Proof.* Since $\phi(e)=\phi(ee)=\phi(e)\phi(e)$, we have $\phi(e)=0$ or $\phi(e)=1$. If $\phi(e)=0$ however, for any $y \in A$, we have $\phi(y)=\phi(ye)=\phi(y)\phi(e)=0$, which is an excluded case. Hence $\phi(e)=1$.

For invertible $x \in A$, note that $\phi(xx^{-1})=\phi(x)\phi(x^{-1})=\phi(e)=1$. This can’t happen if $\phi(x)=0$. $\square$

The theorem reveals that Proposition $0$ actually characterizes the complex homomorphisms (ring-homomorphisms) among the linear functionals (group-homomorphisms).

This theorem was proved by Andrew M. Gleason in 1967 and later independently by J.-P. Kahane and W. Żelazko in 1968. Both of them worked mainly on commutative Banach algebras, and the non-commutative version, which focused on complex homomorphism, was by W. Żelazko. In this post we will follow the third one.

Unfortunately, one cannot find an educational proof on the Internet with ease, which may be the reason why I write this post and why you read this.

Following definitions of Banach algebra and some logic manipulation, we have several equivalences worth noting.

(Stated by Gleason)Let $M$ be a linear subspace of codimension one in a commutative Banach algebra $A$ having an identity. Suppose no element of $M$ is invertible, then $M$ is an ideal.

(Stated by Kahane and Żelazko)A subspace $X \subset A$ of codimension $1$ is a maximal ideal if and only if it consists of non-invertible elements.

(Stated by Kahane and Żelazko)Let $A$ be a commutative complex Banach algebra with unit element. Then a functional $f \in A^\ast$ is a multiplicative linear functional if and only if $f(x)=\sigma(x)$ holds for all $x \in A$.

Here $\sigma(x)$ denotes the spectrum of $x$.

Clearly any maximal ideal contains no invertible element (if so, then it contains $e$, then it’s the ring itself). So it suffices to show that it has codimension 1, and if it consists of non-invertible elements. Also note that every maximal ideal is the kernel of some complex homomorphism. For such a subspace $X \subset A$, since $e \notin X$, we may define $\phi$ so that $\phi(e)=1$, and $\phi(x) \in \sigma(x)$ for all $x \in A$. Note that $\phi(e)=1$ holds if and only if $\phi(x) \in \sigma(x)$. As we will show, $\phi$ has to be a complex homomorphism.

We leave the elementary proofs to the reader since the proof of these lemmas are off topic.

Lemma 0Suppose $A$ is a unitary Banach algebra, $x \in A$, $\lVert x \rVert<1$, then $e-x$ is invertible.

Lemma 1Suppose $f$ is an entire function of one complex variable, $f(0)=1$, $f’(0)=0$, andfor all complex $\lambda$, then $f(\lambda)=1$ for all $\lambda \in \mathbb{C}$.

Note that there is an entire function $g$ such that $f=\exp(g)$. It can be shown that $g=0$.

A mapping $\phi$ from one ring $R$ to another ring $R’$ is said to be a **Jordan homomorphism** from $R$ to $R’$ if

and

It’s of course clear that every homomorphism is Jordan. Note if $R’$ is not of characteristic $2$, the second identity is equivalent to

*To show the equivalence, one let $b=a$ in the first case and puts $a+b$ in place of $a$ in the second case.*

Since in this case $R=A$ and $R’=\mathbb{C}$, the latter of which is commutative, we also write

As we will show, the $\phi$ in the theorem is a Jordan homomorphism.

We will follow an unusual approach. By keep ‘downgrading’ the goal, one will see this algebraic problem be transformed into a pure analysis problem neatly.

To begin with, let $N$ be the kernel of $\phi$.

If $\phi$ is a complex homomorphism, it is immediate that $\phi$ is a Jordan homomorphism. Conversely, if $\phi$ is Jordan, we have

If $x\in N$, the right hand becomes $0$, and therefore

Consider the identity

Therefore

Since $x \in N$ and $yxy \in A$, we see $x(yxy)+(yxy)x \in N$. Therefore $\phi(xy-yx)=0$ and

if $x \in N$ and $y \in A$. Further we see

which implies that $N$ is an ideal. This may remind you of this classic diagram (we will not use it since it is additive though):

For $x,y \in A$, we have $x \in \phi(x)e+N$ and $y \in \phi(y)e+N$. As a result, $xy \in \phi(x)\phi(y)e+N$, and therefore

Again, if $\phi$ is Jordan, we have $\phi(x^2)=\phi(x)^2$ for all $x \in A$. Conversely, if $\phi(a^2)=0$ for all $a \in N$, we may write $x$ by

where $a \in N$ for all $x \in A$. Therefore

which also shows that $\phi$ is Jordan.

Fix $a \in N$, assume $\lVert a \rVert = 1$ without loss of generality, and define

for all complex $\lambda$. If this function is constant (lemma 1), we immediately have $f’’(0)=\phi(a^2)=0$. This is purely a complex analysis problem however.

Note in the definition of $f$, we have

So we expect the norm of $\phi$ to be finite, which ensures that $f$ is entire. By *reductio ad absurdum*, if $\lVert e-a \rVert < 1$ for $a \in N$, by lemma 0, we have $e-e+a=a$ to be invertible, which is impossible. Hence $\lVert e-a \rVert \geq 1$ for all $a \in N$. On the other hand, for $\lambda \in \mathbb{C}$, we have the following inequality:

Therefore $\phi$ is *continuous* with norm $1$. The continuity of $\phi$ is not assumed at the beginning.

For $f$ we have some immediate fact. Since each coefficient in the series of $f$ has finite norm, $f$ is entire with $f’(0)=\phi(a)=0$. Also, since $\phi$ has norm $1$, we also have

All we need in the end is to show that $f(\lambda) \neq 0$ for all $\lambda \in \mathbb{C}$.

The series

converges since $\lVert a \rVert=1$. The continuity of $\phi$ shows now

Note

Hence $E(\lambda)$ *is* invertible for all $\lambda \in C$, hence $f(\lambda)=\phi(E(\lambda)) \neq 0$. By lemma 1, $f(\lambda)=1$ is constant. The proof is completed by reversing the steps. $\square$

- Walter Rudin,
*Real and Complex Analysis* - Walter Rudin,
*Functional Analysis* - Andrew M. Gleason,
*A Characterization of Maximal Ideals* - J.-P. Kahane and W. Żelazko,
*A Characterization of Maximal Ideals in Commutative Banach Algebras* - W. Żelazko
*A Characterization of Multiplicative linear functionals in Complex Banach Algebras* - I. N. Herstein,
*Jordan Homomorphisms*

The Hahn-Banach theorem has been a central tool for functional analysis and therefore enjoys a wide variety, many of which have a numerous uses in other fields of mathematics. Therefore it’s not possible to cover all of them. In this post we are covering two ‘abstract enough’ results, which are sometimes called the dominated extension theorem. Both of them will be discussed in real vector space where topology is not endowed. This allows us to discuss any topological vector space.

Another interesting thing is, we will be using axiom of choice, or whatever equivalence you may like, for example Zorn’s lemma or well-ordering principle. Before everything, we need to examine more properties of vector spaces.

It’s obvious that every complex vector space is also a real vector space. Suppose $X$ is a complex vector space, and we shall give the definition of real-linear and complex-linear functionals.

An addictive functional $\Lambda$ on $X$ is called

real-linear(complex-linear) if $\Lambda(\alpha x)=\alpha\Lambda(x)$ for every $x \in X$ and for every real (complex) scalar $\alpha$.

For *-linear functionals, we have two important but easy theorems.

If $u$ is the real part of a complex-linear functional $f$ on $X$, then $u$ is real-linear and

*Proof.* For complex $f(x)=u(x)+iv(x)$, it suffices to denote $v(x)$ correctly. But

we see $\Im(f(x)=v(x)=-\Re(if(x))$. Therefore

but $\Re(f(ix))=u(ix)$, we get

To show that $u(x)$ is real-linear, note that

Therefore $u(x)+u(y)=u(x+y)$. Similar process can be applied to real scalar $\alpha$. $\square$

Conversely, we are able to generate a complex-linear functional by a real one.

If $u$ is a real-linear functional, then $f(x)=u(x)-iu(ix)$ is a complex-linear functional

*Proof.* Direct computation. $\square$

Suppose now $X$ is a complex topological vector space, we see a complex-linear functional on $X$ is continuous if and only if its real part is continuous. Every continuous real-linear $u: X \to \mathbb{R}$ is the real part of a unique complex-linear continuous functional $f$.

Sublinear functional is ‘almost’ linear but also ‘almost’ a norm. Explicitly, we say $p: X \to \mathbb{R}$ a sublinear functional when it satisfies

for all $t \geq 0$. As one can see, if $X$ is normable, then $p(x)=\lVert x \rVert$ is a sublinear functional. One should not be confused with semilinear functional, where inequality is not involved. Another thing worth noting is that $p$ is not restricted to be nonnegative.

A seminorm on a vector space $X$ is a real-valued function $p$ on $X$ such that

for all $x,y \in X$ and scalar $\alpha$.

Obviously a seminorm is also a sublinear functional. For the connection between norm and seminorm, one shall note that *$p$ is a norm if and only if it satisfies $p(x) \neq 0$ if $x \neq 0$.*

Are the results will be covered in this post. Generally speaking, we are able to extend a functional defined on a subspace to the whole space as long as it’s dominated by a sublinear functional. This is similar to the dominated convergence theorem, which states that if a convergent sequence of measurable functions are dominated by another function, then the convergence holds under the integral operator.

(Hahn-Banach)Suppose

- $M$ is a subspace of a real vector space $X$,
- $f: M \to \mathbb{R}$ is linear and $f(x) \leq p(x)$ on $M$ where $p$ is a sublinear functional on $X$
Then there exists a linear $\Lambda: X \to \mathbb{R}$ such that

for all $x \in M$ and

for all $x \in X$.

With that being said, if $f(x)$ is dominated by a sublinear functional, then we are able to extend this functional to the whole space with a relatively proper range.

*Proof.* If $M=X$ we have nothing to do. So suppose now $M$ is a nontrivial proper subspace of $X$. Choose $x_1 \in X-M$ and define

It’s easy to verify that $M_1$ satisfies all axioms of vector space (warning again: no topology is endowed). Now we will be using the properties of sublinear functionals.

Since

for all $x,y \in M$, we have

Let

By definition, we naturally get

and

Define $f_1$ on $M_1$ by

So when $x +tx_1 \in M$, we have $t=0$, and therefore $f_1=f$.

To show that $f_1 \leq p$ on $M_1$, note that for $t>0$, we have

which implies

Similarly,

and therefore

Hence $f_1 \leq p$.

It seems that we can never stop using step 1 to extend $M$ to a larger space, but we have to extend. (If $X$ is a finite dimensional space, then this is merely a linear algebra problem.) This meets exactly what William Timothy Gowers said in his blog post:

If you are building a mathematical object in stages and find that (i) you have not finished even after infinitely many stages, and (ii) there seems to be nothing to stop you continuing to build, then Zorn’s lemma may well be able to help you.

— How to use Zorn’s lemma

And we will show that, as W. T. Gowers said,

If the resulting partial order satisfies the chain condition and if a maximal element must be a structure of the kind one is trying to build, then the proof is complete.

To apply Zorn’s lemma, we need to construct a partially ordered set. Let $\mathscr{P}$ be the collection of all ordered pairs $(M’,f’)$ where $M’$ is a subspace of $X$ containing $M$ and $f’$ is a linear functional on $M’$ that extends $f$ and satisfies $f’ \leq p$ on $M’$. For example we have

The partial order $\leq$ is defined as follows. By $(M’,f’) \leq (M’’,f’’)$, we mean $M’ \subset M’’$ and $f’ = f’’$ on $M’$. Obviously this is a partial order (you should be able to check this).

Suppose now $\mathcal{F}$ is a chain (totally ordered subset of $\mathscr{P}$). We claim that $\mathcal{F}$ has an upper bound (which is required by Zorn’s lemma). Let

and

whenever $(M’,f’) \in \mathcal{F}$ and $y \in M’$. It’s easy to verify that $(M_0,f_0)$ is the upper bound we are looking for. But $\mathcal{F}$ is arbitrary, therefore by Zorn’s lemma, there exists a maximal element $(M^\ast,f^\ast)$ in $\mathscr{P}$. If $M^* \neq X$, according to step 1, we are able to extend $M^\ast$, which contradicts the maximality of $M^\ast$. And $\Lambda$ is defined to be $f^\ast$. By the linearity of $\Lambda$, we see

The theorem is proved. $\square$

This is a classic application of Zorn’s lemma (well-ordering principle, or Hausdorff maximality theorem). First, we showed that we are able to extend $M$ and $f$. But since we do not know the dimension or other properties of $X$, it’s not easy to control the extension which finally ‘converges’ to $(X,\Lambda)$. However, Zorn’s lemma saved us from this random exploration: Whatever happens, the maximal element is there, and take it to finish the proof.

Since inequality is appeared in the theorem above, we need more careful validation.

(Bohnenblust-Sobczyk-Soukhomlinoff)Suppose $M$ is a subspace of a vector space $X$, $p$ is a seminorm on $X$, and $f$ is a linear functional on $M$ such thatfor all $x \in M$. Then $f$ extends to a linear functional $\Lambda$ on $X$ satisfying

for all $x \in X$.

*Proof.* If the scalar field is $\mathbb{R}$, then we are done, since $p(-x)=p(x)$ in this case (can you see why?). So we assume the scalar field is $\mathbb{C}$.

Put $u = \Re f$. By dominated extension theorem, there is some real-linear functional $U$ such that $U(x)=u$ on $M$ and $U \leq p$ on $X$. And here we have

where $\Lambda(x)=f(x)$ on $M$.

To show that $|\Lambda(x)| \leq p(x)$ for $x \neq 0$, by taking $\alpha=\frac{|\Lambda(x)|}{\Lambda(x)}$, we have

since $|\alpha|=1$ and $p(\alpha{x})=|\alpha|p(x)=p(x)$. $\square$

To end this post, we state a beautiful and useful extension of the Hahn-Banach theorem, which is done by R. P. Agnew and A. P. Morse.

(Agnew-Morse)Let $X$ denote a real vector space and $\mathcal{A}$ be a collection of linear maps $A_\alpha: X \to X$ that commute, or namelyfor all $A_\alpha,A_\beta \in \mathcal{A}$. Let $p$ be a sublinear functional such that

for all $A_\alpha \in \mathcal{A}$. Let $Y$ be a subspace of $X$ on which a linear functional $f$ is defined such that

- $f(y) \leq p(y)$ for all $y \in Y$.
- For each mapping $A$ and $y \in Y$, we have $Ay \in Y$.
- Under the hypothesis of 2, we have $f(Ay)=f(y)$.
Then $f$ can be extended to $X$ by $\Lambda$ so that $-p(-x) \leq \Lambda(x) \leq p(x)$ for all $x \in X$, and

To prove this theorem, we need to construct a sublinear functional that dominates $f$. For the whole proof, see *Functional Analysis* by Peter Lax.

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it’s time to make a list of the series. It’s been around half a year.

- The Big Three Pt. 1 - Baire Category Theorem Explained
- The Big Three Pt. 2 - The Banach-Steinhaus Theorem
- The Big Three Pt. 3 - The Open Mapping Theorem (Banach Space)
- The Big Three Pt. 4 - The Open Mapping Theorem (F-Space)
- The Big Three Pt. 5 - The Hahn-Banach Theorem (Dominated Extension)
- The Big Three Pt. 6 - Closed Graph Theorem with Applications

- Walter Rudin,
*Functional Analysis*. - Peter Lax,
*Functional Analysis*. - William Timothy Gowers,
*How to use Zorn’s lemma*.

*(This section is intended to introduce the background. Feel free to skip if you already know exterior differentiation.)*

There are several useful tools for vector calculus on $\mathbb{R}^3,$ namely gradient, curl, and divergence. It is possible to treat the gradient of a differentiable function $f$ on $\mathbb{R}^3$ at a point $x_0$ as the Fréchet derivative at $x_0$. But it does not work for curl and divergence at all. Fortunately there is another abstraction that works for all of them. It comes from differential forms.

Let $x_1,\cdots,x_n$ be the linear coordinates on $\mathbb{R}^n$ as usual. We define an *algebra* $\Omega^{\ast}$ over $\mathbb{R}$ generated by $dx_1,\cdots,dx_n$ with the following relations:

This is a vector space as well, and it’s easy to derive that it has a basis by

where $i<j<k$. The $C^{\infty}$ differential *forms* on $\mathbb{R}^n$ are defined to be the tensor product

As is can be shown, for $\omega \in \Omega^{\ast}(\mathbb{R}^n)$, we have a unique representation by

and in this case we also say $\omega$ is a $C^{\infty}$ $k$-form on $\mathbb{R}^n$ (for simplicity we also write $\omega=\sum f_Idx_I$). The algebra of all $k$-forms will be denoted by $\Omega^k(\mathbb{R}^n)$. And naturally we have $\Omega^{\ast}(\mathbb{R}^n)$ to be graded since

But if we have $\omega \in \Omega^0(\mathbb{R}^n)$, we see $\omega$ is merely a $C^{\infty}$ function. As taught in multivariable calculus course, for the differential of $\omega$ we have

and it turns out that $d\omega\in\Omega^{1}(\mathbb{R}^n)$. This inspires us to obtain a generalization onto the differential operator $d$:

and $d\omega$ is defined as follows. The case when $k=0$ is defined as usual (just the one above). For $k>0$ and $\omega=\sum f_I dx_I,$ $d\omega$ is defined ‘inductively’ by

This $d$ is the so-called *exterior differentiation*, which serves as the ultimate abstract extension of gradient, curl, divergence, etc. If we restrict ourself to $\mathbb{R}^3$, we see these vector calculus tools comes up in the nature of things.

**Functions**

**$1$-forms**

**$2$-forms**

The calculation is tedious but a nice exercise to understand the definition of $d$ and $\Omega^{\ast}$.

By elementary computation we are also able to show that $d^2\omega=0$ for all $\omega \in \Omega^{\ast}(\mathbb{R}^n)$ (*Hint: $\frac{\partial^2 f}{\partial x_i \partial x_j}=\frac{\partial^2 f}{\partial x_j \partial x_i}$ but $dx_idx_j=-dx_idx_j$)*. Now we consider a vector field $\overrightarrow{v}=(v_1,v_2)$ of dimension $2$. If $C$ is an arbitrary simply closed smooth curve in $\mathbb{R}^2$, then we expect

to be $0$. If this happens (note the arbitrary of $C$), we say $\overrightarrow{v}$ to be a conservative field (path independent).

So when conservative? It happens when there is a function $f$ such that

This is equivalent to say that

If we use $C^{\ast}$ to denote the area enclosed by $C$, by Green’s theorem, we have

If you translate what you’ve learned in multivariable calculus course (path independence) into the language of differential form, you will see that the set of all conservative fields is precisely the *image* of $d_0:\Omega^0(\mathbb{R}^2) \to \Omega^1(\mathbb{R}^2)$. Also, they are in the *kernel* of the next $d_1:\Omega^1(\mathbb{R}^2) \to \Omega^2(\mathbb{R}^2)$. These $d$’s are naturally homomorphism, so it’s natural to discuss the *factor group*. But before that, we need some terminologies.

The complex $\Omega^{\ast}(\mathbb{R}^n)$ together with $d$ is called the *de Rham complex* on $\mathbb{R}^n$. Now consider the sequence

We say $\omega \in \Omega^k(\mathbb{R}^n)$ is *closed* if $d_k\omega=0$, or equivalently, $\omega \in \ker d_k$. Dually, we say $\omega$ is *exact* if there exists some $\mu \in \Omega^{k-1}(\mathbb{R}^n)$ such that $d\mu=\omega$, that is, $\omega \in \operatorname{im}d_{k-1}$. Of course all $d_k$’s can be written as $d$ but the index makes it easier to understand. Instead of doing integration or differentiation, which is ‘uninteresting’, we are going to discuss the abstract structure of it.

The $k$-th *de Rham cohomology* in $\mathbb{R}^n$ is defined to be the factor space

As an example, note that by the fundamental theorem of calculus, every $1$-form is exact, therefore $H_{DR}^1(\mathbb{R})=0$.

Since de Rham complex is a special case of *differential complex*, and other restrictions of de Rham complex plays no critical role thereafter, we are going discuss the algebraic structure of differential complex directly.

We are going to show that, there exists a long exact sequence of cohomology groups after a short exact sequence is defined. For the convenience let’s recall here some basic definitions

A sequence of vector spaces (or groups)

is said to be *exact* if the image of $f_{k-1}$ is the kernel of $f_k$ for all $k$. Sometimes we need to discuss a extremely short one by

As one can see, $f$ is injective and $g$ is surjective.

A direct sum of vector spaces $C=\oplus_{k \in \mathbb{Z}}C^k$ is called a *differential complex* if there are homomorphisms by

such that $d_{k-1}d_k=0$. Sometimes we write $d$ instead of $d_{k}$ since this *differential operator* of $C$ is universal. Therefore we may also say that $d^2=0$. The cohomology of $C$ is the direct sum of vector spaces $H(C)=\oplus_{k \in \mathbb{Z}}H^k(C) $ where

A map $f: A \to B$ where $A$ and $B$ are differential complexes, is called a *chain map* if we have $fd_A=d_Bf$.

Now consider a short exact sequence of differential complexes

where both $f$ and $g$ are chain maps (this is important). Then there exists a long exact sequence by

Here, $f^{\ast}$ and $g^{\ast}$ are the naturally induced maps. For $c \in C^q$, $d^{\ast}[c]$ is defined to be the cohomology class $[a]$ where $a \in A^{q+1}$, and that $f(a)=db$, and that $g(b)=c$. The sequence can be described using the two-layer commutative diagram below.

The long exact sequence is actually the purple one (you see why people may call this zig-zag lemma). This sequence is ‘based on’ the blue diagram, which can be considered naturally as an expansion of the short exact sequence. The method that will be used in the following proof is called diagram-chasing, whose importance has already been described by Professor James Munkres: *master* this. We will be *abusing* the properties of almost every homomorphism and group appeared in this commutative diagram to trace the elements.

First, we give a precise definition of $d^{\ast}$. For a closed $c \in C^q$, by the surjectivity of $g$ (note this sequence is exact), there exists some $b \in B^q$ such that $g(b)=c$. But $g(db)=d(g(b))=dc=0$, we see for $db \in B^{q+1}$ we have $db \in \ker g$. By the exactness of the sequence, we see $db \in \operatorname{im}{f}$, that is, there exists some $a \in A^{q+1}$ such that $f(a)=db$. Further, $a$ is closed since

and we already know that $f$ has trivial kernel (which contains $da$).

$d^{\ast}$ is therefore defined by

where $[\cdot]$ means “the homology class of”.

But it is expected that $d^{\ast}$ is a well-defined homomorphism. Let $c_q$ and $c_q’$ be two closed forms in $C^q$. To show $d^{\ast}$ is well-defined, we suppose $[c_q]=[c_q’]$ (i.e. they are homologous). Choose $b_q$ and $b_q’$ so that $g(b_q)=c_q$ and $g(b_q’)=c_q’$. Accordingly, we also pick $a_{q+1}$ and $a_{q+1}’$ such that $f(a_{q+1})=db_q$ and $f(a_{q+1}’)=db_q’$. By definition of $d^{\ast}$, we need to show that $[a_{q+1}]=[a_{q+1}’]$.

Recall the properties of factor group. $[c_q]=[c_q’]$ if and only if $c_q-c_q’ \in \operatorname{im}d$. Therefore we can pick some $c_{q-1} \in C^{q-1}$ such that $c_q-c_q’=dc_{q-1}$. Again, by the surjectivity of $g$, there is some $b_{q-1}$ such that $g(b_{q-1})=c_{q-1}$.

Note that

Therefore $b_q-b_q’-db_{q-1} \in \operatorname{im} f$. We are able to pick some $a_q \in A^{q}$ such that $f(a_q)=b_q-b_q’-db_{q-1}$. But now we have

Since $f$ is injective, we have $da_q=a_{q+1}-a_{q+1}’$, which implies that $a_{q+1}-a_{q+1}’ \in \operatorname{im}d$. Hence $[a_{q+1}]=[a_{q+1}’]$.

To show that $d^{\ast}$ is a homomorphism, note that $g(b_q+b_q’)=c_q+c_q’$ and $f(a_{q+1}+a_{q+1}’)=d(b_q+b_q’)$. Thus we have

The latter equals $[a_{q+1}]+[a_{q+1}’]$ since the canonical map is a homomorphism. Therefore we have

Therefore the long sequence exists. It remains to prove exactness. Firstly we need to prove exactness at $H^q(B)$. Pick $[b] \in H^q(B)$. If there is some $a \in A^q$ such that $f(a)=b$, then $g(f(a))=0$. Therefore $g^{\ast}[b]=g^{\ast}[f(a)]=[g(f(a))]=[0]$; hence $\operatorname{im}f \subset \ker g$.

Conversely, suppose now $g^{\ast}[b]=[0]$, we shall show that there exists some $[a] \in H^q(A)$ such that $f^{\ast}[a]=[b]$. Note $g^{\ast}[b]=\operatorname{im}d$ where $d$ is the differential operator of $C$ (why?). Therefore there exists some $c_{q-1} \in C^{q-1}$ such that $g(b)=dc_{q-1}$. Pick some $b_{q-1}$ such that $g(b_{q-1})=c_{q-1}$. Then we have

Therefore $f(a)=b-db_{q-1}$ for some $a \in A^q$. Note $a$ is closed since

and $f$ is injective. $db=0$ since we have

Furthermore,

Therefore $\ker g^{\ast} \subset \operatorname{im} f$ as desired.

Now we prove exactness at $H^q(C)$. (Notation:) pick $[c_q] \in H^q(C)$, there exists some $b_q$ such that $g(b_q)=c_q$; choose $a_{q+1}$ such that $f(a_{q+1})=db_q$. Then $d^{\ast}[c_q]=[a_{q+1}]$ by definition.

If $[c_q] \in \operatorname{im}g^{\ast}$, we see $[c_q]=[g(b_q)]=g^{\ast}[b_q]$. But $b_q$ is closed since $[b_q] \in H^q(B)$, we see $f(a_{q+1})=db_q=0$, therefore $d^{\ast}[c_q]=[a_{q+1}]=[0]$ since $f$ is injective. Therefore $\operatorname{im}g^{\ast} \subset \ker d^{\ast}$.

Conversely, suppose $d^{\ast}[c^q]=[0]$. By definition of $H^{q+1}(A)$, there is some $a_q \in A$ such that $da_q = a_{q+1}$ (can you see why?). We claim that $b_q-f(a_q)$ is closed and we have $[c_q]=g^{\ast}[b_q-f(a_q)]$.

By direct computation,

Meanwhile

Therefore $\ker d^{\ast} \subset \operatorname{im}g^{\ast}$. Note that $g(f(a_q))=0$ by exactness.

Finally, we prove exactness at $H^{q+1}(A)$. Pick $\alpha \in H^{q+1}(A)$. If $\alpha \in \operatorname{im}d^{\ast}$, then $\alpha=[a_{q+1}]$ where $f(a_{q+1})=db_q$ by definition. Then

Therefore $\alpha \in \ker f^{\ast}$. Conversely, if we have $f^{\ast}(\alpha)=[0]$, pick the representative element of $\alpha$, namely we write $\alpha=[a]$; then $[f(a)]=[0]$. But this implies that $f(a) \in \operatorname{im}d$ where $d$ denotes the differential operator of $B$. There exists some $b_{q+1} \in B^{q+1}$ and $b_q \in B^q$ such that $db_{q}=b_{q+1}$. Suppose now $c_q=g(b_q)$. $c_q$ is closed since $dc_q=g(db_q)=g(b_{q+1})=g(f(a))=0$. By definition, $\alpha=d^{\ast}[c_q]$. Therefore $\ker f^{\ast} \subset \operatorname{im}d^{\ast}$.

As you may see, almost every property of the diagram has been used. The exactness at $B^q$ ensures that $g(f(a))=0$. The definition of $H^q(A)$ ensures that we can simplify the meaning of $[0]$. We even use the injectivity of $f$ and the surjectivity of $g$.

This proof is also a demonstration of diagram-chasing technique. As you have seen, we keep running through the diagram to ensure that there is “someone waiting” at the destination.

This long exact group is useful. Here is an example.

By differential forms on a open set $U \subset \mathbb{R}^n$, we mean

And the de Rham cohomology of $U$ comes up in the nature of things.

We are able to compute the cohomology of the union of two open sets. Suppose $M=U \cup V$ is a manifold with $U$ and $V$ open, and $U \amalg V$ is the disjoint union of $U$ and $V$ (the coproduct in the category of sets). $\partial_0$ and $\partial_1$ are inclusions of $U \cap V$ in $U$ and $V$ respectively. We have a natural sequence of inclusions

Since $\Omega^{*}$ can also be treated as a contravariant functor from the category of Euclidean spaces with smooth maps to the category of commutative differential graded algebras and their homomorphisms, we have

By taking the difference of the last two maps, we have

The sequence above is a short exact sequence. Therefore we may use the zig-zag lemma to find a long exact sequence (which is also called the Mayer-Vietoris sequence) by

This sequence allows one to compute the cohomology of two union of two open sets. For example, for $H^{*}_{DR}(\mathbb{R}^2-P-Q)$, where $P(x_p,y_p)$ and $Q(x_q,y_q)$ are two distinct points in $\mathbb{R}^2$, we may write

and

Therefore we may write $M=\mathbb{R}^2$, $U=\mathbb{R}^2-P$ and $V=\mathbb{R}^2-Q$. For $U$ and $V$, we have another decomposition by

where

But

is a four-time (homeomorphic) copy of $\mathbb{R}^2$. So things become clear after we compute $H^{\ast}_{DR}(\mathbb{R}^2)$.

- Raoul Bott, Loring W. Tu,
*Differential Forms in Algebraic Topology* - Munkres J. R.,
*Elements of Algebraic Topology* - Micheal Spivak,
*Calculus on Manifolds* - Serge Lang,
*Algebra*

We are going to evaluate the Fourier transform of $\frac{\sin{x}}{x}$ and $\left(\frac{\sin{x}}{x}\right)^2$. And it turns out to be a comprehensive application of many elementary theorems of single complex variable functions. Thus it is recommended to make sure that you can evaluate and understand all the identities in this post by yourself. Also, make sure that you can recall all words in *italics*.

For real $t$, find the limit by

We will do this using contour integration. Since the complex function $f(z)=\frac{\sin{z}}{z}e^{itz}$ is *entire*, by *Cauchy’s theorem*, its integral over $[-A,A]$ is equal to the one over the path $\Gamma_A$ by going from $-A$ to $-1$ along the real axis, from $-1$ to $1$ along the lower half of the unit circle, and from $1$ to $A$ along the real axis (why?). Since the path $\Gamma_A$ avoids the origin, we may use the identity

Replacing $\sin{z}$ with $\frac{1}{2i}(e^{itz}-e^{-itz})$, we get

If we put $\varphi_A(t)=\int_{\Gamma_A}\frac{1}{2iz}e^{i(t+1)z}dz$, we see $I_A(t)=\varphi_A(t+1)-\varphi_A(t-1)$. It is convenient to divide $\varphi_A$ by $\pi$ since we therefore get

and we are cool with the divisor $2\pi i$.

Now, finish the path $\Gamma_A$ in two ways. First, by the semicircle from $A$ to $-Ai$ to $-A$; second, by the semicircle from $A$ to $Ai$ to $-A$, which finishes a circle with radius $A$ actually. For simplicity we denote the two paths by $\Gamma_U$ and $\Gamma_L$ Again by the Cauchy theorem, the first case gives us a integral with value $0$, thus by Cauchy’s theorem,

Notice that

we see, if $t\sin\theta>0$, we have $|\exp(iAte^{i\theta})| \to 0$ as $A \to \infty$. When $-\pi < \theta <0$ in this case, we have $\sin\theta<0$. Therefore we get

(You should be able to prove the convergence above.) Also trivially

But what if $t>0$? Indeed, it would be difficult to obtain the limit using the integral over $[-\pi,0]$. But we have another path, namely the upper one.

Note that $\frac{e^{itz}}{z}$ is a *meromorphic function* in $\mathbb{C}$ with a pole at $0$. For such a function we have

which implies that the residue at $0$ is $1$. By the *residue theorem*,

Note that we have used the *change-of-variable* formula as we did for the upper one. $\operatorname{Ind}_{\Gamma_L}(0)$ means the *winding number* of $\Gamma_L$ around $0$, which is $1$ of course. The identity above implies

Thus if $t>0$, since $\sin\theta>0$ when $0<\theta<\pi$, we get

But as already shown, $I_A(t)=\varphi_A(t+1)-\varphi_A(t-1)$, thus to conclude,

Since $\psi(x)=\left(\frac{\sin{x}}{x}\right)$ is even, by dividing $I_A$ by $\sqrt{\frac{1}{2\pi}}$, we actually obtain the *Fourier transform* of it by abuse of language. Therefore we also get

Note that $\hat\psi(t)$ is not continuous, let alone uniformly continuous. ‘Therefore’, $\psi(x) \notin L^1$ since if $f \in L^1$, then $\hat{f}$ is *uniformly continuous* (proof). Another interesting fact is, this also implies the value of the Dirichlet integral since we have

We end this section by evaluating the inverse of $\hat\psi(t)$. This requires a simple calculation.

For real $t$, compute

Now since $h(x)=\frac{\sin^2{x}}{x^2} \in L^1$, we are able to say with ease that the integral above is the Fourier transform of $h(x)$. But still we will be using the limit form by

where

And we are still using the contour integration as above (we are still using $\Gamma_A$, $\Gamma_U$ and $\Gamma_L$). For this we get

Therefore it suffices to discuss the function

since we have

Dividing $\mu_A(z)$ by $\frac{1}{\pi i}$, we see

Integrate $\frac{e^{itz}}{z^2}$ over $\Gamma_L$, we see

Since we still have

if $t<0$ in this case, we see $\frac{1}{\pi i}\mu_A(z) \to 0$ as $A \to \infty$. For $t>0$, integrating over $\Gamma_U$, we have

We can also evaluate $\mu_A(0)$ by computing the integral but we are not doing that. To conclude, we have

Therefore for $J_A$ we have

Now you may ask, how did you find the value at $0$, $2$ or $-2$? $\mu_A(0)$ is not evaluated. But $h(t) \in L^1$, we see $\hat{h}(t)=\sqrt{\frac{1}{2\pi}}J(t)$ is uniformly continuous, thus continuous, and the values at these points follows from continuity.

Again, we get the value of a classic improper integral by

And this time it’s not hard to find the Fourier inverse:

Thereafter you are able to evaluate the improper integral of $\left(\frac{\sin{x}}{x}\right)^n$. Using *Fubini’s* or *Tonelli’s* theorem will be almost infeasible. But using the contour integral as such will force you deal with $n$ binomial coefficients, which might be tedious still. It’s even possible to discuss the convergence of the sequence $(I_n)$ where

Is intended to establish the existence of the Lebesgue measure in the future, which is often denoted by $m$. In fact, the Lebesgue measure follows as a special case of R-M-K representation theorem. You may not believe it, but euclidean properties of $\mathbb{R}^k$ plays no role in the existence of $m$. The only topological property that works is the fact that $\mathbb{R}^k$ is a locally compact Hausdorff space.

The theorem is named after F. Riesz who introduced it for continuous functions on $[0,1]$ (with respect to Riemann-Steiltjes integral). Years later, after the generalization done by A. Markov and S. Kakutani, we are able to view it in a locally compact Hausdorff space.

You may find there are some over-generalized properties, but this is intended to have you being able to enjoy more alongside (there are some tools related to differential geometry). Also there are many topology and analysis tricks worth your attention.

Again, euclidean topology plays no role in this proof. We need to specify the topology for different reasons. This is similar to what we do in linear functional analysis. Throughout, let $X$ be a topological space.

**0.0 Definition.** $X$ is a *Hausdorff space* if the following is true: If $p \in X$, $q\in X$ but $p \neq q$, then there are two **disjoint** open sets $U$ and $V$ such that $p \in U$ and $q \in V$.

**0.1 Definition.** $X$ is *locally compact* if every point of $X$ has a neighborhood whose closure is compact.

**0.2 Remarks.** A Hausdorff space is also called a $T_2$ space (see Kolmogorov classification) or a separated space. There is a classic example of locally compact Hausdorff space: $\mathbb{R}^n$. It is trivial to verify this. But this is far from being enough. In the future we will see, we can construct some ridiculous but mathematically valid measures.

**0.3 Definition.** A set $E \subset X$ is called *$\sigma$-compact* if $E$ is a countable union of compact sets. Note that every open subset in a euclidean space $\mathbb{R}^n$ is $\sigma$-compact since it can always be a countable union of closed balls (which is compact).

**0.4 Definition.** A covering of $X$ is *locally finite* if every point has a neighborhood which intersects only finitely many elements of the covering. Of course, if the covering is already finite, it’s also locally finite.

**0.5 Definition.** A *refinement* of a covering of $X$ is a second covering, each element of which is contained in an element of the first covering.

**0.6 Definition.** $X$ is *paracompact* if it is Hausdorff, and every open covering has a locally finite open refinement. Obviously any compact space is paracompact.

**0.7 Theorem.** If $X$ is a second countable Hausdorff space and is locally compact, then $X$ is paracompact. For proof, see this [Theorem 2.6].

**0.8 Theorem.** If $X$ is locally compact and sigma compact, then $X=\bigcup_{i=1}^{\infty}K_i$ where for all $i \in \mathbb{N}$, $K_i$ is compact and $K_i \subset\operatorname{int}K_{i+1}$.

The basic technical tool in the theory of differential manifolds is the existence of a partition of unity. We will steal this tool for the application of analysis theory.

**1.0 Definition.** A **partition of unity** on $X$ is a collection $(g_i)$ of continuous real valued functions on $X$ such that

- $g_i \geq 0$ for each $i$.
- every $x \in X$ has a neighborhood $U$ such that $U \cap \operatorname{supp}(g_i)=\varnothing$ for all but finitely many of $g_i$.
- for each $x \in X$, we have $\sum_{i}g_i(x)=1$. (That’s why you see the word ‘unity’.)

**1.1 Definition.** A partition of unity $(g_i)$ on $X$ is *subordinate* to an open cover of $X$ if and only if for each $g_i$ there is an element $U$ of the cover such that $\operatorname{supp}(g_i) \subset U$. We say $X$ *admits* partitions of unity if and only if for every open cover of $X$, there exists a partition of unity subordinate to the cover.

**1.2 Theorem.** A Hausdorff space admits a partition of unity if and only if it is paracompact (the ‘only if’ part is by considering the definition of partition of unity. For the ‘if’ part, see here). As a corollary, we have:

**1.3 Corollary.** Suppose $V_1,\cdots,V_n$ are open subsets of a locally compact Hausdorff space $X$, $K$ is compact, and

Then there exists a partition of unity $(h_i)$ that is subordinate to the cover $(V_n)$ such that $\operatorname{supp}(h_i) \subset V_i$ and $\sum_{i=1}^{n}h_i=1$ for all $x \in K$.

**2.0 Notation.** The notation

will mean that $K$ is a compact subset of $X$, that $f \in C_c(X)$, that $f(X) \subset [0,1]$, and that $f(x)=1$ for all $x \in K$. The notation

will mean that $V$ is open, that $f \in C_c(X)$, that $f(X) \subset [0,1]$ and that $\operatorname{supp}(f) \subset V$. If both hold, we write

**2.1 Remarks.** Clearly, with this notation, we are able to simplify the statement of being subordinate. We merely need to write $g_i \prec U$ in 1.1 instead of $\operatorname{supp}(g_i) \subset U$.

**2.2 Urysohn’s Lemma for locally compact Hausdorff space.** Suppose $X$ is locally compact and Hausdorff, $V$ is open in $X$ and $K \subset V$ is a compact set. Then there exists an $f \in C_c(X)$ such that

**2.3 Remarks.** By $f \in C_c(X)$ we shall mean $f$ is a continuous function with a compact support. This relation also says that $\chi_K \leq f \leq \chi_V$. For more details and the proof, visit this page. This lemma is generally for normal space, for a proof on that level, see arXiv:1910.10381. (Question: why we consider two disjoint closed subsets thereafter?)

We will be using the $\varepsilon$-definitions of $\sup$ and $\inf$, which will makes the proof easier in this case, but if you don’t know it would be troublesome. So we need to put it down here.

Let $S$ be a nonempty subset of the real numbers that is bounded below. The lower bound $w$ is to be the infimum of $S$ if and only if for any $\varepsilon>0$, there exists an element $x_\varepsilon \in S$ such that $x_\varepsilon<w+\varepsilon$.

This definition of $\inf$ is equivalent to the if-then definition by

Let $S$ be a set that is bounded below. We say $w=\inf S$ when $w$ satisfies the following condition.

- $w$ is a lower bound of $S$.
- If $t$ is also a lower bound of $S$, then $t \leq s$.

We have the analogous definition for $\sup$.

Analysis is full of vector spaces and linear transformations. We already know that the Lebesgue integral induces a linear functional. That is, for example, $L^1([0,1])$ is a vector space, and we have a linear functional by

But what about the reverse? Given a linear functional, is it guaranteed that we have a measure to establish the integral? The R-M-K theorem answers this question affirmatively. The functional to be discussed is *positive*, which means that if $\Lambda$ is positive and $f(X) \subset [0,\infty)$, then $\Lambda{f} \in [0,\infty)$.

Let $X$ be a locally compact Hausdorff space, and let $\Lambda$ be a positive linear functional.) on $C_c(X)$. Then there exists a $\sigma$-algebra $\mathfrak{M}$ on $X$ which contains all Borel sets in $X$, and there exists a unique positive measure $\mu$ on $\mathfrak{M}$ which represents $\Lambda$ in the sense that

for all $f \in C_c(X)$.

For the measure $\mu$ and the $\sigma$-algebra $\mathfrak{M}$, we have four assertions:

- $\mu(K)<\infty$ for every compact set $K \subset X$.
- For every $E \in \mathfrak{M}$, we have

- For every open set $E$ and every $E \in \mathfrak{M}$, we have

- If $E \in \mathfrak{M}$, $A \subset E$, and $\mu(E)=0$, then $A \in \mathfrak{M}$.

**Remarks before proof.** It would be great if we can establish the Lebesgue measure $m$ by putting $X=\mathbb{R}^n$. But we need a little more extra work to get this result naturally. If 2 is satisfied, we say $\mu$ is *outer* regular, and *inner* regular for 3. If both hold, we say $\mu$ is *regular*. The partition of unity and Urysohn’s lemma will be heavily used in the proof of the main theorem, so make sure you have no problem with it.

The proof is rather long so we will split it into several steps. I will try my best to make every line clear enough.

For every open set $V \in X$, define

If $V_1 \subset V_2$ and both are open, we claim that $\mu(V_1) \leq \mu(V_2)$. For $f \prec V_1$, since $\operatorname{supp}f \subset V_1 \subset V_2$, we see $f \prec V_2$. But we are able to find some $g \prec V_2$ such that $g \geq f$, or more precisely, $\operatorname{supp}(g) \supset \operatorname{supp}(f)$. By taking another look at the proof of Urysohn’s lemma for locally compact Hausdorff space, we see there is an open set G with compact closure such that

By Urysohn’s lemma to the pair $(\overline{G},V_2)$, we see there exists a function $g \in C_c(X)$ such that

Therefore

Thus for any $f \prec V_1$ and $g \prec V_2$, we have $\Lambda{g} \geq \Lambda{f}$ (monotonic) since $\Lambda{g}-\Lambda{f}=\Lambda{(g-f)}\geq 0$. By taking the supremum over $f$ and $g$, we see

The ‘monotonic’ property of such $\mu$ enables us to *define* $\mu(E)$ for all $E \subset X$ by

The definition above is trivial to valid for open sets. Sometimes people say $\mu$ is the outer measure. We will discuss other kind of sets thoroughly in the following steps. Warning: we are not saying that $\mathfrak{M} = 2^X$. The crucial property of $\mu$, namely countable additivity, will be proved only on a certain $\sigma$-algebra.

It follows from the definition of $\mu$ that if $E_1 \subset E_2$, then $\mu(E_1) \leq \mu(E_2)$.

Let $\mathfrak{M}_F$ be the class of all $E \subset X$ which satisfy the two following conditions:

$\mu(E) <\infty$.

‘Inner regular’:

One may say here $\mu$ is the ‘inner measure’. Finally, let $\mathfrak{M}$ be the class of all $E \subset X$ such that for every compact $K$, we have $E \cap K \in \mathfrak{M}_F$. We shall show that $\mathfrak{M}$ is the desired $\sigma$-algebra.

**Remarks of Step 0.** So far, we have only proved that $\mu(E) \geq 0$ for all $E {\color\red{\subset}}X$. What about the countable additivity? It’s clear that $\mathfrak{M}_F$ and $\mathfrak{M}$ has some strong relation. We need to get a clearer view of it. Also, if we restrict $\mu$ to $\mathfrak{M}_F$, we restrict ourself to finite numbers. In fact, we will show finally $\mathfrak{M}_F \subset \mathfrak{M}$.

If $K$ is compact, then $K \in \mathfrak{M}_F$, and

Define $V_\alpha=f^{-1}(\alpha,1]$ for $K \prec f$ and $0 < \alpha < 1$. Since $f(x)=1$ for all $x \in K$, we have $K \subset V_{\alpha}$. Therefore by definition of $\mu$ for all $E \subset X$, we have

Note that $f \geq \alpha{g}$ whenever $g \prec V_{\alpha}$ since $\alpha{g} \leq \alpha < f$. Since $\mu(K)$ is an lower bound of $\frac{1}{\alpha}\Lambda{f}$ with $0<\alpha<1$, we see

Since $f(X) \in [0,1]$, we have $\Lambda{f}$ to be finite. Namely $\mu(K) <\infty$. Since $K$ itself is compact, we see $K \in \mathfrak{M}_F$.

To prove the identity, note that there exists some $V \supset K$ such that $\mu(V)<\mu(K)+\varepsilon$ for some $\varepsilon>0$. By Urysohn’s lemma, there exists some $h \in C_c(X)$ such that $K \prec h \prec V$. Therefore

Therefore $\mu(K)$ is the infimum of $\Lambda{h}$ with $K \prec h$.

**Remarks of Step 1.** We have just proved assertion 1 of the property of $\mu$. The hardest part of this proof is the inequality

But this is merely the $\varepsilon$-definition of $\inf$. Note that $\mu(K)$ is the infimum of $\mu(V)$ with $V \supset K$. For any $\varepsilon>0$, there exists some open $V$ for what? Under certain conditions, this definition is much easier to use. Now we will examine the relation between $\mathfrak{M}_F$ and $\tau_X$, namely the topology of $X$.

$\mathfrak{M}_F$ contains every open set $V$ with $\mu(V)<\infty$.

It suffices to show that for open set $V$, we have

For $0<\varepsilon<\mu(V)$, we see there exists an $f \prec V$ such that $\Lambda{f}>\mu(V)-\varepsilon$. If $W$ is any open set which contains $K= \operatorname{supp}(f)$, then $f \prec W$, and therefore $\Lambda{f} \leq \mu(W)$. Again by definition of $\mu(K)$, we see

Therefore

This is exactly the definition of $\sup$. The identity is proved.

**Remarks of Step 2.** It’s important to that this identity can only be satisfied by open sets and sets $E$ with $\mu(E)<\infty$, the latter of which will be proved in the following steps. This is the *flaw* of this theorem. With these preparations however, we are able to show the countable additivity of $\mu$ on $\mathfrak{M}_F$.

If $E_1,E_2,E_3,\cdots$ are arbitrary subsets of $X$, then

First we show this holds for finitely many open sets. This is tantamount to show that

if $V_1$ and $V_2$ are open. Pick $g \prec V_1 \cup V_2$. This is possible due to Urysohn’s lemma. By corollary 1.3, there is a partition of unity $(h_1,h_2)$ subordinate to $(V_1,V_2)$ in the sense of corollary 1.3. Therefore,

Notice that $h_1g \prec V_1$ and $h_2g \prec V_2$. By taking the supremum, we have

Now we back to arbitrary subsets of $X$. If $\mu(E_i)=\infty$ for some $i$, then there is nothing to prove. Therefore we shall assume that $\mu(E_i)<\infty$ for all $i$. By definition of $\mu(E_i)$, we see there are open sets $V_i \supset E_i$ such that

Put $V=\bigcup_{i=1}^{\infty}V_i$, and choose $f \prec V_i$. Since $f \in C_c(X)$, there is a finite collection of $V_i$ that covers the support of $f$. Therefore without loss of generality, we may say that

for some $n$. We therefore obtain

for all $f \prec V$. Since $\bigcup E_i \subset V$, we have $\mu(\bigcup E_i) \leq \mu(V)$. Therefore

Since $\varepsilon$ is arbitrary, the inequality is proved.

**Remarks of Step 3.** Again, we are using the $\varepsilon$-definition of $\inf$. One may say this step showed the subaddtivity of the outer measure. Also note the geometric series by $\sum_{k=1}^{\infty}\frac{\varepsilon}{2^k}=\varepsilon$.

Suppose $E=\bigcup_{i=1}^{\infty}E_i$, where $E_1,E_2,\cdots$ are pairwise disjoint members of $\mathfrak{M}_F$, then

If $\mu(E)<\infty$, we also have $E \in \mathfrak{M}_F$.

As a dual to Step 3, we firstly show this holds for finitely many compact sets. As proved in Step 1, compact sets are in $\mathfrak{M}_F$. Suppose now $K_1$ and $K_2$ are disjoint compact sets. We want to show that

Note that compact sets in a Hausdorff space is closed. Therefore we are able to apply Urysohn’s lemma to the pair $(K_1,K_2^c)$. That said, there exists a $f \in C_c(X)$ such that

In other words, $f(x)=1$ for all $x \in K_1$ and $f(x)=0$ for all $x \in K_2$, since $\operatorname{supp}(f) \cap K_2 = \varnothing$. By Step 1, since $K_1 \cup K_2$ is compact, there exists some $g \in C_c(X)$ such that

Now things become tricky. We are able to write $g$ by

But $K_1 \prec fg$ and $K_2 \prec (1-f)g$ by the properties of $f$ and $g$. Also since $\Lambda$ is linear, we have

Therefore we have

On the other hand, by Step 3, we have

Therefore they must equal.

If $\mu(E)=\infty$, there is nothing to prove. So now we should assume that $\mu(E)<\infty$. Since $E_i \in \mathfrak{M}_F$, there are compact sets $K_i \subset E_i$ with

Putting $H_n=K_1 \cup K_2 \cup \cdots \cup K_n$, we see $E \supset H_n$ and

This inequality holds for all $n$ and $\varepsilon$, therefore

Therefore by Step 3, the identity holds.

Finally we shall show that $E \in \mathfrak{M}_F$ if $\mu(E) <\infty$. To make it more understandable, we will use elementary calculus notation. If we write $\mu(E)=x$ and $x_n=\sum_{i=1}^{n}\mu(E_i)$, we see

Therefore, for any $\varepsilon>0$, there exists some $N \in \mathbb{N}$ such that

This is tantamount to

But by definition of the *compact* set $H_N$ above, we see

Hence $E$ satisfies the requirements of $\mathfrak{M}_F$, thus an element of it.

**Remarks of Step 4.** You should realize that we are heavily using the $\varepsilon$-definition of $\sup$ and $\inf$. As you may guess, $\mathfrak{M}_F$ should be a subset of $\mathfrak{M}$ though we don’t know whether it is a $\sigma$-algebra or not. In other words, we hope that the countable additivity of $\mu$ holds on a $\sigma$-algebra that is *properly extended* from $\mathfrak{M}_F$. However it’s still difficult to show that $\mathfrak{M}$ is a $\sigma$-algebra. We need more properties of $\mathfrak{M}_F$ to go on.

If $E \in \mathfrak{M}_F$ and $\varepsilon>0$, there is a compact $K$ and an open $V$ such that $K \subset E \subset V$ and $\mu(V-K)<\varepsilon$.

There are two ways to write $\mu(E)$, namely

where $K$ is compact and $V$ is open. Therefore there exists some $K$ and $V$ such that

Since $V-K$ is open, and $\mu(V-K)<\infty$, we have $V-K \in \mathfrak{M}_F$. By Step 4, we have

Therefore $\mu(V-K)<\varepsilon$ as proved.

**Remarks of Step 5.** You should be familiar with the $\varepsilon$-definitions of $\sup$ and $\inf$ now. Since $V-K =V\cap K^c \subset V$, we have $\mu(V-K)\leq\mu(V)<\mu(E)+\frac{\varepsilon}{2}<\infty$.

If $A,B \in \mathfrak{M}_F$, then $A-B,A\cup B$ and $A \cap B$ are elements of $\mathfrak{M}_F$.

This shows that $\mathfrak{M}_F$ is closed under union, intersection and relative complement. In fact, we merely need to prove $A-B \in \mathfrak{M}_F$, since $A \cup B=(A-B) \cup B$ and $A\cap B = A-(A-B)$.

By Step 5, for $\varepsilon>0$, there are sets $K_A$, $K_B$, $V_A$, $V_B$ such that $K_A \subset A \subset V_A$, $K_B \subset B \subset V_B$, and for $A-B$ we have

With an application of Step 3 and 5, we have

Since $K_A-V_B$ is a closed subset of $K_A$, we see $K_A-V_B$ is compact as well (a closed subset of a compact set is compact). But $K_A-V_B \subset A-B$, and $\mu(A-B) <\mu(K_A-V_B)+2\varepsilon$, we see $A-B$ meet the requirement of $\mathfrak{M}_F$ (, the fact that $\mu(A-B)<\infty$ is trivial since $\mu(A-B)<\mu(A)$).

Since $A-B$ and $B$ are pairwise disjoint members of $\mathfrak{M}_F$, we see

Thus $A \cup B \in \mathfrak{M}_F$. Since $A,A-B \in \mathfrak{M}_F$, we see $A \cap B = A-(A-B) \in \mathfrak{M}_F$.

**Remarks of Step 6.** In this step, we demonstrated several ways to express a set, all of which end up with a huge simplification. Now we are able to show that $\mathfrak{M}_F$ is a subset of $\mathfrak{M}$.

There is a precise relation between $\mathfrak{M}$ and $\mathfrak{M}_F$ by

If $E \in \mathfrak{M}_F$, we shall show that $E \in \mathfrak{M}$. For compact $K\in\mathfrak{M}_F$ (Step 1), by Step 6, we see $K \cap E \in \mathfrak{M}_F$, therefore $E \in \mathfrak{M}$.

If $E \in \mathfrak{M}$ with $\mu(E)<\infty$ however, we need to show that $E \in \mathfrak{M}_F$. By definition of $\mu$, for $\varepsilon>0$, there is an open $V$ such that

Therefore $V \in \mathfrak{M}_F$. By Step 5, there is a compact set $K$ such that $\mu(V-K)<\varepsilon$ (the open set containing $V$ should be $V$ itself). Since $E \cap K \in \mathfrak{M}_F$, there exists a compact set $H \subset E \cap K$ with

Since $E \subset (E \cap K) \cup (V-K)$, it follows from Step 1 that

Therefore $E \in \mathfrak{M}_F$.

**Remarks of Step 7.** Several tricks in the preceding steps are used here. Now we are pretty close to the fact that $(X,\mathfrak{M},\mu)$ is a measure space. Note that for $E \in \mathfrak{M}-\mathfrak{M}_F$, we have $\mu(E)=\infty$, but we have already proved the countable additivity for $\mathfrak{M}_F$. Is it ‘almost trivial’ for $\mathfrak{M}$? Before that, we need to show that $\mathfrak{M}$ is a $\sigma$-algebra. Note that assertion 3 of $\mu$ has been proved.

We will validate the definition of $\sigma$-algebra one by one.

$X \in \mathfrak{M}$.

For any compact $K \subset X$, we have $K \cap X=K$. But as proved in Step 1, $K \in \mathfrak{M}_F$, therefore $X \in \mathfrak{M}$.

If $A \in \mathfrak{M}$, then $A^c \in\mathfrak{M}$.

If $A \in \mathfrak{M}$, then $A \cap K \in \mathfrak{M}_F$. But

By Step 1 and Step 6, we see $K \cap A^c \in \mathfrak{M}_F$, thus $A^c \in \mathfrak{M}$.

If $A_n \in \mathfrak{M}$ for all $n \in \mathbb{N}$, then $A=\bigcup_{n=1}^{\infty}A_n \in \mathfrak{M}$.

We assign an auxiliary sequence of sets inductively. For $n=1$, we write $B_1=A_1 \cap K$ where $K$ is compact. Then $B_1 \in \mathfrak{M}_F$. For $n \geq 2$, we write

Since $A_n \cap K \in \mathfrak{M}_F$, $B_1,B_2,\cdots,B_{n-1} \in \mathfrak{M}_F$, by Step 6, $B_n \in \mathfrak{M}_F$. Also $B_n$ is pairwise disjoint.

Another set-theoretic manipulation shows that

Now we are able to evaluate $\mu(A \cap K)$ by Step 4.

Therefore $A \cap K \in \mathfrak{M}_F$, which implies that $A \in \mathfrak{M}$.

$\mathfrak{M}$ contains all Borel sets.

Indeed, it suffices to prove that $\mathfrak{M}$ contains all open sets and/or closed sets. We’ll show two different paths. Let $K$ be a compact set.

- If $C$ is closed, then $C \cap K$ is compact, therefore $C$ is an element of $\mathfrak{M}_F$. (By Step 2.)
- If $D$ is open, then $D \cap K \subset K$. Therefore $\mu(D \cap K) \leq \mu(K)<\infty$, which shows that $D$ is an element of $\mathfrak{M}_F$. (By Step 7.)

Therefore by 1 or 2, $\mathfrak{M}$ contains all Borel sets.

Again, we will verify all properties of $\mu$ one by one.

$\mu(E) \geq 0$ for all $E \in \mathfrak{M}$.

This follows immediately from the definition of $\mu$, since $\Lambda$ is positive and $0 \leq f \leq 1$.

$\mu$ is countably additive.

If $A_1,A_2,\cdots$ form a disjoint countable collection of members of $\mathfrak{M}$, we need to show that

If $A_n \in \mathfrak{M}_F$ for all $n$, then this is merely what we have just proved in Step 4. If $A_j \in \mathfrak{M}-\mathfrak{M}_F$ however, we have $\mu(A_j)=\infty$. So $\sum_n\mu(A_n)=\infty$. For $\mu(\cup_n A_n)$, notice that $\cup_n A_n \supset A_j$, we have $\mu(\cup_n A_n) \geq \mu(A_j)=\infty$. The identity is now proved.

So far assertion 1-3 have been proved. But the final assertion has not been proved explicitly. We do that since this property will be used when discussing the Lebesgue measure $m$. In fact, this will show that $(X,\mathfrak{M},\mu)$ is a complete measure space.

If $E \in \mathfrak{M}$, $A \subset E$, and $\mu(E)=0$, then $A \in \mathfrak{M}$.

It suffices to show that $A \in \mathfrak{M}_F$. By definition, $\mu(A)=0$ as well. If $K \subset A$, where $K$ is compact, then $\mu(K)=\mu(A)=0$. Therefore $0$ is the supremum of $\mu(K)$. It follows that $A \in \mathfrak{M}_F \subset \mathfrak{M}$.

For every $f \in C_c(X)$, $\Lambda{f}=\int_X fd\mu$.

This is the absolute main result of the theorem. It suffices to prove the inequality

for all $f \in C_c(X)$. What about the other side? By the linearity of $\Lambda$ and $\int_X \cdot d\mu$, once inequality above proved, we have

Therefore

holds as well, and this establish the equality.

Notice that since $K=\operatorname{supp}(f)$ is compact, we see the range of $f$ has to be compact. Namely we may assume that $[a,b]$ contains the range of $f$. For $\varepsilon>0$, we are able to pick a partition around $[a,b]$ such that $y_n - y_{n-1}<\varepsilon$ and

Put

Since $f$ is continuous, $f$ is Borel measurable. The sets $E_i$ are trivially pairwise disjoint Borel sets. Again, there are open sets $V_i \supset E_i$ such that

for $i=1,2,\cdots,n$, and such that $f(x)<y_i + \varepsilon$ for all $x \in V_i$. Notice that $(V_i)$ covers $K$, therefore by the partition of unity, there are a sequence of functions $(h_i)$ such that $h_i \prec V_i$ for all $i$ and $\sum h_i=1$ on $K$. By Step 1 and the fact that $f=\sum_i h_i$, we see

By the way we picked $V_i$, we see $h_if \leq (y_i+\varepsilon)h_i$. We have the following inequality:

Since $h_i \prec V_i$, we have $\mu(E_i)+\frac{\varepsilon}{n}>\mu(V_i) \geq \Lambda{h_i}$. And we already get $\sum_i \Lambda{h_i} \geq \mu(K)$. If we put them into the inequality above, we get

Observe that $\cup_i E_i=K$, by Step 9 we have $\sum_{i}\mu(E_i)=\mu(K)$. A slight manipulation shows that

Therefore for $\Lambda f$ we get

Now here comes the trickiest part of the whole blog post. By definition of $E_i$, we see $f(x) > y_{i-1}>y_{i}-\varepsilon$ for $x \in E_i$. Therefore we get simple function $s_n$ by

If we evaluate the Lebesgue integral of $f$ with respect to $\mu$, we see

For $2\varepsilon\mu(K)$, things are simple since $0\leq\mu(K)<\infty$. Therefore $2\varepsilon\mu(K) \to 0$ as $\varepsilon \to 0$. Now let’s estimate the final part of the inequality. It’s trivial that $\frac{\varepsilon}{n}\sum_{i=1}^{n}(|a|+\varepsilon)=\varepsilon(\varepsilon+|a|)$. For $y_i$, observe that $y_i \leq b$ for all $i$, therefore $\frac{\varepsilon}{n}\sum_{i=1}^{n}y_i \leq \frac{\varepsilon}{n}nb=\varepsilon b$. Thus

Notice that $b+|a| \geq 0$ since $b \geq a \geq -|a|$. Our estimation of $\Lambda{f}$ is finally done:

Since $\varepsilon$ is arbitrary, we see $\Lambda{f} \leq \int_X fd\mu$. The identity is proved.

If there are two measures $\mu_1$ and $\mu_2$ that satisfy assertion 1 to 4 and are correspond to $\Lambda$, then $\mu_1=\mu_2$.

In fact, according to assertion 2 and 3, $\mu$ is determined by the values on compact subsets of $X$. It suffices to show that

If $K$ is a compact subset of $X$, then $\mu_1(K)=\mu_2(K)$.

Fix $K$ compact and $\varepsilon>0$. By Step 1, there exists an open $V \supset K$ such that $\mu_2(V)<\mu_2(K)+\varepsilon$. By Urysohn’s lemma, there exists some $f$ such that $K \prec f \prec V$. Hence

Thus $\mu_1(K) \leq \mu_2(K)$. If $\mu_1$ and $\mu_2$ are exchanged, we see $\mu_2(K) \leq \mu_1(K)$. The uniqueness is proved.

Can we simply put $X=\mathbb{R}^k$ right now? The answer is no. Note that the outer regularity is for all sets but inner is only for open sets and members of $\mathfrak{M}_F$. But we expect the outer and inner regularity to be ‘symmetric’. There is an example showing that *locally compact* is far from being enough to offer the ‘symmetry’.

Define $X=\mathbb{R}_1 \times \mathbb{R}_2$, where $\mathbb{R}_1$ is the real line equipped with discrete metric $d_1$, and $\mathbb{R}_2$ is the real line equipped with euclidean metric $d_2$. The metric of $X$ is defined by

The topology $\tau_X$ induced by $d_X$ is naturally Hausdorff and locally compact by considering the vertical segments. So what would happen to this weird locally compact Hausdorff space?

If $f \in C_c(X)$, let $x_1,x_2,\cdots,x_n$ be those values of $x$ for which $f(x,y) \neq 0$ for at least one $y$. Since $f$ has compact support, it is ensured that there are only finitely many $x_i$’s. We are able to define a positive linear functional by

where $\mu$ is the measure associated with $\Lambda$ in the sense of R-M-K theorem. Let

By squeezing the disjoint vertical segments around $(x_i,0)$, we see $\mu(K)=0$ for all compact $K \subset E$ but $\mu(E)=\infty$.

This is in violent contrast to what we do expect. However, if $X$ is required to be $\sigma$-compact (note that the space in this example is not), this kind of problems disappear neatly.

- Walter Rudin,
*Real and Complex Analysis* - Serge Lang,
*Fundamentals of Differential Geometry* - Joel W. Robbin,
*Partition of Unity* - Brian Conrad,
*Paracompactness and local compactness* - Raoul Bott & Loring W. Tu,
*Differential Forms in Algebraic Topology*

We are finally going to prove the open mapping theorem in $F$-space. In this version, only metric and completeness are required. Therefore it contains the Banach space version naturally.

(Theorem 0)Suppose we have the following conditions:

- $X$ is a $F$-space,
- $Y$ is a topological space,
- $\Lambda: X \to Y$ is continuous and linear, and
- $\Lambda(X)$ is of the second category in $Y$.
Then $\Lambda$ is an open mapping.

*Proof.* Let $B$ be a neighborhood of $0$ in $X$. Let $d$ be an invariant metric on $X$ that is compatible with the $F$-topology of $X$. Define a sequence of balls by

where $r$ is picked in such a way that $B_0 \subset B$. To show that $\Lambda$ is an open mapping, we need to prove that there exists some neighborhood $W$ of $0$ in $Y$ such that

To do this however, we need an auxiliary set. In fact, we will show that there exists some $W$ such that

We need to prove the inclusions one by one.

The first inclusion requires BCT. Since $B_2 -B_2 \subset B_1$, and $Y$ is a topological space, we get

Since

according to BCT, at least one $k\Lambda(B_2)$ is of the second category in $Y$. But scalar multiplication $y\mapsto ky$ is a homeomorphism of $Y$ onto $Y$, we see $k\Lambda(B_2)$ is of the second category for all $k$, especially for $k=1$. Therefore $\overline{\Lambda(B_2)}$ has nonempty interior, which implies that there exists some open neighborhood $W$ of $0$ in $Y$ such that $W \subset \overline{\Lambda(B_1)}$. By replacing the index, it’s easy to see this holds for all $n$. That is, for $n \geq 1$, there exists some neighborhood $W_n$ of $0$ in $Y$ such that $W_n \subset \overline{\Lambda(B_n)}$.

The second inclusion requires the completeness of $X$. Fix $y_1 \in \overline{\Lambda(B_1)}$, we will show that $y_1 \in \Lambda(B)$. Pick $y_n$ inductively. Assume $y_n$ has been chosen in $\overline{\Lambda(B_n)}$. As stated before, there exists some neighborhood $W_{n+1}$ of $0$ in $Y$ such that $W_{n+1} \subset \overline{\Lambda(B_{n+1})}$. Hence

Therefore there exists some $x_n \in B_n$ such that

Put $y_{n+1}=y_n-\Lambda x_n$, we see $y_{n+1} \in W_{n+1} \subset \overline{\Lambda(B_{n+1})}$. Therefore we are able to pick $y_n$ naturally for all $n \geq 1$.

Since $d(x_n,0)<\frac{r}{2^n}$ for all $n \geq 0$, the sums $z_n=\sum_{k=1}^{n}x_k$ converges to some $z \in X$ since $X$ is a $F$-space. Notice we also have

we have $z \in B_0 \subset B$.

By the continuity of $\Lambda$, we see $\lim_{n \to \infty}y_n = 0$. Notice we also have

we see $y_1 = \Lambda z \in \Lambda(B)$.

The whole theorem is now proved, that is, $\Lambda$ is an open mapping. $\square$

You may think the following relation comes from nowhere:

But it’s not. We need to review some set-point topology definitions. Notice that $y_n$ is a limit point of $\Lambda(B_n)$, and $y_n-W_{n+1}$ is a open neighborhood of $y_n$. If $(y_n - W_{n+1}) \cap \Lambda(B_{n})$ is empty, then $y_n$ cannot be a limit point.

The geometric series by

is widely used when sum is taken into account. It is a good idea to keep this technique in mind.

The formal proof will not be put down here, but they are quite easy to be done.

(Corollary 0)$\Lambda(X)=Y$.

This is an immediate consequence of the fact that $\Lambda$ is open. Since $Y$ is open, $\Lambda(X)$ is an open subspace of $Y$. But the only open subspace of $Y$ is $Y$ itself.

(Corollary 1)$Y$ is a $F$-space as well.

If you have already see the commutative diagram by quotient space (put $N=\ker\Lambda$), you know that the induced map $f$ is open and continuous. By treating topological spaces as groups, by corollary 0 and the first isomorphism theorem, we have

Therefore $f$ is a isomorphism; hence one-to-one. Therefore $f$ is a homeomorphism as well. In this post we showed that $X/\ker{\Lambda}$ is a $F$-space, therefore $Y$ has to be a $F$-space as well. (We are using the fact that $\ker{\Lambda}$ is a closed set. But why closed?)

(Corollary 2)If $\Lambda$ is a continuous linear mapping of an $F$-space $X$ onto a $F$-space $Y$, then $\Lambda$ is open.

This is a direct application of BCT and open mapping theorem. Notice that $Y$ is now of the second category.

(Corollary 3)If the linear map $\Lambda$ in Corollary 2 is injective, then $\Lambda^{-1}:Y \to X$ is continuous.

This comes from corollary 2 directly since $\Lambda$ is open.

(Corollary 4)If $X$ and $Y$ are Banach spaces, and if $\Lambda: X \to Y$ is a continuous linear bijective map, then there exist positive real numbers $a$ and $b$ such thatfor every $x \in X$.

This comes from corollary 3 directly since both $\Lambda$ and $\Lambda^{-1}$ are bounded as they are continuous.

(Corollary 5)If $\tau_1 \subset \tau_2$ are vector topologies on a vector space $X$ and if both $(X,\tau_1)$ and $(X,\tau_2)$ are $F$-spaces, then $\tau_1 = \tau_2$.

This is obtained by applying corollary 3 to the identity mapping $\iota:(X,\tau_2) \to (X,\tau_1)$.

(Corollary 6)If $\lVert \cdot \rVert_1$ and $\lVert \cdot \rVert_2$ are two norms in a vector space $X$ such that

- $\lVert\cdot\rVert_1 \leq K\lVert\cdot\rVert_2$.
- $(X,\lVert\cdot\rVert_1)$ and $(X,\lVert\cdot\rVert_2)$ are Banach
Then $\lVert\cdot\rVert_1$ and $\lVert\cdot\rVert_2$ are equivalent.

This is merely a more restrictive version of corollary 5.

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it’s time to make a list of the series. It’s been around half a year.

- The Big Three Pt. 1 - Baire Category Theorem Explained
- The Big Three Pt. 2 - The Banach-Steinhaus Theorem
- The Big Three Pt. 3 - The Open Mapping Theorem (Banach Space)
- The Big Three Pt. 4 - The Open Mapping Theorem (F-Space)
- The Big Three Pt. 5 - The Hahn-Banach Theorem (Dominated Extension)
- The Big Three Pt. 6 - Closed Graph Theorem with Applications

We are going to show the completeness of $X/N$ where $X$ is a TVS and $N$ a closed subspace. Alongside, a bunch of useful analysis tricks will be demonstrated (and that’s why you may find this blog post a little tedious.). But what’s more important, the theorem proved here will be used in the future.

To make it clear, we should give a formal definition of $F$-space.

A topological space $X$ is an $F$-space if its topology $\tau$ is induced by a complete invariant metric $d$.

A metric $d$ on a vector space $X$ will be called invariant if for all $x,y,z \in X$, we have

By complete we mean every Cauchy sequence of $(X,d)$ converges.

The metric can be inherited to the quotient space naturally (we will use this fact latter), that is

If $X$ is a $F$-space, $N$ is a closed subspace of a topological vector space $X$, then $X/N$ is still a $F$-space.

Suppose $d$ is a complete invariant metric compatible with $\tau_X$. The metric on $X/N$ is defined by

*Proof.* First, if $\pi(x)=\pi(y)$, that is, $x-y \in N$, we see

If $\pi(x) \neq \pi(y)$ however, we shall show that $\rho(\pi(x),\pi(y))>0$. In this case, we have $x-y \notin N$. Since $N$ is closed, $N^c$ is open, and $x-y$ is an interior point of $X-N$. Therefore there exists an open ball $B_r(x-y)$ centered at $x-y$ with radius $r>0$ such that $B_r(x-y) \cap N = \varnothing$. Notice we have $d(x-y,z)>r$ since otherwise $z \in B_r(x-y)$. By putting

we see $d(x-y,z) \geq r_0$ for all $z \in N$ and indeed $r_0=\inf_{z \in N}d(x-y,z)>0$ (the verification can be done by contradiction). In general, $\inf_z d(x-y,z)=0$ if and only if $x-y \in \overline{N}$.

Next, we shall show that $\rho(\pi(x),\pi(y))=\rho(\pi(y),\pi(x))$, and it suffices to assume that $\pi(x) \neq \pi(y)$. Sgince $d$ is translate invariant, we get

Therefore the $\inf$ of the left hand is equal to the one of the right hand. The identity is proved.

Finally, we need to verify the triangle inequality. Let $r,s,t \in X$. For any $\varepsilon>0$, there exist some $z_\varepsilon$ and $z_\varepsilon’$ such that

Since $d$ is invariant, we see

*(I owe @LeechLattice for the inequality above.)*

Therefore

*(Warning: This does not imply that $\rho(\pi(r),\pi(s))+\rho(\pi(s),\pi(t))=\inf_z d(r-t,z)$ since we don’t know whether it is the lower bound or not.)*

If $\rho(\pi(r),\pi(s))+\rho(\pi(s),\pi(t))<\rho(\pi(r),\pi(t))$ however, let

then there exists some $z’’_\varepsilon=z_\varepsilon+z’_\varepsilon$ such that

which is a contradiction since $\rho(\pi(r),\pi(t)) \leq d(r-t,z)$ for all $z \in N$.

*(We are using the $\varepsilon$ definition of $\inf$. See here.)*

Since $\pi$ is surjective, we see if $u \in X/N$, there exists some $a \in X$ such that $\pi(a)=u$. Therefore

If $\pi(x)=\pi(x’)$ and $\pi(y)=\pi(y’)$, we have to show that $\rho(\pi(x),\pi(y))=\rho(\pi(x’),\pi(y’))$. In fact,

since $\rho(\pi(x),\pi(x’))=0$ as $\pi(x)=\pi(x’)$. Meanwhile

therefore $\rho(\pi(x),\pi(y))=\rho(\pi(x’),\pi(y’))$.

By proving this, we need to show that a set $E \subset X/N$ is open with respect to $\tau_N$ if and only if $E$ is a union of open balls. But we need to show a generalized version:

If $\mathscr{B}$ is a local base for $\tau$, then the collection $\mathscr{B}_N$, which contains all sets $\pi(V)$ where $V \in \mathscr{B}$, forms a local base for $\tau_N$.

*Proof.* We already know that $\pi$ is continuous, linear and open. Therefore $\pi(V)$ is open for all $V \in \mathscr{B}$. For any open set around $E \subset X/N$ containing $\pi(0)$, we see $\pi^{-1}(E)$ is open, and we have

and therefore

Now consider the local base $\mathscr{B}$ containing all open balls around $0 \in X$. Since

we see $\rho$ determines $\mathscr{B}_N$. But we have already proved that $\rho$ is invariant; hence $\mathscr{B}_N$ determines $\tau_N$.

Once this is proved, we are able to claim that, if $X$ is a $F$-space, then $X/N$ is still a $F$-space, since its topology is induced by a complete invariant metric $\rho$.

*Proof.* Suppose $(x_n)$ is a Cauchy sequence in $X/N$, relative to $\rho$. There is a subsequence $(x_{n_k})$ with $\rho(x_{n_k},x_{n_{k+1}})<2^{-k}$. Since $\pi$ is surjective, we are able to pick some $z_k \in X$ such that $\pi(z_k) = x_{n_k}$ and such that

(The existence can be verified by contradiction still.) By the inequality above, we see $(z_k)$ is Cauchy (can you see why?). Since $X$ is complete, $z_k \to z$ for some $z \in X$. By the **continuity** of $\pi$, we also see $x_{n_k} \to \pi(z)$ as $k \to \infty$. Therefore $(x_{n_k})$ converges. Hence $(x_n)$ converges since it has a convergent subsequence. $\rho$ is complete.

This fact will be used to prove some corollaries in the open mapping theorem. For instance, for any continuous linear map $\Lambda:X \to Y$, we see $\ker(\Lambda)$ is closed, therefore if $X$ is a $F$-space, then $X/\ker(\Lambda)$ is a $F$-space as well. We will show in the future that $X/\ker(\Lambda)$ and $\Lambda(X)$ are homeomorphic if $\Lambda(X)$ is of the second category.

There are more properties that can be inherited by $X/N$ from $X$. For example, normability, metrizability, local convexity. In particular, if $X$ is Banach, then $X/N$ is Banach as well. To do this, it suffices to define the quotient norm by

]]>Before going to it, we are going to give several motivations to define the Riemann-Stieltjes integral, which can be considered as an generalization of Riemann integral, the one everyone learns in their Calculus class.

When talking about $\int_a^b fdg$, one may simply think about $\int_a^b fg’dx$. But is it even necessary that $g$ is differentiable? What would happen if $g$ is simply continuous, or even not continuous? Further, given that $g$ is differentiable, can we prove that

in a general way(without assuming $f$ is differentiable)?

Another motivation comes from probability theory. Oftentimes one need to consider discrete case ($\sum$) and continuous case ($\int$) separately. One may say that integral is the limit of summation, but it would be weird to write $\int$ as $\lim\sum$ every time. However, if we have a way to write a sum, for example the expected value of a discrete variable, as an integral, things would be easier. Of course, we don’t want to write such a sum as another sum by adding up the integral on several disjoint segments. That would be weirder.

If you have learned measure theory, you will know that Lebesgue integral does not perfectly cover Riemann integral. For example, $\int_{0}^{\infty}\frac{\sin{x}}{x}dx$ is not integrable in the sense of Lebesgue but Riemann. We cannot treat Lebesgue integral as a generalization of Riemann integral. In this blog post however, we are showing a direct generalization of Riemann integral.

We are trying our best to prevent ourselves from using $\sup$, $\inf$, and differentiation theory. But $\varepsilon-\delta$ language is heavily used here, so make sure that you are good at it.

By a partition $P$ on $[a,b]$ we mean a sequence of numbers $(x_n)$ such that

and we associate its size by

Let $f$, $g$ be bounded real function on $[a,b]$ (again, no continuity or differentiability required). Given a partition $P$ and numbers $c_k$ with $x_k \leq c_k \leq x_{k+1}$, we define the Riemann-Stieltjes sum (RS-sum) by

We say that the **limit**

exists if there exists some $L \in \mathbb{R}$ such that give $\varepsilon>0$, there exists $\delta>0$ such that whenever $\sigma(P)<\delta$, we have

In this case, we say $f$ is RS(g)-integrable, and the limit is denoted by

This is the so-called **Riemann-Stieltjes** integral. When $g(x)=x$, we get **Riemann integral** naturally.

This integral method can be generalized to Banach space. Let $f$, $g$ be bounded maps of $[a,b]$ into Banach spaces $E$, $F$ respectively. Assume we have a product $E \times F \to G$ denoted by $(u,v) \mapsto uv$ with $\lVert uv \rVert \leq \lVert u \rVert \lVert v \rVert$. Then by replacing the absolute value by norm, still we get the Riemann-Stieltjes integral, although in this case we have

and $G$ is not necessary to be $\mathbb{R}$. This is different from Bochner integral, since no measure theory evolved here.

First, we shall show that RS(g)-integrable functions form a vector space. To do this, it suffices to show that

and

are linear. This follows directly from the definition of RS-sum. Let’s see the result.

Suppose we have

Then we have the following identities for $\alpha \in I$.

- $\int_a^b \alpha fdg=\alpha I$.
- $\int_a^b (f+h)dg=I+J$.
- $\int_a^bfd(g+u)=I+K$.
- $\int_a^b fd(\alpha g)=\alpha I$.

*Proof.* We shall show 2 for example. Other three identities follows in the same way.

Notice that the existence of the limit of RS-sum depends only on the size of $P$. For $\varepsilon>0$, there exists some $\delta_1,\delta_2>0$ such that

when $\sigma(P)<\delta_1$ and $\sigma(P)<\delta_2$ respectively. By picking $\delta=\min(\delta_1,\delta_2)$, we see for $\sigma(P)<\delta$, we have

$f \in RS(g)$ if and only if $g \in RS(f)$. In this case, we also have integration by parts:

You may not believe it, but differentiation does not play any role here, as promised at the beginning.

*Proof.* Using the summation by parts (by Abel), we have

By writing

we have

where

Consider the partition $Q$ by

we have $x_0,x_1,\cdots,x_{n-1},x_k$ to be intermediate points, and

Since $0 < \sigma(Q) \leq 2\sigma(P) \leq 4\sigma(Q)$, when $\sigma(P) \to 0$, we also have $\sigma(Q) \to 0$ and vice versa. Suppose now $\int_a^b gdf$ exists, we have.

And integration by parts follows.

Suppose $\int_a^bfdg$ exists, then

The proposition is proved. $\square$

As said before, we want to represent both continuous and discrete case using integral. For measure theory, we have Lebesgue measure and counting measure. But in some cases, this can be done using Riemann-Stieltjes integral as well. Ordinary Riemann integral and finite or infinite series are both special cases of Riemann-Stieltjes integral.

To do this, we need the unit step function by

If $a<s<b$, $f$ is bounded on $[a,b]$ and continuous at $s$, by putting $g(x)=I(x-s)$, we have

*Proof.* A simple verification shows that $\int_a^b fdg=\int_s^b fdg$ (by unwinding the RS-sum, one see immediately that $g(x_k)=0$ for all $x_k\leq s$, therefore the partition before $s$ has no tribute to the value of the integral). Now consider the partition $P$ by

We see

As $x_1 \to s$, we have $c_0 \to s$, since $f$ is continuous at $s$, we have $f(c_0) \to f(s)$ as desired. $\square$

By the linearity of RS integral, it’s easy to generalize this to the case of finite linear combination. Namely, for $g(x)=\sum_{k=1}^{n}c_nI(x-s_n)$, we have

But now we are discussing the infinite case.

Suppose $c_n \geq 0$ for all $n$ and $\sum_n c_n$ converges, $(s_n)$ is a sequence of distinct points in $(a,b)$, and

Let $f$ be continuous on $[a,b]$. Then

*Proof.* First it’s easy to see that $g(x)$ converges for every $x$, and is monotonic with $g(a)=0$, $g(b)=\sum_n c_n$. For given $\varepsilon>0$, there exists some $N$ such that

Put

we have

By putting $M=\sup|f(x)|$, we see

The inequality holds since $g_2(b)-g_2(a)<\varepsilon$. Since $M$ is finite, when $N \to \infty$, we have the desired result.

Finally we are discussing some differentiation. The following theorem shows the connection between RS integral and Riemann integral.

Let $f$ be continuous and suppose that $g$ is real differentiable on $[a,b]$ while $g’$ is Riemann integrable as well, then $f \in RS(g)$ and

*Proof.* By mean value theorem, for each $k$, we have

The RS-sum can be written as

Since $g’$ is Riemann integrable, we have

given that $|S(P,g’,x)-\int_a^b g’dx|<\varepsilon$. Therefore

where $M=\sup|f(x)|<\infty$ ($f$ is assumed to be bounded.) . Also notice that $fg’$ is integrable since $f$ is continuous. Therefore

Therefore,

which proves the theorem. $\square$

To sum up, given $\varepsilon>0$, there exists some $\delta>0$ such that if $\sigma(P)<\delta$, we have

and

After some estimation, we get

]]>We are restricting ourself into $\mathbb{R}$ endowed with normal topology. Recall that a function is continuous if and only if for any open set $U \subset \mathbb{R}$, we have

to be open. One can rewrite this statement using $\varepsilon-\delta$ language. To say a function $f: \mathbb{R} \to \mathbb{R}$ continuous at $f(x)$, we mean for any $\varepsilon>0$, there exists some $\delta>0$ such that for $t \in (x-\delta,x+\delta)$, we have

$f$ is continuous on $\mathbb{R}$ if and only if $f$ is continuous at every point of $\mathbb{R}$.

If $(x-\delta,x+\delta)$ is replaced with $(x-\delta,x)$ or $(x,x+\delta)$, we get left continuous and right continuous, one of which plays an important role in probability theory.

But the problem is, sometimes continuity is too strong for being a restriction, but the ‘direction’ associated with left/right continuous functions are unnecessary as well. For example the function

is neither left nor right continuous (globally), but it is a thing. Left/right continuity is not a perfectly weakened version of continuity. We need something different.

Let $f$ be a real (or extended-real) function on $\mathbb{R}$. The semicontinuity of $f$ is defined as follows.

If

is open for all real $\alpha$, we say $f$ is

lowersemicontinuous.If

is open for all real $\alpha$, we say $f$ is

uppersemicontinuous.

Is it possible to rewrite these definition à la $\varepsilon-\delta$? The answer is yes if we restrict ourself in metric space.

$f: \mathbb{R} \to \mathbb{R}$ is upper semicontinuous at $x$ if for every $\varepsilon>0$, there exists some $\delta>0$ such that for $t \in (x-\delta,x+\delta)$, we have

$f: \mathbb{R} \to \mathbb{R}$ is lower semicontinuous at $x$ if for every $\varepsilon>0$, there exists some $\delta>0$ such that for $t \in (x-\delta,x+\delta)$, we have

Of course, $f$ is upper/lower semicontinuous on $\mathbb{R}$ if and only if it is so on every point of $\mathbb{R}$. One shall find no difference between the definitions in different styles.

Here is another way to see it. For the continuity of $f$, we are looking for *arbitrary* open subsets $V$ of $\mathbb{R}$, and $f^{-1}(V)$ is expected to be open. For the lower/upper semicontinuity of $f$, however, the open sets are restricted to be like $(\alpha,+\infty]$ and $[-\infty,\alpha)$. Since all open sets of $\mathbb{R}$ can be generated by the union or intersection of sets like $[-\infty,\alpha)$ and $(\beta,+\infty]$, we immediately get

$f$ is continuous if and only if $f$ is both upper semicontinuous and lower semicontinuous.

*Proof.* If $f$ is continuous, then for any $\alpha \in \mathbb{R}$, we see $[-\infty,\alpha)$ is open, and therefore

has to be open. The upper semicontinuity is proved. The lower semicontinuity of $f$ is proved in the same manner.

If $f$ is both upper and lower semicontinuous, we see

is open. Since every open subset of $\mathbb{R}$ can be written as a countable union of segments of the above types, we see for any open subset $V$ of $\mathbb{R}$, $f^{-1}(V)$ is open. (If you have trouble with this part, it is recommended to review the definition of topology.) $\square$

There are two important examples.

- If $E \subset \mathbb{R}$ is open, then $\chi_E$ is lower semicontinuous.
- If $F \subset \mathbb{R}$ is closed, then $\chi_F$ is upper semicontinuous.

We will prove the first one. The second one follows in the same manner of course. For $\alpha<0$, the set $A=\chi_E^{-1}((\alpha,+\infty])$ is equal to $\mathbb{R}$, which is open. For $\alpha \geq 1$, since $\chi_E \leq 1$, we see $A=\varnothing$. For $0 \leq \alpha < 1$ however, the set of $x$ where $\chi_E>\alpha$ has to be $E$, which is still open.

When checking the semicontinuity of a function, we check from bottom to top or top to bottom. The function $\chi_E$ is defined by

If $f_1$ and $f_2$ are upper/lower semicontinuous, then so is $f_1+f_2$.

*Proof.* We are going to prove this using different tools. Suppose now both $f_1$ and $f_2$ are upper semicontinuous. For $\varepsilon>0$, there exists some $\delta_1>0$ and $\delta_2>0$ such that

*Proof.* If we pick $\delta=\min(\delta_1,\delta_2)$, then we see for all $t \in (x-\delta,x+\delta)$, we have

The upper semicontinuity of $f_1+f_2$ is proved by considering all $x \in \mathbb{R}$.

Now suppose both $f_1$ and $f_2$ are lower semicontinuous. We have a identity by

The set on the right side is always open. Hence $f_1+f_2$ is lower semicontinuous. $\square$

However, when there are infinite many semicontinuous functions, things are different.

Let $\{f_n\}$ be a sequence of nonnegative functions on $\mathbb{R}$, then

- If each $f_n$ is lower semicontinuous, then so is $\sum_{1}^{\infty}f_n$.
- If each $f_n$ is upper semicontinuous, then $\sum_{1}^{\infty}f_n$ is not necessarily upper semicontinuous.

*Proof.* To prove this we are still using the properties of open sets. Put $g_n=\sum_{1}^{n}f_k$. Now suppose all $f_k$ are lower. Since $g_n$ is a finite sum of lower functions, we see each $g_n$ is lower. Let $f=\sum_{n}f_n$. As $f_k$ are non-negative, we see $f(x)>\alpha$ if and only if there exists some $n_0$ such that $g_{n_0}(x)>\alpha$. Therefore

The set on the right hand is open already.

For the upper semicontinuity, it suffices to give an counterexample, but before that, we shall give the motivation.

As said, the characteristic function of a closed set is upper semicontinuous. Suppose $\{E_n\}$ is a sequence of almost disjoint closed set, then $E=\cup_{n\geq 1}E_n$ is not necessarily closed, therefore $\chi_E=\sum\chi_{E_n}$ (a.e.) is not necessarily upper semicontinuous. Now we give a concrete example. Put $f_0=\chi_{[1,+\infty]}$ and $f_n=\chi_{E_n}$ for $n \geq 1$ where

For $x > 0$, we have $f=\sum_nf_n \geq 1$. Meanwhile, $f^{-1}([-\infty,1))=[-\infty,0]$, which is not open. $\square$

Notice that $f$ can be defined on any topological space here.

There is one fact we already know about continuous functions.

If $X$ is compact, $f: X \to \mathbb{R}$ is continuous, then there exists some $a,b \in X$ such that $f(a)=\min f(X)$, $f(b)=\max f(X)$.

In fact, $f(X)$ is compact still. But for semicontinuous functions, things will be different but reasonable. For upper semicontinuous functions, we have the following fact.

If $X$ is compact and $f: X \to (-\infty,+\infty)$ is upper semicontinuous, then there exists some $a \in X$ such that $f(a)=\max f(X)$.

Notice that $X$ is not assumed to hold any other topological property. It can be Hausdorff or Lindelöf, but we are not asking for restrictions like this. The only property we will be using is that every open cover of $X$ has a finite subcover. Of course, one can replace $X$ with any compact subset of $\mathbb{R}$, for example, $[a,b]$.

*Proof.* Put $\alpha=\sup f(X)$, and define

If $f$ attains no maximum, then for any $x \in X$, there exists some $n \geq 1$ such that $f(x)<\alpha-\frac{1}{n}$. That is, $x \in E_n$ for some $n$. Therefore $\bigcup_{n \geq 1}E_n$ covers $X$. But this cover has no finite subcover of $X$. A contradiction since $X$ is compact. $\square$

This is a comprehensive application of several properties of semicontinuity.

(

Vitali–Carathéodory theorem) Suppose $f \in L^1(\mathbb{R})$, where $f$ is real-valued function. For $\varepsilon>0$, there exists some functions $u$ and $v$ on $\mathbb{R}$ such that $u \leq f \leq v$, $u$ is a upper semicontinuous functions bounded above, and $v$ is lower semicontinuous bounded below, and

It suffice to prove this theorem for $f \geq 0$ (of course $f$ is not identically equal to $0$ since this case is trivial). Since $f$ is the pointwise limit of an increasing sequence of simple functions $s_n$, we are able to write $f$ as

By putting $t_1=s_1$, $t_n=s_n-s_{n-1}$ for $n \geq 2$, we get $f=\sum_n t_n$. We are able to write $f$ as

where $E_k$ is measurable for all $k$. Also we have

and the series on the right hand converges(since $f \in L^1$. By the properties of Lebesgue measure, there exists a compact set $F_k$ and a open set $V_k$ such that $F_k \subset E_k \subset V_k$ and $c_km(V_k-F_k)<\frac{\varepsilon}{2^{k+1}}$. Put

(now you can see $v$ is lower semicontinuous and $u$ is upper semicontinuous). The $N$ is chosen in such a way that

Since $V_k \supset E_k$, we have $\chi_{V_k} \geq \chi_{E_k}$. Therefore $v \geq f$. Similarly, $f \geq u$. Now we need to check the desired integral inequality. A simple recombination shows that

If we integrate the function above, we get

This proved the case when $f \geq 0$. In the general case, we write $f=f^{+}-f^{-}$. Attach the semicontinuous functions to $f^{+}$ and $f^{-}$ respectively by $u_1 \leq f^{+} \leq v_1$ and $u_2 \leq f^{-} \leq v_2$. Put $u=u_1-v_2$, $v=v_1-u_2$. As we can see, $u$ is upper semicontinuous and $v$ is lower semicontinuous. Also, $u \leq f \leq v$ with the desired property since

and the theorem follows. $\square$

Indeed, the only unique property about measure used is the existence of $F_k$ and $V_k$. The domain $\mathbb{R}$ here can be replaced with $\mathbb{R}^k$ for $1 \leq k < \infty$, and $m$ be replaced with the respective $m_k$. Much more generally, the domain can be replaced by any locally compact Hausdorff space $X$, and the measure by any measure associated with Riesz-Markov-Kakutani representation theorem on $C_c(X)$.

The answer is no. Consider the fat Cantor set $K$, which has Lebesgue measure $\frac{1}{2}$. We shall show that $\chi_K$ can not be approximated below by a lower semicontinuous function.

If $v$ is a lower semicontinuous function such that $v \leq \chi_K$, then $v \leq 0$.

*Proof.* Consider the set $V=v^{-1}((0,1])=v^{-1}((0,+\infty))$. Since $v \leq \chi_K$, we have $V \subset K$. We will show that $V$ has to be empty.

Pick $t \in V$. Since $V$ is open, there exists some neighborhood $U$ containing $t$ such that $U \subset V$. But $U=\varnothing$ since $U \subset K$ and $K$ has empty interior. Therefore $V = \varnothing$. That is, $v \leq 0$ for all $x$. $\square$

Suppose $u$ is any upper semicontinuous function such that $u \geq f$. For $\varepsilon=\frac{1}{2}$, we have

This example shows that there exists some integrable functions that are not able to reversely approximated in the sense of the Vitali–Carathéodory theorem.

]]>Fix $p$ with $1 \leq p \leq \infty$. It’s easy to see that $L^p(\mu)$ is a topological vector space. But it is not a metric space if we define

The reason is, if $d(f,g)=0$, we can only get $f=g$ a.e., but they are not strictly equal. With that being said, this function $d$ is actually a pseudo metric. This is unnatural. However, the relation $\sim$ by $f \sim g \mathbb{R}ightarrow d(f,g)=0$ is a equivalence relation. This inspires us to take quotient set into consideration.

For a vector space $V$, every subspace of $V$ is a normal subgroup. There is no reason to prevent ourselves from considering quotient group and looking for some interesting properties. Further, a vector space is a abelian group, therefore any subspace is automatically normal.

Let $N$ be a subspace of a vector space $X$. For every $x \in X$, let $\pi(x)$ be the coset of $N$ that contains $x$, that is

Trivially, $\pi(x)=\pi(y)$ if and only if $x-y \in N$ (say, $\pi$ is well-defined since $N$ is a vector space). This is a linear function since we also have the addition and multiplication by

These cosets are the elements of a vector space $X/N$, which reads, the quotient space of $X$ modulo $N$. The map $\pi$ is called the canonical map as we all know.

First we shall treat $\mathbb{R}^2$ as a vector space, and the subspace $\mathbb{R}$, which is graphically represented by $x$-axis, as a subspace (we will write it as $X$). For a vector $v=(2,3)$, which is represented by $AB$, we see the coset $v+X$ has something special. Pick any $u \in X$, for example $AE$, $AC$, or $AG$. We see $v+u$ has the same $y$ value. The reason is simple, since we have $v+u=(2+x,3)$, where the $y$ value remain fixed however $u$ may vary.

With that being said, the set $v+X$, which is not a vector space, can be represented by $\overrightarrow{AD}$. This proceed can be generalized to $\mathbb{R}^n$ with $\mathbb{R}^m$ as a subspace with ease.

We now consider some fancy example. Consider all rational Cauchy sequences, that is

where $a_k\in\mathbb{Q}$ for all $k$. In analysis class we learned two facts.

- Any Cauchy sequence is bounded.
- If $(a_n)$ converges, then $(a_n)$ is Cauchy.

However, the reverse of 2 does not hold in $\mathbb{Q}$. For example, if we put $a_k=(1+\frac{1}{k})^k$, we should have the limit to be $e$, but $e \notin \mathbb{Q}$.

If we define the addition and multiplication term by term, namely

and

where $\alpha \in \mathbb{Q}$, we get a vector space (the verification is easy). The zero vector is defined by

This vector space is denoted by $\overline{\mathbb{Q}}$. The subspace containing all sequences converges to $0$ will be denoted by $\overline{\mathbb{O}}$. Again, $(a_n)+\overline{\mathbb{O}}=(b_n)+\overline{\mathbb{O}}$ if and only if $(a_n-b_n) \in \overline{\mathbb{O}}$. Using the language of equivalence relation, we also say $(a_n)$ and $(b_n)$ are equivalent if $(a_n-b_n) \in \overline{\mathbb{O}}$. For example, the two following sequences are equivalent:

Actually we will get $\mathbb{R} \simeq \overline{\mathbb{Q}}/\overline{\mathbb{O}}$ in the end. But to make sure that this quotient space is exactly the one we meet in our analysis class, there are a lot of verification should be done.

We shall give more definitions for calculation. The multiplication of two Cauchy sequences is defined term by term à la the addition. For $\overline{\mathbb{Q}}/\overline{\mathbb{O}}$ we have

and

As for inequality, a partial order has to be defined. We say $(a_n) > (0)$ if there exists some $N>0$ such that $a_n>0$ for all $n \geq N$. By $(a_n) > (b_n)$ we mean $(a_n-b_n)>(0)$ of course. For cosets, we say $(a_n)+\overline{\mathbb{O}}>\overline{\mathbb{O}}$ if $(x_n) > (0)$ for some $(x_n) \in (a_n)+\overline{\mathbb{O}}$. This is well defined. That is, if $(x_n)>(0)$, then $(y_n)>(0)$ for all $(y_n) \in (a_n)+\overline{\mathbb{O}}$.

With these operations being defined, it can be verified that $\overline{\mathbb{Q}}/\overline{\mathbb{O}}$ has the desired properties, for example, least-upper-bound property. But this goes too far from the topic, we are not proving it here. If you are interested, you may visit here for more details.

Finally, we are trying to make $L^p$ a Banach space. Fix $p$ with $1 \leq p < \infty$. There is a seminorm defined for all Lebesgue measurable functions on $[0,1]$ by

$L^p$ is a vector space containing all functions $f$ with $p(f)<\infty$. But it’s not a normed space by $p$, since $p(f)=0$ only implies $f=0$ almost everywhere. However, the set $N$ which contains all functions that equals to $0$ is also a vector space. Now consider the quotient space by

where $\pi$ is the canonical map of $L^p$ into $L^p/N$. We shall prove that $\tilde{p}$ is well-defined here. If $\pi(f)=\pi(g)$, we have $f-g \in N$, therefore

which forces $p(f)=p(g)$. Therefore in this case we also have $\tilde{p}(\pi(f))=\tilde{p}(\pi(g))$. This indeed ensures that $\tilde{p}$ is a *norm*, and $L^p/N$ a Banach space. There are some topological facts required to prove this, we are going to cover a few of them.

We know if $X$ is a topological vector space with a topology $\tau$, then the addition and scalar multiplication is continuous. Suppose now $N$ is a closed subspace of $X$. Define $\tau_N$ by

We are expecting $\tau_N$ to be properly-defined. And fortunately it is. Some interesting techniques will be used in the following section.

There will be two steps to get this done.

$\tau_N$ is a topology.

It is trivial that $\varnothing$ and $X/N$ are elements of $\tau_N$. Other properties are immediate as well since we have

and

That said, if we have $A,B\in \tau_N$, then $A \cap B \in \tau_N$ since $\pi^{-1}(A \cap B)=\pi^{-1}(A) \cap \pi^{-1}(B) \in \tau$.

Similarly, if $A_\alpha \in \tau_N$ for all $\alpha$, we have $\cup A_\alpha \in \tau_N$. Also, by definition of $\tau_N$, $\pi$ is continuous.

$\tau_N$ is a vector topology.

First, we show that a point in $X/N$, which can be written as $\pi(x)$, is closed. Notice that $N$ is assumed to be closed, and

therefore has to be closed.

In fact, $F \subset X/N$ is $\tau_N$-closed if and only if $\pi^{-1}(F)$ is $\tau$-closed. To prove this, one needs to notice that $\pi^{-1}(F^c)=(\pi^{-1}(F))^{c}$.

Suppose $V$ is open, then

is open. By definition of $\tau_N$, we have $\pi(V) \in \tau_N$. Therefore $\pi$ is an open mapping.

If now $W$ is a neighborhood of $0$ in $X/N$, there exists a neighborhood $V$ of $0$ in $X$ such that

Hence $\pi(V)+\pi(V) \subset W$. Since $\pi$ is open, $\pi(V)$ is a neighborhood of $0$ in $X/N$, this shows that the addition is continuous.

The continuity of scalar multiplication will be shown in a direct way (so can the addition, but the proof above is intended to offer some special technique). We already know, the scalar multiplication on $X$ by

is continuous, where $\Phi$ is the scalar field (usually $\mathbb{R}$ or $\mathbb{C}$. Now the scalar multiplication on $X/N$ is by

We see $\psi(\alpha,x+N)=\pi(\varphi(\alpha,x))$. But the composition of two continuous functions are continuous, therefore $\psi$ is continuous.

We are going to talk about a classic commutative diagram that you already see in algebra class.

There are some assumptions.

- $X$ and $Y$ are topological vector spaces.
- $\Lambda$ is linear.
- $\pi$ is the canonical map.
- $N$ is a closed subspace of $X$ and $N \subset \ker\Lambda$.

Algebraically, there exists a unique map $f: X/N \to Y$ by $x+N \mapsto \Lambda(x)$. Namely, the diagram above is commutative. But now we are interested in some analysis facts.

$f$ is linear.

This is obvious. Since $\pi$ is **surjective**, for $u,v \in X/N$, we are able to find some $x,y \in X$ such that $\pi(x)=u$ and $\pi(y)=v$. Therefore we have

and

$\Lambda$ is open if and only if $f$ is open.

If $f$ is open, then for any open set $U \subset X$, we have

to be a open set since $\pi$ is open, and $\pi(U)$ is a open set.

If $f$ is not open, then there exists some $V \subset X/N$ such that $f(V)$ is closed. However, since $\pi$ is continuous, we have $\pi^{-1}(V)$ to be open. In this case we have

to be closed. $\Lambda$ is therefore not open. This shows that if $\Lambda$ is open, then $f$ is open.

$\Lambda$ is continuous if and only if $f$ is continuous.

If $f$ is continuous, for any open set $W \subset Y$, we have $\pi^{-1}(f^{-1}(W))=\Lambda^{-1}(W)$ to be open. Therefore $\Lambda$ is continuous.

Conversely, if $\Lambda$ is continuous, for any open set $W \subset Y$, we have $\Lambda^{-1}(W)$ to be open. Therefore $f^{-1}(W)=\pi(\Lambda^{-1}(W))$ has to be open since $\pi$ is open.

]]>已经二十五岁，已经不再长个子，家里人决定将我下葬。其实也没什么好抗拒的了，再听一会半通不通的悼词，那些乱七八糟的哭腔和应和，再忍一会棺材里的寒冷，就过去了。这是不是妄想我也不清楚，可我想我大抵是这次死亡的一个配角。

这是不是一场噩梦也无关紧要，尸胺的气味是不是从我身上发出也不重要，没有怀疑的必要。这无非是我第三次忍受，心甘情愿的忍受。为什么要挣扎？我也不是十八年前那个体弱多病的孩子，还可以四处活动。我明明是一个尽量保持体面的死者。可能在几天前还可以闻到紫罗兰的芳香。

尘归尘土归土。我可能会有一些伤感，我的精神存在早就收到了严重的打击，或许这尸臭是一个原因。或者是要下葬这一个事实。还是这样被埋掉吧，那气味是实实在在的。我也没有力气，我的神经系统——虽然我不知道是否还存在——也不听我的使唤。我没有力气挣扎。

我不应该感到沾沾自喜，不过这也确实没啥，第二场关于死亡的梦。这也可以是一场货真价实的死亡。我还是主角。十八年，或许只有十八天，我的棺材和我身体一起生长，一直到时间和棺材里的尸臭糅合起来变成半透明且柔软的胶状物，我的躯壳也被定在了里面。可能会有融化的一天。

《蓝狗的眼睛·第三次忍受》

La tercera resignación

Pic by https://www.deviantart.com/insaneattraction/art/La-tercera-resignacion-57045720.

]]>An open map is a function between two topological spaces that maps open sets to open sets. Precisely speaking, a function $f: X \to Y$ is open if for any open set $U \subset X$, $f(U)$ is open in $Y$. Likewise, a closed map is a function mapping closed sets to closed sets.

You may think open/closed map is an alternative name of continuous function. But it’s not. The definition of open/closed mapping is totally different from continuity. Here are some simple examples.

- $f(x)=\sin{x}$ defined on $\mathbb{R}$ is not open, though it’s continuous. It can be verified by considering $(0,2\pi)$, since we have $f((0,2\pi))=[-1,1]$.
- The projection $\pi: \mathbb{R}^2 \to \mathbb{R}$ defined by $(x,y) \mapsto x$ is open. Indeed, it maps an open ball onto an open interval on $x$ axis.
- The inclusion map $\varphi: \mathbb{R} \to \mathbb{R}^2$ by $x \mapsto (x,0)$ however, is not open. An open interval on the plane is
*locally closed*but not open or closed.

Under what condition will a continuous linear function between two TVS be an open mapping? We’ll give the answer in this blog post. Open mapping theorem is a sufficient condition on whether a continuous linear function is open.

Let $X,Y$ be Banach spaces and $T: X \to Y$ a

surjectivebounded linear map. Then $T$ is an open mapping.

The open balls in $X$ and $Y$ are defined respectively by

All we need to do is show that there exists some $r>0$ such that

Since every open set in $X$ or $Y$ can be expressed as a union of open balls. For a ball in $X$ centered at $x \in X$ with radius $r$, we can express it as $x+B_r^X$. After that, it becomes obvious that $T$ maps open set to open set.

First we have

The surjectivity of $T$ ensures that

Since $Y$ is Banach, or simply a complete metric space, by Baire category theorem, there must be some $n_0 \in \mathbb{N}$ such that $\overline{T(B_{n_0}^{X})}$ has nonempty interior. If not, which means $T(B_n^{X})$ is nowhere dense for all $n \in \mathbb{N}$, we have $Y$ is of the first category. A contradiction.

Since $x \to nx$ is a homeomorphism of $X$ onto $X$, we see in fact $T(B_n^X)$ is not nowhere dense for all $n \in \mathbb{N}$. Therefore, there exists some $y_0 \in \overline{T(B_1^{X})}$ and some $\varepsilon>0$ such that

the open set on the left hand is a neighborhood of $y_0$, which should be in the interior of $\overline{T(B_1^X)}$.

On the other hand, we claim

We shall prove it as follows. Pick any $y \in \overline{T(B_1^X)}$, we shall show that $y-y_0 \in \overline{T(B_2^X)}$. For $y_0$, there exists a sequence of $y_n$ where $\lVert y_n \rVert <1$ for all $n$ such that $Ty_n \to y_0$. Also we are able to find a sequence of $x_n$ where $\lVert x_n \rVert <1$ for all $n$ such that $Tx_n \to y$. Notice that we also have

since

we see $T(x_n-y_n) \in T(B_2^X)$ for all $n$, it follows that

Combining all these relations, we get

Since $T$ is linear, we see

By induction we get

for all $n \geq 1$.

We shall show however

For any $u \in B_{\varepsilon/4}^Y$, we have $u \in \overline{T(B_{1/2}^X)}$. There exists some $x_1 \in B_{1/2}^{X}$ such that

This implies that $u-Tx_1 \in B_{\varepsilon/8}^Y$. Under the same fashion, we are able to pick $x_n$ in such a way that

where $\lVert x_n \rVert<2^{-n}$. Now let $z_n=\sum_{k=1}^{n}x_k$, we shall show that $(z_n)$ is Cauchy. For $m<n$, we have

Since $X$ is Banach, there exists some $z \in X$ such that $z_n \to z$. Further we have

therefore $z \in B_1^X$. Since $T$ is bounded, therefore continuous, we get $T(z)=u$. To summarize, for $u \in B_{\varepsilon/4}^Y$, we have some $z \in B_{1}^X$ such that $T(z)=y$, which implies $T(B_1^X) \supset B_{\varepsilon/4}^Y$.

Let $U \subset X$ be open, we want to show that $T(U)$ is also open. Take $y \in T(U)$, then $y=T(x)$ with $x \in U$. Since $U$ is open, there exists some $\varepsilon>0$ such that $B_{\varepsilon}^{X}+x \subset U$. By the linearity of $T$, we obtain $B_{r\varepsilon}^Y \subset T(B_{\varepsilon}^X)$ for some small $r$. Using the linearity of $T$ again, we obtain

which shows that $T(U)$ is open, therefore $T$ is an open mapping.

One have to notice that the completeness of $X$ and $Y$ has been used more than one time. For example, the existence of $z$ depends on the fact that Cauchy sequence converges in $X$. Also, the surjectivity of $T$ cannot be omitted, can you see why?

There are some different ways to state this theorem.

- To every $y$ with $\lVert y \rVert < \delta$, there corresponds an $x$ with $\lVert x \rVert<1$ such that $T(x)=y$.
- Let $U$ and $V$ be the open unit balls of the Banach spaces $X$ and $Y$. To every surjective bounded linear map, there corresponds a $\delta>0$ such that

You may also realize that we have used a lot of basic definitions of topology. For example, we checked the openness of $T(U)$ by using neighborhood. The set $\overline{T(B_1^X)}$ should also remind you of limit point.

The difference of open mapping and continuous mapping can be viewed via the topologies of two topological vector spaces. Suppose $f: X \to Y$. If for any $U \in \tau_X$, we have $f(U) \in \tau_Y$, where $\tau_X$ and $\tau_Y$ are the topologies of $X$ and $Y$, respectively. But this has nothing to do with continuity. By continuity we mean, for any $V \in \tau_Y$, we have $f^{-1}(V) \in \tau_U$.

Fortunately, this theorem can be generalized to $F$-spaces, which will be demonstrated in the following blog post of the series. A space $X$ is an $F$-space if its topology $\tau$ is induced by a complete invariant metric $d$. Still, completeness plays a critical rule.

- The Big Three Pt. 1 - Baire Category Theorem Explained
- The Big Three Pt. 2 - The Banach-Steinhaus Theorem
- The Big Three Pt. 3 - The Open Mapping Theorem (Banach Space)
- The Big Three Pt. 4 - The Open Mapping Theorem (F-Space)
- The Big Three Pt. 5 - The Hahn-Banach Theorem (Dominated Extension)
- The Big Three Pt. 6 - Closed Graph Theorem with Applications

Before we go into group theory, let’s recall how Cauchy sequence is defined in analysis.

A sequence $(x_n)_{n=1}^{\infty}$ of real/complex numbers is called a Cauchy sequence if, for every $\varepsilon>0$, there is a positive integer $N$ such that for all $m,n>N$, we have

That said, the **distance** between two numbers is always ‘too close’. Notice that only distance is involved, the definition of Cauchy sequence in metric space comes up in the natural of things.

Given a metric space $(X,d)$, a sequence $(x_n)_{n=1}^{\infty}$ is Cauchy if for every real number $\varepsilon>0$, there is a positive integer $N$ such that, for all $m,n>N$, the distance by

By considering the topology induced by metric, we see that $x_n$ lies in a neighborhood of $x_m$ with radius $\varepsilon$. But a topology can be constructed by neighborhood, hence the Cauchy sequence for topological vector space follows.

For a topological vector space $X$, pick a local base $\mathcal{B}$, then $(x_n)_{n=1}^{\infty}$ is a Cauchy sequence if for each member $U \in \mathcal{B}$, there exists some number $N$ such that for $m,n>N$, we have

But in a topological space, it’s not working. Consider two topological space by

with usual topology. We have $X \simeq Y$ since we have the map by

as a homeomorphism. Consider the Cauchy sequence $(\frac{1}{n+1})_{n=1}^{\infty}$, we see $(h(\frac{1}{n+1}))_{n=1}^{\infty}=(n+1)_{n=1}^{\infty}$ which is not Cauchy. This counterexample shows that being a Cauchy sequence is not preserved by homeomorphism.

Similarly, one can have a Cauchy sequence in a topological group (bu considering subtraction as inverse).

A sequence $(x_n)_{n=1}^{\infty}$ in a topological group $G$ is a Cauchy sequence if for every open neighborhood $U$ of the identity $G$, there exists some number $N$ such that whenever $m,n>N$, we have

A metric space $(X,d)$ where every Cauchy sequence converges is complete.

Spaces like $\mathbb{R}$, $\mathbb{C}$ are complete with Euclid metric. But consider the sequence in $\mathbb{Q}$ by

we have $a_n\in\mathbb{Q}$ for all $n$ but the sequence does not converge in $\mathbb{Q}$. Indeed in $\mathbb{R}$ we can naturally write $a_n \to e$ but $e \notin \mathbb{Q}$ as we all know.

There are several ways to construct $\mathbb{R}$ from $\mathbb{Q}$. One of the most famous methods is Dedekind’s cut. However you can find no explicit usage of Cauchy sequence. There is another method by using Cauchy sequence explicitly. We are following that way algebraically.

Suppose we are given a group $G$ with a sequence of normal subgroups $(H_n)_{n=1}^{\infty}$ with $H_n \supset H_{n+1}$ for all $n$, all of which has finite index. We are going to complete this group.

A sequence $(x_n)_{n=1}^{\infty}$ in $G$ will be called **Cauchy sequence** if given $H_k$, there exists some $N>0$ such that for $m,n>N$, we have

Indeed, this looks very similar to what we see in topological group, but we don’t want to grant a topology to the group anyway. This definition does not go to far from the original definition of Cauchy sequence in $\mathbb{R}$ as well. If you treat $H_k$ as some ‘small’ thing, it shows that $x_m$ and $x_n$ are close enough (by considering $x_nx_m^{-1}$ as their difference).

A sequence $(x_n)_{n=1}^{\infty}$ in $G$ will be called **null sequence** if given $k$, there exists some $N>0$ such that for all $n>N$, we have

or you may write $x_ne^{-1} \in H_k$. It can be considered as being *arbitrarily close to the identity $e$*.

The Cauchy sequences (of $G$) form a group under termwise product

*Proof.* Let $C$ be the set of Cauchy sequences, we shall show that $C$ forms a group. For $(x_1,x_2,\cdots),(y_1,y_2,\cdots)\in C$, the product is defined by

The associativity follows naturally from the associativity of $G$. To show that $(x_1y_1,x_2y_2,\cdots)$ is still a Cauchy sequence, notice that for big enough $m$, $n$ and some $k$, we have

But $(x_ny_n)(x_my_m)^{-1}=x_ny_ny_m^{-1}x_m^{-1}$. To show that this is an element of $H_k$, notice that

Since $y_ny_m^{-1}\in H_k$, $H_k$ is normal, we have $x_ny_ny_mx_n^{-1} \in H_k$. Since $x_nx_m^{-1} \in H_k$, $(x_ny_n)(x_my_m)^{-1}$ can be viewed as a product of two elements of $H_k$, therefore is an element of $H_k$.

Obviously, if we define $e_C=(e_G,e_G,\cdots)$, where $e_G$ is the identity of $G$, $e_C$ becomes the identity of $C$, since

Finally the inverse. We need to show that

is still an element of $C$. This is trivial since if we have

then

as $H_k$ is a group.

The null sequences (of $G$) form a group, further, it’s a normal subgroup of $C$, that is, the group of Cauchy sequences.

Let $N$ be the set of null sequences of $G$. Still, the identity is defined by $(e_G,e_G,\cdots)$, and there is no need to duplicate the validation. And the associativity still follows from $G$. To show that $N$ is closed under termwise product, namely if $(x_n),(y_n) \in N$, then $(x_ny_n)\in N$, one only need to notice that, for big $n$, we already have

Therefore $x_ny_n \in H_k$ since $x_n$ and $y_n$ are two elements of $H_k$.

To show that $(x_n^{-1})$, which should be treated as the inverse of $(x_n)$, is still in $N$, notice that if $x_n \in H_k$, then $x_n^{-1} \in H_k$.

Next, we shall show that $N$ is a subgroup of $C$, which is equivalent to show that every null sequence is Cauchy. Given $H_p \supset H_q$, for $(x_n)\in{N}$, there are some big enough $m$ and $n$ such that

therefore

as desired. Finally, pick $(p_n) \in N$ and $(q_n) \in C$, we shall show that $(q_n)(p_n)(q_n)^{-1} \in N$. That is, the sequence $(q_np_nq_n^{-1})$ is a null sequence. Given $H_k$, we have some big $n$ such that

therefore

since $H_k$ is normal. Our statement is proved.

The factor group $C/N$ is called the

completionof $G$ (with respect to $(H_n)$).

As we know, the elements of $C/N$ are cosets. A coset can be considered as an element of $G$’s completion. Let’s head back to some properties of factor group. Pick $x,y \in C$, then $xN=yN$ if and only if $x^{-1}y \in N$. With that being said, two Cauchy sequences are equivalent if their ‘difference’ is a null sequence.

Informally, consider the addictive group $\mathbb{Q}$. There are two Cauchy sequence by

They are equivalent since

is a null sequence. That’s why people say $0.99999… = 1$ (in analysis, the difference is convergent to $0$; but in algebra, we say the two sequences are equivalent). Another example, $\ln{2}$ can be represented by the equivalent class of

We made our completion using Cauchy sequences. The completion is filled with some Cauchy sequence and some additions of ‘nothing’, whence the gap disappears.

Again, the sequence of normal subgroups does not have to be indexed by $\mathbb{N}$. It can be indexed by any directed partially ordered set, or simply partially ordered set. Removing the restriction of index set gives us a great variety of implementation.

However, can we finished everything about completing $\mathbb{Q}$ using this? The answer is, no - the multiplication is not verified! To finish this, field theory have to be taken into consideration.

]]>A real-valued function $f(t)$ of a real variable, defined on some neighborhood of $0$, is said to be of $o(t)$ if

And its derivative at some point $a$ is defined by

We also have this equivalent equation:

Now suppose $f:U \subset \mathbb{R}^n \to \mathbb{R}^m$ where $U$ is an open set. The function $f$ is differentiable at $x_0 \in U$ if satisfying the following conditions.

All partial derivatives of $f$, i.e. $\frac{\partial f_i}{\partial x_j}$ exists for all $i=1,\cdots,m$ and $j = 1,\cdots,n$ at $f$. (Which ensures that the Jacobian matrix exists and is well-defined).

The Jacobian matrix $J(x_0)\in\mathbb{R}^{m\times n}$ satisfies

In fact the Jacobian matrix has been the derivative of $f$ at $x_0$ although it’s a matrix in lieu of number. But we should treat a number as a matrix in the general case. In the following definition of Fréchet derivative, you will see that we should treat

*something*as linear functional.

Let $f:U\to\mathbf{F}$ be a function where $U$ is an open subset of $\mathbf{E}$. We say $f$ is Fréchet differentiable at $x \in U$ if there is a bounded and **linear operator ** $\lambda:\mathbf{E} \to \mathbf{F}$ such that

We say that $\lambda$ is the **derivative** of $f$ at $x$, which will be denoted by $Df(x)$ or $f’(x)$. Notice that $\lambda \in L(\mathbf{E},\mathbf{F})$. If $f$ is differentiable at every point of $f$, then $f’$ is a map by

The definition above doesn’t go too far from real functions defined on the real axis. Now we are assuming that both $\mathbf{E}$ and $\mathbf{F}$ are merely topological vector spaces, and still we can get the definition of Fréchet derivative (generalized).

Let $\varphi$ be a mapping of a neighborhood of $0$ of $\mathbf{E}$ into $\mathbf{F}$. We say that $\varphi$ is **tangent to** $0$ if given a neighborhood $W$ of $0$ in $\mathbf{F}$, there exists a neighborhood $V$ of $0$ in $\mathbf{E}$ such that

for some function of $o(t)$. For example, if both $\mathbf{E}$ and $\mathbf{F}$ are normed (not have to be Banach), then we get a usual condition by

where $\lim_{\lVert x \rVert \to 0}\psi(x)=0$.

Still we assume that $\mathbf{E}$ and $\mathbf{F}$ are topological vector spaces. Let $f:U \to \mathbf{F}$ be a continuous map. We say that $f$ is differentiable at a point $x \in U$ if there exists some $\lambda \in L(\mathbf{E},\mathbf{F})$ such that for small $y$ we have

where $\varphi$ is tangent to $0$. Notice that $\lambda$ is uniquely determined.

You must be familiar with some properties of derivative, but we are redoing these in Banach space.

If $f: U \to V$ is differentiable at $x_0$, and $g:V \to W$ is differentiable at $f(x_0)$, then $g \circ f$ is differentiable at $x_0$, and

*Proof.* We are proving this in topological vector space. By definition, we already have some linear operator $\lambda$ and $\mu$ such that

where $\varphi$ and $\psi$ are tangent to $0$. Further, we got

To evaluate $g(f(x_0+y))$, notice that

It’s clear that $\mu\circ\varphi(y)+\psi(\lambda{y}+\varphi(y))$ is tangent to $0$, and $\mu\circ\lambda$ is the linear map we are looking for. That is,

From now on, we are dealing with Banach spaces. Let $U$ be an open subset of $\mathbf{E}$, and $f:U \to \mathbf{F}$ be differentiable at each point of $U$. If $f’$ is continuous, then we say that $f$ is **of class** $C^1$. The function of order $C^p$ where $p \geq 1$ is defined inductively. The $p$-th derivative $D^pf$ is defined as $D(D^{p-1}f)$ and is itself a map of $U$ into $L(\mathbf{E},L(\mathbf{E},\cdots,L(\mathbf{E},\mathbf{F})\cdots)))$ which is isomorphic to $L^p(\mathbf{E},\mathbf{F})$. A map $f$ is said to be **of class** $C^p$ if its $kth$ derivative $D^kf$ exists for $1 \leq k \leq p$, and is continuous. With the help of chain rule, and the fact that the composition of two continuous functions are continuous, we get

Let $U,V$ be open subsets of some Banach spaces. If $f:U \to V$ and $g: V \to \mathbf{F}$ are of class $C^p$, then so is $g \circ f$.

We in fact get a category ${(U,f_U)}$ where $U$ is the object as an open subset of some Banach space, and $f_U$ is the morphism as a map of class $C^p$ mapping $U$ into another open set. To verify this, one only has to realize that the composition of two maps of class $C^p$ is still of class $C^p$ (as stated above).

We say that $f$ is of class $C^\infty$ if $f$ is of class $C^p$ for all integers $p \geq 1$. Meanwhile $C^0$ maps are the continuous maps.

We are going to evaluate the Fréchet derivative of a nonlinear functional. It is the derivative of a functional mapping an infinite dimensional space into $\mathbb{R}$ (instead of $\mathbb{R}$ to $\mathbb{R}$).

Consider the functional by

where the norm is defined by

For $u\in C[0,1]$, we are going to find an linear operator $\lambda$ such that

where $\varphi(\eta)$ is tangent to $0$.

*Solution.* By evaluating $\Gamma(u+\eta)$, we get

To prove that $\int_{0}^{1}\eta^2\sin{x}dx$ is the $\varphi(\eta)$ desired, notice that

Therefore we have

as desired. The Fréchet of $\Gamma$ at $u$ is defined by

It’s hard to believe but, the derivative is not a number, not a matrix, but a linear operator. But conversely, one can treat a matrix or a number as a linear operator effortlessly.

]]>This blog serves as an introduction to profinite groups without touching anything other than elementary group theory (no ring, field, Galois theory, topological group, etc.), though we may not be able to go for further application.

We begin with an easy-to-understand motivation by introducing $\mathbb{Z}_p$. Consider the binary expansion of an integer

where $a_k=0,1$. For example we may have

You must be familiar with binary expansion if you write codes. As a topology exercise, show that the set containing all such $a$ is uncountable. In the octal number system you may also write

This notation is pretty useful in some real life occasions but not here. We are looking for connections between number systems and **prime** numbers (you will see why later)), but number systems with bases like $8,10,16$ definitely won’t work.

Fix a prime number $p$, a $p$-adic integer $\alpha$ is defined by a sequence of integers $x_k$ for which we writes

satisfying

For example, we write $88$ as a $2$-adic number by

As you may realize, $x_k$ can be written by

where $a_i=0,1,\cdots,p-1$ for $i \leq k-1$, where $a_i$ is called the $p$-adic digits.

In fact, if we define the addition componentwise, i.e.

then it’s a group. Further, if we define the multiplication componentwise, we get a ring. The group of all $p$-adic numbers is denoted by $\mathbb{Z}_p$. But this blog won’t touch anything other than group theory.

As you may wonder, it doesn’t seems to work for ‘negative’ one. For example if we have

how do we get $-\alpha$? In fact we have

which suggests the limit value of $x_k$ associated to $-\alpha$ as $k\to\infty$ is

It doesn’t converge in the usual sense. But if it does, we have

But this is valid under such circumstance. We can check this using $p$-adic digits. In fact, the $p$-adic digits of $1$ is

if we add $p-1$ to each component, we get

(there are infinitely many $p-1$!).

With all these stuff being said, you can treat $\alpha=(x_1,x_2,\cdots)$ as a **limit**:

which makes everything natural. We are not digging into $\mathbb{Z}_p$ further. But keep two words in mind: limit and group.

The definition of $\mathbb{Z}_p$ by $x_{k+1} \equiv x_k\mod{p^k}$ might remind you of $\mathbb{Z}/p^k\mathbb{Z}$. Let’s give a review of $\mathbb{Z}/p^k\mathbb{Z}$.

For integers $x,y$, we have

if $x \in p^k\mathbb{Z}$. Further we have

if $(x-y)\in p^k\mathbb{Z}$. We also may write $x \equiv y \mod p^k$. So there are infinitely many $x_{k+1} \equiv x_k \mod p^k\mathbb{Z}$, shall we associate infinitely many $\mathbb{Z}/p^k\mathbb{Z}$? If it works, we may treat $\mathbb{Z}_p$ as the ‘limit’ of $\mathbb{Z}/p^k\mathbb{Z}$. But we need some proper **operation** to do that.

Let $G_n=\mathbb{Z}/p^{n+1}\mathbb{Z}$ for each $n \geq 0$. Let

be the canonical homomorphism. Notice that $f_n$ is surjective. Now consider a $p$-adic integer

we have

Therefore we got a expression of $\mathbb{Z}_p$ by

We will write $\mathbb{Z}_p=\varprojlim\mathbb{Z}/p^n\mathbb{Z}$ since it’s an example of a inverse limit. It’s *inverse* since $f_n$ goes ‘back’ by associating each $x_n$ to $x_{n-1}$. Since $f_n$ is **surjective**, we can always raise $x_{n-1}$ to $G_{n}$ via $f_{n}$. We treated one group as a limit of a sequence of groups. We don’t want to limit ourself in number theory. In the following section we are offering a much more generalized definition where even numbers are generalized.

We are going to give a generalized definition for profinite group. Notice that in the example of $\mathbb{Z}_p$, the sequence is indexed by $\mathbb{N}$. It’s easy to understand but this index set prevents profinite group from being further applied. Of course, the index $\mathbb{N}$ is not excluded.

A set $I$ is **directed partially ordered** if it’s associated with a partial order $\geq$ such that for any two elements $i,j \in I,$ there exists a $k \in I$ such that $k \geq i$ and $k \geq j$.

$\mathbb{Z}$ with the natural inequality is of course directed partially ordered. However we can define another partial order by division. If we define $n \geq m$ if $m|n$, then we have $\operatorname{lcm}(m,n) \geq m,n$.

As another example, consider the family of all subgroups $\mathcal{F}$ of a group $G$. The partial order is defined by inclusion. i.e. for $M,N \in \mathcal{F}$, we have $M \geq N$ if $M \supset N$. In this case $M \cup N \geq M,N$.

A **projective system** is a collection of groups $G_i$ ($i \in I$), together with group homomorphisms $f^{j}_i: G_j \to G_i$ for $i,j\in {I}$ with $j \geq i$ such that

- $f_{i}^{i}=\operatorname{id}_{G_i}$ for every $i \in I$.
- $f_{i}^{j}\circ f_{j}^{k}=f_{i}^{k}$ for $k \geq j \geq i$.

Given any such projective system with a directed partially ordered index set, we has the **inverse limit** (or projective limit) defined by

It’s easy to see that $\mathbb{Z}_p$ can be defined with the same manner, although we have $I = \mathbb{N}$. It also can be verified that the inverse limit forms a group (also topological group, but we are not discussing that here).

A group is **profinite** if it is a **pro**jective limit of **finite** groups (up to isomorphism).

For any $g \in \mathbb{N}_+$, it would be interesting to consider the following projective limit by

It can be verified that we have

That said, base $8$ number system is ‘useless’ since it’s isomorphic to base $2$ number system. That’s why we focus on prime first. We will give another ‘generalization’ of $p$-adic numbers.

Suppose we have a sequence of normal subgroups $(H_n)$ of $G$ such that $H_n \supset H_{n+1}$ for all $n$. It doesn’t matter whether $G$ is finite. Let

be the canonical homomorphisms. Then the inverse limit follows:

We also have a natural homomorphism

by sending $x$ to the sequence $(x_i)$, where $x_n$ is the image of $x$ in $G/H_n$. Notice that we don’t have to use $\mathbb{N}$ as the index. This inverse limit can also be indexed by the set containing all $H_n$.

You may think this is like, algebraists stole something from analysts and made it up with the magic of algebra. There are many other applications that I want to show you in the future (not beyond elementary group theory). If you learned functional analysis you may know that $L^p$ space for $1 \leq p < \infty$ is not a Banach space due to the functions equal to $0$ a.e.. But $L^p/N$ can be a Banach space where $N$ contains all functions equal to $0$ a.e.. Both $L^p$ and $N$ are groups, and we ‘completed’ $L^p$ by defining a factor space which still is a group. In fact, in algebra, we also have **Cauchy sequence** and **completion** of a group, which are associated with inverse limit still.

- Luis Ribes,
*Introduction to Profinite Groups* - Hendrik Lenstra,
*Profinite Groups* - Serge Lang,
*Algebra Revised Third Edition*

我们要讨论这两种方程的普遍的解决办法

其中$a_i$为常数。在上篇博客里，我们见到了，假设我们能求出第一个方程，那么第二个方程的解，可以用Cramer法则，通过解普通的线性方程组，再进行积分得到。但是我们假设自己有这个“超能力”，并没有实际的操作方法。这篇博客会赋予我们这个“超能力”。当然也不仅仅是如何快速准确解出方程，更重要的是，能看到经典理论之间朴素而又巧妙的联系。

这篇博客的方法基于多项式，我想你至少在微积分课上已经知道一些简单的对于多项式的处理了。这里要用到古典代数学基本定理，也就是说

任何一个非零的一元$n$次复系数多项式，都正好有$n$个复数根（重根视为多个根）。

在这里最适合讨论的应该是

确实，再简单就是普通的不定积分了。我们再来回顾一下怎么分析这个方程的解的结构。首先，$y=0$显然是一个解，这保证了解的准确性。另一方面，在第一篇博客里，我们也给出了普遍的通解计算方式

所以，这个方程的解为

其中$C$为任意常数。

我们也可以尝试一下简单的二阶方程

显然，$y=0$仍然是这个方程的一个特殊解。我们希望的是，能解两次$y’+p(x)y=0$形式的方程，因为这种方程的解法我们是已经会了的。

注意到我们可以把方程写成

令$u=y’-y$，那么我们有了

这个方程的解我们是知道的，实际上我们已经有

所以又有了

从而又可以解出

对于一阶方程，就是我们已经学过的办法。而二阶方程，你应该已经察觉到了，似乎有迹可循。注意到，$y’’-2y’+y=0$这个方程，我们是解了两次$y’-y$。这时自然可以想到，对于某个方程可能也可以像是解两次$y’-2y$。注意如果我们把求两次导看成一个”平方“，我们有一个方程

那么能不能写出一个需要解两次$y’-2y$的方程呢？我们先写出一个关于$\lambda$的方程，再给对应一个微分方程，也就是说

而对应的方程恰好又有

仍然是要解两次$y’-2y=0$。类似地，也可能某个方程是需要解$n$次$y-3y’$，等等。我们也可以讨论“混合”的场景。比如一个二阶方程，需要先解一次$y-ay’$，再解一次$y-by’$。如果两个颠倒，会不会又有不一样的结果？我们甚至不需要解出结果就可以进行分析。实际上有

我们会给出一个解决常系数齐次方程的普遍办法。我们已经知道，求导是一个线性运算。对一个可导函数求导，得到一个新的函数。那么我们把$y’$记为$Dy$，其中$D$代表线性运算，对于高阶求导，不妨记$y^{(n)}=D^ny$。如果对函数不求导，也就是$D^0y$，我们可以记成$Iy$或者$I$省略不写。

那么如果我们已经有

也就是说

那么我们又得到一个多项式

所以原方程又可以写成

那么这和上面的例子又有什么关系呢？注意，如果$a_1,\cdots,a_n$为复数，那么$P(D)$总是可以写成下面的形式

其中$\lambda_i$两两之间可以相等，可以不相等。我们再来看$y’’-2y’+y=0$这个例子。这个时候借助$P(D)$可以把方程写成

那么在这个角度下审视这个方程的解法，我们可以设$u=(D-1)y$，通过解$(D-1)u=0$，解出$u$，又解$(D-1)y=u$，就得到了$y$。

那么普遍的解法我们已经有了，实际上，这是一个递归的办法。

对于方程

只需要设$\varphi_1=(D-\lambda_2)\cdots(D-\lambda_n)y$，然后解$(D-\lambda_1)\varphi_1=0$，解出$\varphi_1$；再按照同样的办法进行下去，设$\varphi_2=(D-\lambda_3)\cdots(D-\lambda_n)y$，解出$\varphi_2$，一直进行下去，最后设$\varphi_n=y$，接出来的就是最终结果。这时你已经获得了上篇博客里需要的“超能力”了（注意：这里的$\varphi_n$里已经包含了$n$个常数）。

以上是齐次线性方程的解决办法。对于非齐次线性方程又有什么普遍办法呢？具体办法有三。

- 如果可以很轻松地观察出方程的一个特解，比如存在$\mu(x)$使得$P(D)\mu(x)=f(x)$，那么非齐次方程的解就是$\mu(x)+\varphi_n(x)$。
- 直接解$P(D)y=f(x)$，方法和上面齐次方程一样，只需要注意，解$\varphi_1$时有$(D-\lambda_1)\varphi_1=f(x)$，递归下去得到的解和方法1是一致的。
- 利用上篇博客的办法。注意到最后得到的$\varphi_n(x)$里有$n$个常数，也就是说可以写成$\varphi_n(x)=\sum_{k=1}^{n}C_ku_k(x)$，这里的$u_k(x)$实际上就是所求的基础解系。

**Step 1: 化简$P(D)$**

这很简单，实际上我们有

**Step 2: 递归求解**

设

我们就有

解出

所以又设

又有

得到

最后，解

得

其实不难发现，$y=-1$是这个方程的一个特解，而解$y’’’-3y’’+3y’-y=0$得到$y=C_1x^2e^x+C_2xe^x+C_3e^x$，得到的结果和上面的办法是一样的。

注意到我们还可以把这个方程写成

既然我们已经有了机械的解法，那么我们能不能在动手解之前总结一下解的情况？这自然是可行的。我们会尝试讨论$P(D)$的所有基本情形。这里省略了最基本的计算，但是这些计算无非是最基本的一阶方程。

对于这种方程，我们最终要做的是解$n$次$y’-\lambda y$型一阶方程。为了解决这种方程，我们定义

那么只需要解$n$次$(D-\lambda)\varphi_{k+1}=\varphi_k$即可。通过简单的计算，得到

我们自然希望能得到类似于有$e^{\lambda_1 x},\cdots,e^{\lambda_n x}$的形式，那么事实是怎样呢？我们可以直接进行运算。

首先我们有

那么解

就能得到

整理之后就有

如果我们继续计算下去下去，就能得到

这自然是情况1和情况2的整合。在情况1里我们意识到，如果相邻的$\lambda_i$相等，那么我们有$\varphi_{i+1}=x\varphi_i+C_{i+1}e^{\lambda_{i}x}$；如果相邻的$\varphi(x)$不相等，那么我们有$\varphi_{i+1}=\varphi_i+C_{i+1}e^{\lambda_{i+1}x}$。这两个结论整合起来，再经过简单的计算，就能总结出解的一般形式：

这个时候我们解方程就可以轻松许多了。例如方程

我们能得到

所以解就是

再比如求解

注意到

所以解就是

在这整篇博客里，我们只做了一件事情——对一个高阶微分方程进行化简，使求一个高阶方程变成求若干个一阶方程。但是，手动观察整理化简是很不现实的做法，我们就将求导运算看成一个抽象的“数”，然后处理一个对应的多项式，把解方程变成两步：化简多项式、递归求解。在处理这个多项式的过程中，我们间接对原方程进行了化简。

但是多项式方法一定适用于非常系数方程吗？不一定。例如方程$y’’=xy$，这个方程并没有一个简单的解，我们也不能指望通过简单的解法得到所希望得到的函数。

这种方法的优点是，朴实、机械化，只需要执行若干次一阶方程的求导即可。但是，这种办法并没有很好地体现“线性”这个概念，很难看到和线性代数的关系。在下一篇博客里，会给出基于矩阵的解决办法。

]]>