Partition of Unity on Different Manifolds (Part 1. Introduction)

An application of partition of unity

Partition of unity builds a bridge between local properties and global properties. A nice example is the Stokes’ theorem on manifolds.

Suppose $\omega$ is a $(n-1)$-form with compact support on a oriented manifold $M$ of dimension $n$ and if $\partial{M}$ is given the induced orientation, then

This theorem can be proved in two steps. First, by Fubini’s theorem, one proves the identity on $\mathbb{R}^n$ and $\mathbb{H}^n$. Second, for the general case, let $(U_\alpha)$ be an oriented atlas for $M$ and $(\rho_\alpha)$ a partition of unity to $(U_\alpha)$, one naturally writes $\omega=\sum_{\alpha}\rho_\alpha\omega$. Since $\int_M d\omega=\int_{\partial M}\omega$ is linear with respect to $\omega$, it suffices to prove it only for $\rho_\alpha\omega$. Note that the support of $\rho_\alpha\omega$ is contained in the intersection of supports of $\rho_\alpha$ and $\omega$, hence a compact set.

On the other hand, $U_\alpha$ is diffeomorphic to either $\mathbb{R}^n$ or $\mathbb{H}^n$, it is immediate that

Which furnishes the proof for the general case.

As is seen, to prove a global thing, we do it locally. If you have trouble with these terminologies, never mind. We will go through this right now (in a more abstract way however). If you are familiar with them however, fell free to skip.

Prerequisites

Manifold (of finite or infinite dimension)

Throughout, we use bold letters like $\mathbf{E}$, $\mathbf{F}$ to denote Banach spaces. We will treat Euclidean spaces as a case instead of our restriction. Indeed since Banach spaces are not necessarily of finite dimension, our approach can be troublesome. But the benefit is a better view of abstraction.

Let $X$ be a set. An atlas of class $C^p$ ($p \geq 0$) on $X$ is a collection of pairs $(U_i,\varphi_i)$ where $i$ ranges through some indexing set, satisfying the following conditions:

AT 1. Each $U_i$ is a subset of $X$ and $\bigcup_{i}U_i=X$.

AT 2. Each $\varphi_i$ is a bijection of $U_i$ onto an open subset $\varphi_iU_i$ of some Banach space $\mathbf{E}_i$ and for any $i$ and $j$, $\phi_i(U_i \cap U_j)$ is open in $E_i$.

AT 3. The map

is a $C^p$-isomorphism for all $i$ and $j$.

One should be advised that isomorphism here does not come from group theory, but category theory. Precisely speaking, it’s the isomorphism in the category $\mathfrak{O}$ whose objects are the continuous maps of Banach spaces and whose morphisms are the continuous maps of class $C^p$.

Also, by setting $\tau_X=(U_i)_i$, we see $\tau_X$ is a topology, and $\varphi_i$ are topological isomorphisms. Also, we see no need to assume that $X$ is Hausdorff unless we start with Hausdorff spaces. Lifting this restriction gives us more freedom (also sometimes more difficulty to some extent though).

For condition AT 2, we did not require that the vector spaces be the same for all indexes $i$, or even that they be toplinearly isomorphic. If they are all equal to the same space $\mathbf{E}$, then we say that the atlas is an $\mathbf{E}$-atlas.

Suppose that we are given an open subset $U$ of $X$ and a topological isomorphism $\phi:U \to U’$ onto an open subset of some Banach space $\mathbb{E}$. We shall say that $(U,\varphi)$ is compatible with the atlas $(U_i,\varphi_i)_i$ if each map $\varphi\circ\varphi^{-1}$ is a $C^p$-isomorphism. Two atlas are said to be compatible if each chart of one is compatible with other atlas. It can be verified that this is a equivalence relation. An equivalence relation of atlases of class $C^p$ on $X$ is said to define a structure of $C^p$-manifold on $X$. If all the vector spaces $\mathbf{E}_i$ in some atlas are toplinearly isomorphic, we can find some universal $\mathbf{E}$ that is equal to all of them. In this case, we say $X$ is a $\mathbf{E}$-manifold or that $X$ is modeled on $\mathbf{E}$.

As we know, $\mathbb{R}^n$ is a Banach space. If $\mathbf{E}=\mathbb{R}^n$ for some fixed $n$, then we say that the manifold is $n$-dimensional. Also we have the local coordinates. A chart

is given by $n$ coordinate functions $\varphi_1,\cdots,\varphi_n$. If $P$ denotes a point of $U$, these functions are often written

or simply $x_1,\cdots,x_n$.

Topological prerequisites

Let $X$ be a topological space. A covering $\mathfrak{U}$ of $X$ is locally finite if every point $x$ has a neighborhood $U$ such that all but a finite number of members of $\mathfrak{U}$ do not intersect with $U$ (as you will see, this prevents some nonsense summation). A refinement of a covering $\mathfrak{U}$ is a covering $\mathfrak{U}’$ such that for any $U’ \in \mathfrak{U}’$, there exists some $U \in \mathfrak{U}$ such that $U’ \subset U$. If we write $\mathfrak{U} \leq \mathfrak{U}’$ in this case, we see that the set of open covers on a topological space forms a direct set.

A topological space is paracompact if it is Hausdorff, and every open covering has a locally finite open refinement. Here follows some examples of paracompact spaces.

  1. Any compact Hausdorff space.
  2. Any CW complex.
  3. Any metric space (hence $\mathbb{R}^n$).
  4. Any Hausdorff Lindelöf space.
  5. Any Hausdorff $\sigma$-compact space

These are not too difficult to prove, and one can easily find proofs on the Internet. Below are several key properties of paracompact spaces.

If $X$ is paracompact, then $X$ is normal. (Proof here)

Let $X$ be a paracompact (hence normal) space and $\mathfrak{U}=(U_i)$ a locally finite open cover, then there exists a locally finite open covering $\mathfrak{V}=(V_i)$ such that $\overline{V_i} \subset U_i$. (Proof here. Note the axiom of choice is assumed.

One can find proofs of the following propositions on Elements of Mathematics, General Topology, Chapter 1-4 by N. Bourbaki. It’s interesting to compare them to the corresponding ones of compact spaces.

Every closed subspace $F$ of a paracompact space $X$ is paracompact.

The product of a paracompact space and a compact space is paracompact.

Let $X$ be a locally compact paracompact space. Then every open covering $\mathfrak{R}$ of $X$ has a locally finite open refinement $\mathfrak{R}’$ formed of relatively compact sets. If $X$ is $\sigma$-compact then $\mathfrak{R}’$ can be taken to be countable.

Partition of unity

A partition of unity (of class $C^p$) on a manifold $X$ consists of an open covering $(U_i)$ of $X$ and a family of functions

satisfying the following conditions:

PU 1. For all $x \in X$ we have $\phi_i(x) \geq 0$.

PU 2. The support of $\psi_i$ is contained in $U_i$.

PU 3. The covering is locally finite

PU 4. For each point $x \in X$ we have

The sum in PU 4 makes sense because for given point $x$, there are only finite many $i$ such that $\psi_i(x) >0$, according to PU 3.

A manifold $X$ will be said to admit partition of unity if it is paracompact, and if, given a locally finite open covering $(U_i)$, there exists a partition of unity $(\psi_i)$ such that the support of $\psi_i$ is contained in $U_i$.

Bump function

This function will be useful when dealing with finite dimensional case.

For every integer $n$ and every real number $\delta>0$ there exist maps $\psi_n \in C^{\infty}(\mathbb{R}^n;\mathbb{R})$ which equal $1$ on $B(0,1)$ and vanish in $\mathbb{R}^n\setminus B(1,1+\delta)$.

Proof. It suffices to prove it for $\mathbb{R}$ since once we proved the existence of $\psi_1$, then we may write

Consider the function $\phi: \mathbb{R} \to \mathbb{R}$ defined by

The reader may have seen it in some analysis course and should be able to check that $\phi \in C^{\infty}(\mathbb{R};\mathbb{R})$. Integrating $\phi$ from $-\infty$ to $x$ and divide it by $\lVert \phi \rVert_1$ (you may have done it in probability theory) to obtain

it is immediate that $\theta(x)=0$ for $x \leq a$ and $\theta(x)=1$ for $x \geq b$. By taking $a=1$ and $b=(1+\delta)^2$, our job is done by letting $\psi_1(x)=1-\theta(x^2)$. Considering $x^2=|x|^2$, one sees that the identity about $\psi_n$ and $\psi_1$ is redundant. $\square$

In the following blog posts, we will generalize this to Hilbert spaces.

Is partition of unity ALWAYS available?

Of course this is desirable. But we will give an example that sometimes we cannot find a satisfying partition of unity.

Let $D$ be a connected bounded open set in $\ell^p$ where $p$ is not an even integer. Assume $f$ is a real-valued function, continuous on $\overline{D}$ and $n$-times differentiable in $D$ with $n \geq p$. Then $f(\overline{D}) \subset \overline{f(\partial D)}$.

(Corollary) Let $f$ be an $n$-times differentiable function on $\ell^p$ space, where $n \geq p$, and $p$ is not an even integer. If $f$ has its support in a bounded set, then $f$ is identically zero.

It follows that for $n \geq p$, $C^n$ partitions of unity do not exists whenever $p$ is not an even integer. For example,e $\ell^1[0,1]$ does not have a $C^2$ partition of unity. It is then our duty to find that under what condition does the desired partition of unity available.

Existence of partition of unity

Below are two theorems about the existence of partitions of unity. We are not proving them here but in the future blog post since that would be rather long. The restrictions on $X$ are acceptable. For example $\mathbb{R}^n$ is locally compact and hence the manifold modeled on $\mathbb{R}^n$.

Let $X$ be a manifold which is locally compact Hausdorff and whose topology has a countable base. Then $X$ admits partitions of unity

Let $X$ be a paracompact manifold of class $C^p$, modeled on a separable Hilbert space $E$, then $X$ admits partitions of unity (of class $C^p$)

References

  • N. Bourbaki, Elements of Mathematics
  • S. Lang, Fundamentals of Differential Geometry
  • M. Berger, Differential Geometry: Manifolds, Curves, and Surfaces
  • R. Bonic and J. Frampton, Differentiable Functions on Certain Banach Spaces

Stirling公式的几种经典估计

Stirling公式

对于$\Gamma$函数,我们有一个经典的极限式(证明请见ProofWiki)。

利用这个式子,我们能立刻计算出一些比较难算的极限。注意到这个公式如果写成自然数的形式,有

所以我们能立刻计算出这个极限:

但是Stirling公式不仅仅如此。这篇博客里我们会见到几个比较经典的估计。

原数列的取值区间

这一节我们会看到的结论是

如果在计算器里算一下右边的数,会发现,$\phi_n=\frac{n!}{(n/e)^n\sqrt{2\pi n}}$一直在$1$附近。

对于$m=1,2,3,\dots$,在$y=\ln(x)$下方定义“折线函数”:

其中$m \leq x \leq m+1$。在上方定义另一个“折线函数”:

其中$m-1/2 \leq x < m+1/2$。如果画出$f$,$\ln{x}$,$g$的图像,会发现,$f$和$g$是对$\ln{x}$的拟合。且在$x \geq 1$时,我们有

所以计算定积分的时候就有

但是$f$和$g$的关系并不是那么简单。计算$f$的积分,我们发现

而对于$g$,我们又有

这就说明

总结上面几个不等式,我们得到,对$n>1$:

不等式各项都减去$\int_1^n \ln x dx$,我们又有

由Stirling公式我们知道,

而数列$x_n=-\frac{1}{8n}+\ln(n!)-(\frac{1}{2}+n)\ln{n}+n$是单调递增的,由上式可知收敛到$\ln\sqrt{2\pi}$。在不等式左边,我们取上确界$\ln\sqrt{2\pi}$。在不等式右边,我们取下确界$x_1+\frac{1}{8}=1$。这就让我们得到了

这也就导致

这对所有$n =1,2,3,\dots$都成立。

平移$\Gamma$函数

对于任意$c \in \mathbb{R}$,我们有

这可以看成,把$\Gamma(x)$向左平移$c$后,在$x$足够大时,其值和$x^c\Gamma(x)$接近。这个等式的证明也是比较简单的,虽然计算比较繁琐,只需要利用Stirling公式。

现在这三个因式的极限就很好计算了。显然我们有

以及

最后,

故原极限为$1$。计算过程也非常精彩。注意到如果把$x$和$c$换成正整数$n$和整数$k$,我们又有

估计定积分

结合Bernoulli不等式我们有

接下来我们会给出一个比较精细的估计。实际上,

根据$B(x,y)$函数的定义,

令$t=u^2,我们得到

代入$x=\frac{1}{2}$和$y=n+1$,我们就和所想要的结果很近了:

注意到,利用$B$函数的第二个表达式,我们是可以计算出$\Gamma(\frac{1}{2})$的。实际上,

从而$\Gamma(\frac{1}{2})=\sqrt{\pi}$。对于$B(\frac{1}{2},n+1)$,我们可以用到上面的平移公式了:

从而

额外内容

最后我们证明一个和Stirling公式没有关系的等式

根据古典代数学基本定理,我们立刻有

注意到另一方面

$x=1$时,我们有

此即

考虑到欧拉反射公式,对于$1 \leq k \leq n-1$,我们有

如果$n$为奇数,那么根据上面的结果,我们能得到

这时我们只用到了一半数量的$k$。要用上另一半的$k$,我们只需要把$k$和$n-k$交换顺序,从而得到了

即为所得。如果$n$为偶数,只需要把$1/2$这一项单独拿出来分两段计算即可。


2020.11.9更新

我们给出两个看上去很难计算的极限式。

如果用Stirling公式直接替换$n!$,这个极限的结果是显然的。

所以只需要求$(1+\frac{1}{n})^{n^2}e^{-n}$的极限即可。但是可千万别想当然地认为这个极限是$1$。如果我们利用Taylor展开,能得到

所以原极限为$\sqrt\frac{2\pi}{e}$

注意$n$项的分子相乘,有$\exp(n-1-\frac{1}{2}-\cdots-\frac{1}{n})$,而调和级数是发散的,我们想得到收敛,自然就要想到Euler常数$\gamma=\lim_{n\to\infty}\left(1+\frac{1}{2}+\cdots+\frac{1}{n}-\ln{n}\right)$。我们似乎也没有办法直接化简分母,我们知道$(1+1/k)^k$的极限是$e$,但是这里似乎用不上。所以不如先把分母展开化简一下。

所以原极限可以写成

这时候就可以直接使用Stirling公式了。

而$\lim_{n\to\infty}\left(1+\frac{1}{n}\right)^{-n}=e^{-1}$,$\lim_{n\to\infty}e^{\ln{n}-1-\frac{1}{2}-\frac{1}{3}-\cdots-\frac{1}{n}}=e^{-\gamma}$,我们得到原极限为$\frac{\sqrt{2\pi}}{e^{1+\gamma}}$

A proof of the ordinary Gleason-Kahane-Żelazko theorem for complex functionals

The Theorem

(Gleason-Kahane-Żelazko) If $\phi$ is a complex linear functional on a unitary Banach algebra $A$, such that $\phi(e)=1$ and $\phi(x) \neq 0$ for every invertible $x \in A$, then

Namely, $\phi$ is a complex homomorphism.

Notations and remarks

Suppose $A$ is a complex unitary Banach algebra and $\phi: A \to \mathbb{C}$ is a linear functional which is not identically $0$ (for convenience), and if

for all $x \in A$ and $y \in A$, then $\phi$ is called a complex homomorphism on $A$. Note that a unitary Banach algebra (with $e$ as multiplicative unit) is also a ring, so is $\mathbb{C}$, we may say in this case $\phi$ is a ring-homomorphism. For such $\phi$, we have an instant proposition:

Proposition 0 $\phi(e)=1$ and $\phi(x) \neq 0$ for every invertible $x \in A$.

Proof. Since $\phi(e)=\phi(ee)=\phi(e)\phi(e)$, we have $\phi(e)=0$ or $\phi(e)=1$. If $\phi(e)=0$ however, for any $y \in A$, we have $\phi(y)=\phi(ye)=\phi(y)\phi(e)=0$, which is an excluded case. Hence $\phi(e)=1$.

For invertible $x \in A$, note that $\phi(xx^{-1})=\phi(x)\phi(x^{-1})=\phi(e)=1$. This can’t happen if $\phi(x)=0$. $\square$

The theorem reveals that Proposition $0$ actually characterizes the complex homomorphisms (ring-homomorphisms) among the linear functionals (group-homomorphisms).

This theorem was proved by Andrew M. Gleason in 1967 and later independently by J.-P. Kahane and W. Żelazko in 1968. Both of them worked mainly on commutative Banach algebras, and the non-commutative version, which focused on complex homomorphism, was by W. Żelazko. In this post we will follow the third one.

Unfortunately, one cannot find an educational proof on the Internet with ease, which may be the reason why I write this post and why you read this.

Equivalences

Following definitions of Banach algebra and some logic manipulation, we have several equivalences worth noting.

Subspace and ideal version

(Stated by Gleason) Let $M$ be a linear subspace of codimension one in a commutative Banach algebra $A$ having an identity. Suppose no element of $M$ is invertible, then $M$ is an ideal.

(Stated by Kahane and Żelazko) A subspace $X \subset A$ of codimension $1$ is a maximal ideal if and only if it consists of non-invertible elements.

Spectrum version

(Stated by Kahane and Żelazko) Let $A$ be a commutative complex Banach algebra with unit element. Then a functional $f \in A^\ast$ is a multiplicative linear functional if and only if $f(x)=\sigma(x)$ holds for all $x \in A$.

Here $\sigma(x)$ denotes the spectrum of $x$.

The connection

Clearly any maximal ideal contains no invertible element (if so, then it contains $e$, then it’s the ring itself). So it suffices to show that it has codimension 1, and if it consists of non-invertible elements. Also note that every maximal ideal is the kernel of some complex homomorphism. For such a subspace $X \subset A$, since $e \notin X$, we may define $\phi$ so that $\phi(e)=1$, and $\phi(x) \in \sigma(x)$ for all $x \in A$. Note that $\phi(e)=1$ holds if and only if $\phi(x) \in \sigma(x)$. As we will show, $\phi$ has to be a complex homomorphism.

Tools to prove the theorem

Lemma 0 Suppose $A$ is a unitary Banach algebra, $x \in A$, $\lVert x \rVert<1$, then $e-x$ is invertible.

This lemma can be found in any functional analysis book introducing Banach algebra.

Lemma 1 Suppose $f$ is an entire function of one complex variable, $f(0)=1$, $f’(0)=0$, and

for all complex $\lambda$, then $f(\lambda)=1$ for all $\lambda \in \mathbb{C}$.

Note that there is an entire function $g$ such that $f=\exp(g)$. It can be shown that $g=0$. Indeed, if we put

then we see $h_r$ is holomorphic in the open disk centred at $0$ with radius $2r$. Besides, $|h_r(\lambda)| \leq 1$ if $|\lambda|=r$. By the maximum modulus theorem, we have

whenever $|\lambda| \leq r$. Fix $\lambda$ and let $r \to \infty$, by definition of $h_r(\lambda)$, we must have $g(\lambda)=0$.

Jordan homomorphism

A map $\phi$ from one algebra $R$ to another algebra $R’$ is said to be a Jordan homomorphism from $R$ to $R’$ if

and

It is clear that every algebra homomorphism is Jordan. Note if $R’$ is not of characteristic $2$, the second identity is equivalent to

To show the equivalence, one let $b=a$ in the first case and puts $a+b$ in place of $a$ in the second case.

Since in this case $R=A$ and $R’=\mathbb{C}$, the latter of which is commutative, we also write

As we will show, the $\phi$ in the theorem is a Jordan homomorphism.

Proof

We will follow an unusual approach. By keep ‘downgrading’ the goal, one will see this algebraic problem be transformed into a pure analysis problem neatly.

To begin with, let $N$ be the kernel of $\phi$.

Step 1 - It suffices to prove that $\phi$ is a Jordan homomorphism

If $\phi$ is a complex homomorphism, it is immediate that $\phi$ is a Jordan homomorphism. Conversely, if $\phi$ is Jordan, we have

If $x\in N$, the right hand becomes $0$, and therefore

Consider the identity

Therefore

Since $x \in N$ and $yxy \in A$, we see $x(yxy)+(yxy)x \in N$. Therefore $\phi(xy-yx)=0$ and

if $x \in N$ and $y \in A$. Further we see

which implies that $N$ is an ideal. This may remind you of this classic diagram (we will not use it since it is additive though):

For $x,y \in A$, we have $x \in \phi(x)e+N$ and $y \in \phi(y)e+N$. As a result, $xy \in \phi(x)\phi(y)e+N$, and therefore

Step 2 - It suffices to prove that $\phi(a^2)=0$ if $\phi(a)=0$.

Again, if $\phi$ is Jordan, we have $\phi(x^2)=\phi(x)^2$ for all $x \in A$. Conversely, if $\phi(a^2)=0$ for all $a \in N$, we may write $x$ by

where $a \in N$ for all $x \in A$. Therefore

which also shows that $\phi$ is Jordan.

Step 3 - It suffices to show that the following function is constant

Fix $a \in N$, assume $\lVert a \rVert = 1$ without loss of generality, and define

for all complex $\lambda$. If this function is constant (lemma 1), we immediately have $f’’(0)=\phi(a^2)=0$. This is purely a complex analysis problem however.

Step 4 - It suffices to describe the behaviour of an entire function

Note in the definition of $f$, we have

So we expect the norm of $\phi$ to be finite, which ensures that $f$ is entire. By reductio ad absurdum, if $\lVert e-a \rVert < 1$ for $a \in N$, by lemma 0, we have $e-e+a=a$ to be invertible, which is impossible. Hence $\lVert e-a \rVert \geq 1$ for all $a \in N$. On the other hand, for $\lambda \in \mathbb{C}$, we have the following inequality:

Therefore $\phi$ is continuous with norm less than $1$. The continuity of $\phi$ is not assumed at the beginning but proved here.

For $f$ we have some immediate facts. Since each coefficient in the series of $f$ has finite norm, $f$ is entire with $f’(0)=\phi(a)=0$. Also, since $\phi$ has norm $1$, we also have

All we need in the end is to show that $f(\lambda) \neq 0$ for all $\lambda \in \mathbb{C}$.

The series

converges since $\lVert a \rVert=1$. The continuity of $\phi$ shows now

Note

Hence $E(\lambda)$ is invertible for all $\lambda \in C$, hence $f(\lambda)=\phi(E(\lambda)) \neq 0$. By lemma 1, $f(\lambda)=1$ is constant. The proof is completed by reversing the steps. $\square$

References / Further reading

  • Walter Rudin, Real and Complex Analysis
  • Walter Rudin, Functional Analysis
  • Andrew M. Gleason, A Characterization of Maximal Ideals
  • J.-P. Kahane and W. Żelazko, A Characterization of Maximal Ideals in Commutative Banach Algebras
  • W. Żelazko A Characterization of Multiplicative linear functionals in Complex Banach Algebras
  • I. N. Herstein, Jordan Homomorphisms

The Big Three Pt. 5 - The Hahn-Banach Theorem (Dominated Extension)

About this post

The Hahn-Banach theorem has been a central tool for functional analysis and therefore enjoys a wide variety, many of which have a numerous uses in other fields of mathematics. Therefore it’s not possible to cover all of them. In this post we are covering two ‘abstract enough’ results, which are sometimes called the dominated extension theorem. Both of them will be discussed in real vector space where topology is not endowed. This allows us to discuss any topological vector space.

Another interesting thing is, we will be using axiom of choice, or whatever equivalence you may like, for example Zorn’s lemma or well-ordering principle. Before everything, we need to examine more properties of vector spaces.

Vector space

It’s obvious that every complex vector space is also a real vector space. Suppose $X$ is a complex vector space, and we shall give the definition of real-linear and complex-linear functionals.

An addictive functional $\Lambda$ on $X$ is called real-linear (complex-linear) if $\Lambda(\alpha x)=\alpha\Lambda(x)$ for every $x \in X$ and for every real (complex) scalar $\alpha$.

For *-linear functionals, we have two important but easy theorems.

If $u$ is the real part of a complex-linear functional $f$ on $X$, then $u$ is real-linear and

Proof. For complex $f(x)=u(x)+iv(x)$, it suffices to denote $v(x)$ correctly. But

we see $\Im(f(x)=v(x)=-\Re(if(x))$. Therefore

but $\Re(f(ix))=u(ix)$, we get

To show that $u(x)$ is real-linear, note that

Therefore $u(x)+u(y)=u(x+y)$. Similar process can be applied to real scalar $\alpha$. $\square$

Conversely, we are able to generate a complex-linear functional by a real one.

If $u$ is a real-linear functional, then $f(x)=u(x)-iu(ix)$ is a complex-linear functional

Proof. Direct computation. $\square$

Suppose now $X$ is a complex topological vector space, we see a complex-linear functional on $X$ is continuous if and only if its real part is continuous. Every continuous real-linear $u: X \to \mathbb{R}$ is the real part of a unique complex-linear continuous functional $f$.

Sublinear, seminorm

Sublinear functional is ‘almost’ linear but also ‘almost’ a norm. Explicitly, we say $p: X \to \mathbb{R}$ a sublinear functional when it satisfies

for all $t \geq 0$. As one can see, if $X$ is normable, then $p(x)=\lVert x \rVert$ is a sublinear functional. One should not be confused with semilinear functional, where inequality is not involved. Another thing worth noting is that $p$ is not restricted to be nonnegative.


A seminorm on a vector space $X$ is a real-valued function $p$ on $X$ such that

for all $x,y \in X$ and scalar $\alpha$.

Obviously a seminorm is also a sublinear functional. For the connection between norm and seminorm, one shall note that $p$ is a norm if and only if it satisfies $p(x) \neq 0$ if $x \neq 0$.

Dominated extension theorems

Are the results will be covered in this post. Generally speaking, we are able to extend a functional defined on a subspace to the whole space as long as it’s dominated by a sublinear functional. This is similar to the dominated convergence theorem, which states that if a convergent sequence of measurable functions are dominated by another function, then the convergence holds under the integral operator.

(Hahn-Banach) Suppose

  1. $M$ is a subspace of a real vector space $X$,
  2. $f: M \to \mathbb{R}$ is linear and $f(x) \leq p(x)$ on $M$ where $p$ is a sublinear functional on $X$

Then there exists a linear $\Lambda: X \to \mathbb{R}$ such that

for all $x \in M$ and

for all $x \in X$.

Step 1 - Extending the function by one dimension

With that being said, if $f(x)$ is dominated by a sublinear functional, then we are able to extend this functional to the whole space with a relatively proper range.

Proof. If $M=X$ we have nothing to do. So suppose now $M$ is a nontrivial proper subspace of $X$. Choose $x_1 \in X-M$ and define

It’s easy to verify that $M_1$ satisfies all axioms of vector space (warning again: no topology is endowed). Now we will be using the properties of sublinear functionals.

Since

for all $x,y \in M$, we have

Let

By definition, we naturally get

and

Define $f_1$ on $M_1$ by

So when $x +tx_1 \in M$, we have $t=0$, and therefore $f_1=f$.

To show that $f_1 \leq p$ on $M_1$, note that for $t>0$, we have

which implies

Similarly,

and therefore

Hence $f_1 \leq p$.

Step 2 - An application of Zorn’s lemma

Side note: Why Zorn’s lemma

It seems that we can never stop using step 1 to extend $M$ to a larger space, but we have to extend. (If $X$ is a finite dimensional space, then this is merely a linear algebra problem.) This meets exactly what William Timothy Gowers said in his blog post:

If you are building a mathematical object in stages and find that (i) you have not finished even after infinitely many stages, and (ii) there seems to be nothing to stop you continuing to build, then Zorn’s lemma may well be able to help you.

— How to use Zorn’s lemma

And we will show that, as W. T. Gowers said,

If the resulting partial order satisfies the chain condition and if a maximal element must be a structure of the kind one is trying to build, then the proof is complete.


To apply Zorn’s lemma, we need to construct a partially ordered set. Let $\mathscr{P}$ be the collection of all ordered pairs $(M’,f’)$ where $M’$ is a subspace of $X$ containing $M$ and $f’$ is a linear functional on $M’$ that extends $f$ and satisfies $f’ \leq p$ on $M’$. For example we have

The partial order $\leq$ is defined as follows. By $(M’,f’) \leq (M’’,f’’)$, we mean $M’ \subset M’’$ and $f’ = f’’$ on $M’$. Obviously this is a partial order (you should be able to check this).

Suppose now $\mathcal{F}$ is a chain (totally ordered subset of $\mathscr{P}$). We claim that $\mathcal{F}$ has an upper bound (which is required by Zorn’s lemma). Let

and

whenever $(M’,f’) \in \mathcal{F}$ and $y \in M’$. It’s easy to verify that $(M_0,f_0)$ is the upper bound we are looking for. But $\mathcal{F}$ is arbitrary, therefore by Zorn’s lemma, there exists a maximal element $(M^\ast,f^\ast)$ in $\mathscr{P}$. If $M^* \neq X$, according to step 1, we are able to extend $M^\ast$, which contradicts the maximality of $M^\ast$. And $\Lambda$ is defined to be $f^\ast$. By the linearity of $\Lambda$, we see

The theorem is proved. $\square$

How this proof is constructed

This is a classic application of Zorn’s lemma (well-ordering principle, or Hausdorff maximality theorem). First, we showed that we are able to extend $M$ and $f$. But since we do not know the dimension or other properties of $X$, it’s not easy to control the extension which finally ‘converges’ to $(X,\Lambda)$. However, Zorn’s lemma saved us from this random exploration: Whatever happens, the maximal element is there, and take it to finish the proof.

Generalisation onto the complex field

Since inequality is appeared in the theorem above, we need more careful validation.

(Bohnenblust-Sobczyk-Soukhomlinoff) Suppose $M$ is a subspace of a vector space $X$, $p$ is a seminorm on $X$, and $f$ is a linear functional on $M$ such that

for all $x \in M$. Then $f$ extends to a linear functional $\Lambda$ on $X$ satisfying

for all $x \in X$.

Proof. If the scalar field is $\mathbb{R}$, then we are done, since $p(-x)=p(x)$ in this case (can you see why?). So we assume the scalar field is $\mathbb{C}$.

Put $u = \Re f$. By dominated extension theorem, there is some real-linear functional $U$ such that $U(x)=u$ on $M$ and $U \leq p$ on $X$. And here we have

where $\Lambda(x)=f(x)$ on $M$.

To show that $|\Lambda(x)| \leq p(x)$ for $x \neq 0$, by taking $\alpha=\frac{|\Lambda(x)|}{\Lambda(x)}$, we have

since $|\alpha|=1$ and $p(\alpha{x})=|\alpha|p(x)=p(x)$. $\square$

Extending Hahn-Banach theorem under linear transform

To end this post, we state a beautiful and useful extension of the Hahn-Banach theorem, which is done by R. P. Agnew and A. P. Morse.

(Agnew-Morse) Let $X$ denote a real vector space and $\mathcal{A}$ be a collection of linear maps $A_\alpha: X \to X$ that commute, or namely

for all $A_\alpha,A_\beta \in \mathcal{A}$. Let $p$ be a sublinear functional such that

for all $A_\alpha \in \mathcal{A}$. Let $Y$ be a subspace of $X$ on which a linear functional $f$ is defined such that

  1. $f(y) \leq p(y)$ for all $y \in Y$.
  2. For each mapping $A$ and $y \in Y$, we have $Ay \in Y$.
  3. Under the hypothesis of 2, we have $f(Ay)=f(y)$.

Then $f$ can be extended to $X$ by $\Lambda$ so that $-p(-x) \leq \Lambda(x) \leq p(x)$ for all $x \in X$, and

To prove this theorem, we need to construct a sublinear functional that dominates $f$. For the whole proof, see Functional Analysis by Peter Lax.

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it’s time to make a list of the series. It’s been around half a year.

References / Further Readings

  1. Walter Rudin, Functional Analysis.
  2. Peter Lax, Functional Analysis.
  3. William Timothy Gowers, How to use Zorn’s lemma.

A long exact sequence of cohomology groups (zig-zag and diagram-chasing)

Exterior differentiation

(This section is intended to introduce the background. Feel free to skip if you already know exterior differentiation.)

There are several useful tools for vector calculus on $\mathbb{R}^3,$ namely gradient, curl, and divergence. It is possible to treat the gradient of a differentiable function $f$ on $\mathbb{R}^3$ at a point $x_0$ as the Fréchet derivative at $x_0$. But it does not work for curl and divergence at all. Fortunately there is another abstraction that works for all of them. It comes from differential forms.

Let $x_1,\cdots,x_n$ be the linear coordinates on $\mathbb{R}^n$ as usual. We define an algebra $\Omega^{\ast}$ over $\mathbb{R}$ generated by $dx_1,\cdots,dx_n$ with the following relations:

This is a vector space as well, and it’s easy to derive that it has a basis by

where $i<j<k$. The $C^{\infty}$ differential forms on $\mathbb{R}^n$ are defined to be the tensor product

As is can be shown, for $\omega \in \Omega^{\ast}(\mathbb{R}^n)$, we have a unique representation by

and in this case we also say $\omega$ is a $C^{\infty}$ $k$-form on $\mathbb{R}^n$ (for simplicity we also write $\omega=\sum f_Idx_I$). The algebra of all $k$-forms will be denoted by $\Omega^k(\mathbb{R}^n)$. And naturally we have $\Omega^{\ast}(\mathbb{R}^n)$ to be graded since

The operator $d$

But if we have $\omega \in \Omega^0(\mathbb{R}^n)$, we see $\omega$ is merely a $C^{\infty}$ function. As taught in multivariable calculus course, for the differential of $\omega$ we have

and it turns out that $d\omega\in\Omega^{1}(\mathbb{R}^n)$. This inspires us to obtain a generalization onto the differential operator $d$:

and $d\omega$ is defined as follows. The case when $k=0$ is defined as usual (just the one above). For $k>0$ and $\omega=\sum f_I dx_I,$ $d\omega$ is defined ‘inductively’ by

This $d$ is the so-called exterior differentiation, which serves as the ultimate abstract extension of gradient, curl, divergence, etc. If we restrict ourself to $\mathbb{R}^3$, we see these vector calculus tools comes up in the nature of things.

Functions

$1$-forms

$2$-forms

The calculation is tedious but a nice exercise to understand the definition of $d$ and $\Omega^{\ast}$.

Conservative field - on the kernel and image of $d$

By elementary computation we are also able to show that $d^2\omega=0$ for all $\omega \in \Omega^{\ast}(\mathbb{R}^n)$ (Hint: $\frac{\partial^2 f}{\partial x_i \partial x_j}=\frac{\partial^2 f}{\partial x_j \partial x_i}$ but $dx_idx_j=-dx_idx_j$). Now we consider a vector field $\overrightarrow{v}=(v_1,v_2)$ of dimension $2$. If $C$ is an arbitrary simply closed smooth curve in $\mathbb{R}^2$, then we expect

to be $0$. If this happens (note the arbitrary of $C$), we say $\overrightarrow{v}$ to be a conservative field (path independent).

So when conservative? It happens when there is a function $f$ such that

This is equivalent to say that

If we use $C^{\ast}$ to denote the area enclosed by $C$, by Green’s theorem, we have

If you translate what you’ve learned in multivariable calculus course (path independence) into the language of differential form, you will see that the set of all conservative fields is precisely the image of $d_0:\Omega^0(\mathbb{R}^2) \to \Omega^1(\mathbb{R}^2)$. Also, they are in the kernel of the next $d_1:\Omega^1(\mathbb{R}^2) \to \Omega^2(\mathbb{R}^2)$. These $d$’s are naturally homomorphism, so it’s natural to discuss the factor group. But before that, we need some terminologies.

de Rham complex and de Rham cohomology group

The complex $\Omega^{\ast}(\mathbb{R}^n)$ together with $d$ is called the de Rham complex on $\mathbb{R}^n$. Now consider the sequence

We say $\omega \in \Omega^k(\mathbb{R}^n)$ is closed if $d_k\omega=0$, or equivalently, $\omega \in \ker d_k$. Dually, we say $\omega$ is exact if there exists some $\mu \in \Omega^{k-1}(\mathbb{R}^n)$ such that $d\mu=\omega$, that is, $\omega \in \operatorname{im}d_{k-1}$. Of course all $d_k$’s can be written as $d$ but the index makes it easier to understand. Instead of doing integration or differentiation, which is ‘uninteresting’, we are going to discuss the abstract structure of it.

The $k$-th de Rham cohomology in $\mathbb{R}^n$ is defined to be the factor space

As an example, note that by the fundamental theorem of calculus, every $1$-form is exact, therefore $H_{DR}^1(\mathbb{R})=0$.

Since de Rham complex is a special case of differential complex, and other restrictions of de Rham complex plays no critical role thereafter, we are going discuss the algebraic structure of differential complex directly.

The long exact sequence of cohomology groups

We are going to show that, there exists a long exact sequence of cohomology groups after a short exact sequence is defined. For the convenience let’s recall here some basic definitions

Exact sequence

A sequence of vector spaces (or groups)

is said to be exact if the image of $f_{k-1}$ is the kernel of $f_k$ for all $k$. Sometimes we need to discuss a extremely short one by

As one can see, $f$ is injective and $g$ is surjective.

Differential complex

A direct sum of vector spaces $C=\oplus_{k \in \mathbb{Z}}C^k$ is called a differential complex if there are homomorphisms by

such that $d_{k-1}d_k=0$. Sometimes we write $d$ instead of $d_{k}$ since this differential operator of $C$ is universal. Therefore we may also say that $d^2=0$. The cohomology of $C$ is the direct sum of vector spaces $H(C)=\oplus_{k \in \mathbb{Z}}H^k(C) $ where

A map $f: A \to B$ where $A$ and $B$ are differential complexes, is called a chain map if we have $fd_A=d_Bf$.

The sequence

Now consider a short exact sequence of differential complexes

where both $f$ and $g$ are chain maps (this is important). Then there exists a long exact sequence by

Here, $f^{\ast}$ and $g^{\ast}$ are the naturally induced maps. For $c \in C^q$, $d^{\ast}[c]$ is defined to be the cohomology class $[a]$ where $a \in A^{q+1}$, and that $f(a)=db$, and that $g(b)=c$. The sequence can be described using the two-layer commutative diagram below.

layer-000001

The long exact sequence is actually the purple one (you see why people may call this zig-zag lemma). This sequence is ‘based on’ the blue diagram, which can be considered naturally as an expansion of the short exact sequence. The method that will be used in the following proof is called diagram-chasing, whose importance has already been described by Professor James Munkres: master this. We will be abusing the properties of almost every homomorphism and group appeared in this commutative diagram to trace the elements.

Proof

First, we give a precise definition of $d^{\ast}$. For a closed $c \in C^q$, by the surjectivity of $g$ (note this sequence is exact), there exists some $b \in B^q$ such that $g(b)=c$. But $g(db)=d(g(b))=dc=0$, we see for $db \in B^{q+1}$ we have $db \in \ker g$. By the exactness of the sequence, we see $db \in \operatorname{im}{f}$, that is, there exists some $a \in A^{q+1}$ such that $f(a)=db$. Further, $a$ is closed since

and we already know that $f$ has trivial kernel (which contains $da$).

$d^{\ast}$ is therefore defined by

where $[\cdot]$ means “the homology class of”.

But it is expected that $d^{\ast}$ is a well-defined homomorphism. Let $c_q$ and $c_q’$ be two closed forms in $C^q$. To show $d^{\ast}$ is well-defined, we suppose $[c_q]=[c_q’]$ (i.e. they are homologous). Choose $b_q$ and $b_q’$ so that $g(b_q)=c_q$ and $g(b_q’)=c_q’$. Accordingly, we also pick $a_{q+1}$ and $a_{q+1}’$ such that $f(a_{q+1})=db_q$ and $f(a_{q+1}’)=db_q’$. By definition of $d^{\ast}$, we need to show that $[a_{q+1}]=[a_{q+1}’]$.

Recall the properties of factor group. $[c_q]=[c_q’]$ if and only if $c_q-c_q’ \in \operatorname{im}d$. Therefore we can pick some $c_{q-1} \in C^{q-1}$ such that $c_q-c_q’=dc_{q-1}$. Again, by the surjectivity of $g$, there is some $b_{q-1}$ such that $g(b_{q-1})=c_{q-1}$.

Note that

Therefore $b_q-b_q’-db_{q-1} \in \operatorname{im} f$. We are able to pick some $a_q \in A^{q}$ such that $f(a_q)=b_q-b_q’-db_{q-1}$. But now we have

Since $f$ is injective, we have $da_q=a_{q+1}-a_{q+1}’$, which implies that $a_{q+1}-a_{q+1}’ \in \operatorname{im}d$. Hence $[a_{q+1}]=[a_{q+1}’]$.

To show that $d^{\ast}$ is a homomorphism, note that $g(b_q+b_q’)=c_q+c_q’$ and $f(a_{q+1}+a_{q+1}’)=d(b_q+b_q’)$. Thus we have

The latter equals $[a_{q+1}]+[a_{q+1}’]$ since the canonical map is a homomorphism. Therefore we have

Therefore the long sequence exists. It remains to prove exactness. Firstly we need to prove exactness at $H^q(B)$. Pick $[b] \in H^q(B)$. If there is some $a \in A^q$ such that $f(a)=b$, then $g(f(a))=0$. Therefore $g^{\ast}[b]=g^{\ast}[f(a)]=[g(f(a))]=[0]$; hence $\operatorname{im}f \subset \ker g$.

Conversely, suppose now $g^{\ast}[b]=[0]$, we shall show that there exists some $[a] \in H^q(A)$ such that $f^{\ast}[a]=[b]$. Note $g^{\ast}[b]=\operatorname{im}d$ where $d$ is the differential operator of $C$ (why?). Therefore there exists some $c_{q-1} \in C^{q-1}$ such that $g(b)=dc_{q-1}$. Pick some $b_{q-1}$ such that $g(b_{q-1})=c_{q-1}$. Then we have

Therefore $f(a)=b-db_{q-1}$ for some $a \in A^q$. Note $a$ is closed since

and $f$ is injective. $db=0$ since we have

Furthermore,

Therefore $\ker g^{\ast} \subset \operatorname{im} f$ as desired.

Now we prove exactness at $H^q(C)$. (Notation:) pick $[c_q] \in H^q(C)$, there exists some $b_q$ such that $g(b_q)=c_q$; choose $a_{q+1}$ such that $f(a_{q+1})=db_q$. Then $d^{\ast}[c_q]=[a_{q+1}]$ by definition.

If $[c_q] \in \operatorname{im}g^{\ast}$, we see $[c_q]=[g(b_q)]=g^{\ast}[b_q]$. But $b_q$ is closed since $[b_q] \in H^q(B)$, we see $f(a_{q+1})=db_q=0$, therefore $d^{\ast}[c_q]=[a_{q+1}]=[0]$ since $f$ is injective. Therefore $\operatorname{im}g^{\ast} \subset \ker d^{\ast}$.

Conversely, suppose $d^{\ast}[c^q]=[0]$. By definition of $H^{q+1}(A)$, there is some $a_q \in A$ such that $da_q = a_{q+1}$ (can you see why?). We claim that $b_q-f(a_q)$ is closed and we have $[c_q]=g^{\ast}[b_q-f(a_q)]$.

By direct computation,

Meanwhile

Therefore $\ker d^{\ast} \subset \operatorname{im}g^{\ast}$. Note that $g(f(a_q))=0$ by exactness.

Finally, we prove exactness at $H^{q+1}(A)$. Pick $\alpha \in H^{q+1}(A)$. If $\alpha \in \operatorname{im}d^{\ast}$, then $\alpha=[a_{q+1}]$ where $f(a_{q+1})=db_q$ by definition. Then

Therefore $\alpha \in \ker f^{\ast}$. Conversely, if we have $f^{\ast}(\alpha)=[0]$, pick the representative element of $\alpha$, namely we write $\alpha=[a]$; then $[f(a)]=[0]$. But this implies that $f(a) \in \operatorname{im}d$ where $d$ denotes the differential operator of $B$. There exists some $b_{q+1} \in B^{q+1}$ and $b_q \in B^q$ such that $db_{q}=b_{q+1}$. Suppose now $c_q=g(b_q)$. $c_q$ is closed since $dc_q=g(db_q)=g(b_{q+1})=g(f(a))=0$. By definition, $\alpha=d^{\ast}[c_q]$. Therefore $\ker f^{\ast} \subset \operatorname{im}d^{\ast}$.

Remarks

As you may see, almost every property of the diagram has been used. The exactness at $B^q$ ensures that $g(f(a))=0$. The definition of $H^q(A)$ ensures that we can simplify the meaning of $[0]$. We even use the injectivity of $f$ and the surjectivity of $g$.

This proof is also a demonstration of diagram-chasing technique. As you have seen, we keep running through the diagram to ensure that there is “someone waiting” at the destination.

This long exact group is useful. Here is an example.

Application: Mayer-Vietoris Sequence

By differential forms on a open set $U \subset \mathbb{R}^n$, we mean

And the de Rham cohomology of $U$ comes up in the nature of things.

We are able to compute the cohomology of the union of two open sets. Suppose $M=U \cup V$ is a manifold with $U$ and $V$ open, and $U \amalg V$ is the disjoint union of $U$ and $V$ (the coproduct in the category of sets). $\partial_0$ and $\partial_1$ are inclusions of $U \cap V$ in $U$ and $V$ respectively. We have a natural sequence of inclusions

Since $\Omega^{*}$ can also be treated as a contravariant functor from the category of Euclidean spaces with smooth maps to the category of commutative differential graded algebras and their homomorphisms, we have

By taking the difference of the last two maps, we have

The sequence above is a short exact sequence. Therefore we may use the zig-zag lemma to find a long exact sequence (which is also called the Mayer-Vietoris sequence) by

An example

This sequence allows one to compute the cohomology of two union of two open sets. For example, for $H^{*}_{DR}(\mathbb{R}^2-P-Q)$, where $P(x_p,y_p)$ and $Q(x_q,y_q)$ are two distinct points in $\mathbb{R}^2$, we may write

and

Therefore we may write $M=\mathbb{R}^2$, $U=\mathbb{R}^2-P$ and $V=\mathbb{R}^2-Q$. For $U$ and $V$, we have another decomposition by

where

But

is a four-time (homeomorphic) copy of $\mathbb{R}^2$. So things become clear after we compute $H^{\ast}_{DR}(\mathbb{R}^2)$.

References / Further reading

  • Raoul Bott, Loring W. Tu, Differential Forms in Algebraic Topology
  • Munkres J. R., Elements of Algebraic Topology
  • Micheal Spivak, Calculus on Manifolds
  • Serge Lang, Algebra

The Big Three Pt. 4 - The Open Mapping Theorem (F-Space)

The Open Mapping Theorem

We are finally going to prove the open mapping theorem in $F$-space. In this version, only metric and completeness are required. Therefore it contains the Banach space version naturally.

(Theorem 0) Suppose we have the following conditions:

  1. $X$ is a $F$-space,
  2. $Y$ is a topological space,
  3. $\Lambda: X \to Y$ is continuous and linear, and
  4. $\Lambda(X)$ is of the second category in $Y$.

Then $\Lambda$ is an open mapping.

Proof. Let $B$ be a neighborhood of $0$ in $X$. Let $d$ be an invariant metric on $X$ that is compatible with the $F$-topology of $X$. Define a sequence of balls by

where $r$ is picked in such a way that $B_0 \subset B$. To show that $\Lambda$ is an open mapping, we need to prove that there exists some neighborhood $W$ of $0$ in $Y$ such that

To do this however, we need an auxiliary set. In fact, we will show that there exists some $W$ such that

We need to prove the inclusions one by one.


The first inclusion requires BCT. Since $B_2 -B_2 \subset B_1$, and $Y$ is a topological space, we get

Since

according to BCT, at least one $k\Lambda(B_2)$ is of the second category in $Y$. But scalar multiplication $y\mapsto ky$ is a homeomorphism of $Y$ onto $Y$, we see $k\Lambda(B_2)$ is of the second category for all $k$, especially for $k=1$. Therefore $\overline{\Lambda(B_2)}$ has nonempty interior, which implies that there exists some open neighborhood $W$ of $0$ in $Y$ such that $W \subset \overline{\Lambda(B_1)}$. By replacing the index, it’s easy to see this holds for all $n$. That is, for $n \geq 1$, there exists some neighborhood $W_n$ of $0$ in $Y$ such that $W_n \subset \overline{\Lambda(B_n)}$.


The second inclusion requires the completeness of $X$. Fix $y_1 \in \overline{\Lambda(B_1)}$, we will show that $y_1 \in \Lambda(B)$. Pick $y_n$ inductively. Assume $y_n$ has been chosen in $\overline{\Lambda(B_n)}$. As stated before, there exists some neighborhood $W_{n+1}$ of $0$ in $Y$ such that $W_{n+1} \subset \overline{\Lambda(B_{n+1})}$. Hence

Therefore there exists some $x_n \in B_n$ such that

Put $y_{n+1}=y_n-\Lambda x_n$, we see $y_{n+1} \in W_{n+1} \subset \overline{\Lambda(B_{n+1})}$. Therefore we are able to pick $y_n$ naturally for all $n \geq 1$.

Since $d(x_n,0)<\frac{r}{2^n}$ for all $n \geq 0$, the sums $z_n=\sum_{k=1}^{n}x_k$ converges to some $z \in X$ since $X$ is a $F$-space. Notice we also have

we have $z \in B_0 \subset B$.

By the continuity of $\Lambda$, we see $\lim_{n \to \infty}y_n = 0$. Notice we also have

we see $y_1 = \Lambda z \in \Lambda(B)$.

The whole theorem is now proved, that is, $\Lambda$ is an open mapping. $\square$

Remarks

You may think the following relation comes from nowhere:

But it’s not. We need to review some set-point topology definitions. Notice that $y_n$ is a limit point of $\Lambda(B_n)$, and $y_n-W_{n+1}$ is a open neighborhood of $y_n$. If $(y_n - W_{n+1}) \cap \Lambda(B_{n})$ is empty, then $y_n$ cannot be a limit point.

The geometric series by

is widely used when sum is taken into account. It is a good idea to keep this technique in mind.

Corollaries

The formal proof will not be put down here, but they are quite easy to be done.

(Corollary 0) $\Lambda(X)=Y$.

This is an immediate consequence of the fact that $\Lambda$ is open. Since $Y$ is open, $\Lambda(X)$ is an open subspace of $Y$. But the only open subspace of $Y$ is $Y$ itself.

(Corollary 1) $Y$ is a $F$-space as well.

If you have already see the commutative diagram by quotient space (put $N=\ker\Lambda$), you know that the induced map $f$ is open and continuous. By treating topological spaces as groups, by corollary 0 and the first isomorphism theorem, we have

Therefore $f$ is a isomorphism; hence one-to-one. Therefore $f$ is a homeomorphism as well. In this post we showed that $X/\ker{\Lambda}$ is a $F$-space, therefore $Y$ has to be a $F$-space as well. (We are using the fact that $\ker{\Lambda}$ is a closed set. But why closed?)

(Corollary 2) If $\Lambda$ is a continuous linear mapping of an $F$-space $X$ onto a $F$-space $Y$, then $\Lambda$ is open.

This is a direct application of BCT and open mapping theorem. Notice that $Y$ is now of the second category.

(Corollary 3) If the linear map $\Lambda$ in Corollary 2 is injective, then $\Lambda^{-1}:Y \to X$ is continuous.

This comes from corollary 2 directly since $\Lambda$ is open.

(Corollary 4) If $X$ and $Y$ are Banach spaces, and if $\Lambda: X \to Y$ is a continuous linear bijective map, then there exist positive real numbers $a$ and $b$ such that

for every $x \in X$.

This comes from corollary 3 directly since both $\Lambda$ and $\Lambda^{-1}$ are bounded as they are continuous.

(Corollary 5) If $\tau_1 \subset \tau_2$ are vector topologies on a vector space $X$ and if both $(X,\tau_1)$ and $(X,\tau_2)$ are $F$-spaces, then $\tau_1 = \tau_2$.

This is obtained by applying corollary 3 to the identity mapping $\iota:(X,\tau_2) \to (X,\tau_1)$.

(Corollary 6) If $\lVert \cdot \rVert_1$ and $\lVert \cdot \rVert_2$ are two norms in a vector space $X$ such that

  • $\lVert\cdot\rVert_1 \leq K\lVert\cdot\rVert_2$.
  • $(X,\lVert\cdot\rVert_1)$ and $(X,\lVert\cdot\rVert_2)$ are Banach

Then $\lVert\cdot\rVert_1$ and $\lVert\cdot\rVert_2$ are equivalent.

This is merely a more restrictive version of corollary 5.

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it’s time to make a list of the series. It’s been around half a year.

The completeness of the quotient space (topological vector space)

The Goal

We are going to show the completeness of $X/N$ where $X$ is a TVS and $N$ a closed subspace. Alongside, a bunch of useful analysis tricks will be demonstrated (and that’s why you may find this blog post a little tedious.). But what’s more important, the theorem proved here will be used in the future.

The main process

To make it clear, we should give a formal definition of $F$-space.

A topological space $X$ is an $F$-space if its topology $\tau$ is induced by a complete invariant metric $d$.

A metric $d$ on a vector space $X$ will be called invariant if for all $x,y,z \in X$, we have

By complete we mean every Cauchy sequence of $(X,d)$ converges.

Defining the quotient metric $\rho$

The metric can be inherited to the quotient space naturally (we will use this fact latter), that is

If $X$ is a $F$-space, $N$ is a closed subspace of a topological vector space $X$, then $X/N$ is still a $F$-space.

Suppose $d$ is a complete invariant metric compatible with $\tau_X$. The metric on $X/N$ is defined by

$\rho$ is a metric

Proof. First, if $\pi(x)=\pi(y)$, that is, $x-y \in N$, we see

If $\pi(x) \neq \pi(y)$ however, we shall show that $\rho(\pi(x),\pi(y))>0$. In this case, we have $x-y \notin N$. Since $N$ is closed, $N^c$ is open, and $x-y$ is an interior point of $X-N$. Therefore there exists an open ball $B_r(x-y)$ centered at $x-y$ with radius $r>0$ such that $B_r(x-y) \cap N = \varnothing$. Notice we have $d(x-y,z)>r$ since otherwise $z \in B_r(x-y)$. By putting

we see $d(x-y,z) \geq r_0$ for all $z \in N$ and indeed $r_0=\inf_{z \in N}d(x-y,z)>0$ (the verification can be done by contradiction). In general, $\inf_z d(x-y,z)=0$ if and only if $x-y \in \overline{N}$.

Next, we shall show that $\rho(\pi(x),\pi(y))=\rho(\pi(y),\pi(x))$, and it suffices to assume that $\pi(x) \neq \pi(y)$. Sgince $d$ is translate invariant, we get

Therefore the $\inf$ of the left hand is equal to the one of the right hand. The identity is proved.

Finally, we need to verify the triangle inequality. Let $r,s,t \in X$. For any $\varepsilon>0$, there exist some $z_\varepsilon$ and $z_\varepsilon’$ such that

Since $d$ is invariant, we see

(I owe @LeechLattice for the inequality above.)

Therefore

(Warning: This does not imply that $\rho(\pi(r),\pi(s))+\rho(\pi(s),\pi(t))=\inf_z d(r-t,z)$ since we don’t know whether it is the lower bound or not.)

If $\rho(\pi(r),\pi(s))+\rho(\pi(s),\pi(t))<\rho(\pi(r),\pi(t))$ however, let

then there exists some $z’’_\varepsilon=z_\varepsilon+z’_\varepsilon$ such that

which is a contradiction since $\rho(\pi(r),\pi(t)) \leq d(r-t,z)$ for all $z \in N$.

(We are using the $\varepsilon$ definition of $\inf$. See here.)

$\rho$ is translate invariant

Since $\pi$ is surjective, we see if $u \in X/N$, there exists some $a \in X$ such that $\pi(a)=u$. Therefore

$\rho$ is well-defined

If $\pi(x)=\pi(x’)$ and $\pi(y)=\pi(y’)$, we have to show that $\rho(\pi(x),\pi(y))=\rho(\pi(x’),\pi(y’))$. In fact,

since $\rho(\pi(x),\pi(x’))=0$ as $\pi(x)=\pi(x’)$. Meanwhile

therefore $\rho(\pi(x),\pi(y))=\rho(\pi(x’),\pi(y’))$.

$\rho$ is compatible with $\tau_N$

By proving this, we need to show that a set $E \subset X/N$ is open with respect to $\tau_N$ if and only if $E$ is a union of open balls. But we need to show a generalized version:

If $\mathscr{B}$ is a local base for $\tau$, then the collection $\mathscr{B}_N$, which contains all sets $\pi(V)$ where $V \in \mathscr{B}$, forms a local base for $\tau_N$.

Proof. We already know that $\pi$ is continuous, linear and open. Therefore $\pi(V)$ is open for all $V \in \mathscr{B}$. For any open set around $E \subset X/N$ containing $\pi(0)$, we see $\pi^{-1}(E)$ is open, and we have

and therefore


Now consider the local base $\mathscr{B}$ containing all open balls around $0 \in X$. Since

we see $\rho$ determines $\mathscr{B}_N$. But we have already proved that $\rho$ is invariant; hence $\mathscr{B}_N$ determines $\tau_N$.

If $d$ is complete, then $\rho$ is complete.

Once this is proved, we are able to claim that, if $X$ is a $F$-space, then $X/N$ is still a $F$-space, since its topology is induced by a complete invariant metric $\rho$.

Proof. Suppose $(x_n)$ is a Cauchy sequence in $X/N$, relative to $\rho$. There is a subsequence $(x_{n_k})$ with $\rho(x_{n_k},x_{n_{k+1}})<2^{-k}$. Since $\pi$ is surjective, we are able to pick some $z_k \in X$ such that $\pi(z_k) = x_{n_k}$ and such that

(The existence can be verified by contradiction still.) By the inequality above, we see $(z_k)$ is Cauchy (can you see why?). Since $X$ is complete, $z_k \to z$ for some $z \in X$. By the continuity of $\pi$, we also see $x_{n_k} \to \pi(z)$ as $k \to \infty$. Therefore $(x_{n_k})$ converges. Hence $(x_n)$ converges since it has a convergent subsequence. $\rho$ is complete.

Remarks

This fact will be used to prove some corollaries in the open mapping theorem. For instance, for any continuous linear map $\Lambda:X \to Y$, we see $\ker(\Lambda)$ is closed, therefore if $X$ is a $F$-space, then $X/\ker(\Lambda)$ is a $F$-space as well. We will show in the future that $X/\ker(\Lambda)$ and $\Lambda(X)$ are homeomorphic if $\Lambda(X)$ is of the second category.

There are more properties that can be inherited by $X/N$ from $X$. For example, normability, metrizability, local convexity. In particular, if $X$ is Banach, then $X/N$ is Banach as well. To do this, it suffices to define the quotient norm by

Introducing Riemann-Stieltjes Integral

Motivation

Riemann-Stieltjes integral is a generalisation of Riemann integral, the one every college student studies in their calculus class, and is a little bit more difficult to understand. Nevertheless it has advantages of its own, as we will show below. Before seeing the definition and properties of this integral, we first raise some questions that will can motivate our study.

When talking about $\int_a^b fdg$, one may simply think about $\int_a^b fg’dx$. But is it even necessary that $g$ is differentiable? What would happen if $g$ is simply continuous, or even not continuous? Further, given that $g$ is differentiable, can we prove that

in a general way(without assuming $f$ is differentiable)? Although integration can be connected to differentiation, it should not be mandatory to lock ourselves into $C^1$ functions, $C^2$ functions or smooth functions all the time.

Another motivation comes from probability theory. Oftentimes one need to consider discrete case ($\sum$) and continuous case ($\int$) separately. One may say that integral is the limit of summation, but it would be weird to write $\int$ as $\lim\sum$ every time. However, if we have a way to write a sum, for example the expected value of a discrete variable ($E(X)$), as an integral, things would be easier. Of course, we don’t want to write such a sum as another sum by adding up the integral on several disjoint segments. That would be weirder.

If you have learned measure theory, you will know that Lebesgue integral does not perfectly cover Riemann integral. For example, $\int_{0}^{\infty}\frac{\sin{x}}{x}dx$ is not integrable in the sense of Lebesgue but Riemann. We cannot treat Lebesgue integral as a perfect generalization of Riemann integral. In this blog post however, we will be studying a faithful generalization of Riemann integral, adding the name of Stieltjes.

We are trying our best to prevent ourselves from using $\sup$, $\inf$, and differentiation theory. But $\varepsilon-\delta$ language is heavily used here, so make sure that you are good at it.

Riemann-Stieltjes Integral

By a partition $P$ on $[a,b]$ we mean a sequence of numbers $(x_n)$ such that

and we associate its size by

Let $f$, $g$ be bounded real function on $[a,b]$ (again, no continuity or differentiability required). Given a partition $P$ and numbers $c_k$ with $x_k \leq c_k \leq x_{k+1}$, we define the Riemann-Stieltjes sum (RS-sum) by

We say that the limit

exists if there exists some $L \in \mathbb{R}$ such that give $\varepsilon>0$, there exists $\delta>0$ such that whenever $\sigma(P)<\delta$, we have

In this case, we say $f$ is $RS(g)$-integrable, and the limit is denoted by

This is the so-called Riemann-Stieltjes integral. When $g(x)=x$, we get Riemann integral naturally.

Remarks: Further generalization still available

This integral method can be generalized to Banach space. Let $f$, $g$ be bounded maps of $[a,b]$ into Banach spaces $E$, $F$ respectively. Assume we have a product $E \times F \to G$ denoted by $(u,v) \mapsto uv$ with $\lVert uv \rVert \leq \lVert u \rVert \lVert v \rVert$. Then by replacing the absolute value by norm, still we get the Riemann-Stieltjes integral, although in this case we have

and $G$ is not necessary to be $\mathbb{R}$. This is different from Bochner integral, since no measure theory evolved here.

Linearity with respect to $f$ and $g$

First, we shall show that RS(g)-integrable functions form a vector space. To do this, it suffices to show that

and

are linear. This follows directly from the definition of RS-sum. Let’s see the result.

Suppose we have

Then we have the following identities for $\alpha \in I$.

  1. $\int_a^b \alpha fdg=\alpha I$.
  2. $\int_a^b (f+h)dg=I+J$.
  3. $\int_a^bfd(g+u)=I+K$.
  4. $\int_a^b fd(\alpha g)=\alpha I$.

Proof. We shall show 2 for example. Other three identities follows in the same way.

Notice that the existence of the limit of RS-sum depends only on the size of $P$. For $\varepsilon>0$, there exists some $\delta_1,\delta_2>0$ such that

when $\sigma(P)<\delta_1$ and $\sigma(P)<\delta_2$ respectively. By picking $\delta=\min(\delta_1,\delta_2)$, we see for $\sigma(P)<\delta$, we have

Integration by parts but no differentiation

$f \in RS(g)$ if and only if $g \in RS(f)$. In this case, we also have integration by parts:

You may not believe it, but differentiation does not play any role here, as promised at the beginning.

Proof. Using the summation by parts (by Abel), we have

By writing

we have

where

Consider the partition $Q$ by

we have $x_0,x_1,\cdots,x_{n-1},x_k$ to be intermediate points, and

Since $0 < \sigma(Q) \leq 2\sigma(P) \leq 4\sigma(Q)$, when $\sigma(P) \to 0$, we also have $\sigma(Q) \to 0$ and vice versa. Suppose now $\int_a^b gdf$ exists, we have

And integration by parts follows.

Suppose $\int_a^bfdg$ exists, then

This proves the proposition. $\square$

The flexibility of Riemann-Stieltjes integral

As said before, we want to represent both continuous and discrete case using integral. For measure theory, we have Lebesgue measure and counting measure. But in some cases, this can be done using Riemann-Stieltjes integral as well. Ordinary Riemann integral and finite or infinite series are all special cases of Riemann-Stieltjes integral.

From integral to series (discrete case)

To do this, we need the unit step function by

If $a<s<b$, $f$ is bounded on $[a,b]$ and continuous at $s$, by putting $g(x)=I(x-s)$, we have

Proof. A simple verification shows that $\int_a^b fdg=\int_s^b fdg$ (by unwinding the RS-sum, one see immediately that $g(x_k)=0$ for all $x_k\leq s$, therefore the partition before $s$ has no tribute to the value of the integral). Now consider the partition $P$ by

We see

As $x_1 \to s$, we have $c_0 \to s$, since $f$ is continuous at $s$, we have $f(c_0) \to f(s)$ as desired. $\square$

By the linearity of RS integral, it is easy to generalize this to the case of finite linear combination. Namely, for $g(x)=\sum_{k=1}^{n}c_nI(x-s_n)$, we have

But now we are discussing the infinite case.

Suppose $c_n \geq 0$ for all $n \ge 0$ and $\sum_{n \ge 0} c_n$ converges, $(s_n)$ is a sequence of distinct points in $(a,b)$, and

Let $f$ be continuous on $[a,b]$. Then

Proof. First it’s easy to see that $g(x)$ converges for every $x$, and is monotonic with $g(a)=0$, $g(b)=\sum_n c_n$. For given $\varepsilon>0$, there exists some $N$ such that

Putting

we have

By putting $M=\sup|f(x)|$, we see

The inequality holds since $|g_2(b)-g_2(a)|<\varepsilon$. Since $M$ is finite, when $N \to \infty$, we have the desired result. $\square$

Transformed into ordinary Riemann integral (continuous case)

Finally we will discuss differentiation. The following theorem shows the connection between RS integral and Riemann integral.

Let $f$ be continuous and suppose that $g$ is real differentiable on $[a,b]$ while $g’$ is Riemann integrable as well, then $f \in RS(g)$ and

Proof. By mean value theorem, for each $k$, we have

The RS-sum can be written as

Since $g’$ is Riemann integrable, we have

given that $|S(P,g’,x)-\int_a^b g’dx|<\varepsilon$. Therefore

where $M=\sup|f(x)|<\infty$ ($f$ is assumed to be bounded.) . Also notice that $fg’$ is integrable since $f$ is continuous. Therefore

Therefore,

which proves the theorem. $\square$

To sum up, given $\varepsilon>0$, there exists some $\delta>0$ such that if $\sigma(P)<\delta$, we have

and

After some estimation, we get