Partition of Unity on Different Manifolds (Part 1. Introduction)

An application of partition of unity

Partition of unity builds a bridge between local properties and global properties. A nice example is the Stokes' theorem on manifolds.

Suppose \(\omega\) is a \((n-1)\)-form with compact support on a oriented manifold \(M\) of dimension \(n\) and if \(\partial{M}\) is given the induced orientation, then \[ \int_M d\omega=\int_{\partial{M}}\omega \]

This theorem can be proved in two steps. First, by Fubini's theorem, one proves the identity on \(\mathbb{R}^n\) and \(\mathbb{H}^n\). Second, for the general case, let \((U_\alpha)\) be an oriented atlas for \(M\) and \((\rho_\alpha)\) a partition of unity to \((U_\alpha)\), one naturally writes \(\omega=\sum_{\alpha}\rho_\alpha\omega\). Since \(\int_M d\omega=\int_{\partial M}\omega\) is linear with respect to \(\omega\), it suffices to prove it only for \(\rho_\alpha\omega\). Note that the support of \(\rho_\alpha\omega\) is contained in the intersection of supports of \(\rho_\alpha\) and \(\omega\), hence a compact set.

On the other hand, \(U_\alpha\) is diffeomorphic to either \(\mathbb{R}^n\) or \(\mathbb{H}^n\), it is immediate that \[ \int_M d\rho_\alpha\omega=\int_{U_\alpha}d\rho_\alpha\omega=\int_{\partial U_\alpha}\rho_\alpha\omega=\int_{\partial{M}}\rho_\alpha\omega. \] Which furnishes the proof for the general case.

As is seen, to prove a global thing, we do it locally. If you have trouble with these terminologies, never mind. We will go through this right now (in a more abstract way however). If you are familiar with them however, fell free to skip.

Prerequisites

Manifold (of finite or infinite dimension)

Throughout, we use bold letters like \(\mathbf{E}\), \(\mathbf{F}\) to denote Banach spaces. We will treat Euclidean spaces as a case instead of our restriction. Indeed since Banach spaces are not necessarily of finite dimension, our approach can be troublesome. But the benefit is a better view of abstraction.

Let \(X\) be a set. An atlas of class \(C^p\) (\(p \geq 0\)) on \(X\) is a collection of pairs \((U_i,\varphi_i)\) where \(i\) ranges through some indexing set, satisfying the following conditions:

AT 1. Each \(U_i\) is a subset of \(X\) and \(\bigcup_{i}U_i=X\).

AT 2. Each \(\varphi_i\) is a bijection of \(U_i\) onto an open subset \(\varphi_iU_i\) of some Banach space \(\mathbf{E}_i\) and for any \(i\) and \(j\), \(\phi_i(U_i \cap U_j)\) is open in \(E_i\).

AT 3. The map \[ \varphi_j\circ\varphi_i^{-1}:\varphi_i(U_i \cap U_j) \to \varphi_j(U_i \cap U_j) \] is a \(C^p\)-isomorphism for all \(i\) and \(j\).

One should be advised that isomorphism here does not come from group theory, but category theory. Precisely speaking, it's the isomorphism in the category \(\mathfrak{O}\) whose objects are the continuous maps of Banach spaces and whose morphisms are the continuous maps of class \(C^p\).

Also, by setting \(\tau_X=(U_i)_i\), we see \(\tau_X\) is a topology, and \(\varphi_i\) are topological isomorphisms. Also, we see no need to assume that \(X\) is Hausdorff unless we start with Hausdorff spaces. Lifting this restriction gives us more freedom (also sometimes more difficulty to some extent though).

For condition AT 2, we did not require that the vector spaces be the same for all indexes \(i\), or even that they be toplinearly isomorphic. If they are all equal to the same space \(\mathbf{E}\), then we say that the atlas is an \(\mathbf{E}\)-atlas.

Suppose that we are given an open subset \(U\) of \(X\) and a topological isomorphism \(\phi:U \to U'\) onto an open subset of some Banach space \(\mathbb{E}\). We shall say that \((U,\varphi)\) is compatible with the atlas \((U_i,\varphi_i)_i\) if each map \(\varphi\circ\varphi^{-1}\) is a \(C^p\)-isomorphism. Two atlas are said to be compatible if each chart of one is compatible with other atlas. It can be verified that this is a equivalence relation. An equivalence relation of atlases of class \(C^p\) on \(X\) is said to define a structure of \(C^p\)-manifold on \(X\). If all the vector spaces \(\mathbf{E}_i\) in some atlas are toplinearly isomorphic, we can find some universal \(\mathbf{E}\) that is equal to all of them. In this case, we say \(X\) is a \(\mathbf{E}\)-manifold or that \(X\) is modeled on \(\mathbf{E}\).

As we know, \(\mathbb{R}^n\) is a Banach space. If \(\mathbf{E}=\mathbb{R}^n\) for some fixed \(n\), then we say that the manifold is \(n\)-dimensional. Also we have the local coordinates. A chart \[ \varphi:U \to \mathbb{R}^n \] is given by \(n\) coordinate functions \(\varphi_1,\cdots,\varphi_n\). If \(P\) denotes a point of \(U\), these functions are often written \[ x_1(P),\cdots,x_n(P), \] or simply \(x_1,\cdots,x_n\).

Topological prerequisites

Let \(X\) be a topological space. A covering \(\mathfrak{U}\) of \(X\) is locally finite if every point \(x\) has a neighborhood \(U\) such that all but a finite number of members of \(\mathfrak{U}\) do not intersect with \(U\) (as you will see, this prevents some nonsense summation). A refinement of a covering \(\mathfrak{U}\) is a covering \(\mathfrak{U}'\) such that for any \(U' \in \mathfrak{U}'\), there exists some \(U \in \mathfrak{U}\) such that \(U' \subset U\). If we write \(\mathfrak{U} \leq \mathfrak{U}'\) in this case, we see that the set of open covers on a topological space forms a direct set.

A topological space is paracompact if it is Hausdorff, and every open covering has a locally finite open refinement. Here follows some examples of paracompact spaces.

  1. Any compact Hausdorff space.
  2. Any CW complex.
  3. Any metric space (hence \(\mathbb{R}^n\)).
  4. Any Hausdorff Lindelöf space.
  5. Any Hausdorff \(\sigma\)-compact space

These are not too difficult to prove, and one can easily find proofs on the Internet. Below are several key properties of paracompact spaces.

If \(X\) is paracompact, then \(X\) is normal. (Proof here)

Let \(X\) be a paracompact (hence normal) space and \(\mathfrak{U}=(U_i)\) a locally finite open cover, then there exists a locally finite open covering \(\mathfrak{V}=(V_i)\) such that \(\overline{V_i} \subset U_i\). (Proof here. Note the axiom of choice is assumed.

One can find proofs of the following propositions on Elements of Mathematics, General Topology, Chapter 1-4 by N. Bourbaki. It's interesting to compare them to the corresponding ones of compact spaces.

Every closed subspace \(F\) of a paracompact space \(X\) is paracompact.

The product of a paracompact space and a compact space is paracompact.

Let \(X\) be a locally compact paracompact space. Then every open covering \(\mathfrak{R}\) of \(X\) has a locally finite open refinement \(\mathfrak{R}'\) formed of relatively compact sets. If \(X\) is \(\sigma\)-compact then \(\mathfrak{R}'\) can be taken to be countable.

Partition of unity

A partition of unity (of class \(C^p\)) on a manifold \(X\) consists of an open covering \((U_i)\) of \(X\) and a family of functions \[ \psi_i:X \to \mathbb{R} \] satisfying the following conditions:

PU 1. For all \(x \in X\) we have \(\phi_i(x) \geq 0\).

PU 2. The support of \(\psi_i\) is contained in \(U_i\).

PU 3. The covering is locally finite

PU 4. For each point \(x \in X\) we have \[ \sum_{i}\psi_i(x)=1 \]

The sum in PU 4 makes sense because for given point \(x\), there are only finite many \(i\) such that \(\psi_i(x) >0\), according to PU 3.

A manifold \(X\) will be said to admit partition of unity if it is paracompact, and if, given a locally finite open covering \((U_i)\), there exists a partition of unity \((\psi_i)\) such that the support of \(\psi_i\) is contained in \(U_i\).

Bump function

This function will be useful when dealing with finite dimensional case.

For every integer \(n\) and every real number \(\delta>0\) there exist maps \(\psi_n \in C^{\infty}(\mathbb{R}^n;\mathbb{R})\) which equal \(1\) on \(B(0,1)\) and vanish in \(\mathbb{R}^n\setminus B(1,1+\delta)\).

Proof. It suffices to prove it for \(\mathbb{R}\) since once we proved the existence of \(\psi_1\), then we may write \[ \psi_n(x_1,x_2,\cdots,x_n)=\psi_1(\sqrt{x_1^2+x_2^2+\cdots+x_n^2}). \] Consider the function \(\phi: \mathbb{R} \to \mathbb{R}\) defined by \[ \phi(t)= \begin{cases} \exp\left(\frac{1}{(t-a)(t-b)}\right)&\quad\text{if } a<t<b,\\ 0 &\quad \text{otherwise}. \end{cases} \] The reader may have seen it in some analysis course and should be able to check that \(\phi \in C^{\infty}(\mathbb{R};\mathbb{R})\). Integrating \(\phi\) from \(-\infty\) to \(x\) and divide it by \(\lVert \phi \rVert_1\) (you may have done it in probability theory) to obtain \[ \theta(x)=\frac{\int_{-\infty}^{x}\phi(t)dt}{\int_{-\infty}^{+\infty}\phi(t)dt}; \] it is immediate that \(\theta(x)=0\) for \(x \leq a\) and \(\theta(x)=1\) for \(x \geq b\). By taking \(a=1\) and \(b=(1+\delta)^2\), our job is done by letting \(\psi_1(x)=1-\theta(x^2)\). Considering \(x^2=|x|^2\), one sees that the identity about \(\psi_n\) and \(\psi_1\) is redundant. \(\square\)

In the following blog posts, we will generalize this to Hilbert spaces.

Is partition of unity ALWAYS available?

Of course this is desirable. But we will give an example that sometimes we cannot find a satisfying partition of unity.

Let \(D\) be a connected bounded open set in \(\ell^p\) where \(p\) is not an even integer. Assume \(f\) is a real-valued function, continuous on \(\overline{D}\) and \(n\)-times differentiable in \(D\) with \(n \geq p\). Then \(f(\overline{D}) \subset \overline{f(\partial D)}\).

(Corollary) Let \(f\) be an \(n\)-times differentiable function on \(\ell^p\) space, where \(n \geq p\), and \(p\) is not an even integer. If \(f\) has its support in a bounded set, then \(f\) is identically zero.

It follows that for \(n \geq p\), \(C^n\) partitions of unity do not exists whenever \(p\) is not an even integer. For example,e \(\ell^1[0,1]\) does not have a \(C^2\) partition of unity. It is then our duty to find that under what condition does the desired partition of unity available.

Existence of partition of unity

Below are two theorems about the existence of partitions of unity. We are not proving them here but in the future blog post since that would be rather long. The restrictions on \(X\) are acceptable. For example \(\mathbb{R}^n\) is locally compact and hence the manifold modeled on \(\mathbb{R}^n\).

Let \(X\) be a manifold which is locally compact Hausdorff and whose topology has a countable base. Then \(X\) admits partitions of unity

Let \(X\) be a paracompact manifold of class \(C^p\), modeled on a separable Hilbert space \(E\), then \(X\) admits partitions of unity (of class \(C^p\))

References

  • N. Bourbaki, Elements of Mathematics
  • S. Lang, Fundamentals of Differential Geometry
  • M. Berger, Differential Geometry: Manifolds, Curves, and Surfaces
  • R. Bonic and J. Frampton, Differentiable Functions on Certain Banach Spaces

Stirling公式的几种经典估计

Stirling公式

对于\(\Gamma\)函数,我们有一个经典的极限式(证明请见ProofWiki)。 \[ \lim_{n\to\infty}\frac{\Gamma(x+1)}{(x/e)^x\sqrt{2\pi{x}}}=1. \] 利用这个式子,我们能立刻计算出一些比较难算的极限。注意到这个公式如果写成自然数的形式,有 \[ \lim_{n \to\infty}\frac{n!}{(n/e)^n\sqrt{2\pi{n}}}=1 \] 所以我们能立刻计算出这个极限: \[ \begin{aligned} \lim_{n \to\infty}\sqrt\frac{n!}{n^n} &= \lim_{n \to\infty}\sqrt\frac{n!\cdot (n/e)^n\sqrt{2\pi{n}}}{n^n \cdot (n/e)^n\sqrt{2\pi n}} \\ &= \lim_{n \to\infty} \sqrt\frac{(n/e)^n\sqrt{2\pi n}}{n^n} \\ &=\frac{1}{e} \end{aligned} \] 但是Stirling公式不仅仅如此。这篇博客里我们会见到几个比较经典的估计。

原数列的取值区间

这一节我们会看到的结论是 \[ 1 < \frac{n!}{(n/e)^n\sqrt{2\pi n}}\leq\frac{e}{\sqrt{2\pi}} \] 如果在计算器里算一下右边的数,会发现,\(\phi_n=\frac{n!}{(n/e)^n\sqrt{2\pi n}}\)一直在\(1\)附近。

对于\(m=1,2,3,\dots\),在\(y=\ln(x)\)下方定义“折线函数”: \[ f(x)=(m+1-x)\ln{m}+(x-m)\ln(m+1) \] 其中\(m \leq x \leq m+1\)。在上方定义另一个“折线函数”: \[ g(x)=\frac{x}{m}-1+\ln{m} \] 其中\(m-1/2 \leq x < m+1/2\)。如果画出\(f\)\(\ln{x}\)\(g\)的图像,会发现,\(f\)\(g\)是对\(\ln{x}\)的拟合。且在\(x \geq 1\)时,我们有 \[ f \leq \ln{x} \leq g. \] 所以计算定积分的时候就有 \[ \int_1^n f(x)dx \leq \int_1^n \ln{x}dx=n\ln{n}-n+1 \leq \int_1^n g(x)dx \] 但是\(f\)\(g\)的关系并不是那么简单。计算\(f\)的积分,我们发现 \[ \begin{aligned} \int_1^n f(x)dx &=\sum_{k=1}^{n-1}\int_{k}^{k+1}f(x)dx \\ &=\sum_{k=1}^{n-1}\left((k+\frac{1}{2})(\ln(k+1)-\ln{k})+(k+1)\ln{k}-k\ln(k+1)\right) \\ &=\ln(n!)-\frac{1}{2}\ln{n} \end{aligned} \] 而对于\(g\),我们又有 \[ \begin{aligned} \int_1^n g(x)dx &= \left(\int_{1}^{\frac{2}{3}}+\sum_{k=2}^{n-1}\int_{k-\frac{1}{2}}^{k+\frac{1}{2}}+\int_{n-\frac{1}{2}}^n\right)g(x)dx \\ &= \frac{1}{8}+\sum_{k=2}^{n-1}\ln{k}+\frac{1}{2}\ln{n}-\frac{1}{8n} \\ &=\frac{1}{8}-\frac{1}{8n}+\ln(n!)-\frac{1}{2}\ln{n} \end{aligned} \] 这就说明 \[ \int_1^n f(x)dx > \int_1^n g(x)dx - \frac{1}{8} \] 总结上面几个不等式,我们得到,对\(n>1\)\[ \int_1^n g(x)dx -\frac{1}{8}<\int_1^n f(x)dx < \int_1^n g(x)dx \] 不等式各项都减去\(\int_1^n \ln x dx\),我们又有 \[ -\frac{1}{8n}+\ln(n!)-(\frac{1}{2}+n)\ln{n}+n < \ln(n!)-(n+\frac{1}{2})\ln{n}+n<\frac{1}{8}-\frac{1}{8n}+\ln(n!)-(n+\frac{1}{2})\ln{n}+n \] 由Stirling公式我们知道, \[ \ln(n!)-(n+\frac{1}{2})\ln{n} + n \to \ln\sqrt{2\pi} \] 而数列\(x_n=-\frac{1}{8n}+\ln(n!)-(\frac{1}{2}+n)\ln{n}+n\)是单调递增的,由上式可知收敛到\(\ln\sqrt{2\pi}\)。在不等式左边,我们取上确界\(\ln\sqrt{2\pi}\)。在不等式右边,我们取下确界\(x_1+\frac{1}{8}=1\)。这就让我们得到了 \[ \ln\sqrt{2\pi}<\ln\frac{n!}{(n/e)^n\sqrt{n}}<1 \] 这也就导致 \[ 1<\frac{n!}{(n/e)^n\sqrt{2\pi n}}\leq\frac{e}{\sqrt{2\pi}} \] 这对所有\(n =1,2,3,\dots\)都成立。

平移\(\Gamma\)函数

对于任意\(c \in \mathbb{R}\),我们有 \[ \lim_{x \to \infty}\frac{\Gamma(x+c)}{x^c\Gamma(x)}=1 \] 这可以看成,把\(\Gamma(x)\)向左平移\(c\)后,在\(x\)足够大时,其值和\(x^c\Gamma(x)\)接近。这个等式的证明也是比较简单的,虽然计算比较繁琐,只需要利用Stirling公式。 \[ \begin{aligned} \lim_{x \to \infty}\frac{\Gamma(x+c)}{x^c\Gamma(x)} &= \lim_{x \to \infty}\frac{\left(\frac{x+c-1}{e}\right)^{x+c-1}\sqrt{2\pi(x+c-1)}}{x^c\left(\frac{x-1}{e}\right)^{x-1}\sqrt{2\pi(x-1)}} \\ &=\lim_{x \to \infty}\sqrt{\frac{x+c-1}{x-1}} \left(\frac{x+c-1}{ex}\right)^c\left(\frac{x+c-1}{x-1}\right)^{x-1} \end{aligned} \] 现在这三个因式的极限就很好计算了。显然我们有 \[ \lim_{x \to \infty}\sqrt\frac{x+c-1}{x-1}=1 \] 以及 \[ \lim_{x \to \infty}\left(\frac{x+c-1}{ex}\right)^c=\frac{1}{e^c}. \] 最后, \[ \lim_{x \to \infty}\left(\frac{x+c-1}{x-1}\right)^{x-1}=\lim_{x \to \infty} \left(1+\frac{c}{x-1}\right)^{x-1}=e^c \] 故原极限为\(1\)。计算过程也非常精彩。注意到如果把\(x\)\(c\)换成正整数\(n\)和整数\(k\),我们又有 \[ \lim_{n \to \infty}\frac{(n+k-1)!}{n^k(n-1)!}=1. \]

估计定积分

结合Bernoulli不等式我们有 \[ \begin{aligned} \int_{-1}^1 (1-x^2)^ndx &\geq \int_{-1/\sqrt{n}}^{1/\sqrt{n}}(1-x^2)^ndx \\ &\geq \int_{-1/\sqrt{n}}^{1/\sqrt{n}} (1-nx^2)dx \\ &=\frac{4}{3\sqrt{n}}. \end{aligned} \] 接下来我们会给出一个比较精细的估计。实际上, \[ \lim_{n \to \infty}\sqrt{n}\int_{-1}^1 (1-x^2)^ndx=\sqrt{\pi}. \] 根据\(B(x,y)\)函数的定义, \[ B(x,y)=\int_0^1 t^{x-1}(1-t)^{y-1}dt=\frac{\Gamma(x)\Gamma(y)}{\Gamma(x+y)} \] 令$t=u^2,我们得到 \[ B(x,y)=2\int_0^1 u^{2x-1}(1-u^2)^{y-1}du \] 代入\(x=\frac{1}{2}\)\(y=n+1\),我们就和所想要的结果很近了: \[ \begin{aligned} B(\frac{1}{2},n+1)&=2\int_0^1(1-u^2)^ndu \\ &=\int_{-1}^{1}(1-u^2)^ndu \\ &=B(\frac{1}{2},n+1) \\ &=\frac{\Gamma(\frac{1}{2})\Gamma(n+1)}{\Gamma(n+\frac{3}{2})} \end{aligned} \] 注意到,利用\(B\)函数的第二个表达式,我们是可以计算出\(\Gamma(\frac{1}{2})\)的。实际上, \[ B(\frac{1}{2},\frac{1}{2})=2\int_0^1\frac{1}{\sqrt{1-u^2}}du=\pi \] 从而\(\Gamma(\frac{1}{2})=\sqrt{\pi}\)。对于\(B(\frac{1}{2},n+1)\),我们可以用到上面的平移公式了: \[ \lim_{n \to \infty}\frac{\Gamma(n+\frac{3}{2})}{\sqrt{n}\Gamma(n+1)}=1. \] 从而 \[ \lim_{n \to \infty}\sqrt{n}\int_{-1}^{1}(1-x^2)^ndx=\lim_{n \to \infty} \frac{\sqrt{n}\Gamma(\frac{1}{2})\Gamma(n+1)}{\Gamma(n+\frac{1}{2})}=\sqrt{\pi} \]

额外内容

最后我们证明一个和Stirling公式没有关系的等式 \[ \Gamma\left(\frac{1}{n}\right)\Gamma\left(\frac{2}{n}\right)\cdots\Gamma\left(\frac{n-1}{n}\right)=\frac{(2\pi)^{\frac{n-1}{2}}}{\sqrt{n}} \] 根据古典代数学基本定理,我们立刻有 \[ 1+x+x^2+\cdots+x^{n-1}=\prod_{k=1}^{n-1}\left(x-e^{\frac{2k\pi i}{n}}\right). \] 注意到另一方面 \[ 1+x+x^2+\cdots+x^{n-1}=\frac{x^{n}-1}{x-1} \] \(x=1\)时,我们有 \[ \begin{aligned} n=\prod_{k=1}^{n-1}\left(1-e^{\frac{2k\pi i}{n}}\right)&=\prod_{k=1}^{n-1}\left( e^{-\frac{k\pi i}{n}}-e^{\frac{k \pi i}{n}} \right)e^{\frac{k \pi i}{n}} \\ &=\prod_{k=1}^{n-1}-2i\sin\frac{k\pi}{n}e^{\frac{k\pi i}{n}} \\ &=\left\vert \prod_{k=1}^{n}-2i\sin\frac{k\pi}{n}e^{\frac{k\pi i}{n}}\right\vert \\ &=2^{n-1}\prod_{k=1}^{n-1}\sin\frac{k\pi}{n} \end{aligned} \] 此即 \[ \prod_{k=1}^{n-1}\sin\frac{k\pi}{n}=\frac{n}{2^{n-1}}. \]

考虑到欧拉反射公式,对于\(1 \leq k \leq n-1\),我们有 \[ \Gamma\left(\frac{k}{n}\right)\Gamma\left(\frac{n-k}{n}\right)=\frac{\pi}{\sin\frac{k\pi}{n}}=\frac{\pi}{\sin\frac{(n-k)\pi}{n}} \] 如果\(n\)为奇数,那么根据上面的结果,我们能得到 \[ \Gamma\left(\frac{1}{n}\right)\Gamma\left(\frac{2}{n}\right)\cdots\Gamma\left(\frac{n-1}{n}\right)=\prod_{k=1}^{\frac{n-1}{2}}\Gamma\left(\frac{n-k}{n}\right)\Gamma\left(\frac{k}{n}\right)=\frac{\pi^{(n-1)/2}}{\prod_{k=1}^{(n-1)/2}\sin(k\pi/n)} \] 这时我们只用到了一半数量的\(k\)。要用上另一半的\(k\),我们只需要把\(k\)\(n-k\)交换顺序,从而得到了 \[ \left[\Gamma\left(\frac{1}{n}\right)\Gamma\left(\frac{2}{n}\right)\cdots\Gamma\left(\frac{n-1}{n}\right)\right]^2=\frac{\pi^{n-1}}{n/2^{n-1}} \] 即为所得。如果\(n\)为偶数,只需要把\(1/2\)这一项单独拿出来分两段计算即可。


2020.11.9更新

我们给出两个看上去很难计算的极限式。

\[ \lim_{n \to \infty}\left(1+\frac{1}{n}\right)^{n^2}\frac{n!}{n^n\sqrt{n}} \]

如果用Stirling公式直接替换\(n!\),这个极限的结果是显然的。 \[ \begin{aligned} \lim_{n \to \infty}\left(1+\frac{1}{n}\right)^{n^2}\frac{n!}{n^n\sqrt{n}} &=\lim_{n \to \infty}\left(1+\frac{1}{n}\right)^{n^2}\frac{n!}{n^n\sqrt{n}}\frac{(n/e)^n\sqrt{2\pi n}}{n!} \\ &=\lim_{n \to \infty}\left(1+\frac{1}{n}\right)^{n^2}\frac{\sqrt{2\pi}}{e^n} \end{aligned} \] 所以只需要求\((1+\frac{1}{n})^{n^2}e^{-n}\)的极限即可。但是可千万别想当然地认为这个极限是\(1\)。如果我们利用Taylor展开,能得到 \[ \begin{aligned} \lim_{n \to \infty}\left(1+\frac{1}{n}\right)^{n^2}e^{-n}&=\lim_{n \to \infty}\exp\left(n^2\ln\left(1+\frac{1}{n}\right)-n\right) \\ &=\lim_{n \to \infty} \exp\left(n^2\left(\frac{1}{n}-\frac{1}{2n^2}+o\left(\frac{1}{n^2}\right)\right)-n\right) \\ &=\frac{1}{\sqrt{e}} \end{aligned} \] 所以原极限为\(\sqrt\frac{2\pi}{e}\)

\[ \lim_{n\to\infty} \sqrt{n}\prod_{k=1}^{n}\frac{e^{1-\frac{1}{k}}}{\left(1+\frac{1}{k}\right)^k} \]

注意\(n\)项的分子相乘,有\(\exp(n-1-\frac{1}{2}-\cdots-\frac{1}{n})\),而调和级数是发散的,我们想得到收敛,自然就要想到Euler常数\(\gamma=\lim_{n\to\infty}\left(1+\frac{1}{2}+\cdots+\frac{1}{n}-\ln{n}\right)\)。我们似乎也没有办法直接化简分母,我们知道\((1+1/k)^k\)的极限是\(e\),但是这里似乎用不上。所以不如先把分母展开化简一下。 \[ \prod_{k=1}^{n}\left(1+\frac{1}{k}\right)^k=\frac{2^1\cdot3^2\cdot4^3\cdot4^5\cdots{(n+1)^n}}{1^1\cdot2^2\cdot3^3\cdot4^4\cdots{n^n}}=\frac{(n+1)^n}{n!} \] 所以原极限可以写成 \[ \lim_{n \to \infty}\sqrt{n}\frac{n!e^{n-1-\frac{1}{2}-\cdots-\frac{1}{n}}}{(n+1)^n} \] 这时候就可以直接使用Stirling公式了。 \[ \begin{aligned} \lim_{n \to \infty}\sqrt{n}\frac{n!e^{n-1-\frac{1}{2}-\cdots-\frac{1}{n}}}{(n+1)^n} &=\lim_{n\to\infty}\sqrt{n}\frac{n!e^{n-1-\frac{1}{2}-\cdots-\frac{1}{n}}}{(n+1)^n}\cdot\frac{(n/e)^n\sqrt{2\pi n}}{n!} \\ &=\sqrt{2\pi}\lim_{n\to\infty}\frac{n^{n+1}}{(n+1)^n}e^{-1-\frac{1}{2}-\cdots-\frac{1}{n}} \\ &=\sqrt{2\pi}\lim_{n\to\infty}\left(1+\frac{1}{n}\right)^{-n}\cdot e^{\ln{n}}\cdot e^{-1-\frac{1}{2}-\frac{1}{3}-\cdots-\frac{1}{n}} \end{aligned} \]\(\lim_{n\to\infty}\left(1+\frac{1}{n}\right)^{-n}=e^{-1}\)\(\lim_{n\to\infty}e^{\ln{n}-1-\frac{1}{2}-\frac{1}{3}-\cdots-\frac{1}{n}}=e^{-\gamma}\),我们得到原极限为\(\frac{\sqrt{2\pi}}{e^{1+\gamma}}\)

A proof of the ordinary Gleason-Kahane-Żelazko theorem for complex functionals

The Theorem

(Gleason-Kahane-Żelazko) If \(\phi\) is a complex linear functional on a unitary Banach algebra \(A\), such that \(\phi(e)=1\) and \(\phi(x) \neq 0\) for every invertible \(x \in A\), then \[ \phi(xy)=\phi(x)\phi(y) \] Namely, \(\phi\) is a complex homomorphism.

Notations and remarks

Suppose \(A\) is a complex unitary Banach algebra and \(\phi: A \to \mathbb{C}\) is a linear functional which is not identically \(0\) (for convenience), and if \[ \phi(xy)=\phi(x)\phi(y) \] for all \(x \in A\) and \(y \in A\), then \(\phi\) is called a complex homomorphism on \(A\). Note that a unitary Banach algebra (with \(e\) as multiplicative unit) is also a ring, so is \(\mathbb{C}\), we may say in this case \(\phi\) is a ring-homomorphism. For such \(\phi\), we have an instant proposition:

Proposition 0 \(\phi(e)=1\) and \(\phi(x) \neq 0\) for every invertible \(x \in A\).

Proof. Since \(\phi(e)=\phi(ee)=\phi(e)\phi(e)\), we have \(\phi(e)=0\) or \(\phi(e)=1\). If \(\phi(e)=0\) however, for any \(y \in A\), we have \(\phi(y)=\phi(ye)=\phi(y)\phi(e)=0\), which is an excluded case. Hence \(\phi(e)=1\).

For invertible \(x \in A\), note that \(\phi(xx^{-1})=\phi(x)\phi(x^{-1})=\phi(e)=1\). This can't happen if \(\phi(x)=0\). \(\square\)

The theorem reveals that Proposition \(0\) actually characterizes the complex homomorphisms (ring-homomorphisms) among the linear functionals (group-homomorphisms).

This theorem was proved by Andrew M. Gleason in 1967 and later independently by J.-P. Kahane and W. Żelazko in 1968. Both of them worked mainly on commutative Banach algebras, and the non-commutative version, which focused on complex homomorphism, was by W. Żelazko. In this post we will follow the third one.

Unfortunately, one cannot find an educational proof on the Internet with ease, which may be the reason why I write this post and why you read this.

Equivalences

Following definitions of Banach algebra and some logic manipulation, we have several equivalences worth noting.

Subspace and ideal version

(Stated by Gleason) Let \(M\) be a linear subspace of codimension one in a commutative Banach algebra \(A\) having an identity. Suppose no element of \(M\) is invertible, then \(M\) is an ideal.

(Stated by Kahane and Żelazko) A subspace \(X \subset A\) of codimension \(1\) is a maximal ideal if and only if it consists of non-invertible elements.

Spectrum version

(Stated by Kahane and Żelazko) Let \(A\) be a commutative complex Banach algebra with unit element. Then a functional \(f \in A^\ast\) is a multiplicative linear functional if and only if \(f(x)=\sigma(x)\) holds for all \(x \in A\).

Here \(\sigma(x)\) denotes the spectrum of \(x\).

The connection

Clearly any maximal ideal contains no invertible element (if so, then it contains \(e\), then it's the ring itself). So it suffices to show that it has codimension 1, and if it consists of non-invertible elements. Also note that every maximal ideal is the kernel of some complex homomorphism. For such a subspace \(X \subset A\), since \(e \notin X\), we may define \(\phi\) so that \(\phi(e)=1\), and \(\phi(x) \in \sigma(x)\) for all \(x \in A\). Note that \(\phi(e)=1\) holds if and only if \(\phi(x) \in \sigma(x)\). As we will show, \(\phi\) has to be a complex homomorphism.

Tools to prove the theorem

Lemma 0 Suppose \(A\) is a unitary Banach algebra, \(x \in A\), \(\lVert x \rVert<1\), then \(e-x\) is invertible.

This lemma can be found in any functional analysis book introducing Banach algebra.

Lemma 1 Suppose \(f\) is an entire function of one complex variable, \(f(0)=1\), \(f'(0)=0\), and \[ 0<|f(\lambda)| \leq e^{|\lambda|} \] for all complex \(\lambda\), then \(f(\lambda)=1\) for all \(\lambda \in \mathbb{C}\).

Note that there is an entire function \(g\) such that \(f=\exp(g)\). It can be shown that \(g=0\). Indeed, if we put \[ h_r(\lambda) = \frac{r^2g(\lambda)}{\lambda^2[2r-g(\lambda)]} \] then we see \(h_r\) is holomorphic in the open disk centred at \(0\) with radius \(2r\). Besides, \(|h_r(\lambda)| \leq 1\) if \(|\lambda|=r\). By the maximum modulus theorem, we have \[ |h_r(\lambda)| \leq 1 \] whenever \(|\lambda| \leq r\). Fix \(\lambda\) and let \(r \to \infty\), by definition of \(h_r(\lambda)\), we must have \(g(\lambda)=0\).

Jordan homomorphism

A mapping \(\phi\) from one ring \(R\) to another ring \(R'\) is said to be a Jordan homomorphism from \(R\) to \(R'\) if \[ \phi(a+b)=\phi(a)+\phi(b) \] and \[ \phi(ab+ba)=\phi(a)\phi(b)+\phi(b)\phi(a). \] It's of course clear that every homomorphism is Jordan. Note if \(R'\) is not of characteristic \(2\), the second identity is equivalent to \[ \phi(a^2)=\phi(a)^2. \] To show the equivalence, one let \(b=a\) in the first case and puts \(a+b\) in place of \(a\) in the second case.

Since in this case \(R=A\) and \(R'=\mathbb{C}\), the latter of which is commutative, we also write \[ \phi(ab+ba)=2\phi(a)\phi(b). \] As we will show, the \(\phi\) in the theorem is a Jordan homomorphism.

The proof

We will follow an unusual approach. By keep 'downgrading' the goal, one will see this algebraic problem be transformed into a pure analysis problem neatly.

To begin with, let \(N\) be the kernel of \(\phi\).

Step 1 - It suffices to prove that \(\phi\) is a Jordan homomorphism

If \(\phi\) is a complex homomorphism, it is immediate that \(\phi\) is a Jordan homomorphism. Conversely, if \(\phi\) is Jordan, we have \[ \phi(xy+yx) =2\phi(x)\phi(y). \] If \(x\in N\), the right hand becomes \(0\), and therefore \[ xy+yx \in N \quad \text{if } x \in N, y \in A. \] Consider the identity \[ (xy-yx)^2+(xy+yx)^2=2[x(yxy)+(yxy)x] \]

Therefore \[ \begin{aligned} \phi((xy-yx)^2+(xy+yx)^2)&=\phi((xy-yx)^2)+\phi((xy+yx)^2) \\ &=\phi(xy-yx)^2+\phi(xy+yx)^2 \\ &= \phi(xy-yx)^2 \\ &=2\phi[x(yxy)+(yxy)x] \\ &=0 \end{aligned} \] Since \(x \in N\) and \(yxy \in A\), we see \(x(yxy)+(yxy)x \in N\). Therefore \(\phi(xy-yx)=0\) and \[ xy-yx \in N \] if \(x \in N\) and \(y \in A\). Further we see \[ xy-yx+xy+yx=2xy \in N \quad \text {and}\quad xy+yx-xy+yx = 2yx \in N, \] which implies that \(N\) is an ideal. This may remind you of this classic diagram (we will not use it since it is additive though):

Ring Homomorphism

For \(x,y \in A\), we have \(x \in \phi(x)e+N\) and \(y \in \phi(y)e+N\). As a result, \(xy \in \phi(x)\phi(y)e+N\), and therefore \[ \phi(xy)=\phi(x)\phi(y)+0. \]

Step 2 - It suffices to prove that \(\phi(a^2)=0\) if \(\phi(a)=0\).

Again, if \(\phi\) is Jordan, we have \(\phi(x^2)=\phi(x)^2\) for all \(x \in A\). Conversely, if \(\phi(a^2)=0\) for all \(a \in N\), we may write \(x\) by \[ x=\phi(x)e+a \] where \(a \in N\) for all \(x \in A\). Therefore \[ \begin{aligned} \phi(x^2)&=\phi((\phi(x)e+a)^2)=\phi(x)^2+2\phi(x)\phi(a)+\phi(a)^2=\phi(x)^2, \end{aligned} \] which also shows that \(\phi\) is Jordan.

Step 3 - It suffices to show that the following function is constant

Fix \(a \in N\), assume \(\lVert a \rVert = 1\) without loss of generality, and define \[ f(\lambda)=\sum_{n=0}^{\infty}\frac{\phi(a^n)}{n!}\lambda^n \] for all complex \(\lambda\). If this function is constant (lemma 1), we immediately have \(f''(0)=\phi(a^2)=0\). This is purely a complex analysis problem however.

Step 4 - It suffices to describe the behaviour of an entire function

Note in the definition of \(f\), we have \[ \lvert \phi(a^n) \rvert \leq \lVert \phi \rVert \lVert a^n \rVert \leq \lVert \phi \rVert \lVert a \rVert^n=\lVert \phi \rVert. \] So we expect the norm of \(\phi\) to be finite, which ensures that \(f\) is entire. By reductio ad absurdum, if \(\lVert e-a \rVert < 1\) for \(a \in N\), by lemma 0, we have \(e-e+a=a\) to be invertible, which is impossible. Hence \(\lVert e-a \rVert \geq 1\) for all \(a \in N\). On the other hand, for \(\lambda \in \mathbb{C}\), we have the following inequality: \[ \begin{aligned} \lVert \lambda e-a \rVert = \lambda\lVert e-\lambda^{-1}a \rVert &\geq|\lambda| \\ &= |\phi(\lambda e)-\phi(a)| \\ &= |\phi(\lambda e-a)| \end{aligned} \] Therefore \(\phi\) is continuous with norm less than \(1\). The continuity of \(\phi\) is not assumed at the beginning but proved here.

For \(f\) we have some immediate facts. Since each coefficient in the series of \(f\) has finite norm, \(f\) is entire with \(f'(0)=\phi(a)=0\). Also, since \(\phi\) has norm \(1\), we also have \[ |f(\lambda)|=\left|\sum_{n=0}^{\infty}\frac{\phi(a^n)}{n!}\lambda^n\right| \leq \sum_{n=0}^{\infty}\frac{|\lambda^n|}{n!}=e^{|\lambda|}. \] All we need in the end is to show that \(f(\lambda) \neq 0\) for all \(\lambda \in \mathbb{C}\).

The series \[ E(\lambda)=\exp(a\lambda)=\sum_{n=0}^{\infty}\frac{(\lambda a)^n}{n!} \] converges since \(\lVert a \rVert=1\). The continuity of \(\phi\) shows now \[ f(\lambda)=\phi(E(\lambda)). \] Note \[ E(-\lambda)E(\lambda)=\left(\sum_{n=0}^{\infty}\frac{(-\lambda a)^n}{n!}\right)\left(\sum_{n=0}^{\infty}\frac{(\lambda a)^n}{n!}\right)=e. \] Hence \(E(\lambda)\) is invertible for all \(\lambda \in C\), hence \(f(\lambda)=\phi(E(\lambda)) \neq 0\). By lemma 1, \(f(\lambda)=1\) is constant. The proof is completed by reversing the steps. \(\square\)

References / Further reading

  • Walter Rudin, Real and Complex Analysis
  • Walter Rudin, Functional Analysis
  • Andrew M. Gleason, A Characterization of Maximal Ideals
  • J.-P. Kahane and W. Żelazko, A Characterization of Maximal Ideals in Commutative Banach Algebras
  • W. Żelazko A Characterization of Multiplicative linear functionals in Complex Banach Algebras
  • I. N. Herstein, Jordan Homomorphisms

The Big Three Pt. 5 - The Hahn-Banach Theorem (Dominated Extension)

About this post

The Hahn-Banach theorem has been a central tool for functional analysis and therefore enjoys a wide variety, many of which have a numerous uses in other fields of mathematics. Therefore it's not possible to cover all of them. In this post we are covering two 'abstract enough' results, which are sometimes called the dominated extension theorem. Both of them will be discussed in real vector space where topology is not endowed. This allows us to discuss any topological vector space.

Another interesting thing is, we will be using axiom of choice, or whatever equivalence you may like, for example Zorn's lemma or well-ordering principle. Before everything, we need to examine more properties of vector spaces.

Vector space

It's obvious that every complex vector space is also a real vector space. Suppose \(X\) is a complex vector space, and we shall give the definition of real-linear and complex-linear functionals.

An addictive functional \(\Lambda\) on \(X\) is called real-linear (complex-linear) if \(\Lambda(\alpha x)=\alpha\Lambda(x)\) for every \(x \in X\) and for every real (complex) scalar \(\alpha\).

For *-linear functionals, we have two important but easy theorems.

If \(u\) is the real part of a complex-linear functional \(f\) on \(X\), then \(u\) is real-linear and \[ f(x)=u(x)-iu(ix) \quad (x \in X). \]

Proof. For complex \(f(x)=u(x)+iv(x)\), it suffices to denote \(v(x)\) correctly. But \[ if(x)=iu(x)-v(x), \] we see \(\Im(f(x)=v(x)=-\Re(if(x))\). Therefore \[ f(x)=u(x)-i\Re(if(x))=u(x)-i\Re(f(ix)) \] but \(\Re(f(ix))=u(ix)\), we get \[ f(x)=u(x)-iu(ix). \] To show that \(u(x)\) is real-linear, note that \[ f(x+y)=u(x+y)+iv(x+y)=f(x)+f(y)=u(x)+u(y)+i(v(x)+v(y)). \] Therefore \(u(x)+u(y)=u(x+y)\). Similar process can be applied to real scalar \(\alpha\). \(\square\)

Conversely, we are able to generate a complex-linear functional by a real one.

If \(u\) is a real-linear functional, then \(f(x)=u(x)-iu(ix)\) is a complex-linear functional

Proof. Direct computation. \(\square\)

Suppose now \(X\) is a complex topological vector space, we see a complex-linear functional on \(X\) is continuous if and only if its real part is continuous. Every continuous real-linear \(u: X \to \mathbb{R}\) is the real part of a unique complex-linear continuous functional \(f\).

Sublinear, seminorm

Sublinear functional is 'almost' linear but also 'almost' a norm. Explicitly, we say \(p: X \to \mathbb{R}\) a sublinear functional when it satisfies \[ \begin{aligned} p(x)+p(y) &\leq p(x+y) \\ p(tx) &= tp(x) \\ \end{aligned} \] for all \(t \geq 0\). As one can see, if \(X\) is normable, then \(p(x)=\lVert x \rVert\) is a sublinear functional. One should not be confused with semilinear functional, where inequality is not involved. Another thing worth noting is that \(p\) is not restricted to be nonnegative.


A seminorm on a vector space \(X\) is a real-valued function \(p\) on \(X\) such that \[ \begin{aligned} p(x+y) &\leq p(x)+p(y) \\ p(\alpha x)&=|\alpha|p(x) \end{aligned} \] for all \(x,y \in X\) and scalar \(\alpha\).

Obviously a seminorm is also a sublinear functional. For the connection between norm and seminorm, one shall note that \(p\) is a norm if and only if it satisfies \(p(x) \neq 0\) if \(x \neq 0\).

Dominated extension theorems

Are the results will be covered in this post. Generally speaking, we are able to extend a functional defined on a subspace to the whole space as long as it's dominated by a sublinear functional. This is similar to the dominated convergence theorem, which states that if a convergent sequence of measurable functions are dominated by another function, then the convergence holds under the integral operator.

(Hahn-Banach) Suppose

  1. \(M\) is a subspace of a real vector space \(X\),
  2. \(f: M \to \mathbb{R}\) is linear and \(f(x) \leq p(x)\) on \(M\) where \(p\) is a sublinear functional on \(X\)

Then there exists a linear \(\Lambda: X \to \mathbb{R}\) such that \[ \Lambda(x)=f(x) \] for all \(x \in M\) and \[ -p(-x) \leq \Lambda(x) \leq p(x) \] for all \(x \in X\).

Step 1 - Extending the function by one dimension

With that being said, if \(f(x)\) is dominated by a sublinear functional, then we are able to extend this functional to the whole space with a relatively proper range.

Proof. If \(M=X\) we have nothing to do. So suppose now \(M\) is a nontrivial proper subspace of \(X\). Choose \(x_1 \in X-M\) and define \[ M_1=\{x+tx_1:x \in M,t \in R\}. \] It's easy to verify that \(M_1\) satisfies all axioms of vector space (warning again: no topology is endowed). Now we will be using the properties of sublinear functionals.

Since \[ f(x)+f(y)=f(x+y) \leq p(x+y) \leq p(x-x_1)+p(x_1+y) \] for all \(x,y \in M\), we have \[ f(x)-p(x-x_1) \leq p(x_1+y) -f(y). \] Let \[ \alpha=\sup_{x}\{f(x)-p(x-x_1):x \in M\}. \] By definition, we naturally get \[ f(x)-\alpha \leq p(x-x_1) \] and \[ f(y)+\alpha \leq p(x_1+y). \] Define \(f_1\) on \(M_1\) by \[ f_1(x+tx_1)=f(x)+t\alpha. \] So when \(x +tx_1 \in M\), we have \(t=0\), and therefore \(f_1=f\).

To show that \(f_1 \leq p\) on \(M_1\), note that for \(t>0\), we have \[ f(x/t)-\alpha \leq p(x/t-x_1), \] which implies \[ f(x)-t\alpha=f_1(x-t\alpha)\leq p(x-tx_1). \] Similarly, \[ f(y/t)+\alpha \leq p(y/t+x_1), \] and therefore \[ f(y)+t\alpha=f_1(y+tx_1) \leq p(y+tx_1). \] Hence \(f_1 \leq p\).

Step 2 - An application of Zorn's lemma

Side note: Why Zorn's lemma

It seems that we can never stop using step 1 to extend \(M\) to a larger space, but we have to extend. (If \(X\) is a finite dimensional space, then this is merely a linear algebra problem.) This meets exactly what William Timothy Gowers said in his blog post:

If you are building a mathematical object in stages and find that (i) you have not finished even after infinitely many stages, and (ii) there seems to be nothing to stop you continuing to build, then Zorn’s lemma may well be able to help you.

-- How to use Zorn's lemma

And we will show that, as W. T. Gowers said,

If the resulting partial order satisfies the chain condition and if a maximal element must be a structure of the kind one is trying to build, then the proof is complete.


To apply Zorn's lemma, we need to construct a partially ordered set. Let \(\mathscr{P}\) be the collection of all ordered pairs \((M',f')\) where \(M'\) is a subspace of \(X\) containing \(M\) and \(f'\) is a linear functional on \(M'\) that extends \(f\) and satisfies \(f' \leq p\) on \(M'\). For example we have \[ (M,f) , (M_1,f_1) \subset \mathscr{P}. \] The partial order \(\leq\) is defined as follows. By \((M',f') \leq (M'',f'')\), we mean \(M' \subset M''\) and \(f' = f''\) on \(M'\). Obviously this is a partial order (you should be able to check this).

Suppose now \(\mathcal{F}\) is a chain (totally ordered subset of \(\mathscr{P}\)). We claim that \(\mathcal{F}\) has an upper bound (which is required by Zorn's lemma). Let \[ M_0=\bigcup_{(M',f') \in \mathcal{F}}M' \] and \[ f_0(y)=f(y) \] whenever \((M',f') \in \mathcal{F}\) and \(y \in M'\). It's easy to verify that \((M_0,f_0)\) is the upper bound we are looking for. But \(\mathcal{F}\) is arbitrary, therefore by Zorn's lemma, there exists a maximal element \((M^\ast,f^\ast)\) in \(\mathscr{P}\). If \(M^* \neq X\), according to step 1, we are able to extend \(M^\ast\), which contradicts the maximality of \(M^\ast\). And \(\Lambda\) is defined to be \(f^\ast\). By the linearity of \(\Lambda\), we see \[ -p(-x) \leq -\Lambda(-x)=\Lambda{x}. \] The theorem is proved. \(\square\)

How this proof is constructed

This is a classic application of Zorn's lemma (well-ordering principle, or Hausdorff maximality theorem). First, we showed that we are able to extend \(M\) and \(f\). But since we do not know the dimension or other properties of \(X\), it's not easy to control the extension which finally 'converges' to \((X,\Lambda)\). However, Zorn's lemma saved us from this random exploration: Whatever happens, the maximal element is there, and take it to finish the proof.

Generalisation onto the complex field

Since inequality is appeared in the theorem above, we need more careful validation.

(Bohnenblust-Sobczyk-Soukhomlinoff) Suppose \(M\) is a subspace of a vector space \(X\), \(p\) is a seminorm on \(X\), and \(f\) is a linear functional on \(M\) such that \[ |f(x)| \leq p(x) \] for all \(x \in M\). Then \(f\) extends to a linear functional \(\Lambda\) on \(X\) satisfying \[ |\Lambda (x)| \leq p(x) \] for all \(x \in X\).

Proof. If the scalar field is \(\mathbb{R}\), then we are done, since \(p(-x)=p(x)\) in this case (can you see why?). So we assume the scalar field is \(\mathbb{C}\).

Put \(u = \Re f\). By dominated extension theorem, there is some real-linear functional \(U\) such that \(U(x)=u\) on \(M\) and \(U \leq p\) on \(X\). And here we have \[ \Lambda(x)=U(x)-iU(ix) \] where \(\Lambda(x)=f(x)\) on \(M\).

To show that \(|\Lambda(x)| \leq p(x)\) for \(x \neq 0\), by taking \(\alpha=\frac{|\Lambda(x)|}{\Lambda(x)}\), we have \[ U(\alpha{x})=\Lambda(\alpha{x})=|\Lambda(x)|\leq p(\alpha x)=p(x) \] since \(|\alpha|=1\) and \(p(\alpha{x})=|\alpha|p(x)=p(x)\). \(\square\)

Extending Hahn-Banach theorem under linear transform

To end this post, we state a beautiful and useful extension of the Hahn-Banach theorem, which is done by R. P. Agnew and A. P. Morse.

(Agnew-Morse) Let \(X\) denote a real vector space and \(\mathcal{A}\) be a collection of linear maps \(A_\alpha: X \to X\) that commute, or namely \[ A_\alpha A_\beta=A_\beta A_\alpha \] for all \(A_\alpha,A_\beta \in \mathcal{A}\). Let \(p\) be a sublinear functional such that \[ p(A_\alpha{x})=p(x) \] for all \(A_\alpha \in \mathcal{A}\). Let \(Y\) be a subspace of \(X\) on which a linear functional \(f\) is defined such that

  1. \(f(y) \leq p(y)\) for all \(y \in Y\).
  2. For each mapping \(A\) and \(y \in Y\), we have \(Ay \in Y\).
  3. Under the hypothesis of 2, we have \(f(Ay)=f(y)\).

Then \(f\) can be extended to \(X\) by \(\Lambda\) so that \(-p(-x) \leq \Lambda(x) \leq p(x)\) for all \(x \in X\), and \[ \Lambda(A_\alpha{x})=\Lambda{x}. \]

To prove this theorem, we need to construct a sublinear functional that dominates \(f\). For the whole proof, see Functional Analysis by Peter Lax.

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it's time to make a list of the series. It's been around half a year.

References / Further Readings

  1. Walter Rudin, Functional Analysis.
  2. Peter Lax, Functional Analysis.
  3. William Timothy Gowers, How to use Zorn's lemma.

A long exact sequence of cohomology groups (zig-zag and diagram-chasing)

Exterior differentiation

(This section is intended to introduce the background. Feel free to skip if you already know exterior differentiation.)

There are several useful tools for vector calculus on \(\mathbb{R}^3,\) namely gradient, curl, and divergence. It is possible to treat the gradient of a differentiable function \(f\) on \(\mathbb{R}^3\) at a point \(x_0\) as the Fréchet derivative at \(x_0\). But it does not work for curl and divergence at all. Fortunately there is another abstraction that works for all of them. It comes from differential forms.

Let \(x_1,\cdots,x_n\) be the linear coordinates on \(\mathbb{R}^n\) as usual. We define an algebra \(\Omega^{\ast}\) over \(\mathbb{R}\) generated by \(dx_1,\cdots,dx_n\) with the following relations: \[ \begin{cases} dx_idx_i=0 \\ dx_idx_j = -dx_jdx_i \quad i \neq j \end{cases} \] This is a vector space as well, and it's easy to derive that it has a basis by \[ 1,dx_i,dx_idx_j,dx_idx_jdx_k,\cdots,dx_1\dots dx_n \] where \(i<j<k\). The \(C^{\infty}\) differential forms on \(\mathbb{R}^n\) are defined to be the tensor product \[ \Omega^*(\mathbb{R}^n)=\{C^{\infty}\text{ functions on }\mathbb{R}^n\} \otimes_\mathbb{R}\Omega^*. \] As is can be shown, for \(\omega \in \Omega^{\ast}(\mathbb{R}^n)\), we have a unique representation by \[ \omega=\sum f_{i_1\cdots i_k}dx_{i_1}\dots dx_{i_k}, \] and in this case we also say \(\omega\) is a \(C^{\infty}\) \(k\)-form on \(\mathbb{R}^n\) (for simplicity we also write \(\omega=\sum f_Idx_I\)). The algebra of all \(k\)-forms will be denoted by \(\Omega^k(\mathbb{R}^n)\). And naturally we have \(\Omega^{\ast}(\mathbb{R}^n)\) to be graded since \[ \Omega^{*}(\mathbb{R}^n)=\bigoplus_{k=0}^{n}\Omega^k(\mathbb{R}^n). \]

The operator \(d\)

But if we have \(\omega \in \Omega^0(\mathbb{R}^n)\), we see \(\omega\) is merely a \(C^{\infty}\) function. As taught in multivariable calculus course, for the differential of \(\omega\) we have \[ d\omega=\sum_{i}\partial\omega/\partial x_idx_i \] and it turns out that \(d\omega\in\Omega^{1}(\mathbb{R}^n)\). This inspires us to obtain a generalization onto the differential operator \(d\): \[ \begin{aligned} d:\Omega^{k}(\mathbb{R}^n) &\to \Omega^{k+1}(\mathbb{R}^n) \\ \omega &\mapsto d\omega \end{aligned} \] and \(d\omega\) is defined as follows. The case when \(k=0\) is defined as usual (just the one above). For \(k>0\) and \(\omega=\sum f_I dx_I,\) \(d\omega\) is defined 'inductively' by \[ d\omega=\sum df_I dx_I. \] This \(d\) is the so-called exterior differentiation, which serves as the ultimate abstract extension of gradient, curl, divergence, etc. If we restrict ourself to \(\mathbb{R}^3\), we see these vector calculus tools comes up in the nature of things.

Functions \[ df=\frac{\partial f}{\partial x}dx+\frac{\partial f}{\partial y}dy+\frac{\partial f}{\partial z}dz. \] \(1\)-forms \[ d(f_1dx+f_2dy+f_3dz)=\left(\frac{\partial f_3}{\partial y}-\frac{\partial f_2}{\partial z}\right)dydz-\left(\frac{\partial f_1}{\partial z}-\frac{\partial f_3}{\partial x}\right)dxdz+\left(\frac{\partial f_2}{\partial x}-\frac{\partial f_1}{\partial y}\right)dxdy. \] \(2\)-forms \[ d(f_1dydz-f_2dxdz+f_3dxdy)=\left(\frac{\partial f_1}{\partial x}+\frac{\partial f_2}{\partial y}+ \frac{\partial f_3}{\partial z}\right)dxdydz. \] The calculation is tedious but a nice exercise to understand the definition of \(d\) and \(\Omega^{\ast}\).

Conservative field - on the kernel and image of \(d\)

By elementary computation we are also able to show that \(d^2\omega=0\) for all \(\omega \in \Omega^{\ast}(\mathbb{R}^n)\) (Hint: \(\frac{\partial^2 f}{\partial x_i \partial x_j}=\frac{\partial^2 f}{\partial x_j \partial x_i}\) but \(dx_idx_j=-dx_idx_j\)). Now we consider a vector field \(\overrightarrow{v}=(v_1,v_2)\) of dimension \(2\). If \(C\) is an arbitrary simply closed smooth curve in \(\mathbb{R}^2\), then we expect \[ \oint_C\overrightarrow{v}d\overrightarrow{r}=\oint_C v_1dx+v_2dy \] to be \(0\). If this happens (note the arbitrary of \(C\)), we say \(\overrightarrow{v}\) to be a conservative field (path independent).

So when conservative? It happens when there is a function \(f\) such that \[ \nabla f=\overrightarrow{v}=(v_1,v_2)=(\partial{f}/\partial{x},\partial{f}/\partial{y}). \] This is equivalent to say that \[ df=v_1dx+v_2dy. \] If we use \(C^{\ast}\) to denote the area enclosed by \(C\), by Green's theorem, we have \[ \begin{aligned} \oint_C v_1dx+v_2dy&=\iint_{C^*}\left(\frac{\partial{v_2}}{\partial{x}}-\frac{\partial{v_1}}{\partial{y}}\right)dxdy \\ &=\iint_{C^*}d(v_1dx+v_2dy) \\ &=\iint_{C^*}d^2f \\ &=0 \end{aligned} \] If you translate what you've learned in multivariable calculus course (path independence) into the language of differential form, you will see that the set of all conservative fields is precisely the image of \(d_0:\Omega^0(\mathbb{R}^2) \to \Omega^1(\mathbb{R}^2)\). Also, they are in the kernel of the next \(d_1:\Omega^1(\mathbb{R}^2) \to \Omega^2(\mathbb{R}^2)\). These \(d\)'s are naturally homomorphism, so it's natural to discuss the factor group. But before that, we need some terminologies.

de Rham complex and de Rham cohomology group

The complex \(\Omega^{\ast}(\mathbb{R}^n)\) together with \(d\) is called the de Rham complex on \(\mathbb{R}^n\). Now consider the sequence \[ \Omega^0(\mathbb{R}^n)\xrightarrow{d_0}\Omega^1(\mathbb{R}^n)\xrightarrow{d_1}\cdots\xrightarrow{d_{n-2}}\Omega^{n-1}(\mathbb{R}^n)\xrightarrow{d_{n-1}}\Omega^{n}(\mathbb{R^n}). \] We say \(\omega \in \Omega^k(\mathbb{R}^n)\) is closed if \(d_k\omega=0\), or equivalently, \(\omega \in \ker d_k\). Dually, we say \(\omega\) is exact if there exists some \(\mu \in \Omega^{k-1}(\mathbb{R}^n)\) such that \(d\mu=\omega\), that is, \(\omega \in \operatorname{im}d_{k-1}\). Of course all \(d_k\)'s can be written as \(d\) but the index makes it easier to understand. Instead of doing integration or differentiation, which is 'uninteresting', we are going to discuss the abstract structure of it.

The \(k\)-th de Rham cohomology in \(\mathbb{R}^n\) is defined to be the factor space \[ H_{DR}^{k}(\mathbb{R}^n)=\frac{\ker d_k}{\operatorname{im} d_{k-1}}. \] As an example, note that by the fundamental theorem of calculus, every \(1\)-form is exact, therefore \(H_{DR}^1(\mathbb{R})=0\).

Since de Rham complex is a special case of differential complex, and other restrictions of de Rham complex plays no critical role thereafter, we are going discuss the algebraic structure of differential complex directly.

The long exact sequence of cohomology groups

We are going to show that, there exists a long exact sequence of cohomology groups after a short exact sequence is defined. For the convenience let's recall here some basic definitions

Exact sequence

A sequence of vector spaces (or groups) \[ \cdots \rightarrow G_{k-1} \xrightarrow{f_{k-1}} G_k \xrightarrow{f_k} G_{k+1} \xrightarrow{f_{k+1}}\cdots \] is said to be exact if the image of \(f_{k-1}\) is the kernel of \(f_k\) for all \(k\). Sometimes we need to discuss a extremely short one by \[ 0 \rightarrow A \xrightarrow{f} B \xrightarrow{g} C \rightarrow 0. \] As one can see, \(f\) is injective and \(g\) is surjective.

Differential complex

A direct sum of vector spaces \(C=\oplus_{k \in \mathbb{Z}}C^k\) is called a differential complex if there are homomorphisms by \[ \cdots \rightarrow C^{k-1} \xrightarrow{d_{k-1}} C^k \xrightarrow{d_k} C^{k+1} \xrightarrow{d_{k+1}}\cdots \] such that \(d_{k-1}d_k=0\). Sometimes we write \(d\) instead of \(d_{k}\) since this differential operator of \(C\) is universal. Therefore we may also say that \(d^2=0\). The cohomology of \(C\) is the direct sum of vector spaces $H(C)=_{k }H^k(C) $ where \[ H^k(C)=\frac{\ker d_{k}}{\operatorname{im}d_{k-1}}. \] A map \(f: A \to B\) where \(A\) and \(B\) are differential complexes, is called a chain map if we have \(fd_A=d_Bf\).

The sequence

Now consider a short exact sequence of differential complexes \[ 0 \rightarrow A \xrightarrow{f} B \xrightarrow{g} C \rightarrow 0 \] where both \(f\) and \(g\) are chain maps (this is important). Then there exists a long exact sequence by \[ \cdots\rightarrow H^q(A) \xrightarrow{f^*} H^{q}(B) \xrightarrow{g^*} H^q(C)\xrightarrow{d^{*}}H^{q+1}(A) \xrightarrow{f^*}\cdots. \] Here, \(f^{\ast}\) and \(g^{\ast}\) are the naturally induced maps. For \(c \in C^q\), \(d^{\ast}[c]\) is defined to be the cohomology class \([a]\) where \(a \in A^{q+1}\), and that \(f(a)=db\), and that \(g(b)=c\). The sequence can be described using the two-layer commutative diagram below.

layer-000001

The long exact sequence is actually the purple one (you see why people may call this zig-zag lemma). This sequence is 'based on' the blue diagram, which can be considered naturally as an expansion of the short exact sequence. The method that will be used in the following proof is called diagram-chasing, whose importance has already been described by Professor James Munkres: master this. We will be abusing the properties of almost every homomorphism and group appeared in this commutative diagram to trace the elements.

Proof

First, we give a precise definition of \(d^{\ast}\). For a closed \(c \in C^q\), by the surjectivity of \(g\) (note this sequence is exact), there exists some \(b \in B^q\) such that \(g(b)=c\). But \(g(db)=d(g(b))=dc=0\), we see for \(db \in B^{q+1}\) we have \(db \in \ker g\). By the exactness of the sequence, we see \(db \in \operatorname{im}{f}\), that is, there exists some \(a \in A^{q+1}\) such that \(f(a)=db\). Further, \(a\) is closed since \[ f(da)=d(f(a))=d^2b=0 \] and we already know that \(f\) has trivial kernel (which contains \(da\)).

\(d^{\ast}\) is therefore defined by \[ d^*[c]=[a], \] where \([\cdot]\) means "the homology class of".

But it is expected that \(d^{\ast}\) is a well-defined homomorphism. Let \(c_q\) and \(c_q'\) be two closed forms in \(C^q\). To show \(d^{\ast}\) is well-defined, we suppose \([c_q]=[c_q']\) (i.e. they are homologous). Choose \(b_q\) and \(b_q'\) so that \(g(b_q)=c_q\) and \(g(b_q')=c_q'\). Accordingly, we also pick \(a_{q+1}\) and \(a_{q+1}'\) such that \(f(a_{q+1})=db_q\) and \(f(a_{q+1}')=db_q'\). By definition of \(d^{\ast}\), we need to show that \([a_{q+1}]=[a_{q+1}']\).

Recall the properties of factor group. \([c_q]=[c_q']\) if and only if \(c_q-c_q' \in \operatorname{im}d\). Therefore we can pick some \(c_{q-1} \in C^{q-1}\) such that \(c_q-c_q'=dc_{q-1}\). Again, by the surjectivity of \(g\), there is some \(b_{q-1}\) such that \(g(b_{q-1})=c_{q-1}\).

Note that \[ \begin{aligned} g(b_q-b_q'-db_{q-1})&=c_q-c_{q}'-g(db_{q-1}) \\ &=dc_{q-1}-d(g(b_{q-1})) \\ &=dc_{q-1}-dc_{q-1} \\ &= 0. \end{aligned} \] Therefore \(b_q-b_q'-db_{q-1} \in \operatorname{im} f\). We are able to pick some \(a_q \in A^{q}\) such that \(f(a_q)=b_q-b_q'-db_{q-1}\). But now we have \[ \begin{aligned} f(da_q)=df(a_q)&=d(b_q-b_q'-db_{q-1}) \\ &=db_q-db_q'-d^2b_{q-1} \\ &=db_q-db_q' \\ &=f(a_{q+1}-a_{q+1}'). \end{aligned} \] Since \(f\) is injective, we have \(da_q=a_{q+1}-a_{q+1}'\), which implies that \(a_{q+1}-a_{q+1}' \in \operatorname{im}d\). Hence \([a_{q+1}]=[a_{q+1}']\).

To show that \(d^{\ast}\) is a homomorphism, note that \(g(b_q+b_q')=c_q+c_q'\) and \(f(a_{q+1}+a_{q+1}')=d(b_q+b_q')\). Thus we have \[ d^*[c_q+c_q']=[a_{q+1}+a_{q+1}']. \] The latter equals \([a_{q+1}]+[a_{q+1}']\) since the canonical map is a homomorphism. Therefore we have \[ d^*[c_q+c_q']=d^*[c_q]+d^*[c_q']. \] Therefore the long sequence exists. It remains to prove exactness. Firstly we need to prove exactness at \(H^q(B)\). Pick \([b] \in H^q(B)\). If there is some \(a \in A^q\) such that \(f(a)=b\), then \(g(f(a))=0\). Therefore \(g^{\ast}[b]=g^{\ast}[f(a)]=[g(f(a))]=[0]\); hence \(\operatorname{im}f \subset \ker g\).

Conversely, suppose now \(g^{\ast}[b]=[0]\), we shall show that there exists some \([a] \in H^q(A)\) such that \(f^{\ast}[a]=[b]\). Note \(g^{\ast}[b]=\operatorname{im}d\) where \(d\) is the differential operator of \(C\) (why?). Therefore there exists some \(c_{q-1} \in C^{q-1}\) such that \(g(b)=dc_{q-1}\). Pick some \(b_{q-1}\) such that \(g(b_{q-1})=c_{q-1}\). Then we have \[ g(b-db_{q-1})=g(b)-d(g(b_{q-1}))=g(b)-dc_{q-1}=0. \]

Therefore \(f(a)=b-db_{q-1}\) for some \(a \in A^q\). Note \(a\) is closed since \[ f(da)=df(a)=d(b-db_{q-1})=db-d^2b_{q-1}=db=0 \] and \(f\) is injective. \(db=0\) since we have \[ g(db)=d(g(b))=d(dc_{q-1})=0. \] Furthermore, \[ f^*[a]=[f(a)]=[b-dc_{q-1}]=[b]-[0]=[b]. \] Therefore \(\ker g^{\ast} \subset \operatorname{im} f\) as desired.

Now we prove exactness at \(H^q(C)\). (Notation:) pick \([c_q] \in H^q(C)\), there exists some \(b_q\) such that \(g(b_q)=c_q\); choose \(a_{q+1}\) such that \(f(a_{q+1})=db_q\). Then \(d^{\ast}[c_q]=[a_{q+1}]\) by definition.

If \([c_q] \in \operatorname{im}g^{\ast}\), we see \([c_q]=[g(b_q)]=g^{\ast}[b_q]\). But \(b_q\) is closed since \([b_q] \in H^q(B)\), we see \(f(a_{q+1})=db_q=0\), therefore \(d^{\ast}[c_q]=[a_{q+1}]=[0]\) since \(f\) is injective. Therefore \(\operatorname{im}g^{\ast} \subset \ker d^{\ast}\).

Conversely, suppose \(d^{\ast}[c^q]=[0]\). By definition of \(H^{q+1}(A)\), there is some \(a_q \in A\) such that \(da_q = a_{q+1}\) (can you see why?). We claim that \(b_q-f(a_q)\) is closed and we have \([c_q]=g^{\ast}[b_q-f(a_q)]\).

By direct computation, \[ d(b_q-f(a_q))=db_q-d(f(a_q))=db_q-f(d(a_q))=db_q-f(a_{q+1})=0. \] Meanwhile \[ g^*[b_q-f(a_q)]=[g(b_q)]-[g(f(a_q))]=[c_q]. \] Therefore \(\ker d^{\ast} \subset \operatorname{im}g^{\ast}\). Note that \(g(f(a_q))=0\) by exactness.

Finally, we prove exactness at \(H^{q+1}(A)\). Pick \(\alpha \in H^{q+1}(A)\). If \(\alpha \in \operatorname{im}d^{\ast}\), then \(\alpha=[a_{q+1}]\) where \(f(a_{q+1})=db_q\) by definition. Then \[ f^*(\alpha)=[f(a_{q+1})]=[db_q]=[0]. \] Therefore \(\alpha \in \ker f^{\ast}\). Conversely, if we have \(f^{\ast}(\alpha)=[0]\), pick the representative element of \(\alpha\), namely we write \(\alpha=[a]\); then \([f(a)]=[0]\). But this implies that \(f(a) \in \operatorname{im}d\) where \(d\) denotes the differential operator of \(B\). There exists some \(b_{q+1} \in B^{q+1}\) and \(b_q \in B^q\) such that \(db_{q}=b_{q+1}\). Suppose now \(c_q=g(b_q)\). \(c_q\) is closed since \(dc_q=g(db_q)=g(b_{q+1})=g(f(a))=0\). By definition, \(\alpha=d^{\ast}[c_q]\). Therefore \(\ker f^{\ast} \subset \operatorname{im}d^{\ast}\).

Remarks

As you may see, almost every property of the diagram has been used. The exactness at \(B^q\) ensures that \(g(f(a))=0\). The definition of \(H^q(A)\) ensures that we can simplify the meaning of \([0]\). We even use the injectivity of \(f\) and the surjectivity of \(g\).

This proof is also a demonstration of diagram-chasing technique. As you have seen, we keep running through the diagram to ensure that there is "someone waiting" at the destination.

This long exact group is useful. Here is an example.

Application: Mayer-Vietoris Sequence

By differential forms on a open set \(U \subset \mathbb{R}^n\), we mean \[ \Omega^*(U)=\{C^{\infty}\text{ functions on }U\}\otimes_\mathbb{R}\Omega^*. \] And the de Rham cohomology of \(U\) comes up in the nature of things.

We are able to compute the cohomology of the union of two open sets. Suppose \(M=U \cup V\) is a manifold with \(U\) and \(V\) open, and \(U \amalg V\) is the disjoint union of \(U\) and \(V\) (the coproduct in the category of sets). \(\partial_0\) and \(\partial_1\) are inclusions of \(U \cap V\) in \(U\) and \(V\) respectively. We have a natural sequence of inclusions \[ M \leftarrow U\amalg V \leftleftarrows^{\partial_0}_{\partial_1}\leftleftarrows U \cap V. \] Since \(\Omega^{*}\) can also be treated as a contravariant functor from the category of Euclidean spaces with smooth maps to the category of commutative differential graded algebras and their homomorphisms, we have \[ \Omega^*(M) \rightarrow \Omega^*(U) \oplus \Omega^*(V) \rightrightarrows^{\partial^*_0}_{\partial^*_1}\rightrightarrows\Omega^*({U \cap V}). \] By taking the difference of the last two maps, we have \[ \begin{aligned} 0 \rightarrow \Omega^*(M) \rightarrow \Omega^*(U) \oplus \Omega^*(V) &\rightarrow \Omega^*(U \cap V) \rightarrow 0 \\ (\omega,\tau) &\mapsto \tau-\omega \end{aligned} \] The sequence above is a short exact sequence. Therefore we may use the zig-zag lemma to find a long exact sequence (which is also called the Mayer-Vietoris sequence) by \[ \cdots\to H^q(M) \to H^q(U) \oplus H^q(V) \to H^q(U \cap V) \xrightarrow{d^*} H^{q+1}(M) \to \cdots \]

An example

This sequence allows one to compute the cohomology of two union of two open sets. For example, for \(H^{*}_{DR}(\mathbb{R}^2-P-Q)\), where \(P(x_p,y_p)\) and \(Q(x_q,y_q)\) are two distinct points in \(\mathbb{R}^2\), we may write \[ (\mathbb{R}^2-P)\cap(\mathbb{R}^2-Q)=\mathbb{R}^2-P-Q \] and \[ (\mathbb{R}^2-P)\cup(\mathbb{R}^2-Q)=\mathbb{R}^2. \] Therefore we may write \(M=\mathbb{R}^2\), \(U=\mathbb{R}^2-P\) and \(V=\mathbb{R}^2-Q\). For \(U\) and \(V\), we have another decomposition by \[ \mathbb{R}^2-P=(\mathbb{R}^2-P_x)\cup(\mathbb{R}^2-P_y) \] where \[ P_x=\{(x,y_p):x \in \mathbb{R}\}. \] But \[ (\mathbb{R}^2-P_x)\cap(\mathbb{R}^2-P_y) \] is a four-time (homeomorphic) copy of \(\mathbb{R}^2\). So things become clear after we compute \(H^{\ast}_{DR}(\mathbb{R}^2)\).

References / Further reading

  • Raoul Bott, Loring W. Tu, Differential Forms in Algebraic Topology
  • Munkres J. R., Elements of Algebraic Topology
  • Micheal Spivak, Calculus on Manifolds
  • Serge Lang, Algebra

The Fourier transform of sinx/x and (sinx/x)^2 and more

In this post

We are going to evaluate the Fourier transform of \(\frac{\sin{x}}{x}\) and \(\left(\frac{\sin{x}}{x}\right)^2\). And it turns out to be a comprehensive application of many elementary theorems of single complex variable functions. Thus it is recommended to make sure that you can evaluate and understand all the identities in this post by yourself. Also, make sure that you can recall what all words in italics means.

To be clear, by Fourier transform we actually mean \[ \hat{f}(t) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}f(x)e^{-itx}dx. \] But we omit \(\frac{1}{\sqrt{2\pi}}\) and use \(e^{itx}\) in place of \(e^{-itx}\) because it is easier to compute, and does not change the final result.

Problem 1

For real \(t\), find the limit by \[ \lim_{A \to \infty}\int_{-A}^{A}\frac{\sin{x}}{x}e^{itx}dx. \]

Since \(\frac{\sin{x}}{x}e^{itx}\not\in L^1\), we cannot evaluate the integral of it over \(\mathbb{R}\) directly since it's not defined. Instead, for given \(A>0\), the integral of it over \([-A,A]\) is defined, and we evaluate this limit to get what we want.

We will do this using contour integration. Since the complex function \(f(z)=\frac{\sin{z}}{z}e^{itz}\) is entire, by Cauchy's theorem, its integral over \([-A,A]\) is equal to the one over the path \(\Gamma_A\) by going from \(-A\) to \(-1\) along the real axis, from \(-1\) to \(1\) along the lower half of the unit circle, and from \(1\) to \(A\) along the real axis (why?). Since the path \(\Gamma_A\) avoids the origin, we may use the identity \[ 2i\sin{z}=e^{iz}-e^{-iz}. \] Replacing \(\sin{z}\) with \(\frac{1}{2i}(e^{itz}-e^{-itz})\), we get \[ I_A(t)=\int_{\Gamma_A}f(z)dz=\int_{\Gamma_A}\frac{1}{2iz}(e^{i(t+1)z}-e^{i(t-1)z})dz. \] If we put \(\varphi_A(t)=\int_{\Gamma_A}\frac{1}{2iz}e^{i(t+1)z}dz\), we see \(I_A(t)=\varphi_A(t+1)-\varphi_A(t-1)\). It is convenient to divide \(\varphi_A\) by \(\pi\) since we therefore get \[ \frac{1}{\pi}\varphi_A(t)=\frac{1}{2\pi i}\int_{\Gamma_A}\frac{e^{itz}}{z}dz \] and we are cool with the divisor \(2\pi i\).

Now, finish the path \(\Gamma_A\) in two ways. First, by the semicircle from \(A\) to \(-Ai\) to \(-A\); second, by the semicircle from \(A\) to \(Ai\) to \(-A\), which finishes a circle with radius \(A\) actually. For simplicity we denote the two paths by \(\Gamma_U\) and \(\Gamma_L\) Again by the Cauchy theorem, the first case gives us a integral with value \(0\), thus by Cauchy's theorem, \[ \frac{1}{\pi}\varphi_A(t)=\frac{1}{2\pi i}\int_{-\pi}^{0}\frac{\exp{(itAe^{i\theta})}}{Ae^{i\theta}}dAe^{i\theta}=\frac{1}{2\pi}\int_{-\pi}^{0}\exp{(itAe^{i\theta})}d\theta. \] Notice that \[ \begin{aligned} |\exp(itAe^{i\theta})|&=|\exp(itA(\cos\theta+i\sin\theta))| \\ &=|\exp(itA\cos\theta)|\cdot|\exp(-At\sin\theta)| \\ &=\exp(-At\sin\theta) \end{aligned} \]

hence if \(t\sin\theta>0\), we have \(|\exp(iAte^{i\theta})| \to 0\) as \(A \to \infty\). When \(-\pi < \theta <0\) however, we have \(\sin\theta<0\). Therefore we get \[ \frac{1}{\pi}\varphi_{A}(t)=\frac{1}{2\pi}\int_{-\pi}^{0}\exp(itAe^{i\theta})d\theta \to 0\quad (A \to \infty,t<0). \] (You should be able to prove the convergence above.) Also trivially \[ \varphi_A(0)=\frac{1}{2}\int_{-\pi}^{0}1d\theta=\frac{\pi}{2}. \] But what if \(t>0\)? Indeed, it would be difficult to obtain the limit using the integral over \([-\pi,0]\). But we have another path, namely the upper one.

Note that \(\frac{e^{itz}}{z}\) is a meromorphic function in \(\mathbb{C}\) with a pole at \(0\). For such a function we have \[ \frac{e^{itz}}{z}=\frac{1}{z}\left(1+itz+\frac{(itz)^2}{2!}+\cdots\right)=\frac{1}{z}+it+\frac{(it)^2z}{2!}+\cdots. \] which implies that the residue at \(0\) is \(1\). By the residue theorem, \[ \begin{aligned} \frac{1}{2\pi{i}}\int_{\Gamma_L}\frac{e^{itz}}{z}dz&=\frac{1}{2\pi{i}}\int_{\Gamma_A}\frac{e^{itz}}{z}dz+\frac{1}{2\pi}\int_{0}^{\pi}\exp(itAe^{i\theta})d\theta \\ &=1\cdot\operatorname{Ind}_{\Gamma_L}(0)=1. \end{aligned} \] Note that we have used the change-of-variable formula as we did for the upper one. \(\operatorname{Ind}_{\Gamma_L}(0)\) denotes the winding number of \(\Gamma_L\) around \(0\), which is \(1\) of course. The identity above implies \[ \frac{1}{\pi}\varphi_A(t)=1-\frac{1}{2\pi}\int_{0}^{\pi}\exp{(itAe^{i\theta})}d\theta. \] Thus if \(t>0\), since \(\sin\theta>0\) when \(0<\theta<\pi\), we get \[ \frac{1}{\pi}\varphi_A(t)\to 1 \quad(A \to \infty,t>0). \] But as is already shown, \(I_A(t)=\varphi_A(t+1)-\varphi_A(t-1)\). To conclude, \[ \lim_{A\to\infty}I_A(t)= \begin{cases} \pi\quad &|t|<1, \\ 0 \quad &|t|>1 ,\\ \frac{1}{2\pi} \quad &|t|=1. \end{cases} \]

What we can learn from this integral

Since \(\psi(x)=\left(\frac{\sin{x}}{x}\right)\) is even, dividing \(I_A\) by \(\sqrt{\frac{1}{2\pi}}\), we actually obtain the Fourier transform of it by abuse of language. Therefore we also get \[ \hat\psi(t)= \begin{cases} \sqrt{\frac{\pi}{2}}\quad & |t|<1, \\ 0 \quad & |t|>1, \\ \frac{1}{2\pi\sqrt{2\pi}} & |t|=1. \end{cases} \] Note that \(\hat\psi(t)\) is not continuous, let alone being uniformly continuous. Therefore, \(\psi(x) \notin L^1\). The reason is, if \(f \in L^1\), then \(\hat{f}\) is uniformly continuous (proof). Another interesting fact is, this also implies the value of the Dirichlet integral since we have \[ \begin{aligned} \int_{-\infty}^{\infty}\left(\frac{\sin{x}}{x}\right)dx&=\int_{-\infty}^{\infty}\left(\frac{\sin{x}}{x}\right)e^{0\cdot ix}dx \\ &=\sqrt{2\pi}\hat\psi(0) \\ &=\pi. \end{aligned} \] We end this section by evaluating the inverse of \(\hat\psi(t)\). This requires a simple calculation. \[ \begin{aligned} \sqrt{\frac{1}{2\pi}}\int_{-\infty}^{\infty}\hat\psi(t)e^{itx}dt &= \sqrt{\frac{1}{2\pi}}\int_{-1}^{1}\sqrt{\frac{\pi}{2}}e^{itx}dt \\ &=\frac{1}{2}\cdot\frac{1}{ix}(e^{ix}-e^{-ix}) \\ &=\frac{\sin{x}}{x}. \end{aligned} \]

Problem 2

For real \(t\), compute \[ J=\int_{-\infty}^{\infty}\left(\frac{\sin{x}}{x}\right)^2e^{itx}dx. \]

Now since \(h(x)=\frac{\sin^2{x}}{x^2} \in L^1\), we are able to say with ease that the integral above is the Fourier transform of \(h(x)\) (multiplied by \(\sqrt{2\pi}\)). But still we will be using the limit form by \[ J(t)=\lim_{A \to \infty}J_A(t) \] where \[ J_A(t)=\int_{-A}^{A}\left(\frac{\sin{x}}{x}\right)^2e^{itx}dx. \] And we are still using the contour integration as above (keep \(\Gamma_A\), \(\Gamma_U\) and \(\Gamma_L\) in mind!). For this we get \[ \left(\frac{\sin z}{z}\right)^2e^{itz}=\frac{e^{i(t+2)z}+e^{i(t-2)z}-2e^{itz}}{-4z^2}. \] Therefore it suffices to discuss the function \[ \mu_A(z)=\int_{\Gamma_A}\frac{e^{itz}}{2z^2}dz \] since we have \[ J_A(t)=\mu_A(t)-\frac{1}{2}(\mu_A(t+2)-\mu_A(t-2)). \] Dividing \(\mu_A(z)\) by \(\frac{1}{\pi i}\), we see \[ \frac{1}{\pi i}\mu_A(t)=\frac{1}{2\pi i}\int_{\Gamma_A}\frac{e^{itz}}{z^2}dz. \] An integration of \(\frac{e^{itz}}{z^2}\) over \(\Gamma_L\) gives \[ \begin{aligned} \frac{1}{\pi i}\mu_A(z)&=\frac{1}{2\pi i}\int_{-\pi}^{0}\frac{\exp(itAe^{i\theta})}{A^2e^{2i\theta}}dAe^{i\theta} \\ &=\frac{1}{2\pi}\int_{-\pi}^{0}\frac{\exp(itAe^{i\theta})}{Ae^{i\theta}}d\theta. \end{aligned} \] Since we still have \[ \left|\frac{\exp(itAe^{i\theta})}{Ae^{i\theta}}\right|=\frac{1}{A}\exp(-At\sin\theta), \] if \(t<0\) in this case, \(\frac{1}{\pi i}\mu_A(z) \to 0\) as \(A \to \infty\). For \(t>0\), integrating along \(\Gamma_U\), we have \[ \frac{1}{\pi i}\mu_A(t)=it-\frac{1}{2\pi}\int_{0}^{\pi}\frac{\exp(itAe^{i\theta})}{Ae^{i\theta}}d\theta \to it \quad (A \to \infty) \] We can also evaluate \(\mu_A(0)\) by computing the integral but we are not doing that. To conclude, we have \[ \lim_{A \to\infty}\mu_A(t)=\begin{cases} 0 \quad &t>0, \\ -\pi t \quad &t<0. \end{cases} \] Therefore for \(J_A\) we have \[ J(t)=\lim_{A \to\infty}J_A(t)=\begin{cases} 0 \quad &|t| \geq 2, \\ \pi(1+\frac{t}{2}) \quad &-2<t \leq 0, \\ \pi(1-\frac{t}{2}) \quad & 0<t <2. \end{cases} \] Now you may ask, how did you find the value at \(0\), \(2\) or \(-2\)? \(\mu_A(0)\) is not evaluated. But \(h(t) \in L^1\), \(\hat{h}(t)=\sqrt{\frac{1}{2\pi}}J(t)\) is uniformly continuous, thus continuous, and the values at these points follows from continuity.

What we can learn from this integral

Again, we get the value of a classic improper integral by \[ \int_{-\infty}^{\infty}\left(\frac{\sin{x}}{x}\right)^2dx = J(0)=\pi. \] And this time it's not hard to find the Fourier inverse: \[ \begin{aligned} \sqrt{\frac{1}{2\pi}}\int_{-\infty}^{\infty}\hat{h}(t)e^{itx}dt&=\frac{1}{2\pi}\int_{-\infty}^{\infty}J(t)e^{itx}dt \\ &=\frac{1}{2\pi}\int_{-2}^{2}\pi(1-\frac{1}{2}|t|)e^{itx}dt \\ &=\frac{e^{2ix}+e^{-2ix}-2}{-4x^2} \\ &=\frac{(e^{ix}-e^{-ix})^2}{-4x^2} \\ &=\left(\frac{\sin{x}}{x}\right)^2. \end{aligned} \]

Thereafter you are able to evaluate the improper integral of \(\left(\frac{\sin{x}}{x}\right)^n\). Using Fubini's or Tonelli's theorem is not a good idea. But using the contour integral as such will force you deal with \(n\) binomial coefficients, which might be tedious still. It's even possible to discuss the convergence of the sequence \((I_n)\) where \[ I_n(t)=\lim_{A \to \infty}\int_{-A}^{A}\left(\frac{\sin{x}}{x}\right)^ne^{itx}dx. \]

The Riesz-Markov-Kakutani Representation Theorem

This post

Is intended to establish the existence of the Lebesgue measure in the future, which is often denoted by \(m\). In fact, the Lebesgue measure follows as a special case of R-M-K representation theorem. You may not believe it, but euclidean properties of \(\mathbb{R}^k\) plays no role in the existence of \(m\). The only topological property that works is the fact that \(\mathbb{R}^k\) is a locally compact Hausdorff space.

The theorem is named after F. Riesz who introduced it for continuous functions on \([0,1]\) (with respect to Riemann-Steiltjes integral). Years later, after the generalization done by A. Markov and S. Kakutani, we are able to view it in a locally compact Hausdorff space.

You may find there are some over-generalized properties, but this is intended to have you being able to enjoy more alongside (there are some tools related to differential geometry). Also there are many topology and analysis tricks worth your attention.

Tools

Different kinds of topological spaces

Again, euclidean topology plays no role in this proof. We need to specify the topology for different reasons. This is similar to what we do in linear functional analysis. Throughout, let \(X\) be a topological space.

0.0 Definition. \(X\) is a Hausdorff space if the following is true: If \(p \in X\), \(q\in X\) but \(p \neq q\), then there are two disjoint open sets \(U\) and \(V\) such that \(p \in U\) and \(q \in V\).

0.1 Definition. \(X\) is locally compact if every point of \(X\) has a neighborhood whose closure is compact.

0.2 Remarks. A Hausdorff space is also called a \(T_2\) space (see Kolmogorov classification) or a separated space. There is a classic example of locally compact Hausdorff space: \(\mathbb{R}^n\). It is trivial to verify this. But this is far from being enough. In the future we will see, we can construct some ridiculous but mathematically valid measures.

0.3 Definition. A set \(E \subset X\) is called \(\sigma\)-compact if \(E\) is a countable union of compact sets. Note that every open subset in a euclidean space \(\mathbb{R}^n\) is \(\sigma\)-compact since it can always be a countable union of closed balls (which is compact).

0.4 Definition. A covering of \(X\) is locally finite if every point has a neighborhood which intersects only finitely many elements of the covering. Of course, if the covering is already finite, it's also locally finite.

0.5 Definition. A refinement of a covering of \(X\) is a second covering, each element of which is contained in an element of the first covering.

0.6 Definition. \(X\) is paracompact if it is Hausdorff, and every open covering has a locally finite open refinement. Obviously any compact space is paracompact.

0.7 Theorem. If \(X\) is a second countable Hausdorff space and is locally compact, then \(X\) is paracompact. For proof, see this [Theorem 2.6].

0.8 Theorem. If \(X\) is locally compact and sigma compact, then \(X=\bigcup_{i=1}^{\infty}K_i\) where for all \(i \in \mathbb{N}\), \(K_i\) is compact and \(K_i \subset\operatorname{int}K_{i+1}\).

Partition of unity

The basic technical tool in the theory of differential manifolds is the existence of a partition of unity. We will steal this tool for the application of analysis theory.

1.0 Definition. A partition of unity on \(X\) is a collection \((g_i)\) of continuous real valued functions on \(X\) such that

  1. \(g_i \geq 0\) for each \(i\).
  2. every \(x \in X\) has a neighborhood \(U\) such that \(U \cap \operatorname{supp}(g_i)=\varnothing\) for all but finitely many of \(g_i\).
  3. for each \(x \in X\), we have \(\sum_{i}g_i(x)=1\). (That's why you see the word 'unity'.)

1.1 Definition. A partition of unity \((g_i)\) on \(X\) is subordinate to an open cover of \(X\) if and only if for each \(g_i\) there is an element \(U\) of the cover such that \(\operatorname{supp}(g_i) \subset U\). We say \(X\) admits partitions of unity if and only if for every open cover of \(X\), there exists a partition of unity subordinate to the cover.

1.2 Theorem. A Hausdorff space admits a partition of unity if and only if it is paracompact (the 'only if' part is by considering the definition of partition of unity. For the 'if' part, see here). As a corollary, we have:

1.3 Corollary. Suppose \(V_1,\cdots,V_n\) are open subsets of a locally compact Hausdorff space \(X\), \(K\) is compact, and \[ K \subset \bigcup_{k=1}^{n}V_k. \] Then there exists a partition of unity \((h_i)\) that is subordinate to the cover \((V_n)\) such that \(\operatorname{supp}(h_i) \subset V_i\) and \(\sum_{i=1}^{n}h_i=1\) for all \(x \in K\).

Urysohn's lemma (for locally compact Hausdorff spaces)

2.0 Notation. The notation \[ K \prec f \] will mean that \(K\) is a compact subset of \(X\), that \(f \in C_c(X)\), that \(f(X) \subset [0,1]\), and that \(f(x)=1\) for all \(x \in K\). The notation \[ f \prec V \] will mean that \(V\) is open, that \(f \in C_c(X)\), that \(f(X) \subset [0,1]\) and that \(\operatorname{supp}(f) \subset V\). If both hold, we write \[ K \prec f \prec V. \] 2.1 Remarks. Clearly, with this notation, we are able to simplify the statement of being subordinate. We merely need to write \(g_i \prec U\) in 1.1 instead of \(\operatorname{supp}(g_i) \subset U\).

2.2 Urysohn's Lemma for locally compact Hausdorff space. Suppose \(X\) is locally compact and Hausdorff, \(V\) is open in \(X\) and \(K \subset V\) is a compact set. Then there exists an \(f \in C_c(X)\) such that \[ K \prec f \prec V. \] 2.3 Remarks. By \(f \in C_c(X)\) we shall mean \(f\) is a continuous function with a compact support. This relation also says that \(\chi_K \leq f \leq \chi_V\). For more details and the proof, visit this page. This lemma is generally for normal space, for a proof on that level, see arXiv:1910.10381. (Question: why we consider two disjoint closed subsets thereafter?)

The \(\varepsilon\)-definitions of \(\sup\) and \(\inf\)

We will be using the \(\varepsilon\)-definitions of \(\sup\) and \(\inf\), which will makes the proof easier in this case, but if you don't know it would be troublesome. So we need to put it down here.

Let \(S\) be a nonempty subset of the real numbers that is bounded below. The lower bound \(w\) is to be the infimum of \(S\) if and only if for any \(\varepsilon>0\), there exists an element \(x_\varepsilon \in S\) such that \(x_\varepsilon<w+\varepsilon\).

This definition of \(\inf\) is equivalent to the if-then definition by

Let \(S\) be a set that is bounded below. We say \(w=\inf S\) when \(w\) satisfies the following condition.

  1. \(w\) is a lower bound of \(S\).
  2. If \(t\) is also a lower bound of \(S\), then \(t \leq s\).

We have the analogous definition for \(\sup\).

The main theorem

Analysis is full of vector spaces and linear transformations. We already know that the Lebesgue integral induces a linear functional. That is, for example, \(L^1([0,1])\) is a vector space, and we have a linear functional by \[ f \mapsto \int_0^1 f(x)dx. \] But what about the reverse? Given a linear functional, is it guaranteed that we have a measure to establish the integral? The R-M-K theorem answers this question affirmatively. The functional to be discussed is positive, which means that if \(\Lambda\) is positive and \(f(X) \subset [0,\infty)\), then \(\Lambda{f} \in [0,\infty)\).

Let \(X\) be a locally compact Hausdorff space, and let \(\Lambda\) be a positive linear functional on \(C_c(X)\). Then there exists a \(\sigma\)-algebra \(\mathfrak{M}\) on \(X\) which contains all Borel sets in \(X\), and there exists a unique positive measure \(\mu\) on \(\mathfrak{M}\) which represents \(\Lambda\) in the sense that \[ \Lambda{f}=\int_X fd\mu \] for all \(f \in C_c(X)\).

For the measure \(\mu\) and the \(\sigma\)-algebra \(\mathfrak{M}\), we have four assertions:

  1. \(\mu(K)<\infty\) for every compact set \(K \subset X\).
  2. For every \(E \in \mathfrak{M}\), we have

\[ \mu(E)=\{\mu(V):E \subset V, V\text{ open}\}. \]

  1. For every open set \(E\) and every \(E \in \mathfrak{M}\), we have

\[ \mu(E)=\sup\{\mu(K):K \subset E, K\text{ compact}\}. \]

  1. If \(E \in \mathfrak{M}\), \(A \subset E\), and \(\mu(E)=0\), then \(A \in \mathfrak{M}\).

Remarks before proof. It would be great if we can establish the Lebesgue measure \(m\) by putting \(X=\mathbb{R}^n\). But we need a little more extra work to get this result naturally. If 2 is satisfied, we say \(\mu\) is outer regular, and inner regular for 3. If both hold, we say \(\mu\) is regular. The partition of unity and Urysohn's lemma will be heavily used in the proof of the main theorem, so make sure you have no problem with it.

Proving the theorem

The proof is rather long so we will split it into several steps. I will try my best to make every line clear enough.

Step 0 - Construction of \(\mu\) and \(\mathfrak{M}\)

For every open set \(V \in X\), define \[ \mu(V)=\sup\{\Lambda{f}:f \prec V\}. \]

If \(V_1 \subset V_2\) and both are open, we claim that \(\mu(V_1) \leq \mu(V_2)\). For \(f \prec V_1\), since \(\operatorname{supp}f \subset V_1 \subset V_2\), we see \(f \prec V_2\). But we are able to find some \(g \prec V_2\) such that \(g \geq f\), or more precisely, \(\operatorname{supp}(g) \supset \operatorname{supp}(f)\). By taking another look at the proof of Urysohn's lemma for locally compact Hausdorff space, we see there is an open set G with compact closure such that \[ \operatorname{supp}(f) \subset G \subset \overline{G} \subset V_2. \] By Urysohn's lemma to the pair \((\overline{G},V_2)\), we see there exists a function \(g \in C_c(X)\) such that \[ \overline{G} \prec g \prec V_2. \] Therefore \[ \operatorname{supp}(f) \subset \overline{G} \subset \operatorname{supp}(g). \] Thus for any \(f \prec V_1\) and \(g \prec V_2\), we have \(\Lambda{g} \geq \Lambda{f}\) (monotonic) since \(\Lambda{g}-\Lambda{f}=\Lambda{(g-f)}\geq 0\). By taking the supremum over \(f\) and \(g\), we see \[ \mu(V_1) \leq \mu(V_2). \] The 'monotonic' property of such \(\mu\) enables us to define \(\mu(E)\) for all \(E \subset X\) by \[ \mu(E)=\inf \{\mu(V):E \subset V, V\text{ open}\}. \] The definition above is trivial to valid for open sets. Sometimes people say \(\mu\) is the outer measure. We will discuss other kind of sets thoroughly in the following steps. Warning: we are not saying that \(\mathfrak{M} = 2^X\). The crucial property of \(\mu\), namely countable additivity, will be proved only on a certain \(\sigma\)-algebra.

It follows from the definition of \(\mu\) that if \(E_1 \subset E_2\), then \(\mu(E_1) \leq \mu(E_2)\).

Let \(\mathfrak{M}_F\) be the class of all \(E \subset X\) which satisfy the two following conditions:

  1. \(\mu(E) <\infty\).

  2. 'Inner regular': \[ \mu(E)=\sup\{\mu(K):K \subset E, K\text{ compact}\}. \]

One may say here \(\mu\) is the 'inner measure'. Finally, let \(\mathfrak{M}\) be the class of all \(E \subset X\) such that for every compact \(K\), we have \(E \cap K \in \mathfrak{M}_F\). We shall show that \(\mathfrak{M}\) is the desired \(\sigma\)-algebra.


Remarks of Step 0. So far, we have only proved that \(\mu(E) \geq 0\) for all \(E {\color\red{\subset}}X\). What about the countable additivity? It's clear that \(\mathfrak{M}_F\) and \(\mathfrak{M}\) has some strong relation. We need to get a clearer view of it. Also, if we restrict \(\mu\) to \(\mathfrak{M}_F\), we restrict ourself to finite numbers. In fact, we will show finally \(\mathfrak{M}_F \subset \mathfrak{M}\).

Step 1 - The 'measure' of compact sets (outer)

If \(K\) is compact, then \(K \in \mathfrak{M}_F\), and \[ \mu(K)=\inf\{\Lambda{f}:K \prec f\}<\infty \]

Define \(V_\alpha=f^{-1}(\alpha,1]\) for \(K \prec f\) and \(0 < \alpha < 1\). Since \(f(x)=1\) for all \(x \in K\), we have \(K \subset V_{\alpha}\). Therefore by definition of \(\mu\) for all \(E \subset X\), we have \[ \mu(K) \leq \mu(V_\alpha)=\sup\{\Lambda{g}:g \prec V_{\alpha}\} < \frac{1}{\alpha}\Lambda{f}. \] Note that \(f \geq \alpha{g}\) whenever \(g \prec V_{\alpha}\) since \(\alpha{g} \leq \alpha < f\). Since \(\mu(K)\) is an lower bound of \(\frac{1}{\alpha}\Lambda{f}\) with \(0<\alpha<1\), we see \[ \mu(K) \leq \inf_{\alpha \in (0,1)}\{\frac{1}{\alpha}\Lambda{f}\}=\Lambda{f}. \] Since \(f(X) \in [0,1]\), we have \(\Lambda{f}\) to be finite. Namely \(\mu(K) <\infty\). Since \(K\) itself is compact, we see \(K \in \mathfrak{M}_F\).

To prove the identity, note that there exists some \(V \supset K\) such that \(\mu(V)<\mu(K)+\varepsilon\) for some \(\varepsilon>0\). By Urysohn's lemma, there exists some \(h \in C_c(X)\) such that \(K \prec h \prec V\). Therefore \[ \Lambda{h} \leq \mu(V) < \mu(K)+\varepsilon \] Therefore \(\mu(K)\) is the infimum of \(\Lambda{h}\) with \(K \prec h\).


Remarks of Step 1. We have just proved assertion 1 of the property of \(\mu\). The hardest part of this proof is the inequality \[ \mu(V)<\mu(K)+\varepsilon. \] But this is merely the \(\varepsilon\)-definition of \(\inf\). Note that \(\mu(K)\) is the infimum of \(\mu(V)\) with \(V \supset K\). For any \(\varepsilon>0\), there exists some open \(V\) for what? Under certain conditions, this definition is much easier to use. Now we will examine the relation between \(\mathfrak{M}_F\) and \(\tau_X\), namely the topology of \(X\).

Step 2 - The 'measure' of open sets (inner)

\(\mathfrak{M}_F\) contains every open set \(V\) with \(\mu(V)<\infty\).

It suffices to show that for open set \(V\), we have \[ \mu(V)=\sup\{\mu(K):K \subset E, K\text{ compact}\}. \] For \(0<\varepsilon<\mu(V)\), we see there exists an \(f \prec V\) such that \(\Lambda{f}>\mu(V)-\varepsilon\). If \(W\) is any open set which contains \(K= \operatorname{supp}(f)\), then \(f \prec W\), and therefore \(\Lambda{f} \leq \mu(W)\). Again by definition of \(\mu(K)\), we see \[ \Lambda{f}\leq\mu(K). \] Therefore \[ \mu(V)-\varepsilon<\Lambda{f}\leq\mu(K)\leq\mu(V). \] This is exactly the definition of \(\sup\). The identity is proved.


Remarks of Step 2. It's important to that this identity can only be satisfied by open sets and sets \(E\) with \(\mu(E)<\infty\), the latter of which will be proved in the following steps. This is the flaw of this theorem. With these preparations however, we are able to show the countable additivity of \(\mu\) on \(\mathfrak{M}_F\).

Step 3 - The subadditivity of \(\mu\) on \(2^X\)

If \(E_1,E_2,E_3,\cdots\) are arbitrary subsets of \(X\), then \[ \mu\left(\bigcup_{k=1}^{\infty}E_k\right) \leq \sum_{k=1}^{\infty}\mu(E_k) \]

First we show this holds for finitely many open sets. This is tantamount to show that \[ \mu(V_1 \cup V_2)\leq \mu(V_1)+\mu(V_2) \] if \(V_1\) and \(V_2\) are open. Pick \(g \prec V_1 \cup V_2\). This is possible due to Urysohn's lemma. By corollary 1.3, there is a partition of unity \((h_1,h_2)\) subordinate to \((V_1,V_2)\) in the sense of corollary 1.3. Therefore, \[ \begin{aligned} \Lambda(g)&=\Lambda((h_1+h_2)g) \\ &=\Lambda(h_1g)+\Lambda(h_2g) \\ &\leq\mu(V_1)+\mu(V_2). \end{aligned} \] Notice that \(h_1g \prec V_1\) and \(h_2g \prec V_2\). By taking the supremum, we have \[ \mu(V_1 \cup V_2)\leq \mu(V_1)+\mu(V_2). \]

Now we back to arbitrary subsets of \(X\). If \(\mu(E_i)=\infty\) for some \(i\), then there is nothing to prove. Therefore we shall assume that \(\mu(E_i)<\infty\) for all \(i\). By definition of \(\mu(E_i)\), we see there are open sets \(V_i \supset E_i\) such that \[ \mu(V_i)<\mu(E_i)+\frac{\varepsilon}{2^i}. \] Put \(V=\bigcup_{i=1}^{\infty}V_i\), and choose \(f \prec V_i\). Since \(f \in C_c(X)\), there is a finite collection of \(V_i\) that covers the support of \(f\). Therefore without loss of generality, we may say that \[ f \prec V_1 \cup V_2 \cup \cdots \cup V_n \] for some \(n\). We therefore obtain \[ \begin{aligned} \Lambda{f} &\leq \mu(V_1 \cup V_2 \cup \cdots \cup V_n) \\ &\leq \mu(V_1)+\mu(V_2)+\cdots+\mu(V_n) \\ &\leq \sum_{i=1}^{n}\left(\mu(E_i)+\frac{\varepsilon}{2^i}\right) \\ &\leq \sum_{i=1}^{\infty}\mu(E_i)+\varepsilon, \end{aligned} \] for all \(f \prec V\). Since \(\bigcup E_i \subset V\), we have \(\mu(\bigcup E_i) \leq \mu(V)\). Therefore \[ \mu(\bigcup_{i=1}^{\infty}E_i)\leq\mu(V)=\sup\{\Lambda{f}\}\leq\sum_{i=1}^{\infty}\mu(E_i)+\varepsilon. \] Since \(\varepsilon\) is arbitrary, the inequality is proved.


Remarks of Step 3. Again, we are using the \(\varepsilon\)-definition of \(\inf\). One may say this step showed the subaddtivity of the outer measure. Also note the geometric series by \(\sum_{k=1}^{\infty}\frac{\varepsilon}{2^k}=\varepsilon\).

Step 4 - Additivity of \(\mu\) on \(\mathfrak{M}_F\)

Suppose \(E=\bigcup_{i=1}^{\infty}E_i\), where \(E_1,E_2,\cdots\) are pairwise disjoint members of \(\mathfrak{M}_F\), then \[ \mu(E)=\sum_{i=1}^{\infty}\mu(E_i). \] If \(\mu(E)<\infty\), we also have \(E \in \mathfrak{M}_F\).

As a dual to Step 3, we firstly show this holds for finitely many compact sets. As proved in Step 1, compact sets are in \(\mathfrak{M}_F\). Suppose now \(K_1\) and \(K_2\) are disjoint compact sets. We want to show that \[ \mu(K_1 \cup K_2)=\mu(K_1)+\mu(K_2). \] Note that compact sets in a Hausdorff space is closed. Therefore we are able to apply Urysohn's lemma to the pair \((K_1,K_2^c)\). That said, there exists a \(f \in C_c(X)\) such that \[ K_1 \prec f \prec K_2^c. \] In other words, \(f(x)=1\) for all \(x \in K_1\) and \(f(x)=0\) for all \(x \in K_2\), since \(\operatorname{supp}(f) \cap K_2 = \varnothing\). By Step 1, since \(K_1 \cup K_2\) is compact, there exists some \(g \in C_c(X)\) such that \[ K_1 \cup K_2 \prec g \quad \text{and} \quad \Lambda(g) < \mu(K_1 \cup K_2)+\varepsilon. \] Now things become tricky. We are able to write \(g\) by \[ g=fg+(1-f)g. \] But \(K_1 \prec fg\) and \(K_2 \prec (1-f)g\) by the properties of \(f\) and \(g\). Also since \(\Lambda\) is linear, we have \[ \mu(K_1)+\mu(K_2) \leq \Lambda(fg)+\Lambda((1-f)g)=\Lambda(g) < \mu(K_1 \cup K_2)+\varepsilon. \] Therefore we have \[ \mu(K_1)+\mu(K_2) \leq \mu(K_1 \cup K_2). \] On the other hand, by Step 3, we have \[ \mu(K_1 \cup K_2) \leq \mu(K_1)+\mu(K_2). \] Therefore they must equal.

If \(\mu(E)=\infty\), there is nothing to prove. So now we should assume that \(\mu(E)<\infty\). Since \(E_i \in \mathfrak{M}_F\), there are compact sets \(K_i \subset E_i\) with \[ \mu(K_i) > \mu(E_i)-\frac{\varepsilon}{2^i}. \] Putting \(H_n=K_1 \cup K_2 \cup \cdots \cup K_n\), we see \(E \supset H_n\) and \[ \mu(E) \geq \mu(H_n)=\sum_{i=1}^{n}\mu(H_i)>\sum_{i=1}^{n}\mu(E_i)-\varepsilon. \] This inequality holds for all \(n\) and \(\varepsilon\), therefore \[ \mu(E) \geq \sum_{i=1}^{\infty}\mu(E_i). \] Therefore by Step 3, the identity holds.

Finally we shall show that \(E \in \mathfrak{M}_F\) if \(\mu(E) <\infty\). To make it more understandable, we will use elementary calculus notation. If we write \(\mu(E)=x\) and \(x_n=\sum_{i=1}^{n}\mu(E_i)\), we see \[ \lim_{n \to \infty}x_n=x. \] Therefore, for any \(\varepsilon>0\), there exists some \(N \in \mathbb{N}\) such that \[ x-x_N<\varepsilon. \] This is tantamount to \[ \mu(E)<\sum_{i=1}^{N}\mu(E_i)+\varepsilon. \] But by definition of the compact set \(H_N\) above, we see \[ \mu(E)<{\color\red{\sum_{i=1}^{N}\mu(E_i)}}+\varepsilon<{\color\red {\mu(H_N)+\varepsilon}}+\varepsilon=\mu(H_N)+2\varepsilon. \] Hence \(E\) satisfies the requirements of \(\mathfrak{M}_F\), thus an element of it.


Remarks of Step 4. You should realize that we are heavily using the \(\varepsilon\)-definition of \(\sup\) and \(\inf\). As you may guess, \(\mathfrak{M}_F\) should be a subset of \(\mathfrak{M}\) though we don't know whether it is a \(\sigma\)-algebra or not. In other words, we hope that the countable additivity of \(\mu\) holds on a \(\sigma\)-algebra that is properly extended from \(\mathfrak{M}_F\). However it's still difficult to show that \(\mathfrak{M}\) is a \(\sigma\)-algebra. We need more properties of \(\mathfrak{M}_F\) to go on.

Step 5 - The 'continuity' of \(\mathfrak{M}_F\).

If \(E \in \mathfrak{M}_F\) and \(\varepsilon>0\), there is a compact \(K\) and an open \(V\) such that \(K \subset E \subset V\) and \(\mu(V-K)<\varepsilon\).

There are two ways to write \(\mu(E)\), namely \[ \mu(E)=\sup\{\mu(K):K \subset E\} \quad \text{and} \quad \mu(E)=\inf\{\mu(V):V\supset E\} \] where \(K\) is compact and \(V\) is open. Therefore there exists some \(K\) and \(V\) such that \[ \mu(V)-\frac{\varepsilon}{2}<\mu(E)<\mu(K)+\frac{\varepsilon}{2}. \] Since \(V-K\) is open, and \(\mu(V-K)<\infty\), we have \(V-K \in \mathfrak{M}_F\). By Step 4, we have \[ \mu(K)+\mu(V-K)=\mu(V) <\mu(K)+\varepsilon. \] Therefore \(\mu(V-K)<\varepsilon\) as proved.


Remarks of Step 5. You should be familiar with the \(\varepsilon\)-definitions of \(\sup\) and \(\inf\) now. Since \(V-K =V\cap K^c \subset V\), we have \(\mu(V-K)\leq\mu(V)<\mu(E)+\frac{\varepsilon}{2}<\infty\).

Step 6 - \(\mathfrak{M}_F\) is closed under certain operations

If \(A,B \in \mathfrak{M}_F\), then \(A-B,A\cup B\) and \(A \cap B\) are elements of \(\mathfrak{M}_F\).

This shows that \(\mathfrak{M}_F\) is closed under union, intersection and relative complement. In fact, we merely need to prove \(A-B \in \mathfrak{M}_F\), since \(A \cup B=(A-B) \cup B\) and \(A\cap B = A-(A-B)\).

By Step 5, for \(\varepsilon>0\), there are sets \(K_A\), \(K_B\), \(V_A\), \(V_B\) such that \(K_A \subset A \subset V_A\), \(K_B \subset B \subset V_B\), and for \(A-B\) we have \[ A-B \subset V_A-K_B \subset (V_A-K_A) \cup (K_A-V_B) \cup (V_B-K_B). \] With an application of Step 3 and 5, we have \[ \mu(A-B) \leq \mu(V_A-K_A)+\mu(K_A-V_B)+\mu(V_B-K_B)< \varepsilon+\mu(K_A-V_B)+\varepsilon. \] Since \(K_A-V_B\) is a closed subset of \(K_A\), we see \(K_A-V_B\) is compact as well (a closed subset of a compact set is compact). But \(K_A-V_B \subset A-B\), and \(\mu(A-B) <\mu(K_A-V_B)+2\varepsilon\), we see \(A-B\) meet the requirement of \(\mathfrak{M}_F\) (, the fact that \(\mu(A-B)<\infty\) is trivial since \(\mu(A-B)<\mu(A)\)).

Since \(A-B\) and \(B\) are pairwise disjoint members of \(\mathfrak{M}_F\), we see \[ \mu(A \cup B)=\mu(A-B)+\mu(B)<\infty. \] Thus \(A \cup B \in \mathfrak{M}_F\). Since \(A,A-B \in \mathfrak{M}_F\), we see \(A \cap B = A-(A-B) \in \mathfrak{M}_F\).


Remarks of Step 6. In this step, we demonstrated several ways to express a set, all of which end up with a huge simplification. Now we are able to show that \(\mathfrak{M}_F\) is a subset of \(\mathfrak{M}\).

Step 7 - \(\mathfrak{M}_F \subset \mathfrak{M}\)

There is a precise relation between \(\mathfrak{M}\) and \(\mathfrak{M}_F\) by \[ \mathfrak{M}_F=\{E \in \mathfrak{M}:\mu(E)<\infty\} \subset \mathfrak{M}. \]

If \(E \in \mathfrak{M}_F\), we shall show that \(E \in \mathfrak{M}\). For compact \(K\in\mathfrak{M}_F\) (Step 1), by Step 6, we see \(K \cap E \in \mathfrak{M}_F\), therefore \(E \in \mathfrak{M}\).

If \(E \in \mathfrak{M}\) with \(\mu(E)<\infty\) however, we need to show that \(E \in \mathfrak{M}_F\). By definition of \(\mu\), for \(\varepsilon>0\), there is an open \(V\) such that \[ \mu(V)<\mu(E)+\varepsilon<\infty. \] Therefore \(V \in \mathfrak{M}_F\). By Step 5, there is a compact set \(K\) such that \(\mu(V-K)<\varepsilon\) (the open set containing \(V\) should be \(V\) itself). Since \(E \cap K \in \mathfrak{M}_F\), there exists a compact set \(H \subset E \cap K\) with \[ \mu(E \cap K)<\mu(H)+\varepsilon. \] Since \(E \subset (E \cap K) \cup (V-K)\), it follows from Step 1 that \[ \mu(E) \leq {\color\red{\mu(E\cap K)}}+\mu(V-K)<{\color\red{\mu(H)+\varepsilon}}+\varepsilon=\mu(H)+2\varepsilon. \] Therefore \(E \in \mathfrak{M}_F\).


Remarks of Step 7. Several tricks in the preceding steps are used here. Now we are pretty close to the fact that \((X,\mathfrak{M},\mu)\) is a measure space. Note that for \(E \in \mathfrak{M}-\mathfrak{M}_F\), we have \(\mu(E)=\infty\), but we have already proved the countable additivity for \(\mathfrak{M}_F\). Is it 'almost trivial' for \(\mathfrak{M}\)? Before that, we need to show that \(\mathfrak{M}\) is a \(\sigma\)-algebra. Note that assertion 3 of \(\mu\) has been proved.

Step 8 - \(\mathfrak{M}\) is a \(\sigma\)-algebra in \(X\) containing all Borel sets

We will validate the definition of \(\sigma\)-algebra one by one.

\(X \in \mathfrak{M}\).

For any compact \(K \subset X\), we have \(K \cap X=K\). But as proved in Step 1, \(K \in \mathfrak{M}_F\), therefore \(X \in \mathfrak{M}\).

If \(A \in \mathfrak{M}\), then \(A^c \in\mathfrak{M}\).

If \(A \in \mathfrak{M}\), then \(A \cap K \in \mathfrak{M}_F\). But \[ K-(A \cap K)=K \cap(A^c \cup K^c)=K\cap A^c \cup \varnothing=K \cap A^c. \] By Step 1 and Step 6, we see \(K \cap A^c \in \mathfrak{M}_F\), thus \(A^c \in \mathfrak{M}\).

If \(A_n \in \mathfrak{M}\) for all \(n \in \mathbb{N}\), then \(A=\bigcup_{n=1}^{\infty}A_n \in \mathfrak{M}\).

We assign an auxiliary sequence of sets inductively. For \(n=1\), we write \(B_1=A_1 \cap K\) where \(K\) is compact. Then \(B_1 \in \mathfrak{M}_F\). For \(n \geq 2\), we write \[ B_n=(A_n \cap K)-(B_1 \cup \cdots\cup B_{n-1}). \] Since \(A_n \cap K \in \mathfrak{M}_F\), \(B_1,B_2,\cdots,B_{n-1} \in \mathfrak{M}_F\), by Step 6, \(B_n \in \mathfrak{M}_F\). Also \(B_n\) is pairwise disjoint.

Another set-theoretic manipulation shows that \[ \begin{aligned} A \cap K&=K \cap\left(\bigcup_{n=1}^{\infty}A_n\right) \\ &=\bigcup_{n=1}^{\infty}(K \cap A_n) \\ &=\bigcup_{n=1}^{\infty}B_n \cup(B_1 \cup \cdots\cup B_{n-1}) \\ &=\bigcup_{n=1}^{\infty}B_n. \end{aligned} \] Now we are able to evaluate \(\mu(A \cap K)\) by Step 4. \[ \begin{aligned} \mu(A \cap K)&=\sum_{n=1}^{\infty}\mu(B_n) \\ &= \lim_{n \to \infty}(A_n \cap K) <\infty. \end{aligned} \] Therefore \(A \cap K \in \mathfrak{M}_F\), which implies that \(A \in \mathfrak{M}\).

\(\mathfrak{M}\) contains all Borel sets.

Indeed, it suffices to prove that \(\mathfrak{M}\) contains all open sets and/or closed sets. We'll show two different paths. Let \(K\) be a compact set.

  1. If \(C\) is closed, then \(C \cap K\) is compact, therefore \(C\) is an element of \(\mathfrak{M}_F\). (By Step 2.)
  2. If \(D\) is open, then \(D \cap K \subset K\). Therefore \(\mu(D \cap K) \leq \mu(K)<\infty\), which shows that \(D\) is an element of \(\mathfrak{M}_F\). (By Step 7.)

Therefore by 1 or 2, \(\mathfrak{M}\) contains all Borel sets.

Step 9 - \(\mu\) is a positive measure on \(\mathfrak{M}_F\)

Again, we will verify all properties of \(\mu\) one by one.

\(\mu(E) \geq 0\) for all \(E \in \mathfrak{M}\).

This follows immediately from the definition of \(\mu\), since \(\Lambda\) is positive and \(0 \leq f \leq 1\).

\(\mu\) is countably additive.

If \(A_1,A_2,\cdots\) form a disjoint countable collection of members of \(\mathfrak{M}\), we need to show that \[ \mu\left(\bigcup_{n=1}^{\infty}A_n\right)=\sum_{n=1}^{\infty}\mu(A_n). \] If \(A_n \in \mathfrak{M}_F\) for all \(n\), then this is merely what we have just proved in Step 4. If \(A_j \in \mathfrak{M}-\mathfrak{M}_F\) however, we have \(\mu(A_j)=\infty\). So \(\sum_n\mu(A_n)=\infty\). For \(\mu(\cup_n A_n)\), notice that \(\cup_n A_n \supset A_j\), we have \(\mu(\cup_n A_n) \geq \mu(A_j)=\infty\). The identity is now proved.

Step 10 - The completeness of \(\mu\)

So far assertion 1-3 have been proved. But the final assertion has not been proved explicitly. We do that since this property will be used when discussing the Lebesgue measure \(m\). In fact, this will show that \((X,\mathfrak{M},\mu)\) is a complete measure space.

If \(E \in \mathfrak{M}\), \(A \subset E\), and \(\mu(E)=0\), then \(A \in \mathfrak{M}\).

It suffices to show that \(A \in \mathfrak{M}_F\). By definition, \(\mu(A)=0\) as well. If \(K \subset A\), where \(K\) is compact, then \(\mu(K)=\mu(A)=0\). Therefore \(0\) is the supremum of \(\mu(K)\). It follows that \(A \in \mathfrak{M}_F \subset \mathfrak{M}\).

Step 11 - The functional and the measure

For every \(f \in C_c(X)\), \(\Lambda{f}=\int_X fd\mu\).

This is the absolute main result of the theorem. It suffices to prove the inequality \[ \Lambda f \leq \int_X fd\mu \] for all \(f \in C_c(X)\). What about the other side? By the linearity of \(\Lambda\) and \(\int_X \cdot d\mu\), once inequality above proved, we have \[ \Lambda(-f)=-\Lambda{f}\leq\int_{X}-fd\mu=-\int_Xfd\mu. \] Therefore \[ \Lambda{f} \geq \int_X fd\mu \] holds as well, and this establish the equality.

Notice that since \(K=\operatorname{supp}(f)\) is compact, we see the range of \(f\) has to be compact. Namely we may assume that \([a,b]\) contains the range of \(f\). For \(\varepsilon>0\), we are able to pick a partition around \([a,b]\) such that \(y_n - y_{n-1}<\varepsilon\) and \[ y_0 < a < y_1<\cdots<y_n=b. \] Put \[ E_i=\{x:y_{i-1}< f(x) \leq y_i\}\cap K. \] Since \(f\) is continuous, \(f\) is Borel measurable. The sets \(E_i\) are trivially pairwise disjoint Borel sets. Again, there are open sets \(V_i \supset E_i\) such that \[ \mu(V_i) < \mu(E_i)+\frac{\varepsilon}{n} \] for \(i=1,2,\cdots,n\), and such that \(f(x)<y_i + \varepsilon\) for all \(x \in V_i\). Notice that \((V_i)\) covers \(K\), therefore by the partition of unity, there are a sequence of functions \((h_i)\) such that \(h_i \prec V_i\) for all \(i\) and \(\sum h_i=1\) on \(K\). By Step 1 and the fact that \(f=\sum_i h_i\), we see \[ \mu(K) \leq \Lambda(\sum_i h_i)=\sum_i \Lambda{h_i}. \] By the way we picked \(V_i\), we see \(h_if \leq (y_i+\varepsilon)h_i\). We have the following inequality: \[ \begin{aligned} \Lambda{f} &= \sum_{i=1}^{n}\Lambda(h_if) \leq\sum_{i=1}^{n}(y_i+\varepsilon)\Lambda{h_i} \\ &= \sum_{i=1}^{n}\left(|a|-|a|+y_i+\varepsilon\right)\Lambda{h_i} \\ &=\sum_{i=1}^{n}(|a|+y_i+\varepsilon)\Lambda{h_i}-|a|\sum_{i=1}^{n}\Lambda{h_i}. \end{aligned} \] Since \(h_i \prec V_i\), we have \(\mu(E_i)+\frac{\varepsilon}{n}>\mu(V_i) \geq \Lambda{h_i}\). And we already get \(\sum_i \Lambda{h_i} \geq \mu(K)\). If we put them into the inequality above, we get \[ \begin{aligned} \Lambda{f} &\leq \sum_{i=1}^{n}(|a|+y_i+\varepsilon)\Lambda{h_i}-|a|\sum_{i=1}^{n}\Lambda{h_i} \\ &\leq \sum_{i=1}^{n}(|a|+y_i+\varepsilon){\color\red{(\mu(E_i)+\frac{\varepsilon}{n})}}-|a|\color\red{\mu(K)}. \end{aligned} \] Observe that \(\cup_i E_i=K\), by Step 9 we have \(\sum_{i}\mu(E_i)=\mu(K)\). A slight manipulation shows that \[ \begin{aligned} \sum_{i=1}^{n}(|a|+y_i+\varepsilon)\mu(E_i)-|a|\mu(K)&=|a|\sum_{i=1}^{n}\mu(E_i)-|a|\mu(K)+\sum_{i=1}^{n}(y_i+\varepsilon)\mu(E_i) \\ &=\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)+2\varepsilon\mu(K). \end{aligned} \] Therefore for \(\Lambda f\) we get \[ \begin{aligned} \Lambda{f} &\leq\sum_{i=1}^{n}(|a|+y_i+\varepsilon)(\mu(E_i)+\frac{\varepsilon}{n})-|a|\mu(K) \\ &=\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)+2\varepsilon\mu(K)+\frac{\varepsilon}{n}\sum_{i=1}^n(|a|+y_i+\varepsilon). \end{aligned} \] Now here comes the trickiest part of the whole blog post. By definition of \(E_i\), we see \(f(x) > y_{i-1}>y_{i}-\varepsilon\) for \(x \in E_i\). Therefore we get simple function \(s_n\) by \[ s_n=\sum_{i=1}^{n}(y_i-\varepsilon)\chi_{E_i}. \] If we evaluate the Lebesgue integral of \(f\) with respect to \(\mu\), we see \[ \int_X s_nd\mu={\color\red{\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)}} \leq {\color\red{\int_X fd\mu}}. \] For \(2\varepsilon\mu(K)\), things are simple since \(0\leq\mu(K)<\infty\). Therefore \(2\varepsilon\mu(K) \to 0\) as \(\varepsilon \to 0\). Now let's estimate the final part of the inequality. It's trivial that \(\frac{\varepsilon}{n}\sum_{i=1}^{n}(|a|+\varepsilon)=\varepsilon(\varepsilon+|a|)\). For \(y_i\), observe that \(y_i \leq b\) for all \(i\), therefore \(\frac{\varepsilon}{n}\sum_{i=1}^{n}y_i \leq \frac{\varepsilon}{n}nb=\varepsilon b\). Thus \[ {\color\green{\frac{\varepsilon}{n}\sum_{i=1}^{n}(|a|+y_i+\varepsilon)}} \color\black\leq {\color\green {\varepsilon(|a|+b+\varepsilon)}}\color\black{.} \] Notice that \(b+|a| \geq 0\) since \(b \geq a \geq -|a|\). Our estimation of \(\Lambda{f}\) is finally done: \[ \begin{aligned} \Lambda{f} &\leq{\color\red{\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)}}+2\varepsilon\mu(K)+{\color\green{\frac{\varepsilon}{n}\sum_{i=1}^n(|a|+y_i+\varepsilon)}} \\ &\leq{\color\red {\int_Xfd\mu}}+2\varepsilon\mu(K)+{\color\green{\varepsilon(|a|+b+\varepsilon)}} \\ &= \int_X fd\mu+\varepsilon(2\mu(K)+|a|+b+\varepsilon). \end{aligned} \] Since \(\varepsilon\) is arbitrary, we see \(\Lambda{f} \leq \int_X fd\mu\). The identity is proved.

Step 12 - The uniqueness of \(\mu\)

If there are two measures \(\mu_1\) and \(\mu_2\) that satisfy assertion 1 to 4 and are correspond to \(\Lambda\), then \(\mu_1=\mu_2\).

In fact, according to assertion 2 and 3, \(\mu\) is determined by the values on compact subsets of \(X\). It suffices to show that

If \(K\) is a compact subset of \(X\), then \(\mu_1(K)=\mu_2(K)\).

Fix \(K\) compact and \(\varepsilon>0\). By Step 1, there exists an open \(V \supset K\) such that \(\mu_2(V)<\mu_2(K)+\varepsilon\). By Urysohn's lemma, there exists some \(f\) such that \(K \prec f \prec V\). Hence \[ \mu_1(K)=\int_X\chi_kd\mu \leq\int_X fd\mu=\Lambda{f}=\int_X fd\mu_2 \\ \leq \int_X \chi_V fd\mu_2=\mu_2(V)<\mu_2(V)+\varepsilon. \] Thus \(\mu_1(K) \leq \mu_2(K)\). If \(\mu_1\) and \(\mu_2\) are exchanged, we see \(\mu_2(K) \leq \mu_1(K)\). The uniqueness is proved.

The flaw

Can we simply put \(X=\mathbb{R}^k\) right now? The answer is no. Note that the outer regularity is for all sets but inner is only for open sets and members of \(\mathfrak{M}_F\). But we expect the outer and inner regularity to be 'symmetric'. There is an example showing that locally compact is far from being enough to offer the 'symmetry'.

A weird example

Define \(X=\mathbb{R}_1 \times \mathbb{R}_2\), where \(\mathbb{R}_1\) is the real line equipped with discrete metric \(d_1\), and \(\mathbb{R}_2\) is the real line equipped with euclidean metric \(d_2\). The metric of \(X\) is defined by \[ d_X((x_1,y_1),(x_2,y_2))=d_1(x_1,x_2)+d_2(x_1,x_2). \] The topology \(\tau_X\) induced by \(d_X\) is naturally Hausdorff and locally compact by considering the vertical segments. So what would happen to this weird locally compact Hausdorff space?

If \(f \in C_c(X)\), let \(x_1,x_2,\cdots,x_n\) be those values of \(x\) for which \(f(x,y) \neq 0\) for at least one \(y\). Since \(f\) has compact support, it is ensured that there are only finitely many \(x_i\)'s. We are able to define a positive linear functional by \[ \Lambda f=\sum_{i=1}^{n}\int_{-\infty}^{+\infty}f(x_i,y)dy=\int_X fd\mu, \] where \(\mu\) is the measure associated with \(\Lambda\) in the sense of R-M-K theorem. Let \[ E=\mathbb{R}_1 \times \{0\}. \] By squeezing the disjoint vertical segments around \((x_i,0)\), we see \(\mu(K)=0\) for all compact \(K \subset E\) but \(\mu(E)=\infty\).

This is in violent contrast to what we do expect. However, if \(X\) is required to be \(\sigma\)-compact (note that the space in this example is not), this kind of problems disappear neatly.

References / Further reading

  1. Walter Rudin, Real and Complex Analysis
  2. Serge Lang, Fundamentals of Differential Geometry
  3. Joel W. Robbin, Partition of Unity
  4. Brian Conrad, Paracompactness and local compactness
  5. Raoul Bott & Loring W. Tu, Differential Forms in Algebraic Topology

The Big Three Pt. 4 - The Open Mapping Theorem (F-Space)

The Open Mapping Theorem

We are finally going to prove the open mapping theorem in \(F\)-space. In this version, only metric and completeness are required. Therefore it contains the Banach space version naturally.

(Theorem 0) Suppose we have the following conditions:

  1. \(X\) is a \(F\)-space,
  2. \(Y\) is a topological space,
  3. \(\Lambda: X \to Y\) is continuous and linear, and
  4. \(\Lambda(X)\) is of the second category in \(Y\).

Then \(\Lambda\) is an open mapping.

Proof. Let \(B\) be a neighborhood of \(0\) in \(X\). Let \(d\) be an invariant metric on \(X\) that is compatible with the \(F\)-topology of \(X\). Define a sequence of balls by \[ B_n=\{x:d(x,0) < \frac{r}{2^n}\} \] where \(r\) is picked in such a way that \(B_0 \subset B\). To show that \(\Lambda\) is an open mapping, we need to prove that there exists some neighborhood \(W\) of \(0\) in \(Y\) such that \[ W \subset \Lambda(B). \] To do this however, we need an auxiliary set. In fact, we will show that there exists some \(W\) such that \[ W \subset \overline{\Lambda(B_1)} \subset \Lambda(B). \] We need to prove the inclusions one by one.


The first inclusion requires BCT. Since \(B_2 -B_2 \subset B_1\), and \(Y\) is a topological space, we get \[ \overline{\Lambda(B_2)}-\overline{\Lambda(B_2)} \subset \overline{\Lambda(B_2)-\Lambda(B_2)} \subset \overline{\Lambda(B_1)} \] Since \[ \Lambda(X)=\bigcup_{k=1}^{\infty}k\Lambda(B_2), \] according to BCT, at least one \(k\Lambda(B_2)\) is of the second category in \(Y\). But scalar multiplication \(y\mapsto ky\) is a homeomorphism of \(Y\) onto \(Y\), we see \(k\Lambda(B_2)\) is of the second category for all \(k\), especially for \(k=1\). Therefore \(\overline{\Lambda(B_2)}\) has nonempty interior, which implies that there exists some open neighborhood \(W\) of \(0\) in \(Y\) such that \(W \subset \overline{\Lambda(B_1)}\). By replacing the index, it's easy to see this holds for all \(n\). That is, for \(n \geq 1\), there exists some neighborhood \(W_n\) of \(0\) in \(Y\) such that \(W_n \subset \overline{\Lambda(B_n)}\).


The second inclusion requires the completeness of \(X\). Fix \(y_1 \in \overline{\Lambda(B_1)}\), we will show that \(y_1 \in \Lambda(B)\). Pick \(y_n\) inductively. Assume \(y_n\) has been chosen in \(\overline{\Lambda(B_n)}\). As stated before, there exists some neighborhood \(W_{n+1}\) of \(0\) in \(Y\) such that \(W_{n+1} \subset \overline{\Lambda(B_{n+1})}\). Hence \[ (y_n-W_{n+1}) \cap \Lambda(B_n) \neq \varnothing \] Therefore there exists some \(x_n \in B_n\) such that \[ \Lambda x_n = y_n - W_{n+1}. \] Put \(y_{n+1}=y_n-\Lambda x_n\), we see \(y_{n+1} \in W_{n+1} \subset \overline{\Lambda(B_{n+1})}\). Therefore we are able to pick \(y_n\) naturally for all \(n \geq 1\).

Since \(d(x_n,0)<\frac{r}{2^n}\) for all \(n \geq 0\), the sums \(z_n=\sum_{k=1}^{n}x_k\) converges to some \(z \in X\) since \(X\) is a \(F\)-space. Notice we also have \[ \begin{aligned} d(z,0)& \leq d(x_1,0)+d(x_2,0)+\cdots \\ & < \frac{r}{2}+\frac{r}{4}+\cdots \\ & = r \end{aligned} \] we have \(z \in B_0 \subset B\).

By the continuity of \(\Lambda\), we see \(\lim_{n \to \infty}y_n = 0\). Notice we also have \[ \sum_{k=1}^{n} \Lambda x_k = \sum_{k=1}^{n}(y_k-y_{k+1})=y_1-y_{n+1} \to y_1 \quad (n \to \infty), \] we see \(y_1 = \Lambda z \in \Lambda(B)\).

The whole theorem is now proved, that is, \(\Lambda\) is an open mapping. \(\square\)

Remarks

You may think the following relation comes from nowhere: \[ (y_n - W_{n+1}) \cap \Lambda(B_{n}) \neq \varnothing. \] But it's not. We need to review some set-point topology definitions. Notice that \(y_n\) is a limit point of \(\Lambda(B_n)\), and \(y_n-W_{n+1}\) is a open neighborhood of \(y_n\). If \((y_n - W_{n+1}) \cap \Lambda(B_{n})\) is empty, then \(y_n\) cannot be a limit point.

The geometric series by \[ \frac{\varepsilon}{2}+\frac{\varepsilon}{4}+\cdots+\frac{\varepsilon}{2^n}+\cdots=\varepsilon \] is widely used when sum is taken into account. It is a good idea to keep this technique in mind.

Corollaries

The formal proof will not be put down here, but they are quite easy to be done.

(Corollary 0) \(\Lambda(X)=Y\).

This is an immediate consequence of the fact that \(\Lambda\) is open. Since \(Y\) is open, \(\Lambda(X)\) is an open subspace of \(Y\). But the only open subspace of \(Y\) is \(Y\) itself.

(Corollary 1) \(Y\) is a \(F\)-space as well.

If you have already see the commutative diagram by quotient space (put \(N=\ker\Lambda\)), you know that the induced map \(f\) is open and continuous. By treating topological spaces as groups, by corollary 0 and the first isomorphism theorem, we have \[ X/\ker\Lambda \simeq \Lambda(X)=Y. \] Therefore \(f\) is a isomorphism; hence one-to-one. Therefore \(f\) is a homeomorphism as well. In this post we showed that \(X/\ker{\Lambda}\) is a \(F\)-space, therefore \(Y\) has to be a \(F\)-space as well. (We are using the fact that \(\ker{\Lambda}\) is a closed set. But why closed?)

(Corollary 2) If \(\Lambda\) is a continuous linear mapping of an \(F\)-space \(X\) onto a \(F\)-space \(Y\), then \(\Lambda\) is open.

This is a direct application of BCT and open mapping theorem. Notice that \(Y\) is now of the second category.

(Corollary 3) If the linear map \(\Lambda\) in Corollary 2 is injective, then \(\Lambda^{-1}:Y \to X\) is continuous.

This comes from corollary 2 directly since \(\Lambda\) is open.

(Corollary 4) If \(X\) and \(Y\) are Banach spaces, and if \(\Lambda: X \to Y\) is a continuous linear bijective map, then there exist positive real numbers \(a\) and \(b\) such that \[ a \lVert x \rVert \leq \lVert \Lambda{x} \rVert \leq b\rVert x \rVert \] for every \(x \in X\).

This comes from corollary 3 directly since both \(\Lambda\) and \(\Lambda^{-1}\) are bounded as they are continuous.

(Corollary 5) If \(\tau_1 \subset \tau_2\) are vector topologies on a vector space \(X\) and if both \((X,\tau_1)\) and \((X,\tau_2)\) are \(F\)-spaces, then \(\tau_1 = \tau_2\).

This is obtained by applying corollary 3 to the identity mapping \(\iota:(X,\tau_2) \to (X,\tau_1)\).

(Corollary 6) If \(\lVert \cdot \rVert_1\) and \(\lVert \cdot \rVert_2\) are two norms in a vector space \(X\) such that

  • \(\lVert\cdot\rVert_1 \leq K\lVert\cdot\rVert_2\).
  • \((X,\lVert\cdot\rVert_1)\) and \((X,\lVert\cdot\rVert_2)\) are Banach

Then \(\lVert\cdot\rVert_1\) and \(\lVert\cdot\rVert_2\) are equivalent.

This is merely a more restrictive version of corollary 5.

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it's time to make a list of the series. It's been around half a year.

The completeness of the quotient space (topological vector space)

The Goal

We are going to show the completeness of \(X/N\) where \(X\) is a TVS and \(N\) a closed subspace. Alongside, a bunch of useful analysis tricks will be demonstrated (and that's why you may find this blog post a little tedious.). But what's more important, the theorem proved here will be used in the future.

The main process

To make it clear, we should give a formal definition of \(F\)-space.

A topological space \(X\) is an \(F\)-space if its topology \(\tau\) is induced by a complete invariant metric \(d\).

A metric \(d\) on a vector space \(X\) will be called invariant if for all \(x,y,z \in X\), we have \[ d(x+z,y+z)=d(x,y). \] By complete we mean every Cauchy sequence of \((X,d)\) converges.

Defining the quotient metric \(\rho\)

The metric can be inherited to the quotient space naturally (we will use this fact latter), that is

If \(X\) is a \(F\)-space, \(N\) is a closed subspace of a topological vector space \(X\), then \(X/N\) is still a \(F\)-space.

Suppose \(d\) is a complete invariant metric compatible with \(\tau_X\). The metric on \(X/N\) is defined by \[ \boxed{\rho(\pi(x),\pi(y))=\inf_{z \in N}d(x-y,z)} \] ### \(\rho\) is a metric

Proof. First, if \(\pi(x)=\pi(y)\), that is, \(x-y \in N\), we see \[ \rho(\pi(x),\pi(y))=\inf_{z \in N}d(x-y,z)=d(x-y,x-y)=0. \] If \(\pi(x) \neq \pi(y)\) however, we shall show that \(\rho(\pi(x),\pi(y))>0\). In this case, we have \(x-y \notin N\). Since \(N\) is closed, \(N^c\) is open, and \(x-y\) is an interior point of \(X-N\). Therefore there exists an open ball \(B_r(x-y)\) centered at \(x-y\) with radius \(r>0\) such that \(B_r(x-y) \cap N = \varnothing\). Notice we have \(d(x-y,z)>r\) since otherwise \(z \in B_r(x-y)\). By putting \[ r_0=\sup\{r:B_r(x-y) \cap N = \varnothing\}, \] we see \(d(x-y,z) \geq r_0\) for all \(z \in N\) and indeed \(r_0=\inf_{z \in N}d(x-y,z)>0\) (the verification can be done by contradiction). In general, \(\inf_z d(x-y,z)=0\) if and only if \(x-y \in \overline{N}\).

Next, we shall show that \(\rho(\pi(x),\pi(y))=\rho(\pi(y),\pi(x))\), and it suffices to assume that \(\pi(x) \neq \pi(y)\). Sgince \(d\) is translate invariant, we get \[ \begin{aligned} d(x-y,z)&=d(x-y-z,0) \\ &=d(0,y-x+z) \\ &=d(-z,y-x) \\ &=d(y-x,-z). \end{aligned} \] Therefore the \(\inf\) of the left hand is equal to the one of the right hand. The identity is proved.

Finally, we need to verify the triangle inequality. Let \(r,s,t \in X\). For any \(\varepsilon>0\), there exist some \(z_\varepsilon\) and \(z_\varepsilon'\) such that \[ d(r-s,z_\varepsilon)<\rho(\pi(r),\pi(s))+\frac{\varepsilon}{2},\quad d(s-t,z'_\varepsilon)<\rho(\pi(s),\pi(t))+\frac{\varepsilon}{2}. \] Since \(d\) is invariant, we see \[ \begin{aligned} d(r-t,z_\varepsilon+z'_\varepsilon)&=d((r-s)+(s-t)-(z_\varepsilon+z'_\varepsilon),0) \\ &=d([(r-s)-z_\varepsilon]+[(s-t)-z'_\varepsilon],0) \\ &=d(r-s-z_\varepsilon,t-s+z'_\varepsilon) \\ &\leq d(r-s-z_\varepsilon,0)+d(t-s+z'_\varepsilon,0) \\ &=d(r-s,z_\varepsilon)+d(s-t,z'_\varepsilon) \end{aligned} \] (I owe [@LeechLattice](https://onp4.com/@leechlattice) for the inequality above.)

Therefore \[ \begin{aligned} d(r-t,z_\varepsilon+z'_\varepsilon)&\leq d(r-s,z_\varepsilon)+d(s-t,z'_\varepsilon) \\ &<\rho(\pi(r),\pi(s))+\rho(\pi(s),\pi(t))+\varepsilon. \end{aligned} \] (Warning: This does not imply that \(\rho(\pi(r),\pi(s))+\rho(\pi(s),\pi(t))=\inf_z d(r-t,z)\) since we don't know whether it is the lower bound or not.)

If \(\rho(\pi(r),\pi(s))+\rho(\pi(s),\pi(t))<\rho(\pi(r),\pi(t))\) however, let \[ 0<\varepsilon<\rho(\pi(r),\pi(t))-(\rho(\pi(r),\pi(s))+\rho(\pi(s),\pi(t))) \] then there exists some \(z''_\varepsilon=z_\varepsilon+z'_\varepsilon\) such that \[ d(r-t,z''_\varepsilon)<\rho(\pi(r),\pi(t)) \] which is a contradiction since \(\rho(\pi(r),\pi(t)) \leq d(r-t,z)\) for all \(z \in N\).

(We are using the \(\varepsilon\) definition of \(\inf\). See here.)

\(\rho\) is translate invariant

Since \(\pi\) is surjective, we see if \(u \in X/N\), there exists some \(a \in X\) such that \(\pi(a)=u\). Therefore \[ \begin{aligned} \rho(\pi(x)+u,\pi(y)+u) &=\rho(\pi(x)+\pi(a),\pi(y)+\pi(a)) \\ &=\rho(\pi(x+a),\pi(y+a)) \\ &=\inf_{z \in N}d(x+a-y-a,z) \\ &=\rho(\pi(x),\pi(y)). \end{aligned} \]

\(\rho\) is well-defined

If \(\pi(x)=\pi(x')\) and \(\pi(y)=\pi(y')\), we have to show that \(\rho(\pi(x),\pi(y))=\rho(\pi(x'),\pi(y'))\). In fact, \[ \begin{aligned} \rho(\pi(x),\pi(y)) &\leq \rho(\pi(x),\pi(x'))+\rho(\pi(x'),\pi(y'))+\rho(\pi(y'),\pi(y)) \\ &=\rho(\pi(x'),\pi(y')) \end{aligned} \] since \(\rho(\pi(x),\pi(x'))=0\) as \(\pi(x)=\pi(x')\). Meanwhile \[ \begin{aligned} \rho(\pi(x'),\pi(y')) &\leq \rho(\pi(x'),\pi(x)) + \rho(\pi(x),\pi(y)) + \rho(\pi(y),\pi(y')) \\ &= \rho(\pi(x),\pi(y)). \end{aligned} \] therefore \(\rho(\pi(x),\pi(y))=\rho(\pi(x'),\pi(y'))\).

\(\rho\) is compatible with \(\tau_N\)

By proving this, we need to show that a set \(E \subset X/N\) is open with respect to \(\tau_N\) if and only if \(E\) is a union of open balls. But we need to show a generalized version:

If \(\mathscr{B}\) is a local base for \(\tau\), then the collection \(\mathscr{B}_N\), which contains all sets \(\pi(V)\) where \(V \in \mathscr{B}\), forms a local base for \(\tau_N\).

Proof. We already know that \(\pi\) is continuous, linear and open. Therefore \(\pi(V)\) is open for all \(V \in \mathscr{B}\). For any open set around \(E \subset X/N\) containing \(\pi(0)\), we see \(\pi^{-1}(E)\) is open, and we have \[ \pi^{-1}(E)=\bigcup_{V\in\mathscr{B}}V \] and therefore \[ E=\bigcup_{V \in \mathscr{B}}\pi(V). \]


Now consider the local base \(\mathscr{B}\) containing all open balls around \(0 \in X\). Since \[ \pi(\{x:d(x,0)<r\})=\{u:\rho(u,\pi(0))<r\} \] we see \(\rho\) determines \(\mathscr{B}_N\). But we have already proved that \(\rho\) is invariant; hence \(\mathscr{B}_N\) determines \(\tau_N\).

If \(d\) is complete, then \(\rho\) is complete.

Once this is proved, we are able to claim that, if \(X\) is a \(F\)-space, then \(X/N\) is still a \(F\)-space, since its topology is induced by a complete invariant metric \(\rho\).

Proof. Suppose \((x_n)\) is a Cauchy sequence in \(X/N\), relative to \(\rho\). There is a subsequence \((x_{n_k})\) with \(\rho(x_{n_k},x_{n_{k+1}})<2^{-k}\). Since \(\pi\) is surjective, we are able to pick some \(z_k \in X\) such that \(\pi(z_k) = x_{n_k}\) and such that \[ d(z_{k},z_{k+1})<2^{-k}. \] (The existence can be verified by contradiction still.) By the inequality above, we see \((z_k)\) is Cauchy (can you see why?). Since \(X\) is complete, \(z_k \to z\) for some \(z \in X\). By the continuity of \(\pi\), we also see \(x_{n_k} \to \pi(z)\) as \(k \to \infty\). Therefore \((x_{n_k})\) converges. Hence \((x_n)\) converges since it has a convergent subsequence. \(\rho\) is complete.

Remarks

This fact will be used to prove some corollaries in the open mapping theorem. For instance, for any continuous linear map \(\Lambda:X \to Y\), we see \(\ker(\Lambda)\) is closed, therefore if \(X\) is a \(F\)-space, then \(X/\ker(\Lambda)\) is a \(F\)-space as well. We will show in the future that \(X/\ker(\Lambda)\) and \(\Lambda(X)\) are homeomorphic if \(\Lambda(X)\) is of the second category.

There are more properties that can be inherited by \(X/N\) from \(X\). For example, normability, metrizability, local convexity. In particular, if \(X\) is Banach, then \(X/N\) is Banach as well. To do this, it suffices to define the quotient norm by \[ \lVert \pi(x) \rVert = \inf\{\lVert x-z \rVert:z \in N\}. \]

Introducing Riemann-Stieltjes Integral

Motivation

Before going to it, we are going to give several motivations to define the Riemann-Stieltjes integral, which can be considered as an generalization of Riemann integral, the one everyone learns in their Calculus class.

When talking about \(\int_a^b fdg\), one may simply think about \(\int_a^b fg'dx\). But is it even necessary that \(g\) is differentiable? What would happen if \(g\) is simply continuous, or even not continuous? Further, given that \(g\) is differentiable, can we prove that \[ \int_a^b f(x)dg(x)=\int_a^bf(x)g'(x)dx \] in a general way(without assuming \(f\) is differentiable)?

Another motivation comes from probability theory. Oftentimes one need to consider discrete case (\(\sum\)) and continuous case (\(\int\)) separately. One may say that integral is the limit of summation, but it would be weird to write \(\int\) as \(\lim\sum\) every time. However, if we have a way to write a sum, for example the expected value of a discrete variable, as an integral, things would be easier. Of course, we don't want to write such a sum as another sum by adding up the integral on several disjoint segments. That would be weirder.

If you have learned measure theory, you will know that Lebesgue integral does not perfectly cover Riemann integral. For example, \(\int_{0}^{\infty}\frac{\sin{x}}{x}dx\) is not integrable in the sense of Lebesgue but Riemann. We cannot treat Lebesgue integral as a generalization of Riemann integral. In this blog post however, we are showing a direct generalization of Riemann integral.

We are trying our best to prevent ourselves from using \(\sup\), \(\inf\), and differentiation theory. But \(\varepsilon-\delta\) language is heavily used here, so make sure that you are good at it.

Riemann-Stieltjes Integral

By a partition \(P\) on \([a,b]\) we mean a sequence of numbers \((x_n)\) such that \[ a=x_0 \leq x_1 \leq \cdots \leq x_n=b \] and we associate its size by \[ \sigma(P)=\max_{k}(x_{k+1}-x_k). \] Let \(f\), \(g\) be bounded real function on \([a,b]\) (again, no continuity or differentiability required). Given a partition \(P\) and numbers \(c_k\) with \(x_k \leq c_k \leq x_{k+1}\), we define the Riemann-Stieltjes sum (RS-sum) by \[ S(P,f,g)=\sum_{k=0}^{n-1}f(c_k)[g(x_{k+1})-g(x_k)]. \] We say that the limit \[ \lim_{\sigma(P) \to 0}S(P,f,g) \] exists if there exists some \(L \in \mathbb{R}\) such that give \(\varepsilon>0\), there exists \(\delta>0\) such that whenever \(\sigma(P)<\delta\), we have \[ |S(P,f,g)-L|<\varepsilon. \] In this case, we say \(f\) is RS(g)-integrable, and the limit is denoted by \[ \int_a^bfdg. \] This is the so-called Riemann-Stieltjes integral. When \(g(x)=x\), we get Riemann integral naturally.

Remarks: Further generalization still available

This integral method can be generalized to Banach space. Let \(f\), \(g\) be bounded maps of \([a,b]\) into Banach spaces \(E\), \(F\) respectively. Assume we have a product \(E \times F \to G\) denoted by \((u,v) \mapsto uv\) with \(\lVert uv \rVert \leq \lVert u \rVert \lVert v \rVert\). Then by replacing the absolute value by norm, still we get the Riemann-Stieltjes integral, although in this case we have \[ \int_a^b fdg \in G \] and \(G\) is not necessary to be \(\mathbb{R}\). This is different from Bochner integral, since no measure theory evolved here.

Linearity with respect to \(f\) and \(g\)

First, we shall show that RS(g)-integrable functions form a vector space. To do this, it suffices to show that \[ f \mapsto S(P,f,g) \] and \[ g \mapsto S(P,f,g) \] are linear. This follows directly from the definition of RS-sum. Let's see the result.

Suppose we have \[ \int_a^b fdg=I, \quad \int_a^b hdg=J, \int_a^b fdu=K \] Then we have the following identities for \(\alpha \in I\).

  1. \(\int_a^b \alpha fdg=\alpha I\).
  2. \(\int_a^b (f+h)dg=I+J\).
  3. \(\int_a^bfd(g+u)=I+K\).
  4. \(\int_a^b fd(\alpha g)=\alpha I\).

Proof. We shall show 2 for example. Other three identities follows in the same way.

Notice that the existence of the limit of RS-sum depends only on the size of \(P\). For \(\varepsilon>0\), there exists some \(\delta_1,\delta_2>0\) such that \[ |S(P,f,g)-I|<\frac{\varepsilon}{2},\quad |S(P,h,g)-J| < \frac{\varepsilon}{2} \] when \(\sigma(P)<\delta_1\) and \(\sigma(P)<\delta_2\) respectively. By picking \(\delta=\min(\delta_1,\delta_2)\), we see for \(\sigma(P)<\delta\), we have \[ \begin{aligned} |S(P,f+h,g)-(I+J)|&=|(S(P,f,g)-I)+(S(P,h,g)-J)| \\ &\leq |S(P,f,g)-I| + |S(P,h,g)-J| \\ &< \frac{\varepsilon}{2}+\frac{\varepsilon}{2}=\varepsilon. \end{aligned} \]

Integration by parts but no differentiation

\(f \in RS(g)\) if and only if \(g \in RS(f)\). In this case, we also have integration by parts: \[ \int_a^b fdg + \int_a^b gdf=f(b)g(b)-f(a)g(a) \]

You may not believe it, but differentiation does not play any role here, as promised at the beginning.

Proof. Using the summation by parts (by Abel), we have $$ \[\begin{aligned} S(P,f,g)&=\sum_{k=0}^{n-1}f(c_k)[g(x_{k+1})-g(x_k)] \\ &=-\sum_{k=1}^{n-1}g(x_k)[f(c_k)-f(c_{k-1})]+f(c_{n-1})g(b)-f(c_0)g(a). \\ \end{aligned}\] \[ By writing \] S(P,f,g)=S(P,f,g)+f(a)g(a)-f(a)g(a)+f(b)g(b)-f(b)g(b) \[ we have \] S(P,f,g)=f(b)g(b)-f(a)g(a)-S(Q,g,f) \[ where \] S(Q,g,f)=_{k=1}^{n-1}g(x_k)[f(c_k)-f(c_{k-1})]+[f(b)-f(c_{n-1})]g(b)+[f(c_0)-f(a)]g(a). \[ Consider the partition $Q$ by \] y_k= \[\begin{cases} a &\quad k=0 \\ c_{k-1}&\quad k<n \\ b &\quad k=n \end{cases}\]

\[ we have $x_0,x_1,\cdots,x_{n-1},x_k$ to be intermediate points, and \] S(Q,g,f)={k=0}^{n-1}g(x_k)[f(y_{k+1})-f(y_k)]. \[ Since $0 < \sigma(Q) \leq 2\sigma(P) \leq 4\sigma(Q)$, when $\sigma(P) \to 0$, we also have $\sigma(Q) \to 0$ and vice versa. Suppose now $\int_a^b gdf$ exists, we have. \] {(P) }S(P,f,g)=f(b)g(b)-f(a)g(a)-_a^bgdf=_a^bfdg. $$ And integration by parts follows.

Suppose \(\int_a^bfdg\) exists, then \[ \lim_{\sigma(Q) \to 0}S(Q,g,f)=f(b)g(b)-f(a)g(a)-\int_a^b fdg=\int_a^b gdf. \] The proposition is proved. \(\square\)

The flexibility of Riemann-Stieltjes integral

As said before, we want to represent both continuous and discrete case using integral. For measure theory, we have Lebesgue measure and counting measure. But in some cases, this can be done using Riemann-Stieltjes integral as well. Ordinary Riemann integral and finite or infinite series are both special cases of Riemann-Stieltjes integral.

From integral to series (discrete case)

To do this, we need the unit step function by \[ I(x)=\begin{cases} 0 \quad x \leq 0, \\ 1 \quad x > 0 .\end{cases} \]

If \(a<s<b\), \(f\) is bounded on \([a,b]\) and continuous at \(s\), by putting \(g(x)=I(x-s)\), we have \[ \int_a^b fdg=f(s) \]

Proof. A simple verification shows that \(\int_a^b fdg=\int_s^b fdg\) (by unwinding the RS-sum, one see immediately that \(g(x_k)=0\) for all \(x_k\leq s\), therefore the partition before \(s\) has no tribute to the value of the integral). Now consider the partition \(P\) by \[ s=x_0<x_1<\cdots<x_n=b. \] We see \[ S(P,f,g)=\sum_{k=0}^{n-1}f(c_k)(g(x_{k+1})-g(x_k))=f(c_0)(g(x_1)-g(x_0))=f(c_0). \] As \(x_1 \to s\), we have \(c_0 \to s\), since \(f\) is continuous at \(s\), we have \(f(c_0) \to f(s)\) as desired. \(\square\)

By the linearity of RS integral, it's easy to generalize this to the case of finite linear combination. Namely, for \(g(x)=\sum_{k=1}^{n}c_nI(x-s_n)\), we have \[ \int_a^b fdg=\sum_{k=1}^{n}c_nf(s_n). \] But now we are discussing the infinite case.

Suppose \(c_n \geq 0\) for all \(n\) and \(\sum_n c_n\) converges, \((s_n)\) is a sequence of distinct points in \((a,b)\), and \[ g(x)=\sum_nc_nI(x-s_n). \] Let \(f\) be continuous on \([a,b]\). Then \[ \int_a^b fdg=\sum_{n}c_nf(s_n) \]

Proof. First it's easy to see that \(g(x)\) converges for every \(x\), and is monotonic with \(g(a)=0\), \(g(b)=\sum_n c_n\). For given \(\varepsilon>0\), there exists some \(N\) such that \[ \sum_{N+1}^{\infty}c_n<\varepsilon. \] Put \[ g_1(x)=\sum_{n=1}^{N}c_nI(x-s_n),\quad g_2(x)=\sum_{N+1}^{\infty}c_nI(x-s_n)=g(x)-g_1(x) \] we have \[ \int_a^b fdg_1=\sum_{n=1}^{N}c_nf(s_n). \] By putting \(M=\sup|f(x)|\), we see \[ \left\vert\int_a^b fdg_2 \right\vert=\left\vert\int_a^b fdg-\int_a^bfdg_1 \right\vert=\left\vert\int_a^b fdg-\sum_{n=1}^{N}c_nf(s_n)\right\vert \leq M\varepsilon \] The inequality holds since \(g_2(b)-g_2(a)<\varepsilon\). Since \(M\) is finite, when \(N \to \infty\), we have the desired result.

Transformed into ordinary Riemann integral (continuous case)

Finally we are discussing some differentiation. The following theorem shows the connection between RS integral and Riemann integral.

Let \(f\) be continuous and suppose that \(g\) is real differentiable on \([a,b]\) while \(g'\) is Riemann integrable as well, then \(f \in RS(g)\) and \[ \int_a^b fdg=\int_a^b fg'dx \]

Proof. By mean value theorem, for each \(k\), we have \[ g(x_{k+1})-g(x_k)=g'(\zeta_k)(x_{k+1}-x_k). \] The RS-sum can be written as \[ S(P,f,g)=\sum_{k=0}^{n-1}f(c_k)[g(x_{k+1})-g(x_k)]=\sum_{k=0}^{n-1}f(c_k)g'(\zeta_k)(x_{k+1}-x_k). \] Since \(g'\) is Riemann integrable, we have \[ \sum_{k=0}^{n-1}|g'(c_k)-g'(\zeta_k)|(x_{k+1}-x_k) <\varepsilon \] given that \(|S(P,g',x)-\int_a^b g'dx|<\varepsilon\). Therefore \[ \left\vert\sum_{k=0}^{n-1}f(c_k)g'(\zeta_k)(x_{k+1}-x_k)-\sum_{k=0}^{n-1}f(c_k)g'(c_k)(x_{k+1}-x_k)\right\vert\leq M\varepsilon \] where \(M=\sup|f(x)|<\infty\) (\(f\) is assumed to be bounded.) . Also notice that \(fg'\) is integrable since \(f\) is continuous. Therefore \[ \begin{aligned} \left\vert S(P,f,g)-\int_a^bfg'dx \right\vert&=\left\vert S(P,f,g)-S(P,fg',x)+S(P,fg',x)-\int_a^bfg'dx \right\vert \\ &\leq \left\vert S(P,f,g)-S(P,fg',x) \right\vert+\left\vert S(P,fg',x)-\int_a^bfg'dx \right\vert \\ &< (M+1)\varepsilon. \end{aligned} \] Therefore, \[ \int_a^bfdg=\int_a^b fg'dx, \] which proves the theorem. \(\square\)

To sum up, given \(\varepsilon>0\), there exists some \(\delta>0\) such that if \(\sigma(P)<\delta\), we have \[ \left|S(P,g',x)-\int_a^b g'dx\right|<\varepsilon/(M+1) \] and \[ \left\vert S(P,fg',x)-\int_a^bfg'dx \right\vert<\varepsilon/(M+1). \]

After some estimation, we get \[ \left|S(P,f,g)-\int_{a}^{b}fg'dx \right|<(M+1)\frac{\varepsilon}{M+1}=\varepsilon. \]