A proof of the ordinary Gleason-Kahane-Żelazko theorem for complex functionals

The Theorem

(Gleason-Kahane-Żelazko) If \(\phi\) is a complex linear functional on a unitary Banach algebra \(A\), such that \(\phi(e)=1\) and \(\phi(x) \neq 0\) for every invertible \(x \in A\), then \[ \phi(xy)=\phi(x)\phi(y) \] Namely, \(\phi\) is a complex homomorphism.

Notations and remarks

Suppose \(A\) is a complex unitary Banach algebra and \(\phi: A \to \mathbb{C}\) is a linear functional which is not identically \(0\) (for convenience), and if \[ \phi(xy)=\phi(x)\phi(y) \] for all \(x \in A\) and \(y \in A\), then \(\phi\) is called a complex homomorphism on \(A\). Note that a unitary Banach algebra (with \(e\) as multiplicative unit) is also a ring, so is \(\mathbb{C}\), we may say in this case \(\phi\) is a ring-homomorphism. For such \(\phi\), we have an instant proposition:

Proposition 0 \(\phi(e)=1\) and \(\phi(x) \neq 0\) for every invertible \(x \in A\).

Proof. Since \(\phi(e)=\phi(ee)=\phi(e)\phi(e)\), we have \(\phi(e)=0\) or \(\phi(e)=1\). If \(\phi(e)=0\) however, for any \(y \in A\), we have \(\phi(y)=\phi(ye)=\phi(y)\phi(e)=0\), which is an excluded case. Hence \(\phi(e)=1\).

For invertible \(x \in A\), note that \(\phi(xx^{-1})=\phi(x)\phi(x^{-1})=\phi(e)=1\). This can't happen if \(\phi(x)=0\). \(\square\)

The theorem reveals that Proposition \(0\) actually characterizes the complex homomorphisms (ring-homomorphisms) among the linear functionals (group-homomorphisms).

This theorem was proved by Andrew M. Gleason in 1967 and later independently by J.-P. Kahane and W. Żelazko in 1968. Both of them worked mainly on commutative Banach algebras, and the non-commutative version, which focused on complex homomorphism, was by W. Żelazko. In this post we will follow the third one.

Unfortunately, one cannot find an educational proof on the Internet with ease, which may be the reason why I write this post and why you read this.


Following definitions of Banach algebra and some logic manipulation, we have several equivalences worth noting.

Subspace and ideal version

(Stated by Gleason) Let \(M\) be a linear subspace of codimension one in a commutative Banach algebra \(A\) having an identity. Suppose no element of \(M\) is invertible, then \(M\) is an ideal.

(Stated by Kahane and Żelazko) A subspace \(X \subset A\) of codimension \(1\) is a maximal ideal if and only if it consists of non-invertible elements.

Spectrum version

(Stated by Kahane and Żelazko) Let \(A\) be a commutative complex Banach algebra with unit element. Then a functional \(f \in A^\ast\) is a multiplicative linear functional if and only if \(f(x)=\sigma(x)\) holds for all \(x \in A\).

Here \(\sigma(x)\) denotes the spectrum of \(x\).

The connection

Clearly any maximal ideal contains no invertible element (if so, then it contains \(e\), then it's the ring itself). So it suffices to show that it has codimension 1, and if it consists of non-invertible elements. Also note that every maximal ideal is the kernel of some complex homomorphism. For such a subspace \(X \subset A\), since \(e \notin X\), we may define \(\phi\) so that \(\phi(e)=1\), and \(\phi(x) \in \sigma(x)\) for all \(x \in A\). Note that \(\phi(e)=1\) holds if and only if \(\phi(x) \in \sigma(x)\). As we will show, \(\phi\) has to be a complex homomorphism.

Tools to prove the theorem

Lemma 0 Suppose \(A\) is a unitary Banach algebra, \(x \in A\), \(\lVert x \rVert<1\), then \(e-x\) is invertible.

This lemma can be found in any functional analysis book introducing Banach algebra.

Lemma 1 Suppose \(f\) is an entire function of one complex variable, \(f(0)=1\), \(f'(0)=0\), and \[ 0<|f(\lambda)| \leq e^{|\lambda|} \] for all complex \(\lambda\), then \(f(\lambda)=1\) for all \(\lambda \in \mathbb{C}\).

Note that there is an entire function \(g\) such that \(f=\exp(g)\). It can be shown that \(g=0\). Indeed, if we put \[ h_r(\lambda) = \frac{r^2g(\lambda)}{\lambda^2[2r-g(\lambda)]} \] then we see \(h_r\) is holomorphic in the open disk centred at \(0\) with radius \(2r\). Besides, \(|h_r(\lambda)| \leq 1\) if \(|\lambda|=r\). By the maximum modulus theorem, we have \[ |h_r(\lambda)| \leq 1 \] whenever \(|\lambda| \leq r\). Fix \(\lambda\) and let \(r \to \infty\), by definition of \(h_r(\lambda)\), we must have \(g(\lambda)=0\).

Jordan homomorphism

A mapping \(\phi\) from one ring \(R\) to another ring \(R'\) is said to be a Jordan homomorphism from \(R\) to \(R'\) if \[ \phi(a+b)=\phi(a)+\phi(b) \] and \[ \phi(ab+ba)=\phi(a)\phi(b)+\phi(b)\phi(a). \] It's of course clear that every homomorphism is Jordan. Note if \(R'\) is not of characteristic \(2\), the second identity is equivalent to \[ \phi(a^2)=\phi(a)^2. \] To show the equivalence, one let \(b=a\) in the first case and puts \(a+b\) in place of \(a\) in the second case.

Since in this case \(R=A\) and \(R'=\mathbb{C}\), the latter of which is commutative, we also write \[ \phi(ab+ba)=2\phi(a)\phi(b). \] As we will show, the \(\phi\) in the theorem is a Jordan homomorphism.

The proof

We will follow an unusual approach. By keep 'downgrading' the goal, one will see this algebraic problem be transformed into a pure analysis problem neatly.

To begin with, let \(N\) be the kernel of \(\phi\).

Step 1 - It suffices to prove that \(\phi\) is a Jordan homomorphism

If \(\phi\) is a complex homomorphism, it is immediate that \(\phi\) is a Jordan homomorphism. Conversely, if \(\phi\) is Jordan, we have \[ \phi(xy+yx) =2\phi(x)\phi(y). \] If \(x\in N\), the right hand becomes \(0\), and therefore \[ xy+yx \in N \quad \text{if } x \in N, y \in A. \] Consider the identity \[ (xy-yx)^2+(xy+yx)^2=2[x(yxy)+(yxy)x] \]

Therefore \[ \begin{aligned} \phi((xy-yx)^2+(xy+yx)^2)&=\phi((xy-yx)^2)+\phi((xy+yx)^2) \\ &=\phi(xy-yx)^2+\phi(xy+yx)^2 \\ &= \phi(xy-yx)^2 \\ &=2\phi[x(yxy)+(yxy)x] \\ &=0 \end{aligned} \] Since \(x \in N\) and \(yxy \in A\), we see \(x(yxy)+(yxy)x \in N\). Therefore \(\phi(xy-yx)=0\) and \[ xy-yx \in N \] if \(x \in N\) and \(y \in A\). Further we see \[ xy-yx+xy+yx=2xy \in N \quad \text {and}\quad xy+yx-xy+yx = 2yx \in N, \] which implies that \(N\) is an ideal. This may remind you of this classic diagram (we will not use it since it is additive though):

Ring Homomorphism

For \(x,y \in A\), we have \(x \in \phi(x)e+N\) and \(y \in \phi(y)e+N\). As a result, \(xy \in \phi(x)\phi(y)e+N\), and therefore \[ \phi(xy)=\phi(x)\phi(y)+0. \]

Step 2 - It suffices to prove that \(\phi(a^2)=0\) if \(\phi(a)=0\).

Again, if \(\phi\) is Jordan, we have \(\phi(x^2)=\phi(x)^2\) for all \(x \in A\). Conversely, if \(\phi(a^2)=0\) for all \(a \in N\), we may write \(x\) by \[ x=\phi(x)e+a \] where \(a \in N\) for all \(x \in A\). Therefore \[ \begin{aligned} \phi(x^2)&=\phi((\phi(x)e+a)^2)=\phi(x)^2+2\phi(x)\phi(a)+\phi(a)^2=\phi(x)^2, \end{aligned} \] which also shows that \(\phi\) is Jordan.

Step 3 - It suffices to show that the following function is constant

Fix \(a \in N\), assume \(\lVert a \rVert = 1\) without loss of generality, and define \[ f(\lambda)=\sum_{n=0}^{\infty}\frac{\phi(a^n)}{n!}\lambda^n \] for all complex \(\lambda\). If this function is constant (lemma 1), we immediately have \(f''(0)=\phi(a^2)=0\). This is purely a complex analysis problem however.

Step 4 - It suffices to describe the behaviour of an entire function

Note in the definition of \(f\), we have \[ \lvert \phi(a^n) \rvert \leq \lVert \phi \rVert \lVert a^n \rVert \leq \lVert \phi \rVert \lVert a \rVert^n=\lVert \phi \rVert. \] So we expect the norm of \(\phi\) to be finite, which ensures that \(f\) is entire. By reductio ad absurdum, if \(\lVert e-a \rVert < 1\) for \(a \in N\), by lemma 0, we have \(e-e+a=a\) to be invertible, which is impossible. Hence \(\lVert e-a \rVert \geq 1\) for all \(a \in N\). On the other hand, for \(\lambda \in \mathbb{C}\), we have the following inequality: \[ \begin{aligned} \lVert \lambda e-a \rVert = \lambda\lVert e-\lambda^{-1}a \rVert &\geq|\lambda| \\ &= |\phi(\lambda e)-\phi(a)| \\ &= |\phi(\lambda e-a)| \end{aligned} \] Therefore \(\phi\) is continuous with norm less than \(1\). The continuity of \(\phi\) is not assumed at the beginning but proved here.

For \(f\) we have some immediate facts. Since each coefficient in the series of \(f\) has finite norm, \(f\) is entire with \(f'(0)=\phi(a)=0\). Also, since \(\phi\) has norm \(1\), we also have \[ |f(\lambda)|=\left|\sum_{n=0}^{\infty}\frac{\phi(a^n)}{n!}\lambda^n\right| \leq \sum_{n=0}^{\infty}\frac{|\lambda^n|}{n!}=e^{|\lambda|}. \] All we need in the end is to show that \(f(\lambda) \neq 0\) for all \(\lambda \in \mathbb{C}\).

The series \[ E(\lambda)=\exp(a\lambda)=\sum_{n=0}^{\infty}\frac{(\lambda a)^n}{n!} \] converges since \(\lVert a \rVert=1\). The continuity of \(\phi\) shows now \[ f(\lambda)=\phi(E(\lambda)). \] Note \[ E(-\lambda)E(\lambda)=\left(\sum_{n=0}^{\infty}\frac{(-\lambda a)^n}{n!}\right)\left(\sum_{n=0}^{\infty}\frac{(\lambda a)^n}{n!}\right)=e. \] Hence \(E(\lambda)\) is invertible for all \(\lambda \in C\), hence \(f(\lambda)=\phi(E(\lambda)) \neq 0\). By lemma 1, \(f(\lambda)=1\) is constant. The proof is completed by reversing the steps. \(\square\)

References / Further reading

  • Walter Rudin, Real and Complex Analysis
  • Walter Rudin, Functional Analysis
  • Andrew M. Gleason, A Characterization of Maximal Ideals
  • J.-P. Kahane and W. Żelazko, A Characterization of Maximal Ideals in Commutative Banach Algebras
  • W. Żelazko A Characterization of Multiplicative linear functionals in Complex Banach Algebras
  • I. N. Herstein, Jordan Homomorphisms

The Big Three Pt. 5 - The Hahn-Banach Theorem (Dominated Extension)

About this post

The Hahn-Banach theorem has been a central tool for functional analysis and therefore enjoys a wide variety, many of which have a numerous uses in other fields of mathematics. Therefore it's not possible to cover all of them. In this post we are covering two 'abstract enough' results, which are sometimes called the dominated extension theorem. Both of them will be discussed in real vector space where topology is not endowed. This allows us to discuss any topological vector space.

Another interesting thing is, we will be using axiom of choice, or whatever equivalence you may like, for example Zorn's lemma or well-ordering principle. Before everything, we need to examine more properties of vector spaces.

Vector space

It's obvious that every complex vector space is also a real vector space. Suppose \(X\) is a complex vector space, and we shall give the definition of real-linear and complex-linear functionals.

An addictive functional \(\Lambda\) on \(X\) is called real-linear (complex-linear) if \(\Lambda(\alpha x)=\alpha\Lambda(x)\) for every \(x \in X\) and for every real (complex) scalar \(\alpha\).

For *-linear functionals, we have two important but easy theorems.

If \(u\) is the real part of a complex-linear functional \(f\) on \(X\), then \(u\) is real-linear and \[ f(x)=u(x)-iu(ix) \quad (x \in X). \]

Proof. For complex \(f(x)=u(x)+iv(x)\), it suffices to denote \(v(x)\) correctly. But \[ if(x)=iu(x)-v(x), \] we see \(\Im(f(x)=v(x)=-\Re(if(x))\). Therefore \[ f(x)=u(x)-i\Re(if(x))=u(x)-i\Re(f(ix)) \] but \(\Re(f(ix))=u(ix)\), we get \[ f(x)=u(x)-iu(ix). \] To show that \(u(x)\) is real-linear, note that \[ f(x+y)=u(x+y)+iv(x+y)=f(x)+f(y)=u(x)+u(y)+i(v(x)+v(y)). \] Therefore \(u(x)+u(y)=u(x+y)\). Similar process can be applied to real scalar \(\alpha\). \(\square\)

Conversely, we are able to generate a complex-linear functional by a real one.

If \(u\) is a real-linear functional, then \(f(x)=u(x)-iu(ix)\) is a complex-linear functional

Proof. Direct computation. \(\square\)

Suppose now \(X\) is a complex topological vector space, we see a complex-linear functional on \(X\) is continuous if and only if its real part is continuous. Every continuous real-linear \(u: X \to \mathbb{R}\) is the real part of a unique complex-linear continuous functional \(f\).

Sublinear, seminorm

Sublinear functional is 'almost' linear but also 'almost' a norm. Explicitly, we say \(p: X \to \mathbb{R}\) a sublinear functional when it satisfies \[ \begin{aligned} p(x)+p(y) &\leq p(x+y) \\ p(tx) &= tp(x) \\ \end{aligned} \] for all \(t \geq 0\). As one can see, if \(X\) is normable, then \(p(x)=\lVert x \rVert\) is a sublinear functional. One should not be confused with semilinear functional, where inequality is not involved. Another thing worth noting is that \(p\) is not restricted to be nonnegative.

A seminorm on a vector space \(X\) is a real-valued function \(p\) on \(X\) such that \[ \begin{aligned} p(x+y) &\leq p(x)+p(y) \\ p(\alpha x)&=|\alpha|p(x) \end{aligned} \] for all \(x,y \in X\) and scalar \(\alpha\).

Obviously a seminorm is also a sublinear functional. For the connection between norm and seminorm, one shall note that \(p\) is a norm if and only if it satisfies \(p(x) \neq 0\) if \(x \neq 0\).

Dominated extension theorems

Are the results will be covered in this post. Generally speaking, we are able to extend a functional defined on a subspace to the whole space as long as it's dominated by a sublinear functional. This is similar to the dominated convergence theorem, which states that if a convergent sequence of measurable functions are dominated by another function, then the convergence holds under the integral operator.

(Hahn-Banach) Suppose

  1. \(M\) is a subspace of a real vector space \(X\),
  2. \(f: M \to \mathbb{R}\) is linear and \(f(x) \leq p(x)\) on \(M\) where \(p\) is a sublinear functional on \(X\)

Then there exists a linear \(\Lambda: X \to \mathbb{R}\) such that \[ \Lambda(x)=f(x) \] for all \(x \in M\) and \[ -p(-x) \leq \Lambda(x) \leq p(x) \] for all \(x \in X\).

Step 1 - Extending the function by one dimension

With that being said, if \(f(x)\) is dominated by a sublinear functional, then we are able to extend this functional to the whole space with a relatively proper range.

Proof. If \(M=X\) we have nothing to do. So suppose now \(M\) is a nontrivial proper subspace of \(X\). Choose \(x_1 \in X-M\) and define \[ M_1=\{x+tx_1:x \in M,t \in R\}. \] It's easy to verify that \(M_1\) satisfies all axioms of vector space (warning again: no topology is endowed). Now we will be using the properties of sublinear functionals.

Since \[ f(x)+f(y)=f(x+y) \leq p(x+y) \leq p(x-x_1)+p(x_1+y) \] for all \(x,y \in M\), we have \[ f(x)-p(x-x_1) \leq p(x_1+y) -f(y). \] Let \[ \alpha=\sup_{x}\{f(x)-p(x-x_1):x \in M\}. \] By definition, we naturally get \[ f(x)-\alpha \leq p(x-x_1) \] and \[ f(y)+\alpha \leq p(x_1+y). \] Define \(f_1\) on \(M_1\) by \[ f_1(x+tx_1)=f(x)+t\alpha. \] So when \(x +tx_1 \in M\), we have \(t=0\), and therefore \(f_1=f\).

To show that \(f_1 \leq p\) on \(M_1\), note that for \(t>0\), we have \[ f(x/t)-\alpha \leq p(x/t-x_1), \] which implies \[ f(x)-t\alpha=f_1(x-t\alpha)\leq p(x-tx_1). \] Similarly, \[ f(y/t)+\alpha \leq p(y/t+x_1), \] and therefore \[ f(y)+t\alpha=f_1(y+tx_1) \leq p(y+tx_1). \] Hence \(f_1 \leq p\).

Step 2 - An application of Zorn's lemma

Side note: Why Zorn's lemma

It seems that we can never stop using step 1 to extend \(M\) to a larger space, but we have to extend. (If \(X\) is a finite dimensional space, then this is merely a linear algebra problem.) This meets exactly what William Timothy Gowers said in his blog post:

If you are building a mathematical object in stages and find that (i) you have not finished even after infinitely many stages, and (ii) there seems to be nothing to stop you continuing to build, then Zorn’s lemma may well be able to help you.

-- How to use Zorn's lemma

And we will show that, as W. T. Gowers said,

If the resulting partial order satisfies the chain condition and if a maximal element must be a structure of the kind one is trying to build, then the proof is complete.

To apply Zorn's lemma, we need to construct a partially ordered set. Let \(\mathscr{P}\) be the collection of all ordered pairs \((M',f')\) where \(M'\) is a subspace of \(X\) containing \(M\) and \(f'\) is a linear functional on \(M'\) that extends \(f\) and satisfies \(f' \leq p\) on \(M'\). For example we have \[ (M,f) , (M_1,f_1) \subset \mathscr{P}. \] The partial order \(\leq\) is defined as follows. By \((M',f') \leq (M'',f'')\), we mean \(M' \subset M''\) and \(f' = f''\) on \(M'\). Obviously this is a partial order (you should be able to check this).

Suppose now \(\mathcal{F}\) is a chain (totally ordered subset of \(\mathscr{P}\)). We claim that \(\mathcal{F}\) has an upper bound (which is required by Zorn's lemma). Let \[ M_0=\bigcup_{(M',f') \in \mathcal{F}}M' \] and \[ f_0(y)=f(y) \] whenever \((M',f') \in \mathcal{F}\) and \(y \in M'\). It's easy to verify that \((M_0,f_0)\) is the upper bound we are looking for. But \(\mathcal{F}\) is arbitrary, therefore by Zorn's lemma, there exists a maximal element \((M^\ast,f^\ast)\) in \(\mathscr{P}\). If \(M^* \neq X\), according to step 1, we are able to extend \(M^\ast\), which contradicts the maximality of \(M^\ast\). And \(\Lambda\) is defined to be \(f^\ast\). By the linearity of \(\Lambda\), we see \[ -p(-x) \leq -\Lambda(-x)=\Lambda{x}. \] The theorem is proved. \(\square\)

How this proof is constructed

This is a classic application of Zorn's lemma (well-ordering principle, or Hausdorff maximality theorem). First, we showed that we are able to extend \(M\) and \(f\). But since we do not know the dimension or other properties of \(X\), it's not easy to control the extension which finally 'converges' to \((X,\Lambda)\). However, Zorn's lemma saved us from this random exploration: Whatever happens, the maximal element is there, and take it to finish the proof.

Generalisation onto the complex field

Since inequality is appeared in the theorem above, we need more careful validation.

(Bohnenblust-Sobczyk-Soukhomlinoff) Suppose \(M\) is a subspace of a vector space \(X\), \(p\) is a seminorm on \(X\), and \(f\) is a linear functional on \(M\) such that \[ |f(x)| \leq p(x) \] for all \(x \in M\). Then \(f\) extends to a linear functional \(\Lambda\) on \(X\) satisfying \[ |\Lambda (x)| \leq p(x) \] for all \(x \in X\).

Proof. If the scalar field is \(\mathbb{R}\), then we are done, since \(p(-x)=p(x)\) in this case (can you see why?). So we assume the scalar field is \(\mathbb{C}\).

Put \(u = \Re f\). By dominated extension theorem, there is some real-linear functional \(U\) such that \(U(x)=u\) on \(M\) and \(U \leq p\) on \(X\). And here we have \[ \Lambda(x)=U(x)-iU(ix) \] where \(\Lambda(x)=f(x)\) on \(M\).

To show that \(|\Lambda(x)| \leq p(x)\) for \(x \neq 0\), by taking \(\alpha=\frac{|\Lambda(x)|}{\Lambda(x)}\), we have \[ U(\alpha{x})=\Lambda(\alpha{x})=|\Lambda(x)|\leq p(\alpha x)=p(x) \] since \(|\alpha|=1\) and \(p(\alpha{x})=|\alpha|p(x)=p(x)\). \(\square\)

Extending Hahn-Banach theorem under linear transform

To end this post, we state a beautiful and useful extension of the Hahn-Banach theorem, which is done by R. P. Agnew and A. P. Morse.

(Agnew-Morse) Let \(X\) denote a real vector space and \(\mathcal{A}\) be a collection of linear maps \(A_\alpha: X \to X\) that commute, or namely \[ A_\alpha A_\beta=A_\beta A_\alpha \] for all \(A_\alpha,A_\beta \in \mathcal{A}\). Let \(p\) be a sublinear functional such that \[ p(A_\alpha{x})=p(x) \] for all \(A_\alpha \in \mathcal{A}\). Let \(Y\) be a subspace of \(X\) on which a linear functional \(f\) is defined such that

  1. \(f(y) \leq p(y)\) for all \(y \in Y\).
  2. For each mapping \(A\) and \(y \in Y\), we have \(Ay \in Y\).
  3. Under the hypothesis of 2, we have \(f(Ay)=f(y)\).

Then \(f\) can be extended to \(X\) by \(\Lambda\) so that \(-p(-x) \leq \Lambda(x) \leq p(x)\) for all \(x \in X\), and \[ \Lambda(A_\alpha{x})=\Lambda{x}. \]

To prove this theorem, we need to construct a sublinear functional that dominates \(f\). For the whole proof, see Functional Analysis by Peter Lax.

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it's time to make a list of the series. It's been around half a year.

References / Further Readings

  1. Walter Rudin, Functional Analysis.
  2. Peter Lax, Functional Analysis.
  3. William Timothy Gowers, How to use Zorn's lemma.

The Fourier transform of sinx/x and (sinx/x)^2 and more

In this post

We are going to evaluate the Fourier transform of \(\frac{\sin{x}}{x}\) and \(\left(\frac{\sin{x}}{x}\right)^2\). And it turns out to be a comprehensive application of many elementary theorems in complex analysis. It is a good thing to make sure that you can compute and understand all the identities in this post by yourself in the end. Also, you are expected to be able to recall what all words in italics mean.

To be clear, by Fourier transform we actually mean

\[ \hat{f}(t) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}f(x)e^{-itx}dx. \]

This is a matter of convenience. Indeed, the coefficient \(\frac{1}{\sqrt{2\pi}}\) is superfluous, but without it, when computing the Fourier inverse, one has to write \(\frac{1}{2\pi}\). Instead of making it unbalanced, we write \(\frac{1}{\sqrt{2\pi}}\) all the time and pretend it is not here.

We say a function \(f\) is in \(L^1\) if \(\int_{-\infty}^{+\infty}|f(x)|dx<+\infty\). As classic exercises in elementary calculus, \(\frac{\sin{x}}{x} \not\in L^1\) but \(\left(\frac{\sin{x}}{x}\right)^2 \in L^1\).

Problem 1

For real \(t\), find the following limit:

\[ \lim_{A \to \infty}\int_{-A}^{A}\frac{\sin{x}}{x}e^{itx}dx. \]

Since \(\frac{\sin{x}}{x}e^{itx}\not\in L^1\), we cannot evaluate the integral of it over \(\mathbb{R}\) directly since it's not defined in the sense of Lebesgue integral (the reader can safely ignore this if he or she has no background in it at this moment, but do keep in mind that being in \(L^1\) is a big matter). Instead, for given \(A>0\), the integral of it over \([-A,A]\) is defined, and we evaluate this limit to get what we want.

We will do this using contour integration. Since the complex function \(f(z)=\frac{\sin{z}}{z}e^{itz}\) is entire, by Cauchy's theorem, its integral over \([-A,A]\) is equal to the one over the path \(\Gamma_A\) by going from \(-A\) to \(-1\) along the real axis, from \(-1\) to \(1\) along the lower half of the unit circle, and from \(1\) to \(A\) along the real axis (why?). Since the path \(\Gamma_A\) avoids the origin, we are safe to use the identity

\[ 2i\sin{z}=e^{iz}-e^{-iz}. \]

Replacing \(\sin{z}\) with \(\frac{1}{2i}(e^{itz}-e^{-itz})\), we get

\[ I_A(t)=\int_{\Gamma_A}f(z)dz=\int_{\Gamma_A}\frac{1}{2iz}(e^{i(t+1)z}-e^{i(t-1)z})dz. \]

If we put \(\varphi_A(t)=\int_{\Gamma_A}\frac{1}{2iz}e^{itz}dz\), we see \(I_A(t)=\varphi_A(t+1)-\varphi_A(t-1)\). It is convenient to divide \(\varphi_A\) by \(\pi\) since we therefore get

\[ \frac{1}{\pi}\varphi_A(t)=\frac{1}{2\pi i}\int_{\Gamma_A}\frac{e^{itz}}{z}dz \]

and we are cool with the divisor \(2\pi i\).

Now, close the path \(\Gamma_A\) in two ways. First, by the semicircle from \(A\) to \(-Ai\) to \(-A\); second, by the semicircle from \(A\) to \(Ai\) to \(-A\), which finishes a circle with radius \(A\). For simplicity we denote the two paths by \(\Gamma_U\) and \(\Gamma_L\). Again by the Cauchy theorem, the first case gives us an integral with value \(0\), thus by Cauchy's theorem,

\[ \frac{1}{\pi}\varphi_A(t)=\frac{1}{2\pi i}\int_{-\pi}^{0}\frac{\exp{(itAe^{i\theta})}}{Ae^{i\theta}}dAe^{i\theta}=\frac{1}{2\pi}\int_{-\pi}^{0}\exp{(itAe^{i\theta})}d\theta. \]

Notice that

\[ \begin{aligned} |\exp(itAe^{i\theta})|&=|\exp(itA(\cos\theta+i\sin\theta))| \\ &=|\exp(itA\cos\theta)|\cdot|\exp(-At\sin\theta)| \\ &=\exp(-At\sin\theta) \end{aligned} \]

hence if \(t\sin\theta>0\), we have \(|\exp(iAte^{i\theta})| \to 0\) as \(A \to \infty\). When \(-\pi < \theta <0\) however, we have \(\sin\theta<0\). Therefore we get

\[ \frac{1}{\pi}\varphi_{A}(t)=\frac{1}{2\pi}\int_{-\pi}^{0}\exp(itAe^{i\theta})d\theta \to 0\quad (A \to \infty,t<0). \]

(You should be able to prove the convergence above.) Also trivially

\[ \varphi_A(0)=\frac{1}{2}\int_{-\pi}^{0}1d\theta=\frac{\pi}{2}. \]

But what if \(t>0\)? Indeed, it would be difficult to obtain the limit using the integral over \([-\pi,0]\). But we have another path, namely the upper one.

Note that \(\frac{e^{itz}}{z}\) is a meromorphic function in \(\mathbb{C}\) with a pole at \(0\). For such a function we have

\[ \frac{e^{itz}}{z}=\frac{1}{z}\left(1+itz+\frac{(itz)^2}{2!}+\cdots\right)=\frac{1}{z}+it+\frac{(it)^2z}{2!}+\cdots. \]

which implies that the residue at \(0\) is \(1\). By the residue theorem,

\[ \begin{aligned} \frac{1}{2\pi{i}}\int_{\Gamma_L}\frac{e^{itz}}{z}dz&=\frac{1}{2\pi{i}}\int_{\Gamma_A}\frac{e^{itz}}{z}dz+\frac{1}{2\pi}\int_{0}^{\pi}\exp(itAe^{i\theta})d\theta \\ &=1\cdot\operatorname{Ind}_{\Gamma_L}(0)=1. \end{aligned} \]

Note that we have used the change-of-variable formula as we did for the upper one. \(\operatorname{Ind}_{\Gamma_L}(0)\) denotes the winding number of \(\Gamma_L\) around \(0\), which is \(1\) of course. The identity above implies

\[ \frac{1}{\pi}\varphi_A(t)=1-\frac{1}{2\pi}\int_{0}^{\pi}\exp{(itAe^{i\theta})}d\theta. \]

Thus if \(t>0\), since \(\sin\theta>0\) when \(0<\theta<\pi\), we get

\[ \frac{1}{\pi}\varphi_A(t)\to 1 \quad(A \to \infty,t>0). \]

But as is already shown, \(I_A(t)=\varphi_A(t+1)-\varphi_A(t-1)\). To conclude,

\[ \lim_{A\to\infty}I_A(t)= \begin{cases} \pi\quad &|t|<1, \\ 0 \quad &|t|>1 ,\\ \frac{1}{2\pi} \quad &|t|=1. \end{cases} \]

What we can learn from this integral

Since \(\psi(x)=\left(\frac{\sin{x}}{x}\right)\) is even, dividing \(I_A\) by \(\sqrt{\frac{1}{2\pi}}\), we actually obtain the Fourier transform of it by abuse of language. Therefore we also get

\[ \hat\psi(t)= \begin{cases} \sqrt{\frac{\pi}{2}}\quad & |t|<1, \\ 0 \quad & |t|>1, \\ \frac{1}{2\pi\sqrt{2\pi}} & |t|=1. \end{cases} \]

Note that \(\hat\psi(t)\) is not continuous, let alone being uniformly continuous. Therefore, \(\psi(x) \notin L^1\). The reason is, if \(f \in L^1\), then \(\hat{f}\) is uniformly continuous (proof). Another interesting fact is, this also implies the value of the Dirichlet integral since we have

\[ \begin{aligned} \int_{-\infty}^{\infty}\left(\frac{\sin{x}}{x}\right)dx&=\int_{-\infty}^{\infty}\left(\frac{\sin{x}}{x}\right)e^{0\cdot ix}dx \\ &=\sqrt{2\pi}\hat\psi(0) \\ &=\pi. \end{aligned} \]

We end this section by evaluating the inverse of \(\hat\psi(t)\). This requires a simple calculation.

\[ \begin{aligned} \sqrt{\frac{1}{2\pi}}\int_{-\infty}^{\infty}\hat\psi(t)e^{itx}dt &= \sqrt{\frac{1}{2\pi}}\int_{-1}^{1}\sqrt{\frac{\pi}{2}}e^{itx}dt \\ &=\frac{1}{2}\cdot\frac{1}{ix}(e^{ix}-e^{-ix}) \\ &=\frac{\sin{x}}{x}. \end{aligned} \]

Problem 2

For real \(t\), compute

\[ J=\int_{-\infty}^{\infty}\left(\frac{\sin{x}}{x}\right)^2e^{itx}dx. \]

Now since \(h(x)=\frac{\sin^2{x}}{x^2} \in L^1\), we are able to say with ease that the integral above is the Fourier transform of \(h(x)\) (multiplied by \(\sqrt{2\pi}\)). But still we will be using the limit form

\[ J(t)=\lim_{A \to \infty}J_A(t) \]


\[ J_A(t)=\int_{-A}^{A}\left(\frac{\sin{x}}{x}\right)^2e^{itx}dx. \]

And we are still using the contour integration as above (keep \(\Gamma_A\), \(\Gamma_U\) and \(\Gamma_L\) in mind!). For this we get

\[ \left(\frac{\sin z}{z}\right)^2e^{itz}=\frac{e^{i(t+2)z}+e^{i(t-2)z}-2e^{itz}}{-4z^2}. \]

Therefore it suffices to discuss the function

\[ \mu_A(z)=\int_{\Gamma_A}\frac{e^{itz}}{2z^2}dz \]

since we have

\[ J_A(t)=\mu_A(t)-\frac{1}{2}(\mu_A(t+2)-\mu_A(t-2)). \]

Dividing \(\mu_A(z)\) by \(\frac{1}{\pi i}\), we see

\[ \frac{1}{\pi i}\mu_A(t)=\frac{1}{2\pi i}\int_{\Gamma_A}\frac{e^{itz}}{z^2}dz. \]

An integration of \(\frac{e^{itz}}{z^2}\) over \(\Gamma_L\) gives

\[ \begin{aligned} \frac{1}{\pi i}\mu_A(z)&=\frac{1}{2\pi i}\int_{-\pi}^{0}\frac{\exp(itAe^{i\theta})}{A^2e^{2i\theta}}dAe^{i\theta} \\ &=\frac{1}{2\pi}\int_{-\pi}^{0}\frac{\exp(itAe^{i\theta})}{Ae^{i\theta}}d\theta. \end{aligned} \]

Since we still have

\[ \left|\frac{\exp(itAe^{i\theta})}{Ae^{i\theta}}\right|=\frac{1}{A}\exp(-At\sin\theta), \]

if \(t<0\) in this case, \(\frac{1}{\pi i}\mu_A(z) \to 0\) as \(A \to \infty\). For \(t>0\), integrating along \(\Gamma_U\), we have

\[ \frac{1}{\pi i}\mu_A(t)=it-\frac{1}{2\pi}\int_{0}^{\pi}\frac{\exp(itAe^{i\theta})}{Ae^{i\theta}}d\theta \to it \quad (A \to \infty) \]

We can also evaluate \(\mu_A(0)\) by computing the integral but we are not doing that. To conclude,

\[ \lim_{A \to\infty}\mu_A(t)=\begin{cases} 0 \quad &t>0, \\ -\pi t \quad &t<0. \end{cases} \]

Therefore for \(J_A\) we have

\[ J(t)=\lim_{A \to\infty}J_A(t)=\begin{cases} 0 \quad &|t| \geq 2, \\ \pi(1+\frac{t}{2}) \quad &-2<t \leq 0, \\ \pi(1-\frac{t}{2}) \quad & 0<t <2. \end{cases} \]

Now you may ask, how did you find the value at \(0\), \(2\) or \(-2\)? \(\mu_A(0)\) is not evaluated. But \(h(t) \in L^1\), \(\hat{h}(t)=\sqrt{\frac{1}{2\pi}}J(t)\) is uniformly continuous, thus continuous, and the values at these points follows from continuity.

What we can learn from this integral

Again, we get the value of a classic improper integral by

\[ \int_{-\infty}^{\infty}\left(\frac{\sin{x}}{x}\right)^2dx = J(0)=\pi. \]

And this time it's not hard to find the Fourier inverse:

\[ \begin{aligned} \sqrt{\frac{1}{2\pi}}\int_{-\infty}^{\infty}\hat{h}(t)e^{itx}dt&=\frac{1}{2\pi}\int_{-\infty}^{\infty}J(t)e^{itx}dt \\ &=\frac{1}{2\pi}\int_{-2}^{2}\pi(1-\frac{1}{2}|t|)e^{itx}dt \\ &=\frac{e^{2ix}+e^{-2ix}-2}{-4x^2} \\ &=\frac{(e^{ix}-e^{-ix})^2}{-4x^2} \\ &=\left(\frac{\sin{x}}{x}\right)^2. \end{aligned} \]

The Riesz-Markov-Kakutani Representation Theorem

This post

Is intended to establish the existence of the Lebesgue measure in the future, which is often denoted by \(m\). In fact, the Lebesgue measure follows as a special case of R-M-K representation theorem. You may not believe it, but euclidean properties of \(\mathbb{R}^k\) plays no role in the existence of \(m\). The only topological property that works is the fact that \(\mathbb{R}^k\) is a locally compact Hausdorff space.

The theorem is named after F. Riesz who introduced it for continuous functions on \([0,1]\) (with respect to Riemann-Steiltjes integral). Years later, after the generalization done by A. Markov and S. Kakutani, we are able to view it on a locally compact Hausdorff space.

You may find there are some over-generalized properties, but this is intended to have you being able to enjoy more alongside (there are some tools related to differential geometry). Also there are many topology and analysis tricks worth your attention.


Different kinds of topological spaces

Again, euclidean topology plays no role in this proof. We need to specify the topology for different reasons. This is similar to what we do in linear functional analysis. Throughout, let \(X\) be a topological space.

0.0 Definition. \(X\) is a Hausdorff space if the following is true: If \(p \in X\), \(q\in X\) but \(p \neq q\), then there are two disjoint open sets \(U\) and \(V\) such that \(p \in U\) and \(q \in V\).

0.1 Definition. \(X\) is locally compact if every point of \(X\) has a neighborhood whose closure is compact.

0.2 Remarks. A Hausdorff space is also called a \(T_2\) space (see Kolmogorov classification) or a separated space. There is a classic example of locally compact Hausdorff space: \(\mathbb{R}^n\). It is trivial to verify this. But this is far from being enough. In the future we will see, we can construct some ridiculous but mathematically valid measures.

0.3 Definition. A set \(E \subset X\) is called \(\sigma\)-compact if \(E\) is a countable union of compact sets. Note that every open subset in a euclidean space \(\mathbb{R}^n\) is \(\sigma\)-compact since it can always be a countable union of closed balls (which is compact).

0.4 Definition. A covering of \(X\) is locally finite if every point has a neighborhood which intersects only finitely many elements of the covering. Of course, if the covering is already finite, it's also locally finite.

0.5 Definition. A refinement of a covering of \(X\) is a second covering, each element of which is contained in an element of the first covering.

0.6 Definition. \(X\) is paracompact if it is Hausdorff, and every open covering has a locally finite open refinement. Obviously any compact space is paracompact.

0.7 Theorem. If \(X\) is a second countable Hausdorff space and is locally compact, then \(X\) is paracompact. For proof, see this [Theorem 2.6]. One uses this to prove that a differentiable manifold admits a partition of unity.

0.8 Theorem. If \(X\) is locally compact and sigma compact, then \(X=\bigcup_{i=1}^{\infty}K_i\) where for all \(i \in \mathbb{N}\), \(K_i\) is compact and \(K_i \subset\operatorname{int}K_{i+1}\).

Partition of unity

The basic technical tool in the theory of differential manifolds is the existence of a partition of unity. We will steal this tool for the application of analysis theory.

1.0 Definition. A partition of unity on \(X\) is a collection \((g_i)\) of continuous real valued functions on \(X\) such that

  1. \(g_i \geq 0\) for each \(i\).
  2. every \(x \in X\) has a neighborhood \(U\) such that \(U \cap \operatorname{supp}(g_i)=\varnothing\) for all but finitely many of \(g_i\).
  3. for each \(x \in X\), we have \(\sum_{i}g_i(x)=1\). (That's why you see the word 'unity'.)

One should be reminded that, partition of unity is frequently used in many other fields. For example, in differential geometry, one uses it to find Riemannian structure on a smooth manifold. In generalised function theory, one uses it to find the connection between local property and global property as well.

1.1 Definition. A partition of unity \((g_i)\) on \(X\) is subordinate to an open cover of \(X\) if and only if for each \(g_i\) there is an element \(U\) of the cover such that \(\operatorname{supp}(g_i) \subset U\). We say \(X\) admits partitions of unity if and only if for every open cover of \(X\), there exists a partition of unity subordinate to the cover.

1.2 Theorem. A Hausdorff space admits a partition of unity if and only if it is paracompact (the 'only if' part is by considering the definition of partition of unity. For the 'if' part, see here). As a corollary, we have:

1.3 Corollary. Suppose \(V_1,\cdots,V_n\) are open subsets of a locally compact Hausdorff space \(X\), \(K\) is compact, and \[ K \subset \bigcup_{k=1}^{n}V_k. \] Then there exists a partition of unity \((h_i)\) that is subordinate to the cover \((V_n)\) such that \(\operatorname{supp}(h_i) \subset V_i\) and \(\sum_{i=1}^{n}h_i=1\) for all \(x \in K\).

Urysohn's lemma (for locally compact Hausdorff spaces)

2.0 Notation. The notation \[ K \prec f \] will mean that \(K\) is a compact subset of \(X\), that \(f \in C_c(X)\), that \(f(X) \subset [0,1]\), and that \(f(x)=1\) for all \(x \in K\). The notation \[ f \prec V \] will mean that \(V\) is open, that \(f \in C_c(X)\), that \(f(X) \subset [0,1]\) and that \(\operatorname{supp}(f) \subset V\). If both hold, we write \[ K \prec f \prec V. \] 2.1 Remarks. Clearly, with this notation, we are able to simplify the statement of being subordinate. We merely need to write \(g_i \prec U\) in 1.1 instead of \(\operatorname{supp}(g_i) \subset U\).

2.2 Urysohn's Lemma for locally compact Hausdorff space. Suppose \(X\) is locally compact and Hausdorff, \(V\) is open in \(X\) and \(K \subset V\) is a compact set. Then there exists an \(f \in C_c(X)\) such that \[ K \prec f \prec V. \] 2.3 Remarks. By \(f \in C_c(X)\) we shall mean \(f\) is a continuous function with a compact support. This relation also says that \(\chi_K \leq f \leq \chi_V\). For more details and the proof, visit this page. This lemma is generally for normal space, for a proof on that level, see arXiv:1910.10381. (Question: why we consider two disjoint closed subsets thereafter?)

The \(\varepsilon\)-definitions of \(\sup\) and \(\inf\)

We will be using the \(\varepsilon\)-definitions of \(\sup\) and \(\inf\), which will makes the proof easier in this case, but if you don't know it would be troublesome. So we need to put it down here.

Let \(S\) be a nonempty subset of the real numbers that is bounded below. The lower bound \(w\) is to be the infimum of \(S\) if and only if for any \(\varepsilon>0\), there exists an element \(x_\varepsilon \in S\) such that \(x_\varepsilon<w+\varepsilon\).

This definition of \(\inf\) is equivalent to the if-then definition by

Let \(S\) be a set that is bounded below. We say \(w=\inf S\) when \(w\) satisfies the following condition.

  1. \(w\) is a lower bound of \(S\).
  2. If \(t\) is also a lower bound of \(S\), then \(t \leq s\).

We have the analogous definition for \(\sup\).

The main theorem

Analysis is full of vector spaces and linear transformations. We already know that the Lebesgue integral induces a linear functional. That is, for example, \(L^1([0,1])\) is a vector space, and we have a linear functional by \[ f \mapsto \int_0^1 f(x)dx. \] But what about the reverse? Given a linear functional, is it guaranteed that we have a measure to establish the integral? The R-M-K theorem answers this question affirmatively. The functional to be discussed is positive, which means that if \(\Lambda\) is positive and \(f(X) \subset [0,\infty)\), then \(\Lambda{f} \in [0,\infty)\).

Let \(X\) be a locally compact Hausdorff space, and let \(\Lambda\) be a positive linear functional on \(C_c(X)\). Then there exists a \(\sigma\)-algebra \(\mathfrak{M}\) on \(X\) which contains all Borel sets in \(X\), and there exists a unique positive measure \(\mu\) on \(\mathfrak{M}\) which represents \(\Lambda\) in the sense that \[ \Lambda{f}=\int_X fd\mu \] for all \(f \in C_c(X)\).

For the measure \(\mu\) and the \(\sigma\)-algebra \(\mathfrak{M}\), we have four assertions:

  1. \(\mu(K)<\infty\) for every compact set \(K \subset X\).
  2. For every \(E \in \mathfrak{M}\), we have

\[ \mu(E)=\{\mu(V):E \subset V, V\text{ open}\}. \]

  1. For every open set \(E\) and every \(E \in \mathfrak{M}\), we have

\[ \mu(E)=\sup\{\mu(K):K \subset E, K\text{ compact}\}. \]

  1. If \(E \in \mathfrak{M}\), \(A \subset E\), and \(\mu(E)=0\), then \(A \in \mathfrak{M}\).

Remarks before proof. It would be great if we can establish the Lebesgue measure \(m\) by putting \(X=\mathbb{R}^n\). But we need a little more extra work to get this result naturally. If 2 is satisfied, we say \(\mu\) is outer regular, and inner regular for 3. If both hold, we say \(\mu\) is regular. The partition of unity and Urysohn's lemma will be heavily used in the proof of the main theorem, so make sure you have no problem with it. It can also be extended to complex space, but that requires much non-trivial work.

Proving the theorem

The proof is rather long so we will split it into several steps. I will try my best to make every line clear enough.

Step 0 - Construction of \(\mu\) and \(\mathfrak{M}\)

For every open set \(V \in X\), define \[ \mu(V)=\sup\{\Lambda{f}:f \prec V\}. \]

If \(V_1 \subset V_2\) and both are open, we claim that \(\mu(V_1) \leq \mu(V_2)\). For \(f \prec V_1\), since \(\operatorname{supp}f \subset V_1 \subset V_2\), we see \(f \prec V_2\). But we are able to find some \(g \prec V_2\) such that \(g \geq f\), or more precisely, \(\operatorname{supp}(g) \supset \operatorname{supp}(f)\). By taking another look at the proof of Urysohn's lemma for locally compact Hausdorff space, we see there is an open set G with compact closure such that \[ \operatorname{supp}(f) \subset G \subset \overline{G} \subset V_2. \] By Urysohn's lemma to the pair \((\overline{G},V_2)\), we see there exists a function \(g \in C_c(X)\) such that \[ \overline{G} \prec g \prec V_2. \] Therefore \[ \operatorname{supp}(f) \subset \overline{G} \subset \operatorname{supp}(g). \] Thus for any \(f \prec V_1\) and \(g \prec V_2\), we have \(\Lambda{g} \geq \Lambda{f}\) (monotonic) since \(\Lambda{g}-\Lambda{f}=\Lambda{(g-f)}\geq 0\). By taking the supremum over \(f\) and \(g\), we see \[ \mu(V_1) \leq \mu(V_2). \] The 'monotonic' property of such \(\mu\) enables us to define \(\mu(E)\) for all \(E \subset X\) by \[ \mu(E)=\inf \{\mu(V):E \subset V, V\text{ open}\}. \] The definition above is trivial to valid for open sets. Sometimes people say \(\mu\) is the outer measure. We will discuss other kind of sets thoroughly in the following steps. Warning: we are not saying that \(\mathfrak{M} = 2^X\). The crucial property of \(\mu\), namely countable additivity, will be proved only on a certain \(\sigma\)-algebra.

It follows from the definition of \(\mu\) that if \(E_1 \subset E_2\), then \(\mu(E_1) \leq \mu(E_2)\).

Let \(\mathfrak{M}_F\) be the class of all \(E \subset X\) which satisfy the two following conditions:

  1. \(\mu(E) <\infty\).

  2. 'Inner regular': \[ \mu(E)=\sup\{\mu(K):K \subset E, K\text{ compact}\}. \]

One may say here \(\mu\) is the 'inner measure'. Finally, let \(\mathfrak{M}\) be the class of all \(E \subset X\) such that for every compact \(K\), we have \(E \cap K \in \mathfrak{M}_F\). We shall show that \(\mathfrak{M}\) is the desired \(\sigma\)-algebra.

Remarks of Step 0. So far, we have only proved that \(\mu(E) \geq 0\) for all \(E {\color\red{\subset}}X\). What about the countable additivity? It's clear that \(\mathfrak{M}_F\) and \(\mathfrak{M}\) has some strong relation. We need to get a clearer view of it. Also, if we restrict \(\mu\) to \(\mathfrak{M}_F\), we restrict ourself to finite numbers. In fact, we will show finally \(\mathfrak{M}_F \subset \mathfrak{M}\).

Step 1 - The 'measure' of compact sets (outer)

If \(K\) is compact, then \(K \in \mathfrak{M}_F\), and \[ \mu(K)=\inf\{\Lambda{f}:K \prec f\}<\infty \]

Define \(V_\alpha=f^{-1}(\alpha,1]\) for \(K \prec f\) and \(0 < \alpha < 1\). Since \(f(x)=1\) for all \(x \in K\), we have \(K \subset V_{\alpha}\). Therefore by definition of \(\mu\) for all \(E \subset X\), we have \[ \mu(K) \leq \mu(V_\alpha)=\sup\{\Lambda{g}:g \prec V_{\alpha}\} < \frac{1}{\alpha}\Lambda{f}. \] Note that \(f \geq \alpha{g}\) whenever \(g \prec V_{\alpha}\) since \(\alpha{g} \leq \alpha < f\). Since \(\mu(K)\) is an lower bound of \(\frac{1}{\alpha}\Lambda{f}\) with \(0<\alpha<1\), we see \[ \mu(K) \leq \inf_{\alpha \in (0,1)}\{\frac{1}{\alpha}\Lambda{f}\}=\Lambda{f}. \] Since \(f(X) \in [0,1]\), we have \(\Lambda{f}\) to be finite. Namely \(\mu(K) <\infty\). Since \(K\) itself is compact, we see \(K \in \mathfrak{M}_F\).

To prove the identity, note that there exists some \(V \supset K\) such that \(\mu(V)<\mu(K)+\varepsilon\) for some \(\varepsilon>0\). By Urysohn's lemma, there exists some \(h \in C_c(X)\) such that \(K \prec h \prec V\). Therefore \[ \Lambda{h} \leq \mu(V) < \mu(K)+\varepsilon \] Therefore \(\mu(K)\) is the infimum of \(\Lambda{h}\) with \(K \prec h\).

Remarks of Step 1. We have just proved assertion 1 of the property of \(\mu\). The hardest part of this proof is the inequality \[ \mu(V)<\mu(K)+\varepsilon. \] But this is merely the \(\varepsilon\)-definition of \(\inf\). Note that \(\mu(K)\) is the infimum of \(\mu(V)\) with \(V \supset K\). For any \(\varepsilon>0\), there exists some open \(V\) for what? Under certain conditions, this definition is much easier to use. Now we will examine the relation between \(\mathfrak{M}_F\) and \(\tau_X\), namely the topology of \(X\).

Step 2 - The 'measure' of open sets (inner)

\(\mathfrak{M}_F\) contains every open set \(V\) with \(\mu(V)<\infty\).

It suffices to show that for open set \(V\), we have \[ \mu(V)=\sup\{\mu(K):K \subset E, K\text{ compact}\}. \] For \(0<\varepsilon<\mu(V)\), we see there exists an \(f \prec V\) such that \(\Lambda{f}>\mu(V)-\varepsilon\). If \(W\) is any open set which contains \(K= \operatorname{supp}(f)\), then \(f \prec W\), and therefore \(\Lambda{f} \leq \mu(W)\). Again by definition of \(\mu(K)\), we see \[ \Lambda{f}\leq\mu(K). \] Therefore \[ \mu(V)-\varepsilon<\Lambda{f}\leq\mu(K)\leq\mu(V). \] This is exactly the definition of \(\sup\). The identity is proved.

Remarks of Step 2. It's important to that this identity can only be satisfied by open sets and sets \(E\) with \(\mu(E)<\infty\), the latter of which will be proved in the following steps. This is the flaw of this theorem. With these preparations however, we are able to show the countable additivity of \(\mu\) on \(\mathfrak{M}_F\).

Step 3 - The subadditivity of \(\mu\) on \(2^X\)

If \(E_1,E_2,E_3,\cdots\) are arbitrary subsets of \(X\), then \[ \mu\left(\bigcup_{k=1}^{\infty}E_k\right) \leq \sum_{k=1}^{\infty}\mu(E_k) \]

First we show this holds for finitely many open sets. This is tantamount to show that \[ \mu(V_1 \cup V_2)\leq \mu(V_1)+\mu(V_2) \] if \(V_1\) and \(V_2\) are open. Pick \(g \prec V_1 \cup V_2\). This is possible due to Urysohn's lemma. By corollary 1.3, there is a partition of unity \((h_1,h_2)\) subordinate to \((V_1,V_2)\) in the sense of corollary 1.3. Therefore, \[ \begin{aligned} \Lambda(g)&=\Lambda((h_1+h_2)g) \\ &=\Lambda(h_1g)+\Lambda(h_2g) \\ &\leq\mu(V_1)+\mu(V_2). \end{aligned} \] Notice that \(h_1g \prec V_1\) and \(h_2g \prec V_2\). By taking the supremum, we have \[ \mu(V_1 \cup V_2)\leq \mu(V_1)+\mu(V_2). \]

Now we back to arbitrary subsets of \(X\). If \(\mu(E_i)=\infty\) for some \(i\), then there is nothing to prove. Therefore we shall assume that \(\mu(E_i)<\infty\) for all \(i\). By definition of \(\mu(E_i)\), we see there are open sets \(V_i \supset E_i\) such that \[ \mu(V_i)<\mu(E_i)+\frac{\varepsilon}{2^i}. \] Put \(V=\bigcup_{i=1}^{\infty}V_i\), and choose \(f \prec V_i\). Since \(f \in C_c(X)\), there is a finite collection of \(V_i\) that covers the support of \(f\). Therefore without loss of generality, we may say that \[ f \prec V_1 \cup V_2 \cup \cdots \cup V_n \] for some \(n\). We therefore obtain \[ \begin{aligned} \Lambda{f} &\leq \mu(V_1 \cup V_2 \cup \cdots \cup V_n) \\ &\leq \mu(V_1)+\mu(V_2)+\cdots+\mu(V_n) \\ &\leq \sum_{i=1}^{n}\left(\mu(E_i)+\frac{\varepsilon}{2^i}\right) \\ &\leq \sum_{i=1}^{\infty}\mu(E_i)+\varepsilon, \end{aligned} \] for all \(f \prec V\). Since \(\bigcup E_i \subset V\), we have \(\mu(\bigcup E_i) \leq \mu(V)\). Therefore \[ \mu(\bigcup_{i=1}^{\infty}E_i)\leq\mu(V)=\sup\{\Lambda{f}\}\leq\sum_{i=1}^{\infty}\mu(E_i)+\varepsilon. \] Since \(\varepsilon\) is arbitrary, the inequality is proved.

Remarks of Step 3. Again, we are using the \(\varepsilon\)-definition of \(\inf\). One may say this step showed the subaddtivity of the outer measure. Also note the geometric series by \(\sum_{k=1}^{\infty}\frac{\varepsilon}{2^k}=\varepsilon\).

Step 4 - Additivity of \(\mu\) on \(\mathfrak{M}_F\)

Suppose \(E=\bigcup_{i=1}^{\infty}E_i\), where \(E_1,E_2,\cdots\) are pairwise disjoint members of \(\mathfrak{M}_F\), then \[ \mu(E)=\sum_{i=1}^{\infty}\mu(E_i). \] If \(\mu(E)<\infty\), we also have \(E \in \mathfrak{M}_F\).

As a dual to Step 3, we firstly show this holds for finitely many compact sets. As proved in Step 1, compact sets are in \(\mathfrak{M}_F\). Suppose now \(K_1\) and \(K_2\) are disjoint compact sets. We want to show that \[ \mu(K_1 \cup K_2)=\mu(K_1)+\mu(K_2). \] Note that compact sets in a Hausdorff space is closed. Therefore we are able to apply Urysohn's lemma to the pair \((K_1,K_2^c)\). That said, there exists a \(f \in C_c(X)\) such that \[ K_1 \prec f \prec K_2^c. \] In other words, \(f(x)=1\) for all \(x \in K_1\) and \(f(x)=0\) for all \(x \in K_2\), since \(\operatorname{supp}(f) \cap K_2 = \varnothing\). By Step 1, since \(K_1 \cup K_2\) is compact, there exists some \(g \in C_c(X)\) such that \[ K_1 \cup K_2 \prec g \quad \text{and} \quad \Lambda(g) < \mu(K_1 \cup K_2)+\varepsilon. \] Now things become tricky. We are able to write \(g\) by \[ g=fg+(1-f)g. \] But \(K_1 \prec fg\) and \(K_2 \prec (1-f)g\) by the properties of \(f\) and \(g\). Also since \(\Lambda\) is linear, we have \[ \mu(K_1)+\mu(K_2) \leq \Lambda(fg)+\Lambda((1-f)g)=\Lambda(g) < \mu(K_1 \cup K_2)+\varepsilon. \] Therefore we have \[ \mu(K_1)+\mu(K_2) \leq \mu(K_1 \cup K_2). \] On the other hand, by Step 3, we have \[ \mu(K_1 \cup K_2) \leq \mu(K_1)+\mu(K_2). \] Therefore they must equal.

If \(\mu(E)=\infty\), there is nothing to prove. So now we should assume that \(\mu(E)<\infty\). Since \(E_i \in \mathfrak{M}_F\), there are compact sets \(K_i \subset E_i\) with \[ \mu(K_i) > \mu(E_i)-\frac{\varepsilon}{2^i}. \] Putting \(H_n=K_1 \cup K_2 \cup \cdots \cup K_n\), we see \(E \supset H_n\) and \[ \mu(E) \geq \mu(H_n)=\sum_{i=1}^{n}\mu(H_i)>\sum_{i=1}^{n}\mu(E_i)-\varepsilon. \] This inequality holds for all \(n\) and \(\varepsilon\), therefore \[ \mu(E) \geq \sum_{i=1}^{\infty}\mu(E_i). \] Therefore by Step 3, the identity holds.

Finally we shall show that \(E \in \mathfrak{M}_F\) if \(\mu(E) <\infty\). To make it more understandable, we will use elementary calculus notation. If we write \(\mu(E)=x\) and \(x_n=\sum_{i=1}^{n}\mu(E_i)\), we see \[ \lim_{n \to \infty}x_n=x. \] Therefore, for any \(\varepsilon>0\), there exists some \(N \in \mathbb{N}\) such that \[ x-x_N<\varepsilon. \] This is tantamount to \[ \mu(E)<\sum_{i=1}^{N}\mu(E_i)+\varepsilon. \] But by definition of the compact set \(H_N\) above, we see \[ \mu(E)<{\color\red{\sum_{i=1}^{N}\mu(E_i)}}+\varepsilon<{\color\red {\mu(H_N)+\varepsilon}}+\varepsilon=\mu(H_N)+2\varepsilon. \] Hence \(E\) satisfies the requirements of \(\mathfrak{M}_F\), thus an element of it.

Remarks of Step 4. You should realize that we are heavily using the \(\varepsilon\)-definition of \(\sup\) and \(\inf\). As you may guess, \(\mathfrak{M}_F\) should be a subset of \(\mathfrak{M}\) though we don't know whether it is a \(\sigma\)-algebra or not. In other words, we hope that the countable additivity of \(\mu\) holds on a \(\sigma\)-algebra that is properly extended from \(\mathfrak{M}_F\). However it's still difficult to show that \(\mathfrak{M}\) is a \(\sigma\)-algebra. We need more properties of \(\mathfrak{M}_F\) to go on.

Step 5 - The 'continuity' of \(\mathfrak{M}_F\).

If \(E \in \mathfrak{M}_F\) and \(\varepsilon>0\), there is a compact \(K\) and an open \(V\) such that \(K \subset E \subset V\) and \(\mu(V-K)<\varepsilon\).

There are two ways to write \(\mu(E)\), namely \[ \mu(E)=\sup\{\mu(K):K \subset E\} \quad \text{and} \quad \mu(E)=\inf\{\mu(V):V\supset E\} \] where \(K\) is compact and \(V\) is open. Therefore there exists some \(K\) and \(V\) such that \[ \mu(V)-\frac{\varepsilon}{2}<\mu(E)<\mu(K)+\frac{\varepsilon}{2}. \] Since \(V-K\) is open, and \(\mu(V-K)<\infty\), we have \(V-K \in \mathfrak{M}_F\). By Step 4, we have \[ \mu(K)+\mu(V-K)=\mu(V) <\mu(K)+\varepsilon. \] Therefore \(\mu(V-K)<\varepsilon\) as proved.

Remarks of Step 5. You should be familiar with the \(\varepsilon\)-definitions of \(\sup\) and \(\inf\) now. Since \(V-K =V\cap K^c \subset V\), we have \(\mu(V-K)\leq\mu(V)<\mu(E)+\frac{\varepsilon}{2}<\infty\).

Step 6 - \(\mathfrak{M}_F\) is closed under certain operations

If \(A,B \in \mathfrak{M}_F\), then \(A-B,A\cup B\) and \(A \cap B\) are elements of \(\mathfrak{M}_F\).

This shows that \(\mathfrak{M}_F\) is closed under union, intersection and relative complement. In fact, we merely need to prove \(A-B \in \mathfrak{M}_F\), since \(A \cup B=(A-B) \cup B\) and \(A\cap B = A-(A-B)\).

By Step 5, for \(\varepsilon>0\), there are sets \(K_A\), \(K_B\), \(V_A\), \(V_B\) such that \(K_A \subset A \subset V_A\), \(K_B \subset B \subset V_B\), and for \(A-B\) we have \[ A-B \subset V_A-K_B \subset (V_A-K_A) \cup (K_A-V_B) \cup (V_B-K_B). \] With an application of Step 3 and 5, we have \[ \mu(A-B) \leq \mu(V_A-K_A)+\mu(K_A-V_B)+\mu(V_B-K_B)< \varepsilon+\mu(K_A-V_B)+\varepsilon. \] Since \(K_A-V_B\) is a closed subset of \(K_A\), we see \(K_A-V_B\) is compact as well (a closed subset of a compact set is compact). But \(K_A-V_B \subset A-B\), and \(\mu(A-B) <\mu(K_A-V_B)+2\varepsilon\), we see \(A-B\) meet the requirement of \(\mathfrak{M}_F\) (, the fact that \(\mu(A-B)<\infty\) is trivial since \(\mu(A-B)<\mu(A)\)).

Since \(A-B\) and \(B\) are pairwise disjoint members of \(\mathfrak{M}_F\), we see \[ \mu(A \cup B)=\mu(A-B)+\mu(B)<\infty. \] Thus \(A \cup B \in \mathfrak{M}_F\). Since \(A,A-B \in \mathfrak{M}_F\), we see \(A \cap B = A-(A-B) \in \mathfrak{M}_F\).

Remarks of Step 6. In this step, we demonstrated several ways to express a set, all of which end up with a huge simplification. Now we are able to show that \(\mathfrak{M}_F\) is a subset of \(\mathfrak{M}\).

Step 7 - \(\mathfrak{M}_F \subset \mathfrak{M}\)

There is a precise relation between \(\mathfrak{M}\) and \(\mathfrak{M}_F\) given by \[ \mathfrak{M}_F=\{E \in \mathfrak{M}:\mu(E)<\infty\} \subset \mathfrak{M}. \]

If \(E \in \mathfrak{M}_F\), we shall show that \(E \in \mathfrak{M}\). For compact \(K\in\mathfrak{M}_F\) (Step 1), by Step 6, we see \(K \cap E \in \mathfrak{M}_F\), therefore \(E \in \mathfrak{M}\).

If \(E \in \mathfrak{M}\) with \(\mu(E)<\infty\) however, we need to show that \(E \in \mathfrak{M}_F\). By definition of \(\mu\), for \(\varepsilon>0\), there is an open \(V\) such that \[ \mu(V)<\mu(E)+\varepsilon<\infty. \] Therefore \(V \in \mathfrak{M}_F\). By Step 5, there is a compact set \(K\) such that \(\mu(V-K)<\varepsilon\) (the open set containing \(V\) should be \(V\) itself). Since \(E \cap K \in \mathfrak{M}_F\), there exists a compact set \(H \subset E \cap K\) with \[ \mu(E \cap K)<\mu(H)+\varepsilon. \] Since \(E \subset (E \cap K) \cup (V-K)\), it follows from Step 1 that \[ \mu(E) \leq {\color\red{\mu(E\cap K)}}+\mu(V-K)<{\color\red{\mu(H)+\varepsilon}}+\varepsilon=\mu(H)+2\varepsilon. \] Therefore \(E \in \mathfrak{M}_F\).

Remarks of Step 7. Several tricks in the preceding steps are used here. Now we are pretty close to the fact that \((X,\mathfrak{M},\mu)\) is a measure space. Note that for \(E \in \mathfrak{M}-\mathfrak{M}_F\), we have \(\mu(E)=\infty\), but we have already proved the countable additivity for \(\mathfrak{M}_F\). Is it 'almost trivial' for \(\mathfrak{M}\)? Before that, we need to show that \(\mathfrak{M}\) is a \(\sigma\)-algebra. Note that assertion 3 of \(\mu\) has been proved.

Step 8 - \(\mathfrak{M}\) is a \(\sigma\)-algebra in \(X\) containing all Borel sets

We will validate the definition of \(\sigma\)-algebra one by one.

\(X \in \mathfrak{M}\).

For any compact \(K \subset X\), we have \(K \cap X=K\). But as proved in Step 1, \(K \in \mathfrak{M}_F\), therefore \(X \in \mathfrak{M}\).

If \(A \in \mathfrak{M}\), then \(A^c \in\mathfrak{M}\).

If \(A \in \mathfrak{M}\), then \(A \cap K \in \mathfrak{M}_F\). But \[ K-(A \cap K)=K \cap(A^c \cup K^c)=K\cap A^c \cup \varnothing=K \cap A^c. \] By Step 1 and Step 6, we see \(K \cap A^c \in \mathfrak{M}_F\), thus \(A^c \in \mathfrak{M}\).

If \(A_n \in \mathfrak{M}\) for all \(n \in \mathbb{N}\), then \(A=\bigcup_{n=1}^{\infty}A_n \in \mathfrak{M}\).

We assign an auxiliary sequence of sets inductively. For \(n=1\), we write \(B_1=A_1 \cap K\) where \(K\) is compact. Then \(B_1 \in \mathfrak{M}_F\). For \(n \geq 2\), we write \[ B_n=(A_n \cap K)-(B_1 \cup \cdots\cup B_{n-1}). \] Since \(A_n \cap K \in \mathfrak{M}_F\), \(B_1,B_2,\cdots,B_{n-1} \in \mathfrak{M}_F\), by Step 6, \(B_n \in \mathfrak{M}_F\). Also \(B_n\) is pairwise disjoint.

Another set-theoretic manipulation shows that \[ \begin{aligned} A \cap K&=K \cap\left(\bigcup_{n=1}^{\infty}A_n\right) \\ &=\bigcup_{n=1}^{\infty}(K \cap A_n) \\ &=\bigcup_{n=1}^{\infty}B_n \cup(B_1 \cup \cdots\cup B_{n-1}) \\ &=\bigcup_{n=1}^{\infty}B_n. \end{aligned} \] Now we are able to evaluate \(\mu(A \cap K)\) by Step 4. \[ \begin{aligned} \mu(A \cap K)&=\sum_{n=1}^{\infty}\mu(B_n) \\ &= \lim_{n \to \infty}(A_n \cap K) <\infty. \end{aligned} \] Therefore \(A \cap K \in \mathfrak{M}_F\), which implies that \(A \in \mathfrak{M}\).

\(\mathfrak{M}\) contains all Borel sets.

Indeed, it suffices to prove that \(\mathfrak{M}\) contains all open sets and/or closed sets. We'll show two different paths. Let \(K\) be a compact set.

  1. If \(C\) is closed, then \(C \cap K\) is compact, therefore \(C\) is an element of \(\mathfrak{M}_F\). (By Step 2.)
  2. If \(D\) is open, then \(D \cap K \subset K\). Therefore \(\mu(D \cap K) \leq \mu(K)<\infty\), which shows that \(D\) is an element of \(\mathfrak{M}_F\) (step 7).

Therefore by 1 or 2, \(\mathfrak{M}\) contains all Borel sets.

Step 9 - \(\mu\) is a positive measure on \(\mathfrak{M}\)

Again, we will verify all properties of \(\mu\) one by one.

\(\mu(E) \geq 0\) for all \(E \in \mathfrak{M}\).

This follows immediately from the definition of \(\mu\), since \(\Lambda\) is positive and \(0 \leq f \leq 1\).

\(\mu\) is countably additive.

If \(A_1,A_2,\cdots\) form a disjoint countable collection of members of \(\mathfrak{M}\), we need to show that \[ \mu\left(\bigcup_{n=1}^{\infty}A_n\right)=\sum_{n=1}^{\infty}\mu(A_n). \] If \(A_n \in \mathfrak{M}_F\) for all \(n\), then this is merely what we have just proved in Step 4. If \(A_j \in \mathfrak{M}-\mathfrak{M}_F\) however, we have \(\mu(A_j)=\infty\). So \(\sum_n\mu(A_n)=\infty\). For \(\mu(\cup_n A_n)\), notice that \(\cup_n A_n \supset A_j\), we have \(\mu(\cup_n A_n) \geq \mu(A_j)=\infty\). The identity is now proved.

Step 10 - The completeness of \(\mu\)

So far assertion 1-3 have been proved. But the final assertion has not been proved explicitly. We do that since this property will be used when discussing the Lebesgue measure \(m\). In fact, this will show that \((X,\mathfrak{M},\mu)\) is a complete measure space.

If \(E \in \mathfrak{M}\), \(A \subset E\), and \(\mu(E)=0\), then \(A \in \mathfrak{M}\).

It suffices to show that \(A \in \mathfrak{M}_F\). By definition, \(\mu(A)=0\) as well. If \(K \subset A\), where \(K\) is compact, then \(\mu(K)=\mu(A)=0\). Therefore \(0\) is the supremum of \(\mu(K)\). It follows that \(A \in \mathfrak{M}_F \subset \mathfrak{M}\).

Step 11 - The functional and the measure

For every \(f \in C_c(X)\), \(\Lambda{f}=\int_X fd\mu\).

This is the absolute main result of the theorem. It suffices to prove the inequality \[ \Lambda f \leq \int_X fd\mu \] for all \(f \in C_c(X)\). What about the other side? By the linearity of \(\Lambda\) and \(\int_X \cdot d\mu\), once inequality above proved, we have \[ \Lambda(-f)=-\Lambda{f}\leq\int_{X}-fd\mu=-\int_Xfd\mu. \] Therefore \[ \Lambda{f} \geq \int_X fd\mu \] holds as well, and this establish the equality.

Notice that since \(K=\operatorname{supp}(f)\) is compact, we see the range of \(f\) has to be compact. Namely we may assume that \([a,b]\) contains the range of \(f\). For \(\varepsilon>0\), we are able to pick a partition around \([a,b]\) such that \(y_n - y_{n-1}<\varepsilon\) and \[ y_0 < a < y_1<\cdots<y_n=b. \] Put \[ E_i=\{x:y_{i-1}< f(x) \leq y_i\}\cap K. \] Since \(f\) is continuous, \(f\) is Borel measurable. The sets \(E_i\) are trivially pairwise disjoint Borel sets. Again, there are open sets \(V_i \supset E_i\) such that \[ \mu(V_i) < \mu(E_i)+\frac{\varepsilon}{n} \] for \(i=1,2,\cdots,n\), and such that \(f(x)<y_i + \varepsilon\) for all \(x \in V_i\). Notice that \((V_i)\) covers \(K\), therefore by the partition of unity, there are a sequence of functions \((h_i)\) such that \(h_i \prec V_i\) for all \(i\) and \(\sum h_i=1\) on \(K\). By Step 1 and the fact that \(f=\sum_i h_i\), we see \[ \mu(K) \leq \Lambda(\sum_i h_i)=\sum_i \Lambda{h_i}. \] By the way we picked \(V_i\), we see \(h_if \leq (y_i+\varepsilon)h_i\). We have the following inequality: \[ \begin{aligned} \Lambda{f} &= \sum_{i=1}^{n}\Lambda(h_if) \leq\sum_{i=1}^{n}(y_i+\varepsilon)\Lambda{h_i} \\ &= \sum_{i=1}^{n}\left(|a|-|a|+y_i+\varepsilon\right)\Lambda{h_i} \\ &=\sum_{i=1}^{n}(|a|+y_i+\varepsilon)\Lambda{h_i}-|a|\sum_{i=1}^{n}\Lambda{h_i}. \end{aligned} \] Since \(h_i \prec V_i\), we have \(\mu(E_i)+\frac{\varepsilon}{n}>\mu(V_i) \geq \Lambda{h_i}\). And we already get \(\sum_i \Lambda{h_i} \geq \mu(K)\). If we put them into the inequality above, we get \[ \begin{aligned} \Lambda{f} &\leq \sum_{i=1}^{n}(|a|+y_i+\varepsilon)\Lambda{h_i}-|a|\sum_{i=1}^{n}\Lambda{h_i} \\ &\leq \sum_{i=1}^{n}(|a|+y_i+\varepsilon){\color\red{(\mu(E_i)+\frac{\varepsilon}{n})}}-|a|\color\red{\mu(K)}. \end{aligned} \] Observe that \(\cup_i E_i=K\), by Step 9 we have \(\sum_{i}\mu(E_i)=\mu(K)\). A slight manipulation shows that \[ \begin{aligned} \sum_{i=1}^{n}(|a|+y_i+\varepsilon)\mu(E_i)-|a|\mu(K)&=|a|\sum_{i=1}^{n}\mu(E_i)-|a|\mu(K)+\sum_{i=1}^{n}(y_i+\varepsilon)\mu(E_i) \\ &=\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)+2\varepsilon\mu(K). \end{aligned} \] Therefore for \(\Lambda f\) we get \[ \begin{aligned} \Lambda{f} &\leq\sum_{i=1}^{n}(|a|+y_i+\varepsilon)(\mu(E_i)+\frac{\varepsilon}{n})-|a|\mu(K) \\ &=\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)+2\varepsilon\mu(K)+\frac{\varepsilon}{n}\sum_{i=1}^n(|a|+y_i+\varepsilon). \end{aligned} \] Now here comes the trickiest part of the whole blog post. By definition of \(E_i\), we see \(f(x) > y_{i-1}>y_{i}-\varepsilon\) for \(x \in E_i\). Therefore we get simple function \(s_n\) by \[ s_n=\sum_{i=1}^{n}(y_i-\varepsilon)\chi_{E_i}. \] If we evaluate the Lebesgue integral of \(f\) with respect to \(\mu\), we see \[ \int_X s_nd\mu={\color\red{\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)}} \leq {\color\red{\int_X fd\mu}}. \] For \(2\varepsilon\mu(K)\), things are simple since \(0\leq\mu(K)<\infty\). Therefore \(2\varepsilon\mu(K) \to 0\) as \(\varepsilon \to 0\). Now let's estimate the final part of the inequality. It's trivial that \(\frac{\varepsilon}{n}\sum_{i=1}^{n}(|a|+\varepsilon)=\varepsilon(\varepsilon+|a|)\). For \(y_i\), observe that \(y_i \leq b\) for all \(i\), therefore \(\frac{\varepsilon}{n}\sum_{i=1}^{n}y_i \leq \frac{\varepsilon}{n}nb=\varepsilon b\). Thus \[ {\color\green{\frac{\varepsilon}{n}\sum_{i=1}^{n}(|a|+y_i+\varepsilon)}} \color\black\leq {\color\green {\varepsilon(|a|+b+\varepsilon)}}\color\black{.} \] Notice that \(b+|a| \geq 0\) since \(b \geq a \geq -|a|\). Our estimation of \(\Lambda{f}\) is finally done: \[ \begin{aligned} \Lambda{f} &\leq{\color\red{\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)}}+2\varepsilon\mu(K)+{\color\green{\frac{\varepsilon}{n}\sum_{i=1}^n(|a|+y_i+\varepsilon)}} \\ &\leq{\color\red {\int_Xfd\mu}}+2\varepsilon\mu(K)+{\color\green{\varepsilon(|a|+b+\varepsilon)}} \\ &= \int_X fd\mu+\varepsilon(2\mu(K)+|a|+b+\varepsilon). \end{aligned} \] Since \(\varepsilon\) is arbitrary, we see \(\Lambda{f} \leq \int_X fd\mu\). The identity is proved.

Step 12 - The uniqueness of \(\mu\)

If there are two measures \(\mu_1\) and \(\mu_2\) that satisfy assertion 1 to 4 and are correspond to \(\Lambda\), then \(\mu_1=\mu_2\).

In fact, according to assertion 2 and 3, \(\mu\) is determined by the values on compact subsets of \(X\). It suffices to show that

If \(K\) is a compact subset of \(X\), then \(\mu_1(K)=\mu_2(K)\).

Fix \(K\) compact and \(\varepsilon>0\). By Step 1, there exists an open \(V \supset K\) such that \(\mu_2(V)<\mu_2(K)+\varepsilon\). By Urysohn's lemma, there exists some \(f\) such that \(K \prec f \prec V\). Hence \[ \mu_1(K)=\int_X\chi_kd\mu \leq\int_X fd\mu=\Lambda{f}=\int_X fd\mu_2 \\ \leq \int_X \chi_V fd\mu_2=\mu_2(V)<\mu_2(V)+\varepsilon. \] Thus \(\mu_1(K) \leq \mu_2(K)\). If \(\mu_1\) and \(\mu_2\) are exchanged, we see \(\mu_2(K) \leq \mu_1(K)\). The uniqueness is proved.

The flaw

Can we simply put \(X=\mathbb{R}^k\) right now? The answer is no. Note that the outer regularity is for all sets but inner is only for open sets and members of \(\mathfrak{M}_F\). But we expect the outer and inner regularity to be 'symmetric'. There is an example showing that locally compact is far from being enough to offer the 'symmetry'.

A weird example

Define \(X=\mathbb{R}_1 \times \mathbb{R}_2\), where \(\mathbb{R}_1\) is the real line equipped with discrete metric \(d_1\), and \(\mathbb{R}_2\) is the real line equipped with euclidean metric \(d_2\). The metric of \(X\) is defined by \[ d_X((x_1,y_1),(x_2,y_2))=d_1(x_1,x_2)+d_2(x_1,x_2). \] The topology \(\tau_X\) induced by \(d_X\) is naturally Hausdorff and locally compact by considering the vertical segments. So what would happen to this weird locally compact Hausdorff space?

If \(f \in C_c(X)\), let \(x_1,x_2,\cdots,x_n\) be those values of \(x\) for which \(f(x,y) \neq 0\) for at least one \(y\). Since \(f\) has compact support, it is ensured that there are only finitely many \(x_i\)'s. We are able to define a positive linear functional by \[ \Lambda f=\sum_{i=1}^{n}\int_{-\infty}^{+\infty}f(x_i,y)dy=\int_X fd\mu, \] where \(\mu\) is the measure associated with \(\Lambda\) in the sense of R-M-K theorem. Let \[ E=\mathbb{R}_1 \times \{0\}. \] By squeezing the disjoint vertical segments around \((x_i,0)\), we see \(\mu(K)=0\) for all compact \(K \subset E\) but \(\mu(E)=\infty\).

This is in violent contrast to what we do expect. However, if \(X\) is required to be \(\sigma\)-compact (note that the space in this example is not), this kind of problems disappear neatly.

References / Further reading

  1. Walter Rudin, Real and Complex Analysis
  2. Serge Lang, Fundamentals of Differential Geometry
  3. Joel W. Robbin, Partition of Unity
  4. Brian Conrad, Paracompactness and local compactness
  5. Raoul Bott & Loring W. Tu, Differential Forms in Algebraic Topology

The Big Three Pt. 4 - The Open Mapping Theorem (F-Space)

The Open Mapping Theorem

We are finally going to prove the open mapping theorem in \(F\)-space. In this version, only metric and completeness are required. Therefore it contains the Banach space version naturally.

(Theorem 0) Suppose we have the following conditions:

  1. \(X\) is a \(F\)-space,
  2. \(Y\) is a topological space,
  3. \(\Lambda: X \to Y\) is continuous and linear, and
  4. \(\Lambda(X)\) is of the second category in \(Y\).

Then \(\Lambda\) is an open mapping.

Proof. Let \(B\) be a neighborhood of \(0\) in \(X\). Let \(d\) be an invariant metric on \(X\) that is compatible with the \(F\)-topology of \(X\). Define a sequence of balls by \[ B_n=\{x:d(x,0) < \frac{r}{2^n}\} \] where \(r\) is picked in such a way that \(B_0 \subset B\). To show that \(\Lambda\) is an open mapping, we need to prove that there exists some neighborhood \(W\) of \(0\) in \(Y\) such that \[ W \subset \Lambda(B). \] To do this however, we need an auxiliary set. In fact, we will show that there exists some \(W\) such that \[ W \subset \overline{\Lambda(B_1)} \subset \Lambda(B). \] We need to prove the inclusions one by one.

The first inclusion requires BCT. Since \(B_2 -B_2 \subset B_1\), and \(Y\) is a topological space, we get \[ \overline{\Lambda(B_2)}-\overline{\Lambda(B_2)} \subset \overline{\Lambda(B_2)-\Lambda(B_2)} \subset \overline{\Lambda(B_1)} \] Since \[ \Lambda(X)=\bigcup_{k=1}^{\infty}k\Lambda(B_2), \] according to BCT, at least one \(k\Lambda(B_2)\) is of the second category in \(Y\). But scalar multiplication \(y\mapsto ky\) is a homeomorphism of \(Y\) onto \(Y\), we see \(k\Lambda(B_2)\) is of the second category for all \(k\), especially for \(k=1\). Therefore \(\overline{\Lambda(B_2)}\) has nonempty interior, which implies that there exists some open neighborhood \(W\) of \(0\) in \(Y\) such that \(W \subset \overline{\Lambda(B_1)}\). By replacing the index, it's easy to see this holds for all \(n\). That is, for \(n \geq 1\), there exists some neighborhood \(W_n\) of \(0\) in \(Y\) such that \(W_n \subset \overline{\Lambda(B_n)}\).

The second inclusion requires the completeness of \(X\). Fix \(y_1 \in \overline{\Lambda(B_1)}\), we will show that \(y_1 \in \Lambda(B)\). Pick \(y_n\) inductively. Assume \(y_n\) has been chosen in \(\overline{\Lambda(B_n)}\). As stated before, there exists some neighborhood \(W_{n+1}\) of \(0\) in \(Y\) such that \(W_{n+1} \subset \overline{\Lambda(B_{n+1})}\). Hence \[ (y_n-W_{n+1}) \cap \Lambda(B_n) \neq \varnothing \] Therefore there exists some \(x_n \in B_n\) such that \[ \Lambda x_n = y_n - W_{n+1}. \] Put \(y_{n+1}=y_n-\Lambda x_n\), we see \(y_{n+1} \in W_{n+1} \subset \overline{\Lambda(B_{n+1})}\). Therefore we are able to pick \(y_n\) naturally for all \(n \geq 1\).

Since \(d(x_n,0)<\frac{r}{2^n}\) for all \(n \geq 0\), the sums \(z_n=\sum_{k=1}^{n}x_k\) converges to some \(z \in X\) since \(X\) is a \(F\)-space. Notice we also have \[ \begin{aligned} d(z,0)& \leq d(x_1,0)+d(x_2,0)+\cdots \\ & < \frac{r}{2}+\frac{r}{4}+\cdots \\ & = r \end{aligned} \] we have \(z \in B_0 \subset B\).

By the continuity of \(\Lambda\), we see \(\lim_{n \to \infty}y_n = 0\). Notice we also have \[ \sum_{k=1}^{n} \Lambda x_k = \sum_{k=1}^{n}(y_k-y_{k+1})=y_1-y_{n+1} \to y_1 \quad (n \to \infty), \] we see \(y_1 = \Lambda z \in \Lambda(B)\).

The whole theorem is now proved, that is, \(\Lambda\) is an open mapping. \(\square\)


You may think the following relation comes from nowhere: \[ (y_n - W_{n+1}) \cap \Lambda(B_{n}) \neq \varnothing. \] But it's not. We need to review some set-point topology definitions. Notice that \(y_n\) is a limit point of \(\Lambda(B_n)\), and \(y_n-W_{n+1}\) is a open neighborhood of \(y_n\). If \((y_n - W_{n+1}) \cap \Lambda(B_{n})\) is empty, then \(y_n\) cannot be a limit point.

The geometric series by \[ \frac{\varepsilon}{2}+\frac{\varepsilon}{4}+\cdots+\frac{\varepsilon}{2^n}+\cdots=\varepsilon \] is widely used when sum is taken into account. It is a good idea to keep this technique in mind.


The formal proof will not be put down here, but they are quite easy to be done.

(Corollary 0) \(\Lambda(X)=Y\).

This is an immediate consequence of the fact that \(\Lambda\) is open. Since \(Y\) is open, \(\Lambda(X)\) is an open subspace of \(Y\). But the only open subspace of \(Y\) is \(Y\) itself.

(Corollary 1) \(Y\) is a \(F\)-space as well.

If you have already see the commutative diagram by quotient space (put \(N=\ker\Lambda\)), you know that the induced map \(f\) is open and continuous. By treating topological spaces as groups, by corollary 0 and the first isomorphism theorem, we have \[ X/\ker\Lambda \simeq \Lambda(X)=Y. \] Therefore \(f\) is a isomorphism; hence one-to-one. Therefore \(f\) is a homeomorphism as well. In this post we showed that \(X/\ker{\Lambda}\) is a \(F\)-space, therefore \(Y\) has to be a \(F\)-space as well. (We are using the fact that \(\ker{\Lambda}\) is a closed set. But why closed?)

(Corollary 2) If \(\Lambda\) is a continuous linear mapping of an \(F\)-space \(X\) onto a \(F\)-space \(Y\), then \(\Lambda\) is open.

This is a direct application of BCT and open mapping theorem. Notice that \(Y\) is now of the second category.

(Corollary 3) If the linear map \(\Lambda\) in Corollary 2 is injective, then \(\Lambda^{-1}:Y \to X\) is continuous.

This comes from corollary 2 directly since \(\Lambda\) is open.

(Corollary 4) If \(X\) and \(Y\) are Banach spaces, and if \(\Lambda: X \to Y\) is a continuous linear bijective map, then there exist positive real numbers \(a\) and \(b\) such that \[ a \lVert x \rVert \leq \lVert \Lambda{x} \rVert \leq b\rVert x \rVert \] for every \(x \in X\).

This comes from corollary 3 directly since both \(\Lambda\) and \(\Lambda^{-1}\) are bounded as they are continuous.

(Corollary 5) If \(\tau_1 \subset \tau_2\) are vector topologies on a vector space \(X\) and if both \((X,\tau_1)\) and \((X,\tau_2)\) are \(F\)-spaces, then \(\tau_1 = \tau_2\).

This is obtained by applying corollary 3 to the identity mapping \(\iota:(X,\tau_2) \to (X,\tau_1)\).

(Corollary 6) If \(\lVert \cdot \rVert_1\) and \(\lVert \cdot \rVert_2\) are two norms in a vector space \(X\) such that

  • \(\lVert\cdot\rVert_1 \leq K\lVert\cdot\rVert_2\).
  • \((X,\lVert\cdot\rVert_1)\) and \((X,\lVert\cdot\rVert_2)\) are Banach

Then \(\lVert\cdot\rVert_1\) and \(\lVert\cdot\rVert_2\) are equivalent.

This is merely a more restrictive version of corollary 5.

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it's time to make a list of the series. It's been around half a year.

The completeness of the quotient space (topological vector space)

The Goal

We are going to show the completeness of \(X/N\) where \(X\) is a TVS and \(N\) a closed subspace. Alongside, a bunch of useful analysis tricks will be demonstrated (and that's why you may find this blog post a little tedious.). But what's more important, the theorem proved here will be used in the future.

The main process

To make it clear, we should give a formal definition of \(F\)-space.

A topological space \(X\) is an \(F\)-space if its topology \(\tau\) is induced by a complete invariant metric \(d\).

A metric \(d\) on a vector space \(X\) will be called invariant if for all \(x,y,z \in X\), we have \[ d(x+z,y+z)=d(x,y). \] By complete we mean every Cauchy sequence of \((X,d)\) converges.

Defining the quotient metric \(\rho\)

The metric can be inherited to the quotient space naturally (we will use this fact latter), that is

If \(X\) is a \(F\)-space, \(N\) is a closed subspace of a topological vector space \(X\), then \(X/N\) is still a \(F\)-space.

Suppose \(d\) is a complete invariant metric compatible with \(\tau_X\). The metric on \(X/N\) is defined by \[ \boxed{\rho(\pi(x),\pi(y))=\inf_{z \in N}d(x-y,z)} \] ### \(\rho\) is a metric

Proof. First, if \(\pi(x)=\pi(y)\), that is, \(x-y \in N\), we see \[ \rho(\pi(x),\pi(y))=\inf_{z \in N}d(x-y,z)=d(x-y,x-y)=0. \] If \(\pi(x) \neq \pi(y)\) however, we shall show that \(\rho(\pi(x),\pi(y))>0\). In this case, we have \(x-y \notin N\). Since \(N\) is closed, \(N^c\) is open, and \(x-y\) is an interior point of \(X-N\). Therefore there exists an open ball \(B_r(x-y)\) centered at \(x-y\) with radius \(r>0\) such that \(B_r(x-y) \cap N = \varnothing\). Notice we have \(d(x-y,z)>r\) since otherwise \(z \in B_r(x-y)\). By putting \[ r_0=\sup\{r:B_r(x-y) \cap N = \varnothing\}, \] we see \(d(x-y,z) \geq r_0\) for all \(z \in N\) and indeed \(r_0=\inf_{z \in N}d(x-y,z)>0\) (the verification can be done by contradiction). In general, \(\inf_z d(x-y,z)=0\) if and only if \(x-y \in \overline{N}\).

Next, we shall show that \(\rho(\pi(x),\pi(y))=\rho(\pi(y),\pi(x))\), and it suffices to assume that \(\pi(x) \neq \pi(y)\). Sgince \(d\) is translate invariant, we get \[ \begin{aligned} d(x-y,z)&=d(x-y-z,0) \\ &=d(0,y-x+z) \\ &=d(-z,y-x) \\ &=d(y-x,-z). \end{aligned} \] Therefore the \(\inf\) of the left hand is equal to the one of the right hand. The identity is proved.

Finally, we need to verify the triangle inequality. Let \(r,s,t \in X\). For any \(\varepsilon>0\), there exist some \(z_\varepsilon\) and \(z_\varepsilon'\) such that \[ d(r-s,z_\varepsilon)<\rho(\pi(r),\pi(s))+\frac{\varepsilon}{2},\quad d(s-t,z'_\varepsilon)<\rho(\pi(s),\pi(t))+\frac{\varepsilon}{2}. \] Since \(d\) is invariant, we see \[ \begin{aligned} d(r-t,z_\varepsilon+z'_\varepsilon)&=d((r-s)+(s-t)-(z_\varepsilon+z'_\varepsilon),0) \\ &=d([(r-s)-z_\varepsilon]+[(s-t)-z'_\varepsilon],0) \\ &=d(r-s-z_\varepsilon,t-s+z'_\varepsilon) \\ &\leq d(r-s-z_\varepsilon,0)+d(t-s+z'_\varepsilon,0) \\ &=d(r-s,z_\varepsilon)+d(s-t,z'_\varepsilon) \end{aligned} \] (I owe [@LeechLattice](https://onp4.com/@leechlattice) for the inequality above.)

Therefore \[ \begin{aligned} d(r-t,z_\varepsilon+z'_\varepsilon)&\leq d(r-s,z_\varepsilon)+d(s-t,z'_\varepsilon) \\ &<\rho(\pi(r),\pi(s))+\rho(\pi(s),\pi(t))+\varepsilon. \end{aligned} \] (Warning: This does not imply that \(\rho(\pi(r),\pi(s))+\rho(\pi(s),\pi(t))=\inf_z d(r-t,z)\) since we don't know whether it is the lower bound or not.)

If \(\rho(\pi(r),\pi(s))+\rho(\pi(s),\pi(t))<\rho(\pi(r),\pi(t))\) however, let \[ 0<\varepsilon<\rho(\pi(r),\pi(t))-(\rho(\pi(r),\pi(s))+\rho(\pi(s),\pi(t))) \] then there exists some \(z''_\varepsilon=z_\varepsilon+z'_\varepsilon\) such that \[ d(r-t,z''_\varepsilon)<\rho(\pi(r),\pi(t)) \] which is a contradiction since \(\rho(\pi(r),\pi(t)) \leq d(r-t,z)\) for all \(z \in N\).

(We are using the \(\varepsilon\) definition of \(\inf\). See here.)

\(\rho\) is translate invariant

Since \(\pi\) is surjective, we see if \(u \in X/N\), there exists some \(a \in X\) such that \(\pi(a)=u\). Therefore \[ \begin{aligned} \rho(\pi(x)+u,\pi(y)+u) &=\rho(\pi(x)+\pi(a),\pi(y)+\pi(a)) \\ &=\rho(\pi(x+a),\pi(y+a)) \\ &=\inf_{z \in N}d(x+a-y-a,z) \\ &=\rho(\pi(x),\pi(y)). \end{aligned} \]

\(\rho\) is well-defined

If \(\pi(x)=\pi(x')\) and \(\pi(y)=\pi(y')\), we have to show that \(\rho(\pi(x),\pi(y))=\rho(\pi(x'),\pi(y'))\). In fact, \[ \begin{aligned} \rho(\pi(x),\pi(y)) &\leq \rho(\pi(x),\pi(x'))+\rho(\pi(x'),\pi(y'))+\rho(\pi(y'),\pi(y)) \\ &=\rho(\pi(x'),\pi(y')) \end{aligned} \] since \(\rho(\pi(x),\pi(x'))=0\) as \(\pi(x)=\pi(x')\). Meanwhile \[ \begin{aligned} \rho(\pi(x'),\pi(y')) &\leq \rho(\pi(x'),\pi(x)) + \rho(\pi(x),\pi(y)) + \rho(\pi(y),\pi(y')) \\ &= \rho(\pi(x),\pi(y)). \end{aligned} \] therefore \(\rho(\pi(x),\pi(y))=\rho(\pi(x'),\pi(y'))\).

\(\rho\) is compatible with \(\tau_N\)

By proving this, we need to show that a set \(E \subset X/N\) is open with respect to \(\tau_N\) if and only if \(E\) is a union of open balls. But we need to show a generalized version:

If \(\mathscr{B}\) is a local base for \(\tau\), then the collection \(\mathscr{B}_N\), which contains all sets \(\pi(V)\) where \(V \in \mathscr{B}\), forms a local base for \(\tau_N\).

Proof. We already know that \(\pi\) is continuous, linear and open. Therefore \(\pi(V)\) is open for all \(V \in \mathscr{B}\). For any open set around \(E \subset X/N\) containing \(\pi(0)\), we see \(\pi^{-1}(E)\) is open, and we have \[ \pi^{-1}(E)=\bigcup_{V\in\mathscr{B}}V \] and therefore \[ E=\bigcup_{V \in \mathscr{B}}\pi(V). \]

Now consider the local base \(\mathscr{B}\) containing all open balls around \(0 \in X\). Since \[ \pi(\{x:d(x,0)<r\})=\{u:\rho(u,\pi(0))<r\} \] we see \(\rho\) determines \(\mathscr{B}_N\). But we have already proved that \(\rho\) is invariant; hence \(\mathscr{B}_N\) determines \(\tau_N\).

If \(d\) is complete, then \(\rho\) is complete.

Once this is proved, we are able to claim that, if \(X\) is a \(F\)-space, then \(X/N\) is still a \(F\)-space, since its topology is induced by a complete invariant metric \(\rho\).

Proof. Suppose \((x_n)\) is a Cauchy sequence in \(X/N\), relative to \(\rho\). There is a subsequence \((x_{n_k})\) with \(\rho(x_{n_k},x_{n_{k+1}})<2^{-k}\). Since \(\pi\) is surjective, we are able to pick some \(z_k \in X\) such that \(\pi(z_k) = x_{n_k}\) and such that \[ d(z_{k},z_{k+1})<2^{-k}. \] (The existence can be verified by contradiction still.) By the inequality above, we see \((z_k)\) is Cauchy (can you see why?). Since \(X\) is complete, \(z_k \to z\) for some \(z \in X\). By the continuity of \(\pi\), we also see \(x_{n_k} \to \pi(z)\) as \(k \to \infty\). Therefore \((x_{n_k})\) converges. Hence \((x_n)\) converges since it has a convergent subsequence. \(\rho\) is complete.


This fact will be used to prove some corollaries in the open mapping theorem. For instance, for any continuous linear map \(\Lambda:X \to Y\), we see \(\ker(\Lambda)\) is closed, therefore if \(X\) is a \(F\)-space, then \(X/\ker(\Lambda)\) is a \(F\)-space as well. We will show in the future that \(X/\ker(\Lambda)\) and \(\Lambda(X)\) are homeomorphic if \(\Lambda(X)\) is of the second category.

There are more properties that can be inherited by \(X/N\) from \(X\). For example, normability, metrizability, local convexity. In particular, if \(X\) is Banach, then \(X/N\) is Banach as well. To do this, it suffices to define the quotient norm by \[ \lVert \pi(x) \rVert = \inf\{\lVert x-z \rVert:z \in N\}. \]

Basic Facts of Semicontinuous Functions


We are restricting ourselves into \(\mathbb{R}\) endowed with normal topology. Recall that a function is continuous if and only if for any open set \(U \subset \mathbb{R}\), we have \[ \{x:f(x) \in U\}=f^{-1}(U) \]

to be open. One can rewrite this statement using \(\varepsilon-\delta\) language. To say a function \(f: \mathbb{R} \to \mathbb{R}\) continuous at \(f(x)\), we mean for any \(\varepsilon>0\), there exists some \(\delta>0\) such that for \(t \in (x-\delta,x+\delta)\), we have \[ |f(x)-f(t)|<\varepsilon. \] \(f\) is continuous on \(\mathbb{R}\) if and only if \(f\) is continuous at every point of \(\mathbb{R}\).

If \((x-\delta,x+\delta)\) is replaced with \((x-\delta,x)\) or \((x,x+\delta)\), we get left continuous and right continuous, one of which plays an important role in probability theory.

But the problem is, sometimes continuity is too strong for being a restriction, but the 'direction' associated with left/right continuous functions are unnecessary as well. For example the function \[ f(x)=\chi_{(0,1)}(x) \] is neither left nor right continuous (globally), but it is a thing. Left/right continuity is not a perfectly weakened version of continuity. We need something different.

Definition of semicontinuous

Let \(f\) be a real (or extended-real) function on \(\mathbb{R}\). The semicontinuity of \(f\) is defined as follows.

If \[ \{x:f(x)>\alpha\} \] is open for all real \(\alpha\), we say \(f\) is lower semicontinuous.

If \[ \{x:f(x)<\alpha\} \] is open for all real \(\alpha\), we say \(f\) is upper semicontinuous.

Is it possible to rewrite these definitions à la \(\varepsilon-\delta\)? The answer is yes if we restrict ourselves in metric space.

\(f: \mathbb{R} \to \mathbb{R}\) is upper semicontinuous at \(x\) if, for every \(\varepsilon>0\), there exists some \(\delta>0\) such that for \(t \in (x-\delta,x+\delta)\), we have \[ f(t)<f(x)+\varepsilon \]

\(f: \mathbb{R} \to \mathbb{R}\) is lower semicontinuous at \(x\) if, for every \(\varepsilon>0\), there exists some \(\delta>0\) such that for \(t \in (x-\delta,x+\delta)\), we have \[ f(t)>f(x)-\varepsilon \]

Of course, \(f\) is upper/lower semicontinuous on \(\mathbb{R}\) if and only if it is so on every point of \(\mathbb{R}\). One shall find no difference between the definitions in different styles.

Relation with continuous functions

Here is another way to see it. For the continuity of \(f\), we are looking for arbitrary open subsets \(V\) of \(\mathbb{R}\), and \(f^{-1}(V)\) is expected to be open. For the lower/upper semicontinuity of \(f\), however, the open sets are restricted to be like \((\alpha,+\infty]\) and \([-\infty,\alpha)\). Since all open sets of \(\mathbb{R}\) can be generated by the union or intersection of sets like \([-\infty,\alpha)\) and \((\beta,+\infty]\), we immediately get

\(f\) is continuous if and only if \(f\) is both upper semicontinuous and lower semicontinuous.

Proof. If \(f\) is continuous, then for any \(\alpha \in \mathbb{R}\), we see \([-\infty,\alpha)\) is open, and therefore \[ f^{-1}([-\infty,\alpha)) \] has to be open. The upper semicontinuity is proved. The lower semicontinuity of \(f\) is proved in the same manner.

If \(f\) is both upper and lower semicontinuous, we see \[ f^{-1}((\alpha,\beta))=f^{-1}([-\infty,\beta)) \cap f^{-1}((\alpha,+\infty]) \] is open. Since every open subset of \(\mathbb{R}\) can be written as a countable union of segments of the above types, we see for any open subset \(V\) of \(\mathbb{R}\), \(f^{-1}(V)\) is open. (If you have trouble with this part, it is recommended to review the definition of topology.) \(\square\)


There are two important examples.

  1. If \(E \subset \mathbb{R}\) is open, then \(\chi_E\) is lower semicontinuous.
  2. If \(F \subset \mathbb{R}\) is closed, then \(\chi_F\) is upper semicontinuous.

We will prove the first one. The second one follows in the same manner of course. For \(\alpha<0\), the set \(A=\chi_E^{-1}((\alpha,+\infty])\) is equal to \(\mathbb{R}\), which is open. For \(\alpha \geq 1\), since \(\chi_E \leq 1\), we see \(A=\varnothing\). For \(0 \leq \alpha < 1\) however, the set of \(x\) where \(\chi_E>\alpha\) has to be \(E\), which is still open.

When checking the semicontinuity of a function, we check from bottom to top or top to bottom. The function \(\chi_E\) is defined by \[ \chi_E(x)=\begin{cases} 1 \quad x \in E \\ 0 \quad x \notin E \end{cases}. \]

Addition of semicontinuous functions

If \(f_1\) and \(f_2\) are upper/lower semicontinuous, then so is \(f_1+f_2\).

Proof. We are going to prove this using different tools. Suppose now both \(f_1\) and \(f_2\) are upper semicontinuous. For \(\varepsilon>0\), there exists some \(\delta_1>0\) and \(\delta_2>0\) such that \[ f_1(t) < f_1(x)+\varepsilon/2 \quad t \in (x-\delta_1,x+\delta_1), \\ f_2(t) < f_2(x) + \varepsilon/2 \quad t \in (x-\delta_2,x+\delta_2). \] Proof. If we pick \(\delta=\min(\delta_1,\delta_2)\), then we see for all \(t \in (x-\delta,x+\delta)\), we have \[ f_1(t)+f_2(t)<f_1(x)+f_2(x)+\varepsilon. \] The upper semicontinuity of \(f_1+f_2\) is proved by considering all \(x \in \mathbb{R}\).

Now suppose both \(f_1\) and \(f_2\) are lower semicontinuous. We have an identity by \[ \{x:f_1+f_2>\alpha\}=\bigcup_{\beta\in\mathbb{R}}\{x:f_1>\beta\}\cap\{x:f_2>\alpha-\beta\}. \] The set on the right side is always open. Hence \(f_1+f_2\) is lower semicontinuous. \(\square\)

However, when there are infinite many semicontinuous functions, things are different.

Let \(\{f_n\}\) be a sequence of nonnegative functions on \(\mathbb{R}\), then

  • If each \(f_n\) is lower semicontinuous, then so is \(\sum_{1}^{\infty}f_n\).
  • If each \(f_n\) is upper semicontinuous, then \(\sum_{1}^{\infty}f_n\) is not necessarily upper semicontinuous.

Proof. To prove this we are still using the properties of open sets. Put \(g_n=\sum_{1}^{n}f_k\). Now suppose all \(f_k\) are lower. Since \(g_n\) is a finite sum of lower functions, we see each \(g_n\) is lower. Let \(f=\sum_{n}f_n\). As \(f_k\) are non-negative, we see \(f(x)>\alpha\) if and only if there exists some \(n_0\) such that \(g_{n_0}(x)>\alpha\). Therefore \[ \{x:f(x)>\alpha\}=\bigcup_{n \geq n_0}\{x:g_n>\alpha\}. \] The set on the right hand is open already.

For the upper semicontinuity, it suffices to give a counterexample, but before that, we shall give the motivation.

As said, the characteristic function of a closed set is upper semicontinuous. Suppose \(\{E_n\}\) is a sequence of almost disjoint closed set, then \(E=\cup_{n\geq 1}E_n\) is not necessarily closed, therefore \(\chi_E=\sum\chi_{E_n}\) (a.e.) is not necessarily upper semicontinuous. Now we give a concrete example. Put \(f_0=\chi_{[1,+\infty]}\) and \(f_n=\chi_{E_n}\) for \(n \geq 1\) where \[ E_n=\{x:\frac{1}{1+n} \leq x \leq \frac{1}{n}\}. \] For \(x > 0\), we have \(f=\sum_nf_n \geq 1\). Meanwhile, \(f^{-1}([-\infty,1))=[-\infty,0]\), which is not open. \(\square\)

Notice that \(f\) can be defined on any topological space here.

Maximum and minimum

There is one fact we already know about continuous functions.

If \(X\) is compact, \(f: X \to \mathbb{R}\) is continuous, then there exists some \(a,b \in X\) such that \(f(a)=\min f(X)\), \(f(b)=\max f(X)\).

In fact, \(f(X)\) is compact still. But for semicontinuous functions, things will be different but reasonable. For upper semicontinuous functions, we have the following fact.

If \(X\) is compact and \(f: X \to (-\infty,+\infty)\) is upper semicontinuous, then there exists some \(a \in X\) such that \(f(a)=\max f(X)\).

Notice that \(X\) is not assumed to hold any other topological property. It can be Hausdorff or Lindelöf, but we are not asking for restrictions like this. The only property we will be using is that every open cover of \(X\) has a finite subcover. Of course, one can replace \(X\) with any compact subset of \(\mathbb{R}\), for example, \([a,b]\).

Proof. Put \(\alpha=\sup f(X)\), and define \[ E_n=\{x:f(x)<\alpha-\frac{1}{n}\}. \] If \(f\) attains no maximum, then for any \(x \in X\), there exists some \(n \geq 1\) such that \(f(x)<\alpha-\frac{1}{n}\). That is, \(x \in E_n\) for some \(n\). Therefore \(\bigcup_{n \geq 1}E_n\) covers \(X\). But this cover has no finite subcover of \(X\). A contradiction since \(X\) is compact. \(\square\)

Approximating integrable functions

This is a comprehensive application of several properties of semicontinuity.

(Vitali–Carathéodory theorem) Suppose \(f \in L^1(\mathbb{R})\), where \(f\) is a real-valued function. For \(\varepsilon>0\), there exist some functions \(u\) and \(v\) on \(\mathbb{R}\) such that \(u \leq f \leq v\), \(u\) is an upper semicontinuous function bounded above, and \(v\) is lower semicontinuous bounded below, and \[ \boxed{\int_{\mathbb{R}}(v-u)dm<\varepsilon} \]

It suffices to prove this theorem for \(f \geq 0\) (of course \(f\) is not identically equal to \(0\) since this case is trivial). Since \(f\) is the pointwise limit of an increasing sequence of simple functions \(s_n\), can to write \(f\) as \[ f=s_1+\sum_{n=2}^{\infty}(s_n-s_{n-1}). \] By putting \(t_1=s_1\), \(t_n=s_n-s_{n-1}\) for \(n \geq 2\), we get \(f=\sum_n t_n\). We can write \(f\) as \[ f=\sum_{k=1}^{\infty}c_k\chi_{E_k} \] where \(E_k\) is measurable for all \(k\). Also, we have \[ \int_X f d\mu = \sum_{k=1}^{\infty}c_km(E_k), \] and the series on the right hand converges (since \(f \in L^1\). By the properties of Lebesgue measure, there exists a compact set \(F_k\) and an open set \(V_k\) such that \(F_k \subset E_k \subset V_k\) and \(c_km(V_k-F_k)<\frac{\varepsilon}{2^{k+1}}\). Put \[ v=\sum_{k=1}^{\infty}c_k\chi_{V_k},\quad u=\sum_{k=1}^{N}c_k\chi_{F_k} \] (now you can see \(v\) is lower semicontinuous and \(u\) is upper semicontinuous). The \(N\) is chosen in such a way that \[ \sum_{k=N+1}^{\infty}c_km(E_K)<\frac{\varepsilon}{2}. \] Since \(V_k \supset E_k\), we have \(\chi_{V_k} \geq \chi_{E_k}\). Therefore \(v \geq f\). Similarly, \(f \geq u\). Now we need to check the desired integral inequality. A simple recombination shows that \[ \begin{aligned} v-u&=\sum_{k=1}^{\infty}c_k\chi_{V_k}-\sum_{k=1}^{N}c_k\chi_{F_k} \\ &\leq \sum_{k=1}^{\infty}c_k\chi_{V_k}-\sum_{k=1}^{N}c_k\chi_{F_k}+\sum_{k=N+1}^{\infty}c_k(\chi_{E_k}-\chi_{F_k}) \\ &=\sum_{k=1}^{\infty}c_k(\chi_{V_k}-\chi_{F_k})+\sum_{k=N+1}^{\infty}c_k\chi_{E_k}. \end{aligned}. \] If we integrate the function above, we get \[ \begin{aligned} \int_{\mathbb{R}}(v-u)dm &\leq \sum_{k=1}^{\infty}c_k\mu(V_k-E_k)+\sum_{k=N+1}^{\infty}c_k\chi_{E_k} \\ &< \sum_{k=1}^{\infty}\frac{\varepsilon}{2^{k+1}}+\frac{\varepsilon}{2} \\ &=\varepsilon. \end{aligned} \] This proved the case when \(f \geq 0\). In the general case, we write \(f=f^{+}-f^{-}\). Attach the semicontinuous functions to \(f^{+}\) and \(f^{-}\) respectively by \(u_1 \leq f^{+} \leq v_1\) and \(u_2 \leq f^{-} \leq v_2\). Put \(u=u_1-v_2\), \(v=v_1-u_2\). As we can see, \(u\) is upper semicontinuous and \(v\) is lower semicontinuous. Also, \(u \leq f \leq v\) with the desired property since \[ \int_\mathbb{R}(v-u)dm=\int_\mathbb{R}(v_1-u_1)dm+\int_\mathbb{R}(v_2-u_2)dm<2\varepsilon, \] and the theorem follows. \(\square\)


Indeed, the only property about measure used is the existence of \(F_k\) and \(V_k\). The domain \(\mathbb{R}\) here can be replaced with \(\mathbb{R}^k\) for \(1 \leq k < \infty\), and \(m\) be replaced with the respective \(m_k\). Much more generally, the domain can be replaced by any locally compact Hausdorff space \(X\) and the measure by any measure associated with the Riesz-Markov-Kakutani representation theorem on \(C_c(X)\).

Is the reverse approximation always possible?

The answer is no. Consider the fat Cantor set \(K\), which has Lebesgue measure \(\frac{1}{2}\). We shall show that \(\chi_K\) can not be approximated below by a lower semicontinuous function.

If \(v\) is a lower semicontinuous function such that \(v \leq \chi_K\), then \(v \leq 0\).

Proof. Consider the set \(V=v^{-1}((0,1])=v^{-1}((0,+\infty))\). Since \(v \leq \chi_K\), we have \(V \subset K\). We will show that \(V\) has to be empty.

Pick \(t \in V\). Since \(V\) is open, there exists some neighbourhood \(U\) containing \(t\) such that \(U \subset V\). But \(U=\varnothing\) since \(U \subset K\) and \(K\) has an empty interior. Therefore \(V = \varnothing\). That is, \(v \leq 0\) for all \(x\). \(\square\)

Suppose \(u\) is an upper semicontinuous function such that \(u \geq f\). For \(\varepsilon=\frac{1}{2}\), we have \[ \int_{\mathbb{R}}(u-v)dm \geq \int_\mathbb{R}(f-v)dm \geq \frac{1}{2}. \] This example shows that there exist some integrable functions that are not able to reversely approximated in the sense of the Vitali–Carathéodory theorem.

An Introduction to Quotient Space

I'm assuming the reader has some abstract algebra and functional analysis background. You may have learned this already in your linear algebra class, but we are making our way to functional analysis problems.


The trouble with \(L^p\) spaces

Fix \(p\) with \(1 \leq p \leq \infty\). It's easy to see that \(L^p(\mu)\) is a topological vector space. But it is not a metric space if we define \[ d(f,g)=\lVert f-g \rVert_p. \] The reason is, if \(d(f,g)=0\), we can only get \(f=g\) a.e., but they are not strictly equal. With that being said, this function \(d\) is actually a pseudo metric. This is unnatural. However, the relation \(\sim\) by \(f \sim g \mathbb{R}ightarrow d(f,g)=0\) is a equivalence relation. This inspires us to take the quotient set into consideration.

Vector spaces are groups anyway

For a vector space \(V\), every subspace of \(V\) is a normal subgroup. There is no reason to prevent ourselves from considering the quotient group and looking for some interesting properties. Further, a vector space is an abelian group, therefore any subspace is automatically normal.


Let \(N\) be a subspace of a vector space \(X\). For every \(x \in X\), let \(\pi(x)\) be the coset of \(N\) that contains \(x\), that is \[ \pi(x)=x+N. \] Trivially, \(\pi(x)=\pi(y)\) if and only if \(x-y \in N\) (say, \(\pi\) is well-defined since \(N\) is a vector space). This is a linear function since we also have the addition and multiplication by \[ \pi(x)+\pi(y)=\pi(x+y) \quad \alpha\pi(x)=\pi(\alpha{x}). \] These cosets are the elements of a vector space \(X/N\), which reads, the quotient space of \(X\) modulo \(N\). The map \(\pi\) is called the canonical map as we all know.



First, we shall treat \(\mathbb{R}^2\) as a vector space, and the subspace \(\mathbb{R}\), which is graphically represented by \(x\)-axis, as a subspace (we will write it as \(X\)). For a vector \(v=(2,3)\), which is represented by \(AB\), we see the coset \(v+X\) has something special. Pick any \(u \in X\), for example, \(AE\), \(AC\), or \(AG\). We see \(v+u\) has the same \(y\) value. The reason is simple since we have \(v+u=(2+x,3)\), where the \(y\) value remains fixed however \(u\) may vary.

With that being said, the set \(v+X\), which is not a vector space, can be represented by \(\overrightarrow{AD}\). This proceed can be generalized to \(\mathbb{R}^n\) with \(\mathbb{R}^m\) as a subspace with ease.

We now consider a fancy example. Consider all rational Cauchy sequences, that is \[ (a_n)=(a_1,a_2,\cdots) \] where \(a_k\in\mathbb{Q}\) for all \(k\). In analysis class, we learned two facts.

  1. Any Cauchy sequence is bounded.
  2. If \((a_n)\) converges, then \((a_n)\) is Cauchy.

However, the reverse of 2 does not hold in \(\mathbb{Q}\). For example, if we put \(a_k=(1+\frac{1}{k})^k\), we should have the limit to be \(e\), but \(e \notin \mathbb{Q}\).

If we define the addition and multiplication term by term, namely \[ (a_n)+(b_n)=(a_1+b_1,a_2+b_2,\cdots) \] and \[ (\alpha a_n)=(\alpha a_1,\alpha a_2,\cdots) \] where \(\alpha \in \mathbb{Q}\), we get a vector space (the verification is easy). The zero vector is defined by \[ (0)=(0,0,\cdots). \] This vector space is denoted by \(\overline{\mathbb{Q}}\). The subspace containing all sequences converges to \(0\) will be denoted by \(\overline{\mathbb{O}}\). Again, \((a_n)+\overline{\mathbb{O}}=(b_n)+\overline{\mathbb{O}}\) if and only if \((a_n-b_n) \in \overline{\mathbb{O}}\). Using the language of equivalence relation, we also say \((a_n)\) and \((b_n)\) are equivalent if \((a_n-b_n) \in \overline{\mathbb{O}}\). For example, the two following sequences are equivalent: \[ (1,1,1,\cdots,1,\cdots)\quad\quad (0.9,0.99,0.999,\cdots). \] Actually, we will get \(\mathbb{R} \simeq \overline{\mathbb{Q}}/\overline{\mathbb{O}}\) in the end. But to make sure that this quotient space is exactly the one we meet in our analysis class, there are a lot of verifications should be done.

We shall give more definitions for calculation. The multiplication of two Cauchy sequences is defined term by term à la the addition. For \(\overline{\mathbb{Q}}/\overline{\mathbb{O}}\) we have \[ ((a_n)+\overline{\mathbb{O}})+((b_n)+\overline{\mathbb{O}})=(a_n+b_n) + \overline{\mathbb{O}} \] and \[ ((a_n)+\overline{\mathbb{O}})((b_n)+\overline{\mathbb{O}})=(a_nb_n)+\overline{\mathbb{O}}. \] As for inequality, a partial order has to be defined. We say \((a_n) > (0)\) if there exists some \(N>0\) such that \(a_n>0\) for all \(n \geq N\). By \((a_n) > (b_n)\) we mean \((a_n-b_n)>(0)\) of course. For cosets, we say \((a_n)+\overline{\mathbb{O}}>\overline{\mathbb{O}}\) if \((x_n) > (0)\) for some \((x_n) \in (a_n)+\overline{\mathbb{O}}\). This is well defined. That is, if \((x_n)>(0)\), then \((y_n)>(0)\) for all \((y_n) \in (a_n)+\overline{\mathbb{O}}\).

With these operations being defined, it can be verified that \(\overline{\mathbb{Q}}/\overline{\mathbb{O}}\) has the desired properties, for example, the least-upper-bound property. But this goes too far from the topic, we are not proving it here. If you are interested, you may visit here for more details.

Finally, we are trying to make \(L^p\) a Banach space. Fix \(p\) with \(1 \leq p < \infty\). There is a seminorm defined for all Lebesgue measurable functions on \([0,1]\) by \[ p(f)=\lVert f \rVert_p=\left\{\int_{0}^{1}|f(t)|^pdt\right\}^{1/p} \] \(L^p\) is a vector space containing all functions \(f\) with \(p(f)<\infty\). But it's not a normed space by \(p\), since \(p(f)=0\) only implies \(f=0\) almost everywhere. However, the set \(N\) which contains all functions that equal \(0\) is also a vector space. Now consider the quotient space by \[ \tilde{p}(\pi(f))=p(f), \] where \(\pi\) is the canonical map of \(L^p\) into \(L^p/N\). We shall prove that \(\tilde{p}\) is well-defined here. If \(\pi(f)=\pi(g)\), we have \(f-g \in N\), therefore \[ 0=p(f-g)\geq |p(f)-p(g)|, \] which forces \(p(f)=p(g)\). Therefore in this case we also have \(\tilde{p}(\pi(f))=\tilde{p}(\pi(g))\). This indeed ensures that \(\tilde{p}\) is a norm, and \(L^p/N\) a Banach space. There are some topological facts required to prove this, we are going to cover a few of them.

Topology of quotient space


We know if \(X\) is a topological vector space with a topology \(\tau\), then the addition and scalar multiplication are continuous. Suppose now \(N\) is a closed subspace of \(X\). Define \(\tau_N\) by \[ \tau_N=\{E \subset X/N:\pi^{-1}(E)\in \tau\}. \] We are expecting \(\tau_N\) to be properly-defined. And fortunately, it is. Some interesting techniques will be used in the following section.

\(\tau_N\) is a vector topology

There will be two steps to get this done.

\(\tau_N\) is a topology.

It is trivial that \(\varnothing\) and \(X/N\) are elements of \(\tau_N\). Other properties are immediate as well since we have \[ \pi^{-1}(A \cap B) = \pi^{-1}(A) \cap \pi^{-1}(B) \] and \[ \pi^{-1}(\cup A_\alpha)=\cup\pi^{-1}( A_{\alpha}). \] That said, if we have \(A,B\in \tau_N\), then \(A \cap B \in \tau_N\) since \(\pi^{-1}(A \cap B)=\pi^{-1}(A) \cap \pi^{-1}(B) \in \tau\).

Similarly, if \(A_\alpha \in \tau_N\) for all \(\alpha\), we have \(\cup A_\alpha \in \tau_N\). Also, by definition of \(\tau_N\), \(\pi\) is continuous.

\(\tau_N\) is a vector topology.

First, we show that a point in \(X/N\), which can be written as \(\pi(x)\), is closed. Notice that \(N\) is assumed to be closed, and \[ \pi^{-1}(\pi(x))=x+N \] therefore has to be closed.

In fact, \(F \subset X/N\) is \(\tau_N\)-closed if and only if \(\pi^{-1}(F)\) is \(\tau\)-closed. To prove this, one needs to notice that \(\pi^{-1}(F^c)=(\pi^{-1}(F))^{c}\).

Suppose \(V\) is open, then \[ \pi^{-1}(\pi(V))=N+V \] is open. By definition of \(\tau_N\), we have \(\pi(V) \in \tau_N\). Therefore \(\pi\) is an open mapping.

If now \(W\) is a neighbourhood of \(0\) in \(X/N\), there exists a neighbourhood \(V\) of \(0\) in \(X\) such that \[ V + V \subset \pi^{-1}(W). \] Hence \(\pi(V)+\pi(V) \subset W\). Since \(\pi\) is open, \(\pi(V)\) is a neighbourhood of \(0\) in \(X/N\), this shows that the addition is continuous.

The continuity of scalar multiplication will be shown in a direct way (so can the addition, but the proof above is intended to offer some special technique). We already know, the scalar multiplication on \(X\) by \[ \begin{aligned} \varphi:\Phi \times X &\to X \\ (\alpha,x) &\mapsto \alpha{x} \end{aligned} \] is continuous, where \(\Phi\) is the scalar field (usually \(\mathbb{R}\) or \(\mathbb{C}\). Now the scalar multiplication on \(X/N\) is by \[ \begin{aligned} \psi: \Phi \times X/N &\to X/N \\ (\alpha,x+N) &\mapsto \alpha{x}+N. \end{aligned} \] We see \(\psi(\alpha,x+N)=\pi(\varphi(\alpha,x))\). But the composition of two continuous functions is continuous, therefore \(\psi\) is continuous.

A commutative diagram by quotient space

We are going to talk about a classic commutative diagram that you already see in algebra class.


There are some assumptions.

  1. \(X\) and \(Y\) are topological vector spaces.
  2. \(\Lambda\) is linear.
  3. \(\pi\) is the canonical map.
  4. \(N\) is a closed subspace of \(X\) and \(N \subset \ker\Lambda\).

Algebraically, there exists a unique map \(f: X/N \to Y\) by \(x+N \mapsto \Lambda(x)\). Namely, the diagram above is commutative. But now we are interested in some analysis facts.

\(f\) is linear.

This is obvious. Since \(\pi\) is surjective, for \(u,v \in X/N\), we are able to find some \(x,y \in X\) such that \(\pi(x)=u\) and \(\pi(y)=v\). Therefore we have \[ \begin{aligned} f(u+v)=f(\pi(x)+\pi(y))&=f(\pi(x+y)) \\ &=\Lambda(x+y) \\ &=\Lambda(x)+\Lambda(y) \\ &= f(\pi(x))+f(\pi(y)) \\ &=f(u)+f(v) \end{aligned} \] and \[ \begin{aligned} f(\alpha{u})=f(\alpha\pi(x))&=f(\pi(\alpha{x})) \\ &= \Lambda(\alpha{x}) \\ &= \alpha\Lambda(x) \\ &= \alpha{f(\pi(x))} \\ &= \alpha{f(u)}. \end{aligned} \]

\(\Lambda\) is open if and only if \(f\) is open.

If \(f\) is open, then for any open set \(U \subset X\), we have \[ \Lambda(U)=f(\pi(U)) \] to be an open set since \(\pi\) is open, and \(\pi(U)\) is an open set.

If \(f\) is not open, then there exists some \(V \subset X/N\) such that \(f(V)\) is closed. However, since \(\pi\) is continuous, we have \(\pi^{-1}(V)\) to be open. In this case, we have \[ f(\pi(\pi^{-1}(V)))=f(V)=\Lambda(\pi^{-1}(V)) \] to be closed. \(\Lambda\) is therefore not open. This shows that if \(\Lambda\) is open, then \(f\) is open.

\(\Lambda\) is continuous if and only if \(f\) is continuous.

If \(f\) is continuous, for any open set \(W \subset Y\), we have \(\pi^{-1}(f^{-1}(W))=\Lambda^{-1}(W)\) to be open. Therefore \(\Lambda\) is continuous.

Conversely, if \(\Lambda\) is continuous, for any open set \(W \subset Y\), we have \(\Lambda^{-1}(W)\) to be open. Therefore \(f^{-1}(W)=\pi(\Lambda^{-1}(W))\) has to be open since \(\pi\) is open.

The Big Three Pt. 3 - The Open Mapping Theorem (Banach Space)

What is open mapping

An open map is a function between two topological spaces that maps open sets to open sets. Precisely speaking, a function \(f: X \to Y\) is open if for any open set \(U \subset X\), \(f(U)\) is open in \(Y\). Likewise, a closed map is a function mapping closed sets to closed sets.

You may think open/closed map is an alternative name of continuous function. But it's not. The definition of open/closed mapping is totally different from continuity. Here are some simple examples.

  1. \(f(x)=\sin{x}\) defined on \(\mathbb{R}\) is not open, though it's continuous. It can be verified by considering \((0,2\pi)\), since we have \(f((0,2\pi))=[-1,1]\).
  2. The projection \(\pi: \mathbb{R}^2 \to \mathbb{R}\) defined by \((x,y) \mapsto x\) is open. Indeed, it maps an open ball onto an open interval on \(x\) axis.
  3. The inclusion map \(\varphi: \mathbb{R} \to \mathbb{R}^2\) by \(x \mapsto (x,0)\) however, is not open. An open interval on the plane is locally closed but not open or closed.

Under what condition will a continuous linear function between two TVS be an open mapping? We'll give the answer in this blog post. Open mapping theorem is a sufficient condition on whether a continuous linear function is open.

Open Mapping Theorem

Let \(X,Y\) be Banach spaces and \(T: X \to Y\) a surjective bounded linear map. Then \(T\) is an open mapping.

The open balls in \(X\) and \(Y\) are defined respectively by \[ B_r^X=\{x \in X:\lVert x \rVert<r\}\quad\text{and}\quad B_r^Y=\{y \in Y:\lVert y \rVert<r\} \] All we need to do is show that there exists some \(r>0\) such that \[ B_r^Y \subset T(B_1^X) \] Since every open set in \(X\) or \(Y\) can be expressed as a union of open balls. For a ball in \(X\) centered at \(x \in X\) with radius \(r\), we can express it as \(x+B_r^X\). After that, it becomes obvious that \(T\) maps open set to open set.

First we have \[ X=\bigcup_{n=1}^{\infty}B_n^{X}. \] The surjectivity of \(T\) ensures that \[ Y=\bigcup_{n=1}^{\infty}T(B_n^X). \] Since \(Y\) is Banach, or simply a complete metric space, by Baire category theorem, there must be some \(n_0 \in \mathbb{N}\) such that \(\overline{T(B_{n_0}^{X})}\) has nonempty interior. If not, which means \(T(B_n^{X})\) is nowhere dense for all \(n \in \mathbb{N}\), we have \(Y\) is of the first category. A contradiction.

Since \(x \to nx\) is a homeomorphism of \(X\) onto \(X\), we see in fact \(T(B_n^X)\) is not nowhere dense for all \(n \in \mathbb{N}\). Therefore, there exists some \(y_0 \in \overline{T(B_1^{X})}\) and some \(\varepsilon>0\) such that \[ y_0+B_\varepsilon^Y \subset \overline{T(B_1^X)} \] the open set on the left hand is a neighborhood of \(y_0\), which should be in the interior of \(\overline{T(B_1^X)}\).

On the other hand, we claim \[ \overline{T(B_1^X)} - y_0 \subset \overline{T(B_2^X)}. \] We shall prove it as follows. Pick any \(y \in \overline{T(B_1^X)}\), we shall show that \(y-y_0 \in \overline{T(B_2^X)}\). For \(y_0\), there exists a sequence of \(y_n\) where \(\lVert y_n \rVert <1\) for all \(n\) such that \(Ty_n \to y_0\). Also we are able to find a sequence of \(x_n\) where \(\lVert x_n \rVert <1\) for all \(n\) such that \(Tx_n \to y\). Notice that we also have \[ y-y_0=\lim_{n \to \infty}T(x_n-y_n), \] since \[ \lVert x_n -y_n \rVert \leq \lVert x_n \rVert+\lVert y_n \rVert <2, \] we see \(T(x_n-y_n) \in T(B_2^X)\) for all \(n\), it follows that \[ y-y_0 \in \overline{T(B_2^X)}. \] Combining all these relations, we get \[ B_\varepsilon^Y \subset \overline{T(B_2^X)}. \] Since \(T\) is linear, we see \[ 2B_{\varepsilon/2}^{Y} \subset \overline{T(2B_1^X)}=2\overline{T(B_1^X)}. \] By induction we get \[ B_{\varepsilon/2^n}^Y \subset \overline{T(B_{1/2^{n-1}}^X)} \] for all \(n \geq 1\).

We shall show however \[ B_{\varepsilon/4}^Y \subset T(B_1^X). \] For any \(u \in B_{\varepsilon/4}^Y\), we have \(u \in \overline{T(B_{1/2}^X)}\). There exists some \(x_1 \in B_{1/2}^{X}\) such that \[ \lVert u-Tx_1 \rVert < \frac{\varepsilon}{8}. \] This implies that \(u-Tx_1 \in B_{\varepsilon/8}^Y\). Under the same fashion, we are able to pick \(x_n\) in such a way that \[ \lVert u-Tx_1-Tx_2-\cdots-Tx_n \rVert < \frac{\varepsilon}{2^{n+2}} \] where \(\lVert x_n \rVert<2^{-n}\). Now let \(z_n=\sum_{k=1}^{n}x_k\), we shall show that \((z_n)\) is Cauchy. For \(m<n\), we have \[ \lVert z_n - z_m \rVert =\left\Vert\sum_{k=m+1}^nx_k \right\Vert \leq \sum_{k=m+1}^{n}\lVert x_k\rVert < \frac{1}{2^{m+1}} \] Since \(X\) is Banach, there exists some \(z \in X\) such that \(z_n \to z\). Further we have \[ \lVert z\rVert = \lim_{n \to \infty}\lVert z_n \rVert \leq \sum_{k=1}^{\infty}\lVert x_n \rVert < 1 \] therefore \(z \in B_1^X\). Since \(T\) is bounded, therefore continuous, we get \(T(z)=u\). To summarize, for \(u \in B_{\varepsilon/4}^Y\), we have some \(z \in B_{1}^X\) such that \(T(z)=y\), which implies \(T(B_1^X) \supset B_{\varepsilon/4}^Y\).

Let \(U \subset X\) be open, we want to show that \(T(U)\) is also open. Take \(y \in T(U)\), then \(y=T(x)\) with \(x \in U\). Since \(U\) is open, there exists some \(\varepsilon>0\) such that \(B_{\varepsilon}^{X}+x \subset U\). By the linearity of \(T\), we obtain \(B_{r\varepsilon}^Y \subset T(B_{\varepsilon}^X)\) for some small \(r\). Using the linearity of \(T\) again, we obtain \[ B_{r\varepsilon}^Y + y \subset T(B_{\varepsilon}^X+x) \subset T(U) \] which shows that \(T(U)\) is open, therefore \(T\) is an open mapping.


One have to notice that the completeness of \(X\) and \(Y\) has been used more than one time. For example, the existence of \(z\) depends on the fact that Cauchy sequence converges in \(X\). Also, the surjectivity of \(T\) cannot be omitted, can you see why?

There are some different ways to state this theorem.

  • To every \(y\) with \(\lVert y \rVert < \delta\), there corresponds an \(x\) with \(\lVert x \rVert<1\) such that \(T(x)=y\).
  • Let \(U\) and \(V\) be the open unit balls of the Banach spaces \(X\) and \(Y\). To every surjective bounded linear map, there corresponds a \(\delta>0\) such that

\[ T(U) \supset \delta{V}. \]

You may also realize that we have used a lot of basic definitions of topology. For example, we checked the openness of \(T(U)\) by using neighborhood. The set \(\overline{T(B_1^X)}\) should also remind you of limit point.

The difference of open mapping and continuous mapping can be viewed via the topologies of two topological vector spaces. Suppose \(f: X \to Y\). If for any \(U \in \tau_X\), we have \(f(U) \in \tau_Y\), where \(\tau_X\) and \(\tau_Y\) are the topologies of \(X\) and \(Y\), respectively. But this has nothing to do with continuity. By continuity we mean, for any \(V \in \tau_Y\), we have \(f^{-1}(V) \in \tau_U\).

Fortunately, this theorem can be generalized to \(F\)-spaces, which will be demonstrated in the following blog post of the series. A space \(X\) is an \(F\)-space if its topology \(\tau\) is induced by a complete invariant metric \(d\). Still, completeness plays a critical rule.

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it's time to make a list of the series. It's been around half a year.

A brief introduction to Fréchet derivative

Fréchet derivative is a generalisation to the ordinary derivative. Generally we are talking about Banach space, where \(\mathbb{R}\)​ is a special case. Indeed, the space discussed is not even required to be of finite dimension.


A real-valued function \(f(t)\) of a real variable, defined on some neighborhood of \(0\), is said to be of \(o(t)\) if \[ \lim_{t \to 0} \frac{f(t)}{t}=0. \] And its derivative at some point \(a\) is defined by \[ f'(a)=\lim_{h \to 0}\frac{f(a+h)-f(a)}{h}. \] We also have this equivalent equation: \[ f(a+h)=f(a)+f'(a)h+o(h). \] Now suppose \(f:U \subset \mathbb{R}^n \to \mathbb{R}^m\) where \(U\) is an open set. The function \(f\) is differentiable at \(x_0 \in U\) if satisfying the following conditions.

  1. All partial derivatives of \(f\), i.e. \(\frac{\partial f_i}{\partial x_j}\) exists for all \(i=1,\cdots,m\) and \(j = 1,\cdots,n\) at \(f\). (Which ensures that the Jacobian matrix exists and is well-defined).

  2. The Jacobian matrix \(J(x_0)\in\mathbb{R}^{m\times n}\) satisfies \[ \lim_{|h| \to 0}\frac{|f(x_0+h)-f(x_0)-J(x_0)h|}{|h|}=0. \] In fact the Jacobian matrix has been the derivative of \(f\) at \(x_0\) although it's a matrix in lieu of number. But we should treat a number as a matrix in the general case. In the following definition of Fréchet derivative, you will see that we should treat something as linear functional.


Let \(f:U\to\mathbf{F}\) be a function where \(U\) is an open subset of \(\mathbf{E}\). We say \(f\) is Fréchet differentiable at \(x \in U\) if there is a bounded and linear operator \(\lambda:\mathbf{E} \to \mathbf{F}\) such that \[ \lim_{\lVert y \rVert \to 0}\frac{\lVert f(x+y)-f(x)-\lambda y \rVert}{\lVert y \rVert}=0. \] We say that \(\lambda\) is the derivative of \(f\) at \(x\), which will be denoted by \(Df(x)\) or \(f'(x)\). Notice that \(\lambda \in L(\mathbf{E},\mathbf{F})\). If \(f\) is differentiable at every point of \(f\), then \(f'\) is a map by \[ f':U \to L(\mathbf{E},\mathbf{F}). \]

The definition above doesn't go too far from real functions defined on the real axis. Now we are assuming that both \(\mathbf{E}\) and \(\mathbf{F}\) are merely topological vector spaces, and still we can get the definition of Fréchet derivative (generalized).

Let \(\varphi\) be a mapping of a neighborhood of \(0\) of \(\mathbf{E}\) into \(\mathbf{F}\). We say that \(\varphi\) is tangent to \(0\) if given a neighborhood \(W\) of \(0\) in \(\mathbf{F}\), there exists a neighborhood \(V\) of \(0\) in \(\mathbf{E}\) such that \[ \varphi(tV) \subset o(t)W \] for some function of \(o(t)\). For example, if both \(\mathbf{E}\) and \(\mathbf{F}\) are normed (not have to be Banach), then we get a usual condition by \[ \lVert \varphi(x) \rVert \leq \lVert x \rVert \psi(x) \] where \(\lim_{\lVert x \rVert \to 0}\psi(x)=0\).

Still we assume that \(\mathbf{E}\) and \(\mathbf{F}\) are topological vector spaces. Let \(f:U \to \mathbf{F}\) be a continuous map. We say that \(f\) is differentiable at a point \(x \in U\) if there exists some \(\lambda \in L(\mathbf{E},\mathbf{F})\) such that for small \(y\) we have \[ f(x+y)=f(x)+\lambda{y}+\varphi(y) \] where \(\varphi\) is tangent to \(0\). Notice that \(\lambda\) is uniquely determined.


You must be familiar with some properties of derivative, but we are redoing these in Banach space.

Chain rule

If \(f: U \to V\) is differentiable at \(x_0\), and \(g:V \to W\) is differentiable at \(f(x_0)\), then \(g \circ f\) is differentiable at \(x_0\), and \[ (g \circ f)'(x_0)=g'(f(x_0)) \circ f'(x_0) \]

Proof. We are proving this in topological vector space. By definition, we already have some linear operator \(\lambda\) and \(\mu\) such that \[ f(x_0+y)=f(x_0)+\lambda{y}+\varphi(y) \\ g(f(x_0)+h)=g(f(x_0))+\mu{h}+\psi(h) \] where \(\varphi\) and \(\psi\) are tangent to \(0\). Further, we got \[ f'(x_0)=\lambda \\ g'(f(x_0))=\mu \] To evaluate \(g(f(x_0+y))\), notice that \[ \begin{equation} \begin{aligned} g(f(x_0+y))&=g[f(x_0)+(\lambda{y}+\varphi(y))] \\ &=g(f(x_0))+\mu(\lambda{y}+\varphi(y))+\psi(\lambda{y}+\varphi(y)) \\ &=g(f(x_0))+\mu\circ\lambda{y}+\mu\circ\varphi(y)+\psi(\lambda{y}+\varphi(y)) \end{aligned} \end{equation} \] It's clear that \(\mu\circ\varphi(y)+\psi(\lambda{y}+\varphi(y))\) is tangent to \(0\), and \(\mu\circ\lambda\) is the linear map we are looking for. That is, \[ (g \circ f)'(x)=g'(f(x_0)) \circ f'(x_0). \]

Derivative of higher orders

From now on, we are dealing with Banach spaces. Let \(U\) be an open subset of \(\mathbf{E}\), and \(f:U \to \mathbf{F}\) be differentiable at each point of \(U\). If \(f'\) is continuous, then we say that \(f\) is of class \(C^1\). The function of order \(C^p\) where \(p \geq 1\) is defined inductively. The \(p\)-th derivative \(D^pf\) is defined as \(D(D^{p-1}f)\) and is itself a map of \(U\) into \(L(\mathbf{E},L(\mathbf{E},\cdots,L(\mathbf{E},\mathbf{F})\cdots)))\) which is isomorphic to \(L^p(\mathbf{E},\mathbf{F})\). A map \(f\) is said to be of class \(C^p\) if its \(kth\) derivative \(D^kf\) exists for \(1 \leq k \leq p\), and is continuous. With the help of chain rule, and the fact that the composition of two continuous functions are continuous, we get

Let \(U,V\) be open subsets of some Banach spaces. If \(f:U \to V\) and \(g: V \to \mathbf{F}\) are of class \(C^p\), then so is \(g \circ f\).

Open subsets of Banach spaces as a category

We in fact get a category \(\{(U,f_U)\}\) where \(U\) is the object as an open subset of some Banach space, and \(f_U\) is the morphism as a map of class \(C^p\) mapping \(U\) into another open set. To verify this, one only has to realize that the composition of two maps of class \(C^p\) is still of class \(C^p\) (as stated above).

We say that \(f\) is of class \(C^\infty\) if \(f\) is of class \(C^p\) for all integers \(p \geq 1\). Meanwhile \(C^0\) maps are the continuous maps.

An example

We are going to evaluate the Fréchet derivative of a nonlinear functional. It is the derivative of a functional mapping an infinite dimensional space into \(\mathbb{R}\) (instead of \(\mathbb{R}\) to \(\mathbb{R}\)).

Consider the functional by \[ \begin{aligned} \Gamma:C^0[0,1] &\to \mathbb{R} \\ u &\mapsto \int_{0}^{1}u^2(x)\sin\pi{x}dx. \end{aligned} \] where the norm is defined by \[ \lVert u \rVert = \sup_{x \in [0,1]}|u|. \]

For \(u\in C[0,1]\), we are going to find an linear operator \(\lambda\) such that \[ \Gamma(u+\eta)=\Gamma(u)+\lambda{\eta}+\varphi(\eta), \] where \(\varphi(\eta)\) is tangent to \(0\).

Solution. By evaluating \(\Gamma(u+\eta)\), we get \[ \begin{aligned} \Gamma(u+\eta)&=\int_{0}^{1}(u+\eta)^2\sin\pi{x}dx \\ &= \Gamma(u)+2\int_{0}^{1}u\eta\sin\pi{x}dx+\int_{0}^{1}\eta^2\sin\pi{x}dx. \end{aligned} \] To prove that \(\int_{0}^{1}\eta^2\sin{x}dx\) is the \(\varphi(\eta)\) desired, notice that \[ \int_{0}^{1}\eta^2\sin\pi{x}dx \leq \lVert\eta\rVert^2\int_{0}^{1}\sin\pi{x}dx=2\lVert \eta \rVert^2. \] Therefore we have \[ 0\leq\lim_{\lVert \eta \rVert \to 0}\frac{\int_{0}^{1}\eta^2\sin\pi{x}dx}{\lVert \eta \rVert} \leq \lim_{\lVert\eta\rVert\to0}2\lVert\eta\rVert=0 \] as desired. The Fréchet derivative of \(\Gamma\) at \(u\) is defined by \[ \begin{aligned} \Gamma'(u):C[0,1] &\to \mathbb{R} \\ \eta &\mapsto 2\int_{0}^{1}u\eta\sin\pi{x}dx. \end{aligned} \] It's hard to believe but, the derivative is not a number, nor a matrix, but a linear operator. But conversely, a real or complex number or matrix can be treated as a linear operator in the nature of things.