Basic Facts of Semicontinuous Functions


We are restricting ourselves into \(\mathbb{R}\) endowed with normal topology. Recall that a function is continuous if and only if for any open set \(U \subset \mathbb{R}\), we have \[ \{x:f(x) \in U\}=f^{-1}(U) \]

to be open. One can rewrite this statement using \(\varepsilon-\delta\) language. To say a function \(f: \mathbb{R} \to \mathbb{R}\) continuous at \(f(x)\), we mean for any \(\varepsilon>0\), there exists some \(\delta>0\) such that for \(t \in (x-\delta,x+\delta)\), we have \[ |f(x)-f(t)|<\varepsilon. \] \(f\) is continuous on \(\mathbb{R}\) if and only if \(f\) is continuous at every point of \(\mathbb{R}\).

If \((x-\delta,x+\delta)\) is replaced with \((x-\delta,x)\) or \((x,x+\delta)\), we get left continuous and right continuous, one of which plays an important role in probability theory.

But the problem is, sometimes continuity is too strong for being a restriction, but the 'direction' associated with left/right continuous functions are unnecessary as well. For example the function \[ f(x)=\chi_{(0,1)}(x) \] is neither left nor right continuous (globally), but it is a thing. Left/right continuity is not a perfectly weakened version of continuity. We need something different.

Definition of semicontinuous

Let \(f\) be a real (or extended-real) function on \(\mathbb{R}\). The semicontinuity of \(f\) is defined as follows.

If \[ \{x:f(x)>\alpha\} \] is open for all real \(\alpha\), we say \(f\) is lower semicontinuous.

If \[ \{x:f(x)<\alpha\} \] is open for all real \(\alpha\), we say \(f\) is upper semicontinuous.

Is it possible to rewrite these definitions à la \(\varepsilon-\delta\)? The answer is yes if we restrict ourselves in metric space.

\(f: \mathbb{R} \to \mathbb{R}\) is upper semicontinuous at \(x\) if, for every \(\varepsilon>0\), there exists some \(\delta>0\) such that for \(t \in (x-\delta,x+\delta)\), we have \[ f(t)<f(x)+\varepsilon \]

\(f: \mathbb{R} \to \mathbb{R}\) is lower semicontinuous at \(x\) if, for every \(\varepsilon>0\), there exists some \(\delta>0\) such that for \(t \in (x-\delta,x+\delta)\), we have \[ f(t)>f(x)-\varepsilon \]

Of course, \(f\) is upper/lower semicontinuous on \(\mathbb{R}\) if and only if it is so on every point of \(\mathbb{R}\). One shall find no difference between the definitions in different styles.

Relation with continuous functions

Here is another way to see it. For the continuity of \(f\), we are looking for arbitrary open subsets \(V\) of \(\mathbb{R}\), and \(f^{-1}(V)\) is expected to be open. For the lower/upper semicontinuity of \(f\), however, the open sets are restricted to be like \((\alpha,+\infty]\) and \([-\infty,\alpha)\). Since all open sets of \(\mathbb{R}\) can be generated by the union or intersection of sets like \([-\infty,\alpha)\) and \((\beta,+\infty]\), we immediately get

\(f\) is continuous if and only if \(f\) is both upper semicontinuous and lower semicontinuous.

Proof. If \(f\) is continuous, then for any \(\alpha \in \mathbb{R}\), we see \([-\infty,\alpha)\) is open, and therefore \[ f^{-1}([-\infty,\alpha)) \] has to be open. The upper semicontinuity is proved. The lower semicontinuity of \(f\) is proved in the same manner.

If \(f\) is both upper and lower semicontinuous, we see \[ f^{-1}((\alpha,\beta))=f^{-1}([-\infty,\beta)) \cap f^{-1}((\alpha,+\infty]) \] is open. Since every open subset of \(\mathbb{R}\) can be written as a countable union of segments of the above types, we see for any open subset \(V\) of \(\mathbb{R}\), \(f^{-1}(V)\) is open. (If you have trouble with this part, it is recommended to review the definition of topology.) \(\square\)


There are two important examples.

  1. If \(E \subset \mathbb{R}\) is open, then \(\chi_E\) is lower semicontinuous.
  2. If \(F \subset \mathbb{R}\) is closed, then \(\chi_F\) is upper semicontinuous.

We will prove the first one. The second one follows in the same manner of course. For \(\alpha<0\), the set \(A=\chi_E^{-1}((\alpha,+\infty])\) is equal to \(\mathbb{R}\), which is open. For \(\alpha \geq 1\), since \(\chi_E \leq 1\), we see \(A=\varnothing\). For \(0 \leq \alpha < 1\) however, the set of \(x\) where \(\chi_E>\alpha\) has to be \(E\), which is still open.

When checking the semicontinuity of a function, we check from bottom to top or top to bottom. The function \(\chi_E\) is defined by \[ \chi_E(x)=\begin{cases} 1 \quad x \in E \\ 0 \quad x \notin E \end{cases}. \]

Addition of semicontinuous functions

If \(f_1\) and \(f_2\) are upper/lower semicontinuous, then so is \(f_1+f_2\).

Proof. We are going to prove this using different tools. Suppose now both \(f_1\) and \(f_2\) are upper semicontinuous. For \(\varepsilon>0\), there exists some \(\delta_1>0\) and \(\delta_2>0\) such that \[ f_1(t) < f_1(x)+\varepsilon/2 \quad t \in (x-\delta_1,x+\delta_1), \\ f_2(t) < f_2(x) + \varepsilon/2 \quad t \in (x-\delta_2,x+\delta_2). \] Proof. If we pick \(\delta=\min(\delta_1,\delta_2)\), then we see for all \(t \in (x-\delta,x+\delta)\), we have \[ f_1(t)+f_2(t)<f_1(x)+f_2(x)+\varepsilon. \] The upper semicontinuity of \(f_1+f_2\) is proved by considering all \(x \in \mathbb{R}\).

Now suppose both \(f_1\) and \(f_2\) are lower semicontinuous. We have an identity by \[ \{x:f_1+f_2>\alpha\}=\bigcup_{\beta\in\mathbb{R}}\{x:f_1>\beta\}\cap\{x:f_2>\alpha-\beta\}. \] The set on the right side is always open. Hence \(f_1+f_2\) is lower semicontinuous. \(\square\)

However, when there are infinite many semicontinuous functions, things are different.

Let \(\{f_n\}\) be a sequence of nonnegative functions on \(\mathbb{R}\), then

  • If each \(f_n\) is lower semicontinuous, then so is \(\sum_{1}^{\infty}f_n\).
  • If each \(f_n\) is upper semicontinuous, then \(\sum_{1}^{\infty}f_n\) is not necessarily upper semicontinuous.

Proof. To prove this we are still using the properties of open sets. Put \(g_n=\sum_{1}^{n}f_k\). Now suppose all \(f_k\) are lower. Since \(g_n\) is a finite sum of lower functions, we see each \(g_n\) is lower. Let \(f=\sum_{n}f_n\). As \(f_k\) are non-negative, we see \(f(x)>\alpha\) if and only if there exists some \(n_0\) such that \(g_{n_0}(x)>\alpha\). Therefore \[ \{x:f(x)>\alpha\}=\bigcup_{n \geq n_0}\{x:g_n>\alpha\}. \] The set on the right hand is open already.

For the upper semicontinuity, it suffices to give a counterexample, but before that, we shall give the motivation.

As said, the characteristic function of a closed set is upper semicontinuous. Suppose \(\{E_n\}\) is a sequence of almost disjoint closed set, then \(E=\cup_{n\geq 1}E_n\) is not necessarily closed, therefore \(\chi_E=\sum\chi_{E_n}\) (a.e.) is not necessarily upper semicontinuous. Now we give a concrete example. Put \(f_0=\chi_{[1,+\infty]}\) and \(f_n=\chi_{E_n}\) for \(n \geq 1\) where \[ E_n=\{x:\frac{1}{1+n} \leq x \leq \frac{1}{n}\}. \] For \(x > 0\), we have \(f=\sum_nf_n \geq 1\). Meanwhile, \(f^{-1}([-\infty,1))=[-\infty,0]\), which is not open. \(\square\)

Notice that \(f\) can be defined on any topological space here.

Maximum and minimum

There is one fact we already know about continuous functions.

If \(X\) is compact, \(f: X \to \mathbb{R}\) is continuous, then there exists some \(a,b \in X\) such that \(f(a)=\min f(X)\), \(f(b)=\max f(X)\).

In fact, \(f(X)\) is compact still. But for semicontinuous functions, things will be different but reasonable. For upper semicontinuous functions, we have the following fact.

If \(X\) is compact and \(f: X \to (-\infty,+\infty)\) is upper semicontinuous, then there exists some \(a \in X\) such that \(f(a)=\max f(X)\).

Notice that \(X\) is not assumed to hold any other topological property. It can be Hausdorff or Lindelöf, but we are not asking for restrictions like this. The only property we will be using is that every open cover of \(X\) has a finite subcover. Of course, one can replace \(X\) with any compact subset of \(\mathbb{R}\), for example, \([a,b]\).

Proof. Put \(\alpha=\sup f(X)\), and define \[ E_n=\{x:f(x)<\alpha-\frac{1}{n}\}. \] If \(f\) attains no maximum, then for any \(x \in X\), there exists some \(n \geq 1\) such that \(f(x)<\alpha-\frac{1}{n}\). That is, \(x \in E_n\) for some \(n\). Therefore \(\bigcup_{n \geq 1}E_n\) covers \(X\). But this cover has no finite subcover of \(X\). A contradiction since \(X\) is compact. \(\square\)

Approximating integrable functions

This is a comprehensive application of several properties of semicontinuity.

(Vitali–Carathéodory theorem) Suppose \(f \in L^1(\mathbb{R})\), where \(f\) is a real-valued function. For \(\varepsilon>0\), there exist some functions \(u\) and \(v\) on \(\mathbb{R}\) such that \(u \leq f \leq v\), \(u\) is an upper semicontinuous function bounded above, and \(v\) is lower semicontinuous bounded below, and \[ \boxed{\int_{\mathbb{R}}(v-u)dm<\varepsilon} \]

It suffices to prove this theorem for \(f \geq 0\) (of course \(f\) is not identically equal to \(0\) since this case is trivial). Since \(f\) is the pointwise limit of an increasing sequence of simple functions \(s_n\), can to write \(f\) as \[ f=s_1+\sum_{n=2}^{\infty}(s_n-s_{n-1}). \] By putting \(t_1=s_1\), \(t_n=s_n-s_{n-1}\) for \(n \geq 2\), we get \(f=\sum_n t_n\). We can write \(f\) as \[ f=\sum_{k=1}^{\infty}c_k\chi_{E_k} \] where \(E_k\) is measurable for all \(k\). Also, we have \[ \int_X f d\mu = \sum_{k=1}^{\infty}c_km(E_k), \] and the series on the right hand converges (since \(f \in L^1\). By the properties of Lebesgue measure, there exists a compact set \(F_k\) and an open set \(V_k\) such that \(F_k \subset E_k \subset V_k\) and \(c_km(V_k-F_k)<\frac{\varepsilon}{2^{k+1}}\). Put \[ v=\sum_{k=1}^{\infty}c_k\chi_{V_k},\quad u=\sum_{k=1}^{N}c_k\chi_{F_k} \] (now you can see \(v\) is lower semicontinuous and \(u\) is upper semicontinuous). The \(N\) is chosen in such a way that \[ \sum_{k=N+1}^{\infty}c_km(E_K)<\frac{\varepsilon}{2}. \] Since \(V_k \supset E_k\), we have \(\chi_{V_k} \geq \chi_{E_k}\). Therefore \(v \geq f\). Similarly, \(f \geq u\). Now we need to check the desired integral inequality. A simple recombination shows that \[ \begin{aligned} v-u&=\sum_{k=1}^{\infty}c_k\chi_{V_k}-\sum_{k=1}^{N}c_k\chi_{F_k} \\ &\leq \sum_{k=1}^{\infty}c_k\chi_{V_k}-\sum_{k=1}^{N}c_k\chi_{F_k}+\sum_{k=N+1}^{\infty}c_k(\chi_{E_k}-\chi_{F_k}) \\ &=\sum_{k=1}^{\infty}c_k(\chi_{V_k}-\chi_{F_k})+\sum_{k=N+1}^{\infty}c_k\chi_{E_k}. \end{aligned}. \] If we integrate the function above, we get \[ \begin{aligned} \int_{\mathbb{R}}(v-u)dm &\leq \sum_{k=1}^{\infty}c_k\mu(V_k-E_k)+\sum_{k=N+1}^{\infty}c_k\chi_{E_k} \\ &< \sum_{k=1}^{\infty}\frac{\varepsilon}{2^{k+1}}+\frac{\varepsilon}{2} \\ &=\varepsilon. \end{aligned} \] This proved the case when \(f \geq 0\). In the general case, we write \(f=f^{+}-f^{-}\). Attach the semicontinuous functions to \(f^{+}\) and \(f^{-}\) respectively by \(u_1 \leq f^{+} \leq v_1\) and \(u_2 \leq f^{-} \leq v_2\). Put \(u=u_1-v_2\), \(v=v_1-u_2\). As we can see, \(u\) is upper semicontinuous and \(v\) is lower semicontinuous. Also, \(u \leq f \leq v\) with the desired property since \[ \int_\mathbb{R}(v-u)dm=\int_\mathbb{R}(v_1-u_1)dm+\int_\mathbb{R}(v_2-u_2)dm<2\varepsilon, \] and the theorem follows. \(\square\)


Indeed, the only property about measure used is the existence of \(F_k\) and \(V_k\). The domain \(\mathbb{R}\) here can be replaced with \(\mathbb{R}^k\) for \(1 \leq k < \infty\), and \(m\) be replaced with the respective \(m_k\). Much more generally, the domain can be replaced by any locally compact Hausdorff space \(X\) and the measure by any measure associated with the Riesz-Markov-Kakutani representation theorem on \(C_c(X)\).

Is the reverse approximation always possible?

The answer is no. Consider the fat Cantor set \(K\), which has Lebesgue measure \(\frac{1}{2}\). We shall show that \(\chi_K\) can not be approximated below by a lower semicontinuous function.

If \(v\) is a lower semicontinuous function such that \(v \leq \chi_K\), then \(v \leq 0\).

Proof. Consider the set \(V=v^{-1}((0,1])=v^{-1}((0,+\infty))\). Since \(v \leq \chi_K\), we have \(V \subset K\). We will show that \(V\) has to be empty.

Pick \(t \in V\). Since \(V\) is open, there exists some neighbourhood \(U\) containing \(t\) such that \(U \subset V\). But \(U=\varnothing\) since \(U \subset K\) and \(K\) has an empty interior. Therefore \(V = \varnothing\). That is, \(v \leq 0\) for all \(x\). \(\square\)

Suppose \(u\) is an upper semicontinuous function such that \(u \geq f\). For \(\varepsilon=\frac{1}{2}\), we have \[ \int_{\mathbb{R}}(u-v)dm \geq \int_\mathbb{R}(f-v)dm \geq \frac{1}{2}. \] This example shows that there exist some integrable functions that are not able to reversely approximated in the sense of the Vitali–Carathéodory theorem.

An Introduction to Quotient Space

I'm assuming the reader has some abstract algebra and functional analysis background. You may have learned this already in your linear algebra class, but we are making our way to functional analysis problems.


The trouble with \(L^p\) spaces

Fix \(p\) with \(1 \leq p \leq \infty\). It's easy to see that \(L^p(\mu)\) is a topological vector space. But it is not a metric space if we define \[ d(f,g)=\lVert f-g \rVert_p. \] The reason is, if \(d(f,g)=0\), we can only get \(f=g\) a.e., but they are not strictly equal. With that being said, this function \(d\) is actually a pseudo metric. This is unnatural. However, the relation \(\sim\) by \(f \sim g \mathbb{R}ightarrow d(f,g)=0\) is a equivalence relation. This inspires us to take the quotient set into consideration.

Vector spaces are groups anyway

For a vector space \(V\), every subspace of \(V\) is a normal subgroup. There is no reason to prevent ourselves from considering the quotient group and looking for some interesting properties. Further, a vector space is an abelian group, therefore any subspace is automatically normal.


Let \(N\) be a subspace of a vector space \(X\). For every \(x \in X\), let \(\pi(x)\) be the coset of \(N\) that contains \(x\), that is \[ \pi(x)=x+N. \] Trivially, \(\pi(x)=\pi(y)\) if and only if \(x-y \in N\) (say, \(\pi\) is well-defined since \(N\) is a vector space). This is a linear function since we also have the addition and multiplication by \[ \pi(x)+\pi(y)=\pi(x+y) \quad \alpha\pi(x)=\pi(\alpha{x}). \] These cosets are the elements of a vector space \(X/N\), which reads, the quotient space of \(X\) modulo \(N\). The map \(\pi\) is called the canonical map as we all know.



First, we shall treat \(\mathbb{R}^2\) as a vector space, and the subspace \(\mathbb{R}\), which is graphically represented by \(x\)-axis, as a subspace (we will write it as \(X\)). For a vector \(v=(2,3)\), which is represented by \(AB\), we see the coset \(v+X\) has something special. Pick any \(u \in X\), for example, \(AE\), \(AC\), or \(AG\). We see \(v+u\) has the same \(y\) value. The reason is simple since we have \(v+u=(2+x,3)\), where the \(y\) value remains fixed however \(u\) may vary.

With that being said, the set \(v+X\), which is not a vector space, can be represented by \(\overrightarrow{AD}\). This proceed can be generalized to \(\mathbb{R}^n\) with \(\mathbb{R}^m\) as a subspace with ease.

We now consider a fancy example. Consider all rational Cauchy sequences, that is \[ (a_n)=(a_1,a_2,\cdots) \] where \(a_k\in\mathbb{Q}\) for all \(k\). In analysis class, we learned two facts.

  1. Any Cauchy sequence is bounded.
  2. If \((a_n)\) converges, then \((a_n)\) is Cauchy.

However, the reverse of 2 does not hold in \(\mathbb{Q}\). For example, if we put \(a_k=(1+\frac{1}{k})^k\), we should have the limit to be \(e\), but \(e \notin \mathbb{Q}\).

If we define the addition and multiplication term by term, namely \[ (a_n)+(b_n)=(a_1+b_1,a_2+b_2,\cdots) \] and \[ (\alpha a_n)=(\alpha a_1,\alpha a_2,\cdots) \] where \(\alpha \in \mathbb{Q}\), we get a vector space (the verification is easy). The zero vector is defined by \[ (0)=(0,0,\cdots). \] This vector space is denoted by \(\overline{\mathbb{Q}}\). The subspace containing all sequences converges to \(0\) will be denoted by \(\overline{\mathbb{O}}\). Again, \((a_n)+\overline{\mathbb{O}}=(b_n)+\overline{\mathbb{O}}\) if and only if \((a_n-b_n) \in \overline{\mathbb{O}}\). Using the language of equivalence relation, we also say \((a_n)\) and \((b_n)\) are equivalent if \((a_n-b_n) \in \overline{\mathbb{O}}\). For example, the two following sequences are equivalent: \[ (1,1,1,\cdots,1,\cdots)\quad\quad (0.9,0.99,0.999,\cdots). \] Actually, we will get \(\mathbb{R} \simeq \overline{\mathbb{Q}}/\overline{\mathbb{O}}\) in the end. But to make sure that this quotient space is exactly the one we meet in our analysis class, there are a lot of verifications should be done.

We shall give more definitions for calculation. The multiplication of two Cauchy sequences is defined term by term à la the addition. For \(\overline{\mathbb{Q}}/\overline{\mathbb{O}}\) we have \[ ((a_n)+\overline{\mathbb{O}})+((b_n)+\overline{\mathbb{O}})=(a_n+b_n) + \overline{\mathbb{O}} \] and \[ ((a_n)+\overline{\mathbb{O}})((b_n)+\overline{\mathbb{O}})=(a_nb_n)+\overline{\mathbb{O}}. \] As for inequality, a partial order has to be defined. We say \((a_n) > (0)\) if there exists some \(N>0\) such that \(a_n>0\) for all \(n \geq N\). By \((a_n) > (b_n)\) we mean \((a_n-b_n)>(0)\) of course. For cosets, we say \((a_n)+\overline{\mathbb{O}}>\overline{\mathbb{O}}\) if \((x_n) > (0)\) for some \((x_n) \in (a_n)+\overline{\mathbb{O}}\). This is well defined. That is, if \((x_n)>(0)\), then \((y_n)>(0)\) for all \((y_n) \in (a_n)+\overline{\mathbb{O}}\).

With these operations being defined, it can be verified that \(\overline{\mathbb{Q}}/\overline{\mathbb{O}}\) has the desired properties, for example, the least-upper-bound property. But this goes too far from the topic, we are not proving it here. If you are interested, you may visit here for more details.

Finally, we are trying to make \(L^p\) a Banach space. Fix \(p\) with \(1 \leq p < \infty\). There is a seminorm defined for all Lebesgue measurable functions on \([0,1]\) by \[ p(f)=\lVert f \rVert_p=\left\{\int_{0}^{1}|f(t)|^pdt\right\}^{1/p} \] \(L^p\) is a vector space containing all functions \(f\) with \(p(f)<\infty\). But it's not a normed space by \(p\), since \(p(f)=0\) only implies \(f=0\) almost everywhere. However, the set \(N\) which contains all functions that equal \(0\) is also a vector space. Now consider the quotient space by \[ \tilde{p}(\pi(f))=p(f), \] where \(\pi\) is the canonical map of \(L^p\) into \(L^p/N\). We shall prove that \(\tilde{p}\) is well-defined here. If \(\pi(f)=\pi(g)\), we have \(f-g \in N\), therefore \[ 0=p(f-g)\geq |p(f)-p(g)|, \] which forces \(p(f)=p(g)\). Therefore in this case we also have \(\tilde{p}(\pi(f))=\tilde{p}(\pi(g))\). This indeed ensures that \(\tilde{p}\) is a norm, and \(L^p/N\) a Banach space. There are some topological facts required to prove this, we are going to cover a few of them.

Topology of quotient space


We know if \(X\) is a topological vector space with a topology \(\tau\), then the addition and scalar multiplication are continuous. Suppose now \(N\) is a closed subspace of \(X\). Define \(\tau_N\) by \[ \tau_N=\{E \subset X/N:\pi^{-1}(E)\in \tau\}. \] We are expecting \(\tau_N\) to be properly-defined. And fortunately, it is. Some interesting techniques will be used in the following section.

\(\tau_N\) is a vector topology

There will be two steps to get this done.

\(\tau_N\) is a topology.

It is trivial that \(\varnothing\) and \(X/N\) are elements of \(\tau_N\). Other properties are immediate as well since we have \[ \pi^{-1}(A \cap B) = \pi^{-1}(A) \cap \pi^{-1}(B) \] and \[ \pi^{-1}(\cup A_\alpha)=\cup\pi^{-1}( A_{\alpha}). \] That said, if we have \(A,B\in \tau_N\), then \(A \cap B \in \tau_N\) since \(\pi^{-1}(A \cap B)=\pi^{-1}(A) \cap \pi^{-1}(B) \in \tau\).

Similarly, if \(A_\alpha \in \tau_N\) for all \(\alpha\), we have \(\cup A_\alpha \in \tau_N\). Also, by definition of \(\tau_N\), \(\pi\) is continuous.

\(\tau_N\) is a vector topology.

First, we show that a point in \(X/N\), which can be written as \(\pi(x)\), is closed. Notice that \(N\) is assumed to be closed, and \[ \pi^{-1}(\pi(x))=x+N \] therefore has to be closed.

In fact, \(F \subset X/N\) is \(\tau_N\)-closed if and only if \(\pi^{-1}(F)\) is \(\tau\)-closed. To prove this, one needs to notice that \(\pi^{-1}(F^c)=(\pi^{-1}(F))^{c}\).

Suppose \(V\) is open, then \[ \pi^{-1}(\pi(V))=N+V \] is open. By definition of \(\tau_N\), we have \(\pi(V) \in \tau_N\). Therefore \(\pi\) is an open mapping.

If now \(W\) is a neighbourhood of \(0\) in \(X/N\), there exists a neighbourhood \(V\) of \(0\) in \(X\) such that \[ V + V \subset \pi^{-1}(W). \] Hence \(\pi(V)+\pi(V) \subset W\). Since \(\pi\) is open, \(\pi(V)\) is a neighbourhood of \(0\) in \(X/N\), this shows that the addition is continuous.

The continuity of scalar multiplication will be shown in a direct way (so can the addition, but the proof above is intended to offer some special technique). We already know, the scalar multiplication on \(X\) by \[ \begin{aligned} \varphi:\Phi \times X &\to X \\ (\alpha,x) &\mapsto \alpha{x} \end{aligned} \] is continuous, where \(\Phi\) is the scalar field (usually \(\mathbb{R}\) or \(\mathbb{C}\). Now the scalar multiplication on \(X/N\) is by \[ \begin{aligned} \psi: \Phi \times X/N &\to X/N \\ (\alpha,x+N) &\mapsto \alpha{x}+N. \end{aligned} \] We see \(\psi(\alpha,x+N)=\pi(\varphi(\alpha,x))\). But the composition of two continuous functions is continuous, therefore \(\psi\) is continuous.

A commutative diagram by quotient space

We are going to talk about a classic commutative diagram that you already see in algebra class.


There are some assumptions.

  1. \(X\) and \(Y\) are topological vector spaces.
  2. \(\Lambda\) is linear.
  3. \(\pi\) is the canonical map.
  4. \(N\) is a closed subspace of \(X\) and \(N \subset \ker\Lambda\).

Algebraically, there exists a unique map \(f: X/N \to Y\) by \(x+N \mapsto \Lambda(x)\). Namely, the diagram above is commutative. But now we are interested in some analysis facts.

\(f\) is linear.

This is obvious. Since \(\pi\) is surjective, for \(u,v \in X/N\), we are able to find some \(x,y \in X\) such that \(\pi(x)=u\) and \(\pi(y)=v\). Therefore we have \[ \begin{aligned} f(u+v)=f(\pi(x)+\pi(y))&=f(\pi(x+y)) \\ &=\Lambda(x+y) \\ &=\Lambda(x)+\Lambda(y) \\ &= f(\pi(x))+f(\pi(y)) \\ &=f(u)+f(v) \end{aligned} \] and \[ \begin{aligned} f(\alpha{u})=f(\alpha\pi(x))&=f(\pi(\alpha{x})) \\ &= \Lambda(\alpha{x}) \\ &= \alpha\Lambda(x) \\ &= \alpha{f(\pi(x))} \\ &= \alpha{f(u)}. \end{aligned} \]

\(\Lambda\) is open if and only if \(f\) is open.

If \(f\) is open, then for any open set \(U \subset X\), we have \[ \Lambda(U)=f(\pi(U)) \] to be an open set since \(\pi\) is open, and \(\pi(U)\) is an open set.

If \(f\) is not open, then there exists some \(V \subset X/N\) such that \(f(V)\) is closed. However, since \(\pi\) is continuous, we have \(\pi^{-1}(V)\) to be open. In this case, we have \[ f(\pi(\pi^{-1}(V)))=f(V)=\Lambda(\pi^{-1}(V)) \] to be closed. \(\Lambda\) is therefore not open. This shows that if \(\Lambda\) is open, then \(f\) is open.

\(\Lambda\) is continuous if and only if \(f\) is continuous.

If \(f\) is continuous, for any open set \(W \subset Y\), we have \(\pi^{-1}(f^{-1}(W))=\Lambda^{-1}(W)\) to be open. Therefore \(\Lambda\) is continuous.

Conversely, if \(\Lambda\) is continuous, for any open set \(W \subset Y\), we have \(\Lambda^{-1}(W)\) to be open. Therefore \(f^{-1}(W)=\pi(\Lambda^{-1}(W))\) has to be open since \(\pi\) is open.

The Big Three Pt. 3 - The Open Mapping Theorem (Banach Space)

What is open mapping

An open map is a function between two topological spaces that maps open sets to open sets. Precisely speaking, a function \(f: X \to Y\) is open if for any open set \(U \subset X\), \(f(U)\) is open in \(Y\). Likewise, a closed map is a function mapping closed sets to closed sets.

You may think open/closed map is an alternative name of continuous function. But it's not. The definition of open/closed mapping is totally different from continuity. Here are some simple examples.

  1. \(f(x)=\sin{x}\) defined on \(\mathbb{R}\) is not open, though it's continuous. It can be verified by considering \((0,2\pi)\), since we have \(f((0,2\pi))=[-1,1]\).
  2. The projection \(\pi: \mathbb{R}^2 \to \mathbb{R}\) defined by \((x,y) \mapsto x\) is open. Indeed, it maps an open ball onto an open interval on \(x\) axis.
  3. The inclusion map \(\varphi: \mathbb{R} \to \mathbb{R}^2\) by \(x \mapsto (x,0)\) however, is not open. An open interval on the plane is locally closed but not open or closed.

Under what condition will a continuous linear function between two TVS be an open mapping? We'll give the answer in this blog post. Open mapping theorem is a sufficient condition on whether a continuous linear function is open.

Open Mapping Theorem

Let \(X,Y\) be Banach spaces and \(T: X \to Y\) a surjective bounded linear map. Then \(T\) is an open mapping.

The open balls in \(X\) and \(Y\) are defined respectively by \[ B_r^X=\{x \in X:\lVert x \rVert<r\}\quad\text{and}\quad B_r^Y=\{y \in Y:\lVert y \rVert<r\} \] All we need to do is show that there exists some \(r>0\) such that \[ B_r^Y \subset T(B_1^X) \] Since every open set in \(X\) or \(Y\) can be expressed as a union of open balls. For a ball in \(X\) centered at \(x \in X\) with radius \(r\), we can express it as \(x+B_r^X\). After that, it becomes obvious that \(T\) maps open set to open set.

First we have \[ X=\bigcup_{n=1}^{\infty}B_n^{X}. \] The surjectivity of \(T\) ensures that \[ Y=\bigcup_{n=1}^{\infty}T(B_n^X). \] Since \(Y\) is Banach, or simply a complete metric space, by Baire category theorem, there must be some \(n_0 \in \mathbb{N}\) such that \(\overline{T(B_{n_0}^{X})}\) has nonempty interior. If not, which means \(T(B_n^{X})\) is nowhere dense for all \(n \in \mathbb{N}\), we have \(Y\) is of the first category. A contradiction.

Since \(x \to nx\) is a homeomorphism of \(X\) onto \(X\), we see in fact \(T(B_n^X)\) is not nowhere dense for all \(n \in \mathbb{N}\). Therefore, there exists some \(y_0 \in \overline{T(B_1^{X})}\) and some \(\varepsilon>0\) such that \[ y_0+B_\varepsilon^Y \subset \overline{T(B_1^X)} \] the open set on the left hand is a neighborhood of \(y_0\), which should be in the interior of \(\overline{T(B_1^X)}\).

On the other hand, we claim \[ \overline{T(B_1^X)} - y_0 \subset \overline{T(B_2^X)}. \] We shall prove it as follows. Pick any \(y \in \overline{T(B_1^X)}\), we shall show that \(y-y_0 \in \overline{T(B_2^X)}\). For \(y_0\), there exists a sequence of \(y_n\) where \(\lVert y_n \rVert <1\) for all \(n\) such that \(Ty_n \to y_0\). Also we are able to find a sequence of \(x_n\) where \(\lVert x_n \rVert <1\) for all \(n\) such that \(Tx_n \to y\). Notice that we also have \[ y-y_0=\lim_{n \to \infty}T(x_n-y_n), \] since \[ \lVert x_n -y_n \rVert \leq \lVert x_n \rVert+\lVert y_n \rVert <2, \] we see \(T(x_n-y_n) \in T(B_2^X)\) for all \(n\), it follows that \[ y-y_0 \in \overline{T(B_2^X)}. \] Combining all these relations, we get \[ B_\varepsilon^Y \subset \overline{T(B_2^X)}. \] Since \(T\) is linear, we see \[ 2B_{\varepsilon/2}^{Y} \subset \overline{T(2B_1^X)}=2\overline{T(B_1^X)}. \] By induction we get \[ B_{\varepsilon/2^n}^Y \subset \overline{T(B_{1/2^{n-1}}^X)} \] for all \(n \geq 1\).

We shall show however \[ B_{\varepsilon/4}^Y \subset T(B_1^X). \] For any \(u \in B_{\varepsilon/4}^Y\), we have \(u \in \overline{T(B_{1/2}^X)}\). There exists some \(x_1 \in B_{1/2}^{X}\) such that \[ \lVert u-Tx_1 \rVert < \frac{\varepsilon}{8}. \] This implies that \(u-Tx_1 \in B_{\varepsilon/8}^Y\). Under the same fashion, we are able to pick \(x_n\) in such a way that \[ \lVert u-Tx_1-Tx_2-\cdots-Tx_n \rVert < \frac{\varepsilon}{2^{n+2}} \] where \(\lVert x_n \rVert<2^{-n}\). Now let \(z_n=\sum_{k=1}^{n}x_k\), we shall show that \((z_n)\) is Cauchy. For \(m<n\), we have \[ \lVert z_n - z_m \rVert =\left\Vert\sum_{k=m+1}^nx_k \right\Vert \leq \sum_{k=m+1}^{n}\lVert x_k\rVert < \frac{1}{2^{m+1}} \] Since \(X\) is Banach, there exists some \(z \in X\) such that \(z_n \to z\). Further we have \[ \lVert z\rVert = \lim_{n \to \infty}\lVert z_n \rVert \leq \sum_{k=1}^{\infty}\lVert x_n \rVert < 1 \] therefore \(z \in B_1^X\). Since \(T\) is bounded, therefore continuous, we get \(T(z)=u\). To summarize, for \(u \in B_{\varepsilon/4}^Y\), we have some \(z \in B_{1}^X\) such that \(T(z)=y\), which implies \(T(B_1^X) \supset B_{\varepsilon/4}^Y\).

Let \(U \subset X\) be open, we want to show that \(T(U)\) is also open. Take \(y \in T(U)\), then \(y=T(x)\) with \(x \in U\). Since \(U\) is open, there exists some \(\varepsilon>0\) such that \(B_{\varepsilon}^{X}+x \subset U\). By the linearity of \(T\), we obtain \(B_{r\varepsilon}^Y \subset T(B_{\varepsilon}^X)\) for some small \(r\). Using the linearity of \(T\) again, we obtain \[ B_{r\varepsilon}^Y + y \subset T(B_{\varepsilon}^X+x) \subset T(U) \] which shows that \(T(U)\) is open, therefore \(T\) is an open mapping.


One have to notice that the completeness of \(X\) and \(Y\) has been used more than one time. For example, the existence of \(z\) depends on the fact that Cauchy sequence converges in \(X\). Also, the surjectivity of \(T\) cannot be omitted, can you see why?

There are some different ways to state this theorem.

  • To every \(y\) with \(\lVert y \rVert < \delta\), there corresponds an \(x\) with \(\lVert x \rVert<1\) such that \(T(x)=y\).
  • Let \(U\) and \(V\) be the open unit balls of the Banach spaces \(X\) and \(Y\). To every surjective bounded linear map, there corresponds a \(\delta>0\) such that

\[ T(U) \supset \delta{V}. \]

You may also realize that we have used a lot of basic definitions of topology. For example, we checked the openness of \(T(U)\) by using neighborhood. The set \(\overline{T(B_1^X)}\) should also remind you of limit point.

The difference of open mapping and continuous mapping can be viewed via the topologies of two topological vector spaces. Suppose \(f: X \to Y\). If for any \(U \in \tau_X\), we have \(f(U) \in \tau_Y\), where \(\tau_X\) and \(\tau_Y\) are the topologies of \(X\) and \(Y\), respectively. But this has nothing to do with continuity. By continuity we mean, for any \(V \in \tau_Y\), we have \(f^{-1}(V) \in \tau_U\).

Fortunately, this theorem can be generalized to \(F\)-spaces, which will be demonstrated in the following blog post of the series. A space \(X\) is an \(F\)-space if its topology \(\tau\) is induced by a complete invariant metric \(d\). Still, completeness plays a critical rule.

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it's time to make a list of the series. It's been around half a year.

The Big Three Pt. 2 - The Banach-Steinhaus Theorem

About this blog post

People call the Banach-Steinhaus theorem the first of the big three, which sits at the foundation of linear functional analysis. None of them can go without the Baire's category theorem.

This blog post offers the Banach-Steinhaus theorem on different abstract levels. Recall that we have \[ \text{TVS} \supset \text{Metrizable TVS} \supset \text{F-space} \supset \text{Fréchet space}\supset\text{Banach space} \supset \text{Hilbert space} \] First, there will be a simple version for Banach spaces, which may be more frequently used, and you will realize why it's referred to as the uniform boundedness principle. After that, there will be a much more generalized version for TVS. Typically, the metrization of the space will not be considered.

Also, it will be a good chance to get a better view of the first and second space by Baire.


For metric spaces, equicontinuity is defined as follows. Let \((X,d_X)\) and \((Y,d_Y)\) be two metric spaces.

Let \(\Lambda\) be a collection of functions from \(X\) to \(Y\). We have three different levels of equicontinuity.

  1. Equicontinuous at a point. For \(x_0 \in X\), if for every \(\varepsilon>0\), there exists a \(\delta>0\) such that \(d_Y(Lx_0,Lx)<\varepsilon\) for all \(L \in \Lambda\) and \(d_X(x_0,x)<\delta\) (that is, the continuity holds for all \(L\) in a ball centered at \(x_0\) with radius \(r\)).
  2. Pointwise equicontinuous. \(\Lambda\) is equicontinuous at each point of \(X\).
  3. Uniformly equicontinuous. For every \(\varepsilon>0\), there exists a \(\delta>0\) such that \(d_Y(Lx,Ly)<\varepsilon\) for all \(x \in \Lambda\) and \(x,y \in X\) such that \(d_X(x,y) < \delta\).

Indeed, if \(\Lambda\) contains only one element, namely \(L\), then everything goes with the continuity and uniform continuity.

But for Banach-Steinhaus theorem, we need a little more restrictions. In fact, \(X\) and \(Y\) should be considered Banach spaces, and \(\Lambda\) contains linear functions only. In this sense, for \(L \in \Lambda\), we have the following three conditions equivalent.

  1. \(L\) is bounded.
  2. \(L\) is continuous.
  3. \(L\) is continuous at one point of \(X\).

For topological vector spaces, where only topology and linear structure are taken into consideration, things get different. Since no metrization is considered, we have to state it in the language of topology.

Suppose \(X\) and \(Y\) are TVS and \(\Lambda\) is a collection of linear functions from \(X\) to \(Y\). \(\Lambda\) is equicontinuous if for every neighborhood \(N\) of \(0\) in \(Y\), there corresponds a neighborhood \(V\) of \(0\) in \(X\) such that \(L(V) \subset N\) for all \(L \in \Lambda\).

Indeed, for TVS, \(L \in \Lambda\) has the three conditions equivalent as well. With that being said, equicontinuous collection has the boundedness property in a uniform manner. That's why the Banach-Steinhaus theorem is always referred to as the uniform boundedness principle.

The Banach-Steinhaus theorem, a sufficient condition for being equicontinuous

Banach space version

Suppose \(X\) is a Banach space, \(Y\) is a normed linear space, and \({F}\) is a collection of bounded linear transformation of \(X\) into \(Y\), we have two equivalent statements: 1. (The Resonance Theorem) If \(\sup\limits_{L \in \Lambda}\left\Vert{L}\right\Vert=\infty\), then there exists some \(x \in X\) such that \(\sup\limits_{L \in {L}}\left\Vert{Lx}\right\Vert=\infty\). (In fact, these \(x\) form a dense \(G_\delta\).)

  1. (The Uniform Boundedness Principle) If \(\sup\limits_{L \in {\Lambda}}\left\Vert{Lx}\right\Vert<\infty\) for all \(x \in X\), then we have $ L M$ for all \(L \in {\Lambda}\) and some \(M<\infty\).
  2. (A summary of 1 and 2) Either there exists an \(M<\infty\) such that \(\lVert L \rVert \leq M\) for all \(L \in {L}\), or \(\sup\lVert Lx \rVert = \infty\) for all \(x\) belonging to some dense \(G_\delta\) in \(X\).


Though it would be easier if we finish the TVS version proof, it's still a good idea to leave the formal proof without the help of TVS here. The equicontinuity of \(\Lambda\) will be shown in the next section.

An elementary proof of the Resonance theorem

First, we offer an elementary proof in which the hardest part is the Cauchy sequence.

(Lemma) For any \(x \in X\) and \(r >0\), we have \[ \sup_{y\in B(x,r)}\lVert Lx \rVert \geq \lVert L \rVert r \] where \(B(x,r)=\{y \in X:\lVert x-y \rVert < r\}\).

(Proof of the lemma)

For \(t \in X\) we have a simple relation \[ \begin{aligned} \max(\lVert{L(x+t)}\rVert,\lVert{L(x-t)}\rVert)&=\frac{1}{2}(\lVert{L(x+t)}\rVert+\lVert{L(x-t)}\rVert)+\frac{1}{2}\left\vert\lVert{L(x+t)}\rVert-\lVert{L(x-t)}\rVert\right\vert \\ &\geq \frac{1}{2}(\lVert{L(x+t)}\rVert+\lVert{L(x-t)}\rVert) \\ &\geq \frac{1}{2}\lVert{L(2t)}\rVert=\lVert Lt \rVert \end{aligned} \] If we have \(t \in B(0,r)\), then \(x+t,x-t\in{B(x,r)}\). And the desired inequality follows by taking the supremum over \(t \in B(0,r)\). (If you find trouble understanding this, take a look at the definition of \(\lVert L \rVert\).)

Suppose now \(\sup\limits_{L \in \Lambda}\left\Vert{L}\right\Vert=\infty\). Pick a sequence of linear transformation in \(\Lambda\), say \((L_n)_{n=1}^{\infty}\), such that \(\lVert L_n \rVert \geq 4^n\). Pick \(x_0 \in X\), and for \(n \geq 1\), we pick \(x_n\) inductively.

Set \(r_n=3^{-n}\). With \(x_{n-1}\) being picked, \(x_n \in B(x_{n-1},r_n)\) is picked in such a way that \[ \lVert L_n x_n \rVert \geq \frac{2}{3}\lVert L_n \rVert r_n \] (It's easy to validate this inequality by reaching a contradiction.) Also, it's easy to check that \((x_n)_{n=1}^{\infty}\) is Cauchy. Since \(X\) is complete, \((x_n)\) converges to some \(x \in X\). Further we have \[ \begin{aligned} \lVert x-x_n \rVert &\leq \sum_{k=n}^{\infty}\lVert x_k - x_{k+1}\rVert \\ &=\frac{1}{2\cdot 3^n} \end{aligned} \] Therefore we have \[ \begin{aligned} \lVert L_n x \rVert &=\lVert L_n[x_n-(x_n-x)] \rVert \\ &\geq \lVert L_nx_n \rVert - \lVert L_n(x_n-x) \rVert \\ &\geq \frac{2}{3}\lVert{L_n}\rVert{3}^{-n}-\lVert{L_n}\rVert\lVert{x_n-x}\rVert\\ &\geq \frac{1}{6}\lVert{L_n}\rVert{3}^{-n} \\ & \geq \frac{1}{6}\left(\frac{4}{3}\right)^n \to\infty \end{aligned} \]

A topology-based proof

The previous proof is easy to understand but it's not easy to see the topological properties of the set formed by such \(x\). Thus we are offering a topology-based proof which enables us to get a topology view.

Put \[ \varphi(x)=\sup_{L \in \Lambda}\lVert Lx \rVert \] and let \[ V_n=\{x:\varphi(x)>n\} \] we claim that each \(V_n\) is open. Indeed, we have to show that \(x \mapsto \lVert Lx \rVert\) is continuous. It suffice to show that \(\lVert\cdot\rVert\) defined in \(Y\) is continuous. This follows immediately from triangle inequality since for \(x,y \in Y\) we have \[ \lVert x \rVert \leq \lVert x-y \rVert + \lVert y \rVert \] which implies \[ \lVert x \rVert - \lVert y \rVert \leq \lVert x-y \rVert \] by interchanging \(x\) and \(y\), we get \[ |\lVert x \rVert - \lVert y \rVert | \leq \lVert x-y \rVert \] Thus \(x \mapsto \lVert Lx \rVert\) is continuous since it's a composition of \(\lVert\cdot\rVert\) and \(L\). Hence \(\varphi\), by the definition, is lower semicontinuous, which forces \(V_n\) to be open.

If every \(V_n\) is dense in \(X\) (consider \(\sup\lVert L \rVert=\infty\)), then by BCT, \(B=\bigcap_{n=1}^{\infty} V_n\) is dense in \(X\). Since each \(V_n\) is open, \(B\) is a dense \(G_\delta\). Again by the definition of \(B\), we have \(\varphi(x)=\infty\) for all \(x \in B\).

If one of these sets, namely \(V_N\), fails to be dense in \(X\), then there exist an \(x_0 \in X - V_N\) and an \(r>0\) such that for \(x \in B(0,r)\) we have \(x_0+x \notin V_N\), which is equivalent to \[ \varphi(x+x_0) \leq N \] considering the definition of \(\varphi\), we also have \[ \lVert L(x+x_0) \rVert \leq N \] for all \(L \in \Lambda\). Since \(x=(x+x_0)-x_0\), we also have \[ \lVert Lx \rVert \leq \lVert L(x+x_0) \rVert+\lVert Lx_0 \rVert \leq 2N \] Dividing \(r\) on two sides, we got \[ \lVert L\frac{x}{r}\rVert \leq \frac{2N}{r} \] therefore \(\lVert L \rVert \leq M=\frac{2N}{r}\) as is to be shown. Again, this follows from the definition of \(\lVert L \rVert\).

Topological vector space version

Suppose \(X\) and \(Y\) are topological vector spaces, \(\Lambda\) is a collection of continuous linear mapping from \(X\) into \(Y\), and \(B\) is the set of all \(x \in X\) whose orbits \[ \Lambda(x)=\{Lx:L\in\Lambda\} \] are bounded in \(Y\). For this \(B\), we have:

  • If \(B\) is of the second category, then \(\Lambda\) is equicontinuous.
A proof using properties of TVS

Pick balanced neighborhoods \(W\) and \(U\) of the origin in \(Y\) such that \(\overline{U} + \overline{U} \subset W\). The balanced neighborhood exists since every neighborhood of \(0\) contains a balanced one.

Put \[ E=\bigcap_{L \in \Lambda}L^{-1}(\overline{U}). \] If \(x \in B\), then \(\Lambda(x)\) is bounded, which means that to \(U\), there exists some \(n\) such that \(\Lambda(x) \subset nU\) (Be aware, no metric is introduced, this is the definition of boundedness in topological space). Therefore we have \(x \in nE\). Consequently, \[ B\subset \bigcup_{n=1}^{\infty}nE. \] If no \(nE\) is of the second category, then \(B\) is of the first category. Therefore, there exists at least one \(n\) such that \(nE\) is of the second category. Since \(x \mapsto nx\) is a homeomorphism of \(X\) onto \(X\), \(E\) is of the second category as well. But \(E\) is closed since each \(L\) is continuous. Therefore \(E\) has an interior point \(x\). In this case, \(x-E\) contains a neighborhood \(V\) of \(0\) in \(X\), and \[ L(V) \subset Lx-L(E) \subset \overline{U} - \overline{U} \subset W \] This proves that \(\Lambda\) is equicontinuous.

Equicontinuity and uniform boundedness

We'll show that \(B=X\). But before that, we need another lemma, which states the connection between equicontinuity and uniform boundedness

(Lemma) Suppose \(X\) and \(Y\) are TVS, \(\Gamma\) is an equicontinuous collection of linear mappings from \(X\) to \(Y\), and \(E\) is a bounded subset of \(X\). Then \(Y\) has a bounded subset \(F\) such that \(T(E) \subset F\) for every \(T \in \Gamma\).

(Proof of the lemma) We'll show that, the set \[ F=\bigcup_{T \in \Gamma}T(E) \] is bounded. By the definition of equicontinuity, there is an neighborhood \(V\) of the origin in \(X\) such that \(T(V) \subset W\) for all \(T \in \Gamma\). Since \(E\) is bounded, there exists some \(t\) such that \(E \subset tV\). For these \(t\), by the definition of linear functions, we have \[ T(E) \subset T(tV)=tT(V) \subset tW \] Therefore \(F \subset tW\). \(F\) is bounded.

Thus \(\Lambda\) is uniformly bounded. Picking \(E=\{x\}\) in the lemma, we also see \(\Lambda(x)\) is bounded in \(Y\) for every \(x\). Thus \(B=X\).

A special case when \(X\) is a \(F\)-space or Banach space

\(X\) is a \(F\)-space if its topology \(\tau\) is induced by a complete invariant metric \(d\). By BCT, \(X\) is of the second category. If we already have \(B=X\), in which case \(B\) is of the second category, then by Banach-Steinhaus theorem, \(\Lambda\) is equicontinuous. Formally speaking, we have:

If \(\Lambda\) is a collection of continuous linear mappings from an \(F\)-space \(X\) into a topological vector space \(Y\), and if the sets \[ \Lambda(x)=\{Lx:L\in\Lambda\} \] are bounded in \(Y\) for every \(x \in X\), then \(\Lambda\) is equicontinuous.

Notice that all Banach spaces are \(F\)-spaces. Therefore we can restate the Uniform Boundedness Principle in Banach space with equicontinuity.

Suppose \(X\) is a Banach space, \(Y\) is a normed linear space, and \({F}\) is a collection of bounded linear transformation of \(X\) into \(Y\), we have:

  • (The Uniform Boundedness Principle) If \(\sup\limits_{L \in {\Lambda}}\left\Vert{Lx}\right\Vert<\infty\) for all \(x \in X\), then we have \(\|L\| \le M\) for all \(L \in {\Lambda}\) and some \(M<\infty\). Further, \(\Lambda\) is equicontinuous.


Surprisingly enough, the Banach-Steinhaus theorem can be used to do Fourier analysis. An important example follows.

There is a periodic continuous function \(f\) on \([0,1]\) such that the Fourier series \[ \sum_{n\in\mathbb{Z}}\hat{f}(n)e^{2\pi inx} \] of \(f\) diverges at \(0\). \(\hat{f}(n)\) is defined by \[ \hat{f}(n)=\int_{0}^{1}e^{-2\pi inx}f(x)dx \]

Notice that \(f \mapsto \hat{f}\) is linear, and the divergence of the series at \(0\) can be considered by \[ \sum_{n\in\mathbb{Z}}\hat{f}(n)e^{2\pi in\cdot0}=\sum_{n\in\mathbb{Z}}\hat{f}(n) \] To invoke Banach-Steinhaus theorem, the family of linear functionals are defined by \[ \lambda_N(f)=\sum_{|n| \leq N}\hat{f}(n) \] It can be proved that \[ \lVert \lambda_N \rVert=\int_0^1\left\vert\sum_{|n| \leq N}e^{-2\pi inx}\right\vert dx \] which goes to infinity as \(N \to \infty\). The existence of such \(f\) that \[ \sup_{N}|\lambda_N(f)|=+\infty \] follows from the resonance theorem. Further, we also know that these \(f\) are in a dense \(G_\delta\) subset of the vector space generated by all periodic continuous functions on \([0,1]\).

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it's time to make a list of the series. It's been around half a year.

References / Further readings

  1. arXiv:1005.1585v2
  2. W. Rudin, Real and Complex Analysis
  3. W. Rudin, Functional Analysiss
  4. Applications to Fourier series

The Big Three Pt. 1 - Baire Category Theorem Explained

About the 'Big Three'

There are three theorems about Banach spaces that occur frequently in the crux of functional analysis, which are called the 'big three':

  1. The Hahn-Banach Theorem
  2. The Banach-Steinhaus Theorem
  3. The Open Mapping Theorem

The incoming series of blog posts is intended to offer a self-read friendly explanation with richer details. Some basic analysis and topology backgrounds are required.

First and second category

The term 'category' is due to Baire, who developed the category theorem afterwards. Let \(X\) be a topological space. A set \(E \subset X\) is said to be nowhere dense if \(\overline{E}\) has empty interior, i.e. \(\text{int}(\overline{E})= \varnothing\).

There are some easy examples of nowhere dense sets. For example, suppose \(X=\mathbb{R}\), equipped with the usual topology. Then \(\mathbb{N}\) is nowhere dense in \(\mathbb{R}\) while \(\mathbb{Q}\) is not. It's trivial since \(\overline{\mathbb{N}}=\mathbb{N}\), which has empty interior. Meanwhile \(\overline{\mathbb{Q}}=\mathbb{R}\). But \(\mathbb{R}\) is open, whose interior is itself. The category is defined using nowhere dense set. In fact,

  • A set \(S\) is of the first category if \(S\) is a countable union of nowhere dense sets.
  • A set \(T\) is of the second category if \(T\) is not of the first category.

Baire category theorem (BCT)

In this blog post, we consider two cases: BCT in complete metric space and in locally compact Hausdorff space. These two cases have nontrivial intersection but they are not equal. There are some complete metric spaces that are not locally compact Hausdorff.

There are some classic topological spaces, for example \(\mathbb{R}^n\), are both complete metric space and locally compact Hausdorff. If a locally compact Hausdorff space happens to be a topological vector space, then this space has finite dimension. Also, a topological vector space has to be Hausdorff.

By a Baire space we mean a topological space \(X\) such that the intersection of every countable collection of dense open subsets of \(X\) is also dense in \(X\).

Baire category states that

(BCT 1) Every complete metric space is a Baire space.

(BCT 2) Every locally compact Hausdorff space is a Baire space.

By taking the complement of the definition, we can see that, every Baire space is not of the first category.

Suppose we have a sequence of sets \(\{X_n\}\) where \(X_n\) is dense in \(X\) for all \(n>0\), then \(X_0=\cap_n X_n\) is also dense in \(X\). Notice then \(X_0^{c} = \cup_n X_n^c\), a nowhere dense set and a countable union of nowhere dense sets, i.e. of the first category.

Proving BCT 1 and BCT 2 via Choquet game

Let \(X\) be the given complete metric space or locally Hausdorff space, and \(\{X_n\}\) a countable collection of open subsets of \(X\). Pick an arbitrary open subsets of \(X\), namely \(A_0\) (this is possible due to the topology defined on \(X\)). To prove that \(\cap_n V_n\) is dense, we have to show that \(A_0 \cap \left(\cap_n V_n\right) \neq \varnothing\). This follows the definition of denseness. Typically we have

A subset \(A\) of \(X\) is dense if and only if \(A \cap U \neq \varnothing\) for all nonempty open subsets \(U\) of \(X\).

We pick a sequence of nonempty open sets \(\{A_n\}\) inductively. With \(A_{n-1}\) being picked, and since \(V_n\) is open and dense in \(X\), the intersection \(V_n \cap A_{n-1}\) is nonempty and open. \(A_n\) can be chosen such that \[ \overline{A}_n \subset V_n \cap A_{n-1} \] For BCT 1, \(A_n\) can be chosen to be open balls with radius \(< \frac{1}{n}\); for BCT 2, \(A_n\) can be chosen such that the closure is compact. Define \[ C = \bigcap_{n=1}^{\infty}\overline{A}_n \] Now, if \(X\) is a locally compact Hausdorff space, then due to the compactness, \(C\) is not empty, therefore we have \[ \begin{cases} K \subset A_0 \\ K \subset V_n \quad(n \in \mathbb{N}) \end{cases} \] which shows that \(A_0 \cap V_n \neq \varnothing\). BCT 2 is proved.

For BCT 1, we cannot follow this since it's not ensured that \(X\) has the Heine-Borel property, for example when \(X\) is the Hilbert space (this is also a reason why BCT 1 and BCT 2 are not equivalent). The only tool remaining is Cauchy sequence. But how and where?

For any \(\varepsilon > 0\), we have some \(N\) such that \(\frac{1}{N} < \varepsilon\). For all \(m>n>N\), we have \(A_m \subset A_n\subset A_N\), therefore the centers of \(\{A_n\}\) form a Cauchy sequence, converging to some point of \(K\), which implies that \(K \neq \varnothing\). BCT 1 follows.

Applications of BCT

BCT will be used directly in the big three. It can be considered as the origin of them. But there are many other applications in different branches of mathematics. The applications shown below are in the same pattern: if it does not hold, then we have a Baire space of the first category, which is not possible.

\(\mathbb{R}\) is uncountable

Suppose \(\mathbb{R}\) is countable, then we have \[ \mathbb{R}=\bigcup_{n=1}^{\infty}\{x_n\} \] where \(x_n\) is a real number. But \(\{x_n\}\) is nowhere dense, therefore \(\mathbb{R}\) is of the first category. A contradiction.

Suppose that \(f\) is an entire function, and that in every power series \[ f(z)=\sum_{n=1}^{\infty}c_n(z-a)^n \] has at least one coefficient is \(0\), then \(f\) is a polynomial (there exists a \(N\) such that \(c_n=0\) for all \(n>N\)).

You can find the proof here. We are using the fact that \(\mathbb{C}\) is complete.

An infinite dimensional Banach space \(B\) has no countable basis

Assume that \(B\) has a countable basis \(\{x_1,x_2,\cdots\}\) and define \[ B_n=\text{span}\{x_1,x_2,\cdots,x_n\} \] It can be easily shown that \(B_n\) is nowhere dense. In this sense, \(B=\cup_n B_n\). A contradiction since \(B\) is a complete metric space.

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it's time to make a list of the series. It's been around half a year.

More properties of zeros of an entire function

What's going on again

In this post we discussed the topological properties of the zero points of an entire nonzero function, or roughly, how those points look like. The set of zero points contains no limit point, and at most countable (countable or finite). So if it's finite, then we can find them out one by one. For example, the function \(f(z)=z\) has simply one zero point. But what if it's just countable? How fast the number grows?

Another question. Suppose we have an entire function \(f\), and the zeros of \(f\), namely \(z_1,z_2,\cdots,z_n\), are ordered increasingly by moduli: \[ |z_1| \leq |z_2| \leq \cdots \leq |z_n| \leq \cdots \] Is it possible to get a fine enough estimation of \(|z_n|\)? Interesting enough, we can get there with the help of Jensen's formula.

Jensen's formula

Suppose \(\Omega=D(0;R)\), \(f \in H(\Omega)\), \(f(0) \neq 0\), \(0<r<R\), and \(z_1,z_2,\cdots,z_{n(r)}\) are the zeros of \(f\) in \(\overline{D}(0;R)\), then \[ |f(0)|\prod_{n=1}^{n(r)}\frac{r}{|z_n|}=\exp\left[\frac{1}{2\pi}\int_{-\pi}^{\pi}\log|f(re^{i\theta})|d\theta\right] \]

There is no need to worry about the assumption \(f(0) \neq 0\). Take another look at this proof. Every zero point \(a\) has a unique positive number \(m\) such that \(f(z)=(z-a)^mg(z)\) and \(g \in H(\Omega)\) but \(g(a) \neq 0\). The number \(m\) is called the order of the zero at \(a\). Therefore if we have \(f(0)=0\) we can simply consider another function, namely \(\frac{f}{z^m}\) where \(m\) is the order of zero at \(0\).

We are not proving this identity at this point. But it can be done by considering the following function \[ g(z)=f(z)\prod_{n=1}^{m}\frac{r^2-\overline{z}_nz}{r(z_n-z)}\prod_{n=m+1}^{n(r)}\frac{z_n}{z_n-z} \] where \(m\) is found by ordering \(z_j\) in such a way that \(z_1,\cdots,z_m \in D(0;r)\) and \(|z_{m+1}|=\cdots=|z_{n}|\). One can prove this identity by considering \(|g(0)|\) as well as \(\log|g(re^{i\theta})|\).

Several applications

The number of zeros of \(f\) in \(\overline{D}(0;r)\)

For simplicity we shall assume \(f(0)=1\) which has no loss of generality. Let \[ M(r)=\sup_{\theta}|f(re^{i\theta})|\quad 0<r<\infty \] and \(n(r)\) be the number of zeros of \(f\) in \(\overline{D}(0;r)\). By the maximum modulus theorem, we have \[ \log|f(2re^{i\theta})| \leq |f(2re^{i\theta})| \leq M(2r) \] If we insert Jensen's formula into this inequality and order \(|z_n|\) by increasing moduli, we get \[ \log M(2r) \geq \frac{1}{2\pi}\int_{-\pi}^{\pi}\log|f(2re^{i\theta})|d\theta=\sum_{n=1}^{n(2r)}\log\frac{2r}{|z_n|}\geq\sum_{n=1}^{n(r)}\log\frac{2r}{|z_n|}\geq n(r)\log2 \] Which implies \[ n(r)\leq\log_2M(2r) \] So \(n(r)\) is controlled by \(M(2r)\). The second and third inequalities look tricky, which require more explanation.

First we should notice the fact that \(z_n \in \overline{D}(0;R)\) for all \(R \in \mathbb{R}\). Hence we have \(\log\frac{2r}{|z_n|} \geq \log1=0\) for all \(z_n \in \overline{D}(0;R)\). Hence the second inequality follows. For the third one, we simply have \[ \sum_{n=1}^{n(r)}\log\frac{2r}{|z_n|}=\sum_{n=1}^{n(r)}(\log2+\log\frac{r}{|z_n|}) \geq n(r)\log2. \] So this is it, the rapidity with which \(n(r)\) can grow is dominated by \(M(r)\). Namely, the number of zeros of \(f\) in the closed disc with radius \(r\) is controlled by the maximum modulus of \(f\) on a circle with bigger radius.

Examples based on different \(M(r)\)

Let's begin with a simple example. Let \(f(z)=1\), we have \(M(r)=1\) for all \(r\), but also we have \(n(r)=0\), in which sense this estimation does nothing. Indeed, as long as \(M(r)\) is bounded by a constant, which implies \(f(z)\) is bounded, then by Liouville's theorem, \(f(z)\) is constant and this estimation is not available.

But if \(M(r)\) grows properly, things become interesting. For example, if we have \[ M(r) \leq \exp(Ar^k) \] where \(A\) and \(k\) are given positive numbers, we have a good enough estimation by \[ n(r) \leq \frac{A+(2r)^k}{\log2} \] This estimation becomes interesting if we consider the logarithm of \(n(r)\) and \(r\), that is \[ \begin{aligned} \limsup_{r\to\infty}\frac{\log{n(r)}}{\log{r}} &\leq \lim_{r\to\infty} \frac{\log(A+(2r)^k)-\log{2}}{\log{r}} \\ & =k \end{aligned} \] If we have \(f(z)=1-\exp(z^k)\) where \(k\) is a positive integer, we have \(n(r) \sim \frac{kr^k}{\pi}\), also \[ \lim_{r\to\infty}\frac{\log{n(r)}}{\log r}=k \]

Lower bound of \(|z_{n(r)}|\)

We'll see here, how to evaluate the lower bound of \(|z_{n(r)}|\) using Jensen's formula, provided that \(M(r)\), or simply the upper bound of \(f(z)\) is properly described. Without loss of generality we shall assume that \(f(0)=1\). Also, we assume that the zero points of \(f(z)\) are ordered by increasing moduli.

First we still consider \[ M(r) \leq \exp(Ar^k) \] and see what will happen.

By Jensen's, we have \[ \prod_{n=1}^{n(r)}\frac{r}{|z_n|}=\exp\left[\frac{1}{2\pi}\int_{-\pi}^{\pi}\log|f(re^{i\theta})|d\theta\right] \leq \exp{Ar^k} \] This gives \[ \prod_{n=1}^{n(r)}|z_n| \geq r^{n(r)}\exp(-Ar^k) \] By the arrangement of \(\{z_n\}\), we have \[ |z_{n(r)}| \geq \sqrt[n(r)]{\prod_{n=1}^{n(r)}|z_n|}\geq r\exp(-Ar^{k-n(r)}) \]

Another example is when we have \[ |f(z)| \leq \exp(A|\Im{z}|) \] where \(\Im{z}\) means the imagine part of \(z\).

We shall notice that in this case, \[ \begin{aligned} \frac{1}{2\pi}\int_{-\pi}^{\pi}\log|f(re^{i\theta})|d\theta &\leq \frac{1}{2\pi}\int_{-\pi}^{\pi}A|r\sin\theta|d\theta=\frac{2Ar}{\pi} \end{aligned} \] Following Jensen's formula, we therefore have \[ |z_{n(r)}| \geq \exp(\frac{2A}{\pi}r^{1-n(r)}) \]

The Lebesgue-Radon-Nikodym theorem and how von Neumann proved it

An introduction

If one wants to learn the fundamental theorem of Calculus in the sense of Lebesgue integral, properties of measures have to be taken into account. In elementary calculus, one may consider something like \[ df(x)=f'(x)dx \] where \(f\) is differentiable, say, everywhere on an interval. Now we restrict \(f\) to be a differentiable and nondecreasing real function defined on \(I=[a,b]\). There we got a one-to-one function defined by \[ g(x)=x+f(x) \]

For measurable sets \(E\in\mathfrak{M}\), it can be seen that if \(m(E)=0\), we have \(m(g(E))=0\). Moreover, \(g(E) \in \mathfrak{M}\), and \(g\) is one-to-one. Therefore we can define a measure like \[ \mu(E)=m(g(E)) \] If we have a relation \[ \mu(E)=\int_{E}hdm \] (in fact, this is the Radon-Nikodym theorem we will prove later), the fundamental theorem of calculus for \(f\) becomes somewhat clear since if \(E=[a,x]\), we got \(g(E)=[a+f(a),x+f(x)]\), thus we got \[ \begin{aligned} \mu(E)=m(g(E))&=g(x)-g(a)\\ &=f(x)-f(a)+\int_a^xdt \\ &=\int_a^xh(t)dt \end{aligned} \] which trivially implies \[ f(x)-f(a)=\int_a^x[h(t)-1]dt \] the function \(h\) looks like to be \(g'=f'+1\).

We are not proving the fundamental theorem here. But this gives rise to a question. Is it possible to find a function such that \[ \mu(E)=\int_{E}hdm \] one may write as \[ d\mu=hdm \] or, more generally, a measure \(\mu\) with respect to another measure \(\lambda\)? Does this \(\mu\) exist with respect to \(\lambda\)? Does this \(h\) exist? Lot of questions. Luckily the Lebesgue decomposition and Radon-Nikodym theorem make it possible.


Let \(\mu\) be a positive measure on a \(\sigma\)-algebra \(\mathfrak{M}\), let \(\lambda\) be any arbitrary measure (positive or complex) defined on \(\mathfrak{M}\).

We write \[ \lambda \ll \mu \] if \(\lambda(E)=0\) for every \(E\in\mathfrak{M}\) for which \(\mu(E)=0\). (You may write \(\mu \ll m\) in the previous section.) We say \(\lambda\) is absolutely continuous with respect to \(\mu\).

Another relation between measures worth consideration is being mutually singular. If we have \(\lambda(E)=\lambda(A \cap E)\) for every \(E \in \mathfrak{M}\), we say \(\lambda\) is concentrated on \(A\).

If we now have two measures \(\mu_1\) and \(\mu_2\), two disjoint sets \(A\) and \(B\) such that \(\mu_1\) is concentrated on \(A\), \(\mu_2\) is concentrated on \(B\), we say \(\mu_1\) and \(\mu_2\) are mutually singular, and write \[ \mu_1 \perp \mu_2 \]

The Theorem of Lebesgue-Radon-Nikodym

Let \(\mu\) be a positive \(\sigma\)-finite measure on \(\mathfrak{M}\), and \(\lambda\) a complex measure on \(\mathfrak{M}\).

  • There exists a unique pair of complex measures \(\lambda_{ac}\) and \(\lambda_{s}\) on \(\mathfrak{M}\) such that

\[ \lambda = \lambda_{ac}+\lambda_s \quad \lambda_{ac}\ll\mu\quad \lambda_s \perp \mu \]

  • There is a unique \(h \in L^1(\mu)\) such that

\[ \lambda_{ac}(E)=\int_{E}hd\mu \]

for every \(E \in \mathfrak{M}\).

The unique pair \((\lambda_{ac},\lambda_s)\) is called the Lebesgue decomposition; the existence of \(h\) is called the Radon-Nikodym theorem, and \(h\) is called the Radon-Nikodym derivative. One also writes \(d\lambda_{ac}=hd\mu\) or \(\frac{d\lambda_{ac}}{d\mu}=h\) in this situation.

These are two separate theorems, but von Neumann gave the idea to prove these two at one stroke.

If we already have \(\lambda \ll \mu\), then \(\lambda_s=0\) and the Radon-Nikodym derivative shows up in the natural of things.

Also, one cannot ignore the fact that \(m\) the Lebesgue measure is \(\sigma\)-finite.

Proof explained

Step 1 - Construct a bounded functional

We are going to employ Hilbert space technique in this proof. Precisely speaking, we are going to construct a bounded linear functional to find another function, namely \(g\), which is the epicentre of this proof.

The boundedness of \(\lambda\) is clear since it's complex, but \(\mu\) is only assumed to be \(\sigma\)-finite. Therefore we need some adjustment onto \(\mu\).

1.1 Replacing \(\mu\) with a finite measure

If \(\mu\) is a positive \(\sigma\)-finite measure on a \(\sigma\)-algebra \(\mathfrak{M}\) in a set \(X\), then there is a function \(w\) such that \(w \in L^1(\mu)\) and \(0<w(x)<1\) for every \(x \in X\).

The \(\sigma\)-finiteness of \(\mu\) denotes that, there exist some sets \(E_n\) such that \[ X=\bigcup_{n=1}^{\infty}E_n \] and that \(\mu(E_n)<\infty\) for all \(n\).

Define \[ w_n(x)= \begin{aligned} \begin{cases} \frac{1}{2^n(1+\mu(E_n))}\quad &x \in E_n \\ 0 \quad &x\notin E_n \end{cases} \end{aligned} \] (you can also say that \(w_n=\frac{1}{2^n(1+\mu(E_n))}\chi_{E_n}\)), then we have \[ \begin{aligned} w &= \sum_{n=1}^{\infty}w_n \\ \end{aligned} \] satisfies \(0<w<1\) for all \(x\). With \(w\), we are able to define a new measure, namely \[ \tilde{\mu}(E)=\int_{E}wd\mu. \] The fact that \(\tilde{\mu}(E)\) is a measure can be validated by considering \(\int_{E}wd\mu=\int_{X}\chi_{E}wd\mu\). It's more important that \(\tilde{\mu}(E)\) is bounded and \(\tilde{\mu}(E)=0\) if and only if \(\mu(E)=0\). The second one comes from the strict positivity of \(w\). For the first one, notice that \[ \begin{aligned} \tilde{\mu}(X) &\leq \sum_{n=1}^{\infty}\tilde{\mu}(E_n) \\ &= \sum_{n=1}^{\infty}\frac{1}{2^n(1+\mu(E_n))} \\ &\leq \sum_{n=1}^{\infty}\frac{1}{2^n} \end{aligned} \]

1.2 A bounded linear functional associated with \(\lambda\)

Since \(\lambda\) is complex, without loss of generality, we are able to assume that \(\lambda\) is a positive bounded measure on \(\mathfrak{M}\). By 1.1, we are able to obtain a positive bounded measure by \[ \varphi=\lambda+\tilde{\mu} \] Following the construction of Lebesgue measure, we have \[ \int_{X}fd\varphi=\int_{X}fd\lambda+\int_{X}fwd\mu \] for all nonnegative measurable function \(f\). Also, notice that \(\lambda \leq \varphi\), we have \[ \left\vert \int_{X}fd\lambda \right\vert \leq \int_{X}|f|d\lambda \leq \int_{X}|f|d\varphi \leq \sqrt{\varphi(X)}\left\Vert f \right\Vert_2 \] for \(f \in L^2(\varphi)\) by Schwarz inequality.

Since \(\varphi(X)<\infty\), we have \[ \Lambda{f}=\int_{X}fd\lambda \] to be a bounded linear functional on \(L^2(\varphi)\).

Step 2 - Find the associated function with respect to \(\lambda\)

Since \(L^2(\varphi)\) is a Hilbert space, every bounded linear functional on a Hilbert space \(H\) is given by an inner product with an element in \(H\). That is, by the completeness of \(L^2(\varphi)\), there exists a function \(g\) such that \[ \Lambda{f}=\int_{X}fd\lambda=\int_{X}fgd\varphi=(f,g). \] The properties of \(L^2\) space shows that \(g\) is determined almost everywhere with respect to \(\varphi\).

For \(E \in \mathfrak{M}\), we got \[ 0 \leq (\chi_{E},g)=\int_{E}gd\varphi=\int_{E}d\lambda=\lambda(E)\leq\varphi(E) \] which implies \(0 \leq g \leq 1\) for almost every \(x\) with respect to \(\varphi\). Therefore we are able to assume that \(0 \leq g \leq 1\) without ruining the identity. The proof is in the bag once we define \(A\) to be the set where \(0 \leq g < 1\) and \(B\) the set where \(g=1\).

Step 3 - Generate \(\lambda_{ac}\) and \(\lambda_{s}\) and the Radon-Nikodym derivative at one stroke

We claim that \(\lambda(A \cap E)\) and \(\lambda(B \cap E)\) form the decomposition we are looking for, \(\lambda_{ac}\) and \(\lambda_s\), respectively. Namely, \(\lambda_{ac}=\lambda(A \cap E)\), \(\lambda_s=\lambda(B \cap E)\).

Proving \(\lambda_s \perp \mu\)

If we combine \(\Lambda{f}=(f,g)\) and \(\varphi=\lambda+\tilde{\mu}\) together, we have \[ \int_{X}(1-g)fd\lambda=\int_{X}fgwd\mu. \] Put \(f=\chi_{B}\), we have \[ \int_{B}wd\mu=0. \] Since \(w\) is strictly positive, we see that \(\mu(B)=0\). Notice that \(A \cap B = \varnothing\) and \(A \cup B=X\). For \(E \in \mathfrak{M}\), we write \(E=E_A \cup E_B\), where \(E_A \subset A\) and \(E_B \subset B\). Therefore \[ \mu(E)=\mu(E_A)+\mu(E_B)=\mu(E \cap A)+\mu(E \cap B)=\mu(E \cap A). \] Therefore \(\mu\) is concentrated on \(A\).

For \(\lambda_s\), observe that \[ \lambda_s(E)=\lambda(E \cap B)=\lambda((E \cap B) \cap B)=\lambda_s(E \cap B). \] Hence \(\lambda_s\) is concentrated on \(B\). This observation shows that \(\lambda_s \perp \mu\).

Proving \(\lambda_{ac} \ll \mu\) by the Radon-Nikodym derivative

The relation that \(\lambda_{ac} \ll \mu\) will be showed by the existence of the Radon-Nikodym derivative.

If we replace \(f\) by \[ (1+g+\cdots+g^n)\chi_E, \] where \(E \in \mathfrak{M}\), we have \[ \int_X(1-g)fd\lambda=\int_E(1-g^{n+1})d\lambda=\int_Eg(1+g+\cdots+g^n)wd\mu. \] Notice that \[ \begin{aligned} \int_{E}(1-g^{n+1})d\lambda &=\int\limits_{E \cap A}(1-g^{n+1})d\lambda + \int\limits_{E \cap B}(1-g^{n+1})d\lambda \\ &=\int\limits_{E \cap A}(1-g^{n+1})d\lambda \\ &\to\lambda(E \cap A) = \lambda_{ac}(E)\quad(n\to\infty) \end{aligned} \] Define \(h_n=g(1+g+g^2+\cdots+g^n)w\), we see that on \(A\), \(h_n\) converges monotonically to \[ h= \begin{aligned} \begin{cases} \frac{gw}{1-g} \quad &x\in{A}\\ 0 \quad &x\in{B} \end{cases} \end{aligned} \] By monotone convergence theorem, we got \[ \lim_{n\to\infty}\int_{E}h_nd\mu = \int_{E}hd\mu=\lambda_{ac}(E). \] for every \(E\in\mathfrak{M}\).

The measurable function \(h\) is the desired Radon-Nikodym derivative once we show that \(h \in L^1(\mu)\). Replacing \(E\) with \(X\), we see that \[ \int_{X}|h|d\mu=\int_{X}hd\mu=\lambda_{ac}(X)\leq\lambda(X)<\infty. \] Clearly, if \(\mu(E)=0\), we have \[ \lambda_{ac}(E)=\int_{E}hd\mu=0 \] which shows that \[ \lambda_{ac}\ll\mu \] as desired.

Step 3 - Generalization onto complex measures

By far we have proved this theorem for positive bounded measure. For real bounded measure, we can apply the proceeding case to the positive and negative part of it. For all complex measures, we have \[ \lambda=\lambda_1+i\lambda_2 \] where \(\lambda_1\) and \(\lambda_2\) are real.

Step 4 - Uniqueness of the decomposition

If we have two Lebesgue decompositions of the same measure, namely \((\lambda_{ac},\lambda_s)\) and \((\lambda'_{ac},\lambda'_s)\), we shall show that \[ \lambda_{ac}-\lambda_{ac}'=\lambda_s'-\lambda_s=0 \] By the definition of the decomposition we got \[ \lambda_{ac}-\lambda'_{ac}=\lambda'_s-\lambda_s \] with \(\lambda_{ac}-\lambda_{ac}' \ll \mu\) and \(\lambda_{s}'-\lambda_{s}\perp\mu\). This implies that \(\lambda'_{s}-\lambda_{s} \ll \mu\) as well.

Since \(\lambda'_s-\lambda_s\perp\mu\), there exists a set with \(\mu(A)=0\) on which \(\lambda'_s-\lambda_s\) is concentrated; the absolute continuity shows that \(\lambda'_s(E)-\lambda_s(E)=0\) for all \(E \subset A\). Hence \(\lambda_s'-\lambda_s\) is concentrated on \(X-A\). Therefore we got \((\lambda'_s-\lambda_s)\perp(\lambda'_s-\lambda_s)\), which forces \(\lambda'_s-\lambda_s=0\). The uniqueness is proved.

(Following the same process one can also show that \(\lambda_{ac}\perp\lambda_s\).)

Topological properties of the zeros of a holomorphic function

What's going on

If for every \(z_0 \in \Omega\) where \(\Omega\) is a plane open set, the limit \[ \lim_{z \to z_0}\frac{f(z)-f(z_0)}{z-z_0} \] exists, we say that \(f\) is holomorphic (a.k.a. analytic) in \(\Omega\). If \(f\) is holomorphic in the whole plane, it's called entire. The class of all holomorphic functions (denoted by \(H(\Omega)\)) has many interesting properties. For example it does form a ring.

But what happens if we talk about the points where \(f\) is equal to \(0\)? Is it possible to find an entire function \(g\) such that \(g(z)=0\) if and only if \(z\) is on the unit circle? The topological property we will discuss in this post answers this question negatively.


Suppose \(\Omega\) is a region, the set \[ Z(f)=\{z_0\in\Omega:f(z_0)=0\} \] is a at most countable set without limit point, as long as \(f\) is not identically equal to \(0\) on \(\Omega\).

Trivially, if \(f(\Omega)=\{0\}\), we have \(Z(f)=\Omega\). The set of unit circle is not at most countable and every point is a limit point. Hence if an entire function is equal to \(0\) on the unit circle, then the function equals to \(0\) on the whole plane.

Note: the connectivity of \(\Omega\) is important. For example, for two disjoint open sets \(\Omega_0\) and \(\Omega_1\), define \(f(z)=0\) on \(\Omega_0\) and \(f(z)=1\) on \(\Omega_1\), then everything fails.

A simple application (Feat. Baire Category Theorem)

Before establishing the proof, let's see what we can do using this result.

Suppose that \(f\) is an entire function, and that in every power series \[ f(z)=\sum c_n(z-a)^n \] has at leat one coefficient is \(0\), then \(f\) is a polynomial.

Clearly we have \(n!c_n=f^{(n)}(a)\), thus for every \(a \in \mathbb{C}\), we can find a postivie integer \(n_0\) such that \(f^{(n_0)}(a)=0\). Thus we establish the identity: \[ \bigcup_{n=0}^{\infty} Z(f^{(n)})=\mathbb{C} \] Notice the fact that \(f^{(n)}\) is entire. So \(Z(f^{n})\) is either an at most countable set without limit point, or simply equal to \(\mathbb{C}\). If there exists a number \(N\) such that \(Z(f^{N})=\mathbb{C}\), then naturally \(Z(f^{n})=\mathbb{C}\) holds for all \(n \geq N\). Whilst we see that \(f\)'s power series has finitely many nonzero coefficients, thus polynomial.

So the question is, is this \(N\) always exist? Being an at most countable set without limit points , \(Z(f^{(n)})\) has empty interior (nowhere dense). But according to Baire Category Theorem, \(\mathbb{C}\) could not be a countable union of nowhere dense sets (of the first category if you say so). This forces the existence of \(N\).


The proof will be finished using some basic topology techniques.

Let \(A\) be the set of all limit points of \(Z(f)\) in \(\Omega\). The continuity of \(f\) shows that \(A \subset Z(f)\). We'll show that if \(A \neq \varnothing\), then \(Z(f)=\Omega\).

First we claim that if \(a \in A\), then \(a \in \bigcap_{n \geq 0}Z(f^{(n)})\). That is, \(f^{(k)}(a) = 0\) for all \(k \geq 0\). Suppose this fails, then there is a smallest positive integer \(m\) such that \(c_m \neq 0\) for the power series on the disc \(D(a;r)\): \[ f(z)=\sum_{n=1}^{\infty}c_n(z-a)^{n}. \]


\[ \begin{aligned} ​ g(z)=\begin{cases} ​ (z-a)^{-m}f(z)\quad&(z\in\Omega-\{a\}) \\\ ​ c_m\quad&(z=a) ​ \end{cases} \end{aligned} \]

It's clear that \(g \in H(D(a;r))\) since we have \[ g(z)=\sum_{n=1}^{\infty}c_{m+n}(z-a)^{n}\quad(z\in D(a;r)) \]

But the continuity shows that \(g(a)=0\) while \(c_m \neq 0\). A contradiction.

Next fix a point \(b \in \Omega\). Choose a curve (continuous mapping) defined \(\gamma\) on \([0,1]\) such that \(\gamma(0)=a\) and \(\gamma(1)=b\). Let

\[ \Gamma=\{t\in[0,1]:\gamma(t)\in\bigcap_{n \geq 0}Z(f^{(n)})\} \] By hypothesis, \(0 \in \Gamma\). We shall prove that \(1 \in \Gamma\). Let \[ s = \sup\Gamma \] There exists a sequence \(\{t_n\}\subset\Gamma\) such that \(t_n \to s\). The continuity of \(f^{(k)}\) and \(\gamma\) shows that \[ f^{(k)}(\gamma(s))=0 \]

Hence \(s \in \Gamma\). Choose a disc \(D(\gamma(s);\delta)\subset\Omega\). On this disc, \(f\) is represented by its power series but all coefficients are \(0\). It follows that \(f(z)=0\) for all \(z \in D(\gamma(s);\delta)\). Further, \(f^{(k)}(z)=0\) for all \(z \subset D(\gamma(s);\delta)\) for all \(k \geq 0\). Therefore by the continuity of \(\gamma\), there exists \(\varepsilon>0\) such that \(\gamma(s-\varepsilon,s+\varepsilon)\subset D(\gamma(s);\delta)\), which implies that \((s-\varepsilon, s+\varepsilon)\cap[0,1]\subset\Gamma\). Since \(s=\sup\Gamma\), we have \(s=1\), therefore \(1 \in \Gamma\).

So far we showed that \(\Omega = \bigcap_{n \geq 0}Z(f^{(n)})\), which forces \(Z(f)=\Omega\). This happens when \(Z(f)\) contains limit points, which is equivalent to what we shall prove.

When \(Z(f)\) contains no limit point, all points of \(Z(f)\) are isolated points; hence in each compact subset of \(\Omega\), there are at most finitely many points in \(Z(f)\). Since \(\Omega\) is \(\sigma\)-compact, \(Z(f)\) is at most countable. \(Z(f)\) is also called a discrete set in this situation.



洛必达法则我想甚至不少高中生甚至初中生都听说过,知道怎么进行简单的应用。简单点说,处理\(\frac{0}{0}\)的函数时,对上下进行求导,可能会很大程度上简化计算。但是洛必达法则为什么能奏效? 能不能用严格的数理语言进行论证? 这是这篇文章需要解决的。



\(f\)\(g\)\((a,b)\)上的实可微函数,且在\((a,b)\)区间上总有\(g'(x) \ne 0\). 设

\[ \frac{f'(x)}{g'(x)} \to A \quad (x \to a). \]

\[ f(x) \to 0, g(x) \to 0 \quad (x \to a) \]


\[ g(x) \to \pm \infty \quad (x \to a) \]

\[ \frac{f(x)}{g(x)} \to A \quad (x \to a). \]

其中\(x \to a\)自然也可以换成\(x \to b\). 这里把发散到无穷也看作是极限,也就是说\(-\infty \le A \le \infty\)\(-\infty \le a < b \le +\infty\).



洛必达法则首次出现于1696年洛必达的 Analyse des Infiniment Petits pour l'Intelligence des Lignes Courbes 一书中。这本书当然以”洛必达法则”闻名于世。证明是这样完成的: \[ \frac{f(a+dx)}{g(a+dx)}=\frac{f(a)+f’(a)dx}{g(a)+g’(a)dx}=\frac{f’(a)dx}{g’(a)dx}=\frac{f’(a)}{g’(a)} \]

这个证明很好理解,线性近似展开,再考虑到\(f(a)=g(a)=0\)就得到结果。但是这个做法肯定是不合适的,\(dx\)在这里非常模糊,也不方便表达\(x\to\infty\)的情况。关于历史内容可以参见 The Historical Development of the Calculus 一书。




\[ f'(x) = \lim_{h\to 0} \frac{f(x+h)-f(x)}{h}, \]


\[ f'(x) = \frac{f(x+h)-f(x)}{h} + r(h) \]


\[ f(x+h)=f(x)+f'(x)h+r(h)h \]


\[ g(x+h)=g(x)+g'(x)h+s(h)h \]


\[ \frac{f(a+h)}{g(a+h)}=\frac{f(a)+f'(a)h+r(h)h}{g(a)+g'(a)h+s(h)h}=\frac{f'(a)h+r(h)h}{g'(a)h+s(h)h}=\frac{f'(a)+r(h)}{g'(a)+s(h)} \]



这个证明中,我们会利用柯西中值定理(GMVT)对所有的情况进行完整的证明,这期间涉及到一些不等式运算技巧。证明来自W. Rudin的 Principles Of Mathematical Analysis,我会在其中加上一些额外的解释。关于\(g(x) \to \pm\infty\)的情况,我们在此只讨论\(+\infty\).

情况1: \(-\infty\leq{A}<+\infty\)

选取实数\(q>A\),再选取\(\varepsilon>0\)使得\(A+\varepsilon<q\). 因为\(\frac{f(x)}{g(x)}\to{A}\),根据极限的定义,必定有实数\(\delta\in(0,b-a)\),使得对于所有\(a<x<a+\delta\),始终有\(-\varepsilon<\frac{f'(x)}{g'(x)}-A<\varepsilon\). 也就是说

\[ \frac{f'(x)}{g'(x)}<A+\varepsilon. \]


\[ \frac{f(x)-f(y)}{g(x)-g(y)}=\frac{f'(t)}{g'(t)}<A+\varepsilon \]

最后一个不等式成立是因为\(t\in(x,y)\subset(a,a+\delta)\),而在\((a,a+\delta)\)中这个不等式成立。接下来,我们根据\(g(x)\)\(x \to a\)时的取值,分别讨论,会发现结果其实类似。

情况1.1: \(g(x)\to 0\)\(f(x) \to 0\)

在不等式(A)中,令\(x \to a\),会发现有不等式(B)成立:

\[ \frac{0-f(y)}{0-g(y)}=\frac{f(y)}{g(y)}=\frac{f'(t)}{g'(t)} \leq A+\varepsilon<q \]

更正式地说,对任意的\(q>A\), \(0<\varepsilon<q-A\),都存在\(\delta>0\),使得对任意的\(a<y<a+\delta\),均满足不等式(B)

\[ \frac{f(y)}{g(y)} \le A+\varepsilon<q. \]


情况1.2: \(g(x)\to+\infty\)

固定不等式(A)中的\(y\). 因为\(g(x) \to +\infty\),在\((a,y)\)一定存在\(c_1\)使得对于任意\(a<x<c_1\),总是有\(g(x)>g(y)\)\(g(x)>0\)同时成立。将不等式(A)两边同时乘以\([g(x)-g(y)]/g(x)\),我们得到

\[ \frac{f(x)-f(y)}{g(x)-g(y)}\frac{g(x)-g(y)}{g(x)}< (A+\varepsilon)\frac{g(x)-g(y)}{g(x)}. \]


\[ \frac{f(x)}{g(x)}<(A+\varepsilon)\left(1-\frac{g(y)}{g(x)}\right) +\frac{f(y)}{g(x)} \quad (a<x<c_1). \]

根据\(g(x) \to +\infty\)的定义,存在\(c_2 \in (a,c_1)\)使得不等式(D)成立:

\[ \frac{f(x)}{g(x)} < A+\varepsilon<q. \]


不等式(B)和(D)都只说明,存在\(c\in(a,b)\)使得对于所有\(x\in(a,c)\),满足\(\frac{f(x)}{g(x)}<q\).但是\(\frac{f(x)}{g(x)}\)\(A\)的关系并不知道。但是,如果\(A=-\infty\),那么我们已经证明了\(\frac{f(x)}{g(x)} \to -\infty\)的成立。

情况2: \(-\infty<{A}\leq+\infty\)

这个情况是和情况1完全类似的。同理可证,对任意\(p\),当且仅当\(p<A\)时,总有\(c'\in(a,b)\),使得对于所有\(x\in(a,c')\),满足\(p<\frac{f(x)}{g(x)}\). 如果\(A=+\infty\),那么我们已经证明了\(\frac{f(x)}{g(x)} \to +\infty\)的情况。

除去\(A=\pm\infty\),综合情况1和2,我们发现,对任意的\(p,q\)满足\(p<A<q\),若取\(c_0=\min\{c,c'\}\),则对于任意\(x \in (a,c_0)\),一定有

\[ p<\frac{f(x)}{g(x)}<q. \]

这其实就等价于\(\lim_{x \to a}\frac{f(x)}{g(x)}=A\). \(\square\)




\[ f'(t)/g'(t)<A+\varepsilon. \]

也就不满足\(f'(t)/g'(t) \to A\),与原假设冲突。