This blog post is intended to deliver a quick explanation of the
algebra of Borel measures on \(\mathbb{R}^n\). It will be broken into
pieces. All complex-valued complex Borel measures \(M(\mathbb{R}^n)\) clearly form a vector
space over \(\mathbb{C}\). The main
goal of this post is to show that this is a Banach space and also a
Banach algebra.

In fact, the \(\mathbb{R}^n\) case
can be generalised into any locally compact abelian group (see any
abstract harmonic analysis books), this is because what really matters
here is being locally compact and abelian. But at this moment we stick
to Euclidean spaces. Note since \(\mathbb{R}^n\) is \(\sigma\)-compact, all Borel measures are
regular.

To read this post you need to be familiar with some basic properties
of Banach algebra, complex Borel measures, and the most important,
Fubini's theorem.

In this post, we study the concept of generalised functions (a.k.a. distributions), and let's see how to evaluate the derivative no matter the function is differentiable or not.

Throughout we consider the Hilbert space \(L^2=L^2(\mathbb{R})\), the space of all
complex-valued functions with real variable such that \(f \in L^2\) if and only if \[
\lVert f \rVert_2^2=\int_{-\infty}^{\infty}|f(t)|^2dm(t)<\infty
\] where \(m\) denotes the
ordinary Lebesgue measure (in fact it's legitimate to consider Riemann
integral in this context).

For each \(t \geq 0\), we assign an
bounded linear operator \(Q(t)\) such
that \[
(Q(t)f)(s)=f(s+t).
\] This is indeed bounded since we have \(\lVert Q(t)f \rVert_2 = \lVert f \rVert_2\)
as the Lebesgue measure is translate-invariant. This is a left
translation operator with a single step \(t\).

Guided by researches in function theory, operator theorists gave the
analogue to quasi-analytic classes. Let \(A\) be an operator in a Banach space \(X\). \(A\)
is not necessarily bounded hence the domain \(D(A)\) is not necessarily to be the whole
space. We say \(x \in X\) is a \(C^\infty\) vector if \(x \in \bigcap_{n \geq 1}D(A^n)\). This is
quite intuitive if we consider the differential operator. A vector is
analytic if the series \[
\sum_{n=0}^{\infty}\lVert{A^n x}\rVert\frac{t^n}{n!}
\] has a positive radius of convergence. Finally, we say \(x\) is quasi-analytic for \(A\) provided that \[
\sum_{n=0}^{\infty}\left(\frac{1}{\lVert A^n x \rVert}\right)^{1/n} =
\infty
\] or equivalently its nondecreasing majorant. Interestingly, if
\(A\) is symmetric, then \(\lVert{A^nx}\rVert\) is log convex.

Based on the density of quasi-analytic vectors, we have an
interesting result.

(Theorem) Let \(A\)
be a symmetric operator in a Hilbert space \(\mathscr{H}\). If the set of quasi-analytic
vectors spans a dense subset, then \(A\) is essentially self-adjoint.

Suppose \(1 < p < \infty\) and
\(f \in L^p((0,\infty))\) (with respect
to Lebesgue measure of course) is a nonnegative function, take \[
F(x) = \frac{1}{x}\int_0^x f(t)dt \quad 0 < x <\infty,
\] we have Hardy's inequality \(\def\lrVert[#1]{\lVert #1 \rVert}\)\[
\lrVert[F]_p \leq q\lrVert[f]_p
\] where \(\frac{1}{p}+\frac{1}{q}=1\) of course.

There are several ways to prove it. I think there are several good
reasons to write them down thoroughly since that may be why you find
this page. Maybe you are burnt out since it's left as exercise.
You are assumed to have enough knowledge of Lebesgue measure and
integration.

Minkowski's integral
inequality

Let \(S_1,S_2 \subset \mathbb{R}\)
be two measurable set, suppose \(F:S_1 \times
S_2 \to \mathbb{R}\) is measurable, then \[
\left[\int_{S_2} \left\vert\int_{S_1}F(x,y)dx
\right\vert^pdy\right]^{\frac{1}{p}} \leq \int_{S_1} \left[\int_{S_2}
|F(x,y)|^p dy\right]^{\frac{1}{p}}dx.
\] A proof can be found at here
by turning to Example A9. You may need to replace all measures with
Lebesgue measure \(m\).

Now let's get into it. For a measurable function in this place we
should have \(G(x,t)=\frac{f(t)}{x}\).
If we put this function inside this inequality, we see \[
\begin{aligned}
\lrVert[F]_p &= \left[\int_0^\infty \left\vert \int_0^x
\frac{f(t)}{x}dt \right\vert^p dx\right]^{\frac{1}{p}} \\
&= \left[\int_0^\infty \left\vert \int_0^1 f(ux)du
\right\vert^p dx\right]^{\frac{1}{p}} \\
&\leq \int_0^1 \left[\int_0^\infty
|f(ux)|^pdx\right]^{\frac{1}{p}}du \\
&= \int_0^1 \left[\int_0^\infty
|f(ux)|^pudx\right]^{\frac{1}{p}}u^{-\frac{1}{p}}du \\
&= \lrVert[f]_p \int_0^1 u^{-\frac{1}{p}}du \\
&=q\lrVert[f]_p.
\end{aligned}
\] Note we have used change-of-variable twice and the inequality
once.

A constructive approach

I have no idea how people came up with this solution. Take \(xF(x)=\int_0^x f(t)t^{u}t^{-u}dt\) where
\(0<u<1-\frac{1}{p}\). Hölder's
inequality gives us \[
\begin{aligned}
xF(x) &= \int_0^x f(t)t^ut^{-u}dt \\
&\leq \left[\int_0^x
t^{-uq}dt\right]^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}}
\\
&=\left(\frac{1}{1-uq}x^{1-uq}\right)^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}}
\end{aligned}
\] Hence \[
\begin{aligned}
F(x)^p & \leq
\frac{1}{x^p}\left\{\left(\frac{1}{1-uq}x^{1-uq}\right)^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}}\right\}^{p}
\\
&=
\left(\frac{1}{1-uq}\right)^{\frac{p}{q}}x^{\frac{p}{q}(1-uq)-p}\int_0^x
f(t)^pt^{up}dt \\
&= \left(\frac{1}{1-uq}\right)^{p-1}x^{-up-1}\int_0^x f(t)^pt^{up}dt
\end{aligned}
\]

Note we have used the fact that \(\frac{1}{p}+\frac{1}{q}=1 \implies p+q=pq\)
and \(\frac{p}{q}=p-1\). Fubini's
theorem gives us the final answer: \[
\begin{aligned}
\int_0^\infty F(x)^pdx &\leq
\int_0^\infty\left[\left(\frac{1}{1-uq}\right)^{p-1}x^{-up-1}\int_0^x
f(t)^pt^{up}dt\right]dx \\
&=\left(\frac{1}{1-uq}\right)^{p-1}\int_0^\infty dx\int_0^x
f(t)^pt^{up}x^{-up-1}dt \\
&=\left(\frac{1}{1-uq}\right)^{p-1}\int_0^\infty dt\int_t^\infty
f(t)^pt^{up}x^{-up-1}dx \\
&=\left(\frac{1}{1-uq}\right)^{p-1}\frac{1}{up}\int_0^\infty
f(t)^pdt.
\end{aligned}
\] It remains to find the minimum of \(\varphi(u) =
\left(\frac{1}{1-uq}\right)^{p-1}\frac{1}{up}\). This is an
elementary calculus problem. By taking its derivative, we see when \(u=\frac{1}{pq}<1-\frac{1}{p}\) it
attains its minimum \(\left(\frac{p}{p-1}\right)^p=q^p\). Hence
we get \[
\int_0^\infty F(x)^pdx \leq q^p\int_0^\infty f(t)^pdt,
\] which is exactly what we want. Note the constant \(q\) cannot be replaced with a smaller one.
We simply proved the case when \(f \geq
0\). For the general case, one simply needs to take absolute
value.

Integration by parts

This approach makes use of properties of \(L^p\) space. Still we assume that \(f \geq 0\) but we also assume \(f \in C_c((0,\infty))\), that is, \(f\) is continuous and has compact support.
Hence \(F\) is differentiable in this
situation. Integration by parts gives \[
\int_0^\infty F^p(x)dx=xF(x)^p\vert_0^\infty- p\int_0^\infty xdF^p =
-p\int_0^\infty xF^{p-1}(x)F'(x)dx.
\] Note since \(f\) has compact
support, there are some \([a,b]\) such
that \(f >0\) only if \(0 < a \leq x \leq b < \infty\) and
hence \(xF(x)^p\vert_0^\infty=0\). Next
it is natural to take a look at \(F'(x)\). Note we have \[
F'(x) = \frac{f(x)}{x}-\frac{\int_0^x f(t)dt}{x^2},
\] hence \(xF'(x)=f(x)-F(x)\). A substitution
gives us \[
\int_0^\infty F^p(x)dx = -p\int_0^\infty F^{p-1}(x)[f(x)-F(x)]dx,
\] which is equivalent to say \[
\int_0^\infty F^p(x)dx = \frac{p}{p-1}\int_0^\infty F^{p-1}(x)f(x)dx.
\] Hölder's inequality gives us \[
\begin{aligned}
\int_0^\infty F^{p-1}(x)f(x)dx &\leq \left[\int_0^\infty
F^{(p-1)q}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty
f(x)^pdx\right]^{\frac{1}{p}} \\
&=\left[\int_0^\infty
F^{p}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty
f(x)^pdx\right]^{\frac{1}{p}}.
\end{aligned}
\] Together with the identity above we get \[
\int_0^\infty F^p(x)dx = q\left[\int_0^\infty
F^{p}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty
f(x)^pdx\right]^{\frac{1}{p}}
\] which is exactly what we want since \(1-\frac{1}{q}=\frac{1}{p}\) and all we need
to do is divide \(\left[\int_0^\infty
F^pdx\right]^{1/q}\) on both sides. So what's next? Note \(C_c((0,\infty))\) is dense in \(L^p((0,\infty))\). For any \(f \in L^p((0,\infty))\), we can take a
sequence of functions \(f_n \in
C_c((0,\infty))\) such that \(f_n \to
f\) with respect to \(L^p\)-norm. Taking \(F=\frac{1}{x}\int_0^x f(t)dt\) and \(F_n = \frac{1}{x}\int_0^x f_n(t)dt\), we
need to show that \(F_n \to F\)
pointwise, so that we can use Fatou's lemma. For \(\varepsilon>0\), there exists some \(m\) such that \(\lrVert[f_n-f]_p < \frac{1}{n}\). Thus
\[
\begin{aligned}
|F_n(x)-F(x)| &= \frac{1}{x}\left\vert \int_0^x f_n(t)dt - \int_0^x
f(t)dt \right\vert \\
&\leq \frac{1}{x} \int_0^x |f_n(t)-f(t)|dt \\
&\leq \frac{1}{x}
\left[\int_0^x|f_n(t)-f(t)|^pdt\right]^{\frac{1}{p}}\left[\int_0^x
1^qdt\right]^{\frac{1}{q}} \\
&=\frac{1}{x^{1/p}}\left[\int_0^x|f_n(t)-f(t)|^pdt\right]^{\frac{1}{p}}
\\
&\leq \frac{1}{x^{1/p}}\lrVert[f_n-f]_p
<\frac{\varepsilon}{x^{1/p}}.
\end{aligned}
\] Hence \(F_n \to F\)
pointwise, which also implies that \(|F_n|^p
\to |F|^p\) pointwise. For \(|F_n|\) we have \[
\begin{aligned}
\int_0^\infty |F_n(x)|^pdx &= \int_0^\infty
\left\vert\frac{1}{x}\int_0^x f_n(t)dt\right\vert^p dx \\
&\leq \int_0^\infty \left[\frac{1}{x}\int_0^x
|f_n(t)|dt\right]^{p}dx \\
&\leq q\int_0^\infty |f_n(t)|^pdt
\end{aligned}
\] note the third inequality follows since we have already proved
it for \(f \geq 0\). By Fatou's lemma,
we have \[
\begin{aligned}
\int_0^\infty |F(x)|^pdx &= \int_0^\infty \lim_{n \to
\infty}|F_n(x)|^pdx \\
&\leq \lim_{n \to \infty} \int_0^\infty |F_n(x)|^pdx \\
&\leq \lim_{n \to \infty}q^p\int_0^\infty |f_n(x)|^pdx \\
&=q^p\int_0^\infty |f(x)|^pdx.
\end{aligned}
\]

Throughout, let \((X,\mathfrak{M},\mu)\) be a measure space
where \(\mu\) is positive.

The question

If \(f\) is of \(L^p(\mu)\), which means \(\lVert f \rVert_p=\left(\int_X |f|^p
d\mu\right)^{1/p}<\infty\), or equivalently \(\int_X |f|^p d\mu<\infty\), then we may
say \(|f|^p\) is of \(L^1(\mu)\). In other words, we have a
function \[
\begin{aligned}
\lambda: L^p(\mu) &\to L^1(\mu) \\
f &\mapsto |f|^p.
\end{aligned}
\] This function does not have to be one to one due to absolute
value. But we hope this function to be fine enough, at the very
least, we hope it is continuous.

Here, \(f \sim g\) means that \(f-g\) equals \(0\) almost everywhere with respect to \(\mu\). It can be easily verified that this
is an equivalence relation.

Continuity

We still use the \(\varepsilon-\delta\) argument but it's in a
metric space. Suppose \((X,d_1)\) and
\((Y,d_2)\) are two metric spaces and
\(f:X \to Y\) is a function. We say
\(f\) is continuous at \(x_0 \in X\) if, for any \(\varepsilon>0\), there exists some \(\delta>0\) such that \(d_2(f(x_0),f(x))<\varepsilon\) whenever
\(d_1(x_0,x)<\delta\). Further, we
say \(f\) is continuous on \(X\) if \(f\) is continuous at every point \(x \in X\).

Metrics

For \(1\leq p<\infty\), we
already have a metric by \[
d(f,g)=\lVert f-g \rVert_p
\] given that \(d(f,g)=0\) if
and only if \(f \sim g\). This is
complete and makes \(L^p\) a Banach
space. But for \(0<p<1\) (yes we
are going to cover that), things are much more different, and there is
one reason: Minkowski inequality holds reversely! In fact, we have \[
\lVert f+g \rVert_p \geq \lVert f \rVert_p + \lVert g \rVert_p
\] for \(0<p<1\). \(L^p\) space has too many weird things when
\(0<p<1\). Precisely,

For \(0<p<1\), \(L^p(\mu)\) is locally convex if and only if
\(\mu\) assumes finitely many values.
(Proof.)

On the other hand, for example, \(X=[0,1]\) and \(\mu=m\) be the Lebesgue measure, then \(L^p(\mu)\) has no open convex
subset other than \(\varnothing\) and
\(L^p(\mu)\) itself. However,

A topological vector space \(X\) is
normable if and only if its origin has a convex bounded neighbourhood.
(See Kolmogorov's normability criterion.)

Therefore \(L^p(m)\) is not
normable, hence not Banach.

We have gone too far. We need a metric that is fine enough.

Metric of \(L^p\) when \(0<p<1\)

Define \[
\Delta(f)=\int_X |f|^p d\mu
\] for \(f \in L^p(\mu)\). We
will show that we have a metric by \[
d(f,g)=\Delta(f-g).
\] Fix \(y\geq 0\), consider the
function \[
f(x)=(x+y)^p-x^p.
\] We have \(f(0)=y^p\) and
\[
f'(x)=p(x+y)^{p-1}-px^{p-1} \leq px^{p-1}-px^{p-1}=0
\] when \(x > 0\) and hence
\(f(x)\) is nonincreasing on \([0,\infty)\), which implies that \[
(x+y)^p \leq x^p+y^p.
\] Hence for any \(f\), \(g \in L^p\), we have \[
\Delta(f+g)=\int_X |f+g|^p d\mu \leq \int_X |f|^p d\mu + \int_X |g|^p
d\mu=\Delta(f)+\Delta(g).
\] This inequality ensures that \[
d(f,g)=\Delta(f-g)
\] is a metric. It's immediate that \(d(f,g)=d(g,f) \geq 0\) for all \(f\), \(g \in
L^p(\mu)\). For the triangle inequality, note that \[
d(f,h)+d(g,h)=\Delta(f-h)+\Delta(h-g) \geq
\Delta((f-h)+(h-g))=\Delta(f-g)=d(f,g).
\] This is translate-invariant as well since \[
d(f+h,g+h)=\Delta(f+h-g-h)=\Delta(f-g)=d(f,g)
\] The completeness can be verified in the same way as the case
when \(p>1\). In fact, this metric
makes \(L^p\) a locally bounded
F-space.

The continuity of \(\lambda\)

The metric of \(L^1\) is defined by
\[
d_1(f,g)=\lVert f-g \rVert_1=\int_X |f-g|d\mu.
\] We need to find a relation between \(d_p(f,g)\) and \(d_1(\lambda(f),\lambda(g))\), where \(d_p\) is the metric of the corresponding
\(L^p\) space.

\(0<p<1\)

As we have proved, \[
(x+y)^p \leq x^p+y^p.
\] Without loss of generality we assume \(x \geq y\) and therefore \[
x^p=(x-y+y)^p \leq (x-y)^p+y^p.
\] Hence \[
x^p-y^p \leq (x-y)^p.
\] By interchanging \(x\) and
\(y\), we get \[
|x^p-y^p| \leq |x-y|^p.
\] Replacing \(x\) and \(y\) with \(|f|\) and \(|g|\) where \(f\), \(g \in
L^p\), we get \[
\int_{X}\lvert |f|^p-|g|^p \rvert d\mu \leq \int_X |f-g|^p d\mu.
\] But \[
d_1(\lambda(f),\lambda(g))=\int_{X}\lvert |f|^p-|g|^p \rvert d\mu \\
d_p(f,g)=\Delta(f-g)= d\mu \leq \int_X |f-g|^p d\mu
\] and we therefore have \[
d_1(\lambda(f),\lambda(g)) \leq d_p(f,g).
\] Hence \(\lambda\) is
continuous (and in fact, Lipschitz continuous and uniformly continuous)
when \(0<p<1\).

\(1 \leq p <
\infty\)

It's natural to think about Minkowski's inequality and Hölder's
inequality in this case since they are critical inequality enablers. You
need to think about some examples of how to create the condition to use
them and get a fine result. In this section we need to prove that \[
|x^p-y^p| \leq p|x-y|(x^{p-1}+y^{p-1}).
\] This inequality is surprisingly easy to prove however. We will
use nothing but the mean value theorem. Without loss of generality we
assume that \(x > y \geq 0\) and
define \(f(t)=t^p\). Then \[
\frac{f(x)-f(y)}{x-y}=f'(\zeta)=p\zeta^{p-1}
\] where \(y < \zeta <
x\). But since \(p-1 \geq 0\),
we see \(\zeta^{p-1} < x^{p-1}
<x^{p-1}+y^{p-1}\). Therefore \[
f(x)-f(y)=x^p-y^p=p(x-y)\zeta^{p-1}<p(x-y)(x^{p-1}-y^{p-1}).
\] For \(x=y\) the equality
holds.

Therefore \[
\begin{aligned}
d_1(\lambda(f),\lambda(g)) &= \int_X \left||f|^p-|g|^p\right|d\mu \\
&\leq
\int_Xp\left||f|-|g|\right|(|f|^{p-1}+|g|^{p-1})d\mu
\end{aligned}
\] By Hölder's inequality, we have \[
\begin{aligned}
\int_X ||f|-|g||(|f|^{p-1}+|g|^{p-1})d\mu & \leq \left[\int_X
\left||f|-|g|\right|^pd\mu\right]^{1/p}\left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q}
\\
&\leq \left[\int_X
\left|f-g\right|^pd\mu\right]^{1/p}\left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q}
\\
&=\lVert f-g \rVert_p
\left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q}.
\end{aligned}
\] By Minkowski's inequality, we have \[
\left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q} \leq
\left[\int_X|f|^{(p-1)q}d\mu\right]^{1/q}+\left[\int_X
|g|^{(p-1)q}d\mu\right]^{1/q}
\] Now things are clear. Since \(1/p+1/q=1\), or equivalently \(1/q=(p-1)/p\), suppose \(\lVert f \rVert_p\), \(\lVert g \rVert_p \leq R\), then \((p-1)q=p\) and therefore \[
\left[\int_X|f|^{(p-1)q}d\mu\right]^{1/q}+\left[\int_X
|g|^{(p-1)q}d\mu\right]^{1/q} = \lVert f \rVert_p^{p-1}+\lVert g
\rVert_p^{p-1} \leq 2R^{p-1}.
\] Summing the inequalities above, we get \[
\begin{aligned}
d_1(\lambda(f),\lambda(g)) \leq 2pR^{p-1}\lVert f-g \rVert_p
=2pR^{p-1}d_p(f,g)
\end{aligned}
\] hence \(\lambda\) is
continuous.

Conclusion and further

We have proved that \(\lambda\) is
continuous, and when \(0<p<1\),
we have seen that \(\lambda\) is
Lipschitz continuous. It's natural to think about its differentiability
afterwards, but the absolute value function is not even differentiable
so we may have no chance. But this is still a fine enough result. For
example we have no restriction to \((X,\mathfrak{M},\mu)\) other than the
positivity of \(\mu\). Therefore we may
take \(\mathbb{R}^n\) as the Lebesgue
measure space here, or we can take something else.

It's also interesting how we use elementary Calculus to solve some
much more abstract problems.

(Before everything: elementary background in topology and vector
spaces, in particular Banach spaces, is assumed.)

A surprising result of
Banach spaces

We can define several relations between two norms. Suppose we have a
topological vector space \(X\) and two
norms \(\lVert \cdot \rVert_1\) and
\(\lVert \cdot \rVert_2\). One says
\(\lVert \cdot \rVert_1\) is
weaker than \(\lVert \cdot
\rVert_2\) if there is \(K>0\) such that \(\lVert x \rVert_1 \leq K \lVert x
\rVert_2\) for all \(x \in X\).
Two norms are equivalent if each is weaker than the other
(trivially this is a equivalence relation). The idea of stronger and
weaker norms is related to the idea of the "finer" and "coarser"
topologies in the setting of topological spaces.

So what about their limit? Unsurprisingly this can be verified with
elementary \(\epsilon-N\) arguments.
Suppose now \(\lVert x_n - x \rVert_1 \to
0\) as \(n \to 0\), we
immediately have

\[
\lVert x_n - x \rVert_2 \leq K \lVert x_n-x \rVert_1 < K\varepsilon
\]

for some large enough \(n\). Hence
\(\lVert x_n - x \rVert_2 \to 0\) as
well. But what about the converse? We give a new definition of
equivalence relation between norms.

(Definition) Two norms \(\lVert \cdot \rVert_1\) and \(\lVert \cdot \rVert_2\) of a topological
vector space are compatible if given that \(\lVert x_n - x \rVert_1 \to 0\) and \(\lVert x_n - y \rVert_2 \to 0\) as \(n \to \infty\), we have \(x=y\).

By the uniqueness of limit, we see if two norms are equivalent, then
they are compatible. And surprisingly, with the help of the closed graph
theorem we will discuss in this post, we have

(Theorem 1) If \(\lVert
\cdot \rVert_1\) and \(\lVert \cdot
\rVert_2\) are compatible, and both \((X,\lVert\cdot\rVert_1)\) and \((X,\lVert\cdot\rVert_2)\) are Banach, then
\(\lVert\cdot\rVert_1\) and \(\lVert\cdot\rVert_2\) are equivalent.

This result looks natural but not seemingly easy to prove, since one
find no way to build a bridge between the limit and a general
inequality. But before that, we need to elaborate some
terminologies.

Preliminaries

(Definition) For \(f:X \to
Y\), the graph of \(f\)
is defined by

\[
G(f)=\{(x,f(x)) \in X \times Y:x \in X\}.
\]

If both \(X\) and \(Y\) are topological spaces, and the
topology of \(X \times Y\) is the usual
one, that is, the smallest topology that contains all sets \(U \times V\) where \(U\) and \(V\) are open in \(X\) and \(Y\) respectively, and if \(f: X \to Y\) is continuous, it is natural
to expect \(G(f)\) to be closed. For
example, by taking \(f(x)=x\) and \(X=Y=\mathbb{R}\), one would expect the
diagonal line of the plane to be closed.

(Definition) The topological space \((X,\tau)\) is an \(F\)-space if \(\tau\) is induced by a complete invariant
metric \(d\). Here invariant means that
\(d(x+z,y+z)=d(x,y)\) for all \(x,y,z \in X\).

A Banach space is easily to be verified to be a \(F\)-space by defining \(d(x,y)=\lVert x-y \rVert\).

In this case, we say \(f\) is
closed. For continuous functions, things are trivial.

(Proposition 2) If \(X\) and \(Y\) are two topological spaces and \(Y\) is Hausdorff, and \(f:X \to Y\) is continuous, then \(G(f)\) is closed.

Proof. Let \(G^c\) be the
complement of \(G(f)\) with respect to
\(X \times Y\). Fix \((x_0,y_0) \in G^c\), we see \(y_0 \neq f(x_0)\). By the Hausdorff
property of \(Y\), there exists some
open subsets \(U \subset Y\) and \(V \subset Y\) such that \(y_0 \in U\) and \(f(x_0) \in V\) and \(U \cap V = \varnothing\). Since \(f\) is continuous, we see \(W=f^{-1}(V)\) is open in \(X\). We obtained a open neighborhood \(W \times U\) containing \((x_0,y_0)\) which has empty intersection
with \(G(f)\). This is to say, every
point of \(G^c\) has a open
neighborhood contained in \(G^c\),
hence a interior point. Therefore \(G^c\) is open, which is to say that \(G(f)\) is closed. \(\square\)

REMARKS. For \(X \times
Y=\mathbb{R} \times \mathbb{R}\), we have a simple visualization.
For \(\varepsilon>0\), there exists
some \(\delta\) such that \(|f(x)-f(x_0)|<\varepsilon\) whenever
\(|x-x_0|<\delta\). For \(y_0 \neq f(x_0)\), pick \(\varepsilon\) such that \(0<\varepsilon<\frac{1}{2}|f(x_0)-y_0|\),
we have two boxes (\(CDEF\) and \(GHJI\) on the picture), namely

In this case, \(B_2\) will not
intersect the graph of \(f\), hence
\((x_0,y_0)\) is an interior point of
\(G^c\).

The Hausdorff property of \(Y\) is
not removable. To see this, since \(X\)
has no restriction, it suffices to take a look at \(X \times X\). Let \(f\) be the identity map (which is
continuous), we see the graph

\[
G(f)=\{(x,x):x \in X\}
\]

is the diagonal. Suppose \(X\) is
not Hausdorff, we reach a contradiction. By definition, there exists
some distinct \(x\) and \(y\) such that all neighborhoods of \(x\) contain \(y\). Pick \((x,y)
\in G^c\), then all neighborhoods of \((x,y) \in X \times X\) contain \((x,x)\) so \((x,y) \in G^c\) is not a interior
point of \(G^c\), hence \(G^c\) is not open.

Also, as an immediate consequence, every affine algebraic variety in
\(\mathbb{C}^n\) and \(\mathbb{R}^n\) is closed with respect to
Euclidean topology. Further, we have the Zariski topology \(\mathcal{Z}\) by claiming that, if \(V\) is an affine algebraic variety, then
\(V^c \in \mathcal{Z}\). It's worth
noting that \(\mathcal{Z}\) is
not Hausdorff (example?) and in fact much coarser than the
Euclidean topology although an affine algebraic variety is both closed
in the Zariski topology and the Euclidean topology.

The closed graph theorem

After we have proved this theorem, we are able to prove the theorem
about compatible norms. We shall assume that both \(X\) and \(Y\) are \(F\)-spaces, since the norm plays no
critical role here. This offers a greater variety but shall not be
considered as an abuse of abstraction.

(The Closed Graph Theorem) Suppose

\(X\) and \(Y\) are \(F\)-spaces,

\(f:X \to Y\) is
linear,

\(G(f)\) is closed in \(X \times Y\).

Then \(f\) is continuous.

In short, the closed graph theorem gives a sufficient condition to
claim the continuity of \(f\) (keep in
mind, linearity does not imply continuity). If \(f:X \to Y\) is continuous, then \(G(f)\) is closed; if \(G(f)\) is closed and \(f\) is linear, then \(f\) is continuous.

Proof. First of all we should make \(X \times Y\) an \(F\)-space by assigning addition, scalar
multiplication and metric. Addition and scalar multiplication are
defined componentwise in the nature of things:

Then it can be verified that \(X \times
Y\) is a topological space with translate invariant metric.
(Potentially the verifications will be added in the future but it's
recommended to do it yourself.)

Since \(f\) is linear, the graph
\(G(f)\) is a subspace of \(X \times Y\). Next we quote an elementary
result in point-set topology, a
subset of a complete metric space is closed if and only if it's
complete, by the translate-invariance of \(d\), we see \(G(f)\) is an \(F\)-space as well. Let \(p_1: X \times Y \to X\) and \(p_2: X \times Y \to Y\) be the natural
projections respectively (for example, \(p_1(x,y)=x\)). Our proof is done by
verifying the properties of \(p_1\) and
\(p_2\) on \(G(f)\).

For simplicity one can simply define \(p_1\) on \(G(f)\) instead of the whole space \(X \times Y\), but we make it a global
projection on purpose to emphasize the difference between global
properties and local properties. One can also write \(p_1|_{G(f)}\) to dodge confusion.

Claim 1.\(p_1\)
(with restriction on \(G(f)\)) defines
an isomorphism between \(G(f)\) and
\(X\).

For \(x \in X\), we see \(p_1(x,f(x)) = x\) (surjectivity). If \(p_1(x,f(x))=0\), we see \(x=0\) and therefore \((x,f(x))=(0,0)\), hence the restriction of
\(p_1\) on \(G\) has trivial kernel (injectivity).
Further, it's trivial that \(p_1\) is
linear.

Claim 2.\(p_1\) is
continuous on \(G(f)\).

For every sequence \((x_n)\) such
that \(\lim_{n \to \infty}x_n=x\), we
have \(\lim_{n \to \infty}f(x_n)=f(x)\)
since \(G(f)\) is closed, and therefore
\(\lim_{n \to \infty}p_1(x_n,f(x_n))
=x\). Meanwhile \(p_1(x,f(x))=x\). The continuity of \(p_1\) is proved.

Claim 3.\(p_1\) is
a homeomorphism with restriction on \(G(f)\).

We already know that \(G(f)\) is an
\(F\)-space, so is \(X\). For \(p_1\) we have \(p_1(G(f))=X\) is of the second category
(since it's an \(F\)-space and \(p_1\) is one-to-one), and \(p_1\) is continuous and linear on \(G(f)\). By the open mapping theorem, \(p_1\) is an open mapping on \(G(f)\), hence is a homeomorphism
thereafter.

Claim 4.\(p_2\) is
continuous.

This follows the same way as the proof of claim 2 but much easier
since there is no need to care about \(f\).

Now things are immediate once one realises that \(f=p_2 \circ p_1|_{G(f)}^{-1}\), which
implies that \(f\) is continuous. \(\square\)

Applications

Before we go for theorem 1 at the beginning, we drop an application
on Hilbert spaces.

Let \(T\) be a bounded operator on
the Hilbert space \(L_2([0,1])\) so
that if \(\phi \in L_2([0,1])\) is a
continuous function so is \(T\phi\).
Then the restriction of \(T\) to \(C([0,1])\) is a bounded operator of \(C([0,1])\).

\[
\begin{aligned}
f:(X,\lVert\cdot\rVert_1) &\to (X,\lVert\cdot\rVert_2) \\
x &\mapsto x
\end{aligned}
\]

i.e. the identity map between two Banach spaces (hence \(F\)-spaces). Then \(f\) is linear. We need to prove that \(G(f)\) is closed. For the convergent
sequence \((x_n)\)

Hence \(G(f)\) is closed. Therefore
\(f\) is continuous, hence bounded, we
have some \(K\) such that

\[
\lVert x \rVert_2 =\lVert f(x) \rVert_1 \leq K \lVert x \rVert_1.
\]

By defining

\[
\begin{aligned}
g:(X,\lVert\cdot\rVert_2) &\to (X,\lVert\cdot\rVert_1) \\
x &\mapsto x
\end{aligned}
\]

we see \(g\) is continuous as well,
hence we have some \(K'\) such
that

\[
\lVert x \rVert_1 =\lVert g(x) \rVert_2 \leq K'\lVert x \rVert_2
\]

Hence two norms are weaker than each other.

The series

Since there is no strong reason to write more posts on this topic,
i.e. the three fundamental theorems of linear functional analysis, I
think it's time to make a list of the series. It's been around half a
year.

(Gleason-Kahane-Żelazko) If \(\phi\) is a complex linear functional on a
unitary Banach algebra \(A\), such that
\(\phi(e)=1\) and \(\phi(x) \neq 0\) for every invertible \(x \in A\), then \[
\phi(xy)=\phi(x)\phi(y)
\] Namely, \(\phi\) is a complex
homomorphism.

Notations and remarks

Suppose \(A\) is a complex unitary
Banach algebra and \(\phi: A \to
\mathbb{C}\) is a linear functional which is not identically
\(0\) (for convenience), and if \[
\phi(xy)=\phi(x)\phi(y)
\] for all \(x \in A\) and \(y \in A\), then \(\phi\) is called a complex
homomorphism on \(A\). Note that a
unitary Banach algebra (with \(e\) as
multiplicative unit) is also a ring, so is \(\mathbb{C}\), we may say in this case \(\phi\) is a ring-homomorphism. For such
\(\phi\), we have an instant
proposition:

Proposition 0\(\phi(e)=1\) and \(\phi(x) \neq 0\) for every invertible \(x \in A\).

Proof. Since \(\phi(e)=\phi(ee)=\phi(e)\phi(e)\), we have
\(\phi(e)=0\) or \(\phi(e)=1\). If \(\phi(e)=0\) however, for any \(y \in A\), we have \(\phi(y)=\phi(ye)=\phi(y)\phi(e)=0\), which
is an excluded case. Hence \(\phi(e)=1\).

For invertible \(x \in A\), note
that \(\phi(xx^{-1})=\phi(x)\phi(x^{-1})=\phi(e)=1\).
This can't happen if \(\phi(x)=0\).
\(\square\)

The theorem reveals that Proposition \(0\) actually characterizes the complex
homomorphisms (ring-homomorphisms) among the linear functionals
(group-homomorphisms).

This theorem was proved by Andrew M. Gleason in 1967 and later
independently by J.-P. Kahane and W. Żelazko in 1968. Both of them
worked mainly on commutative Banach algebras, and the non-commutative
version, which focused on complex homomorphism, was by W. Żelazko. In
this post we will follow the third one.

Unfortunately, one cannot find an educational proof on the Internet
with ease, which may be the reason why I write this post and why you
read this.

Equivalences

Following definitions of Banach algebra and some logic manipulation,
we have several equivalences worth noting.

Subspace and ideal version

(Stated by Gleason) Let \(M\) be a linear subspace of codimension one
in a commutative Banach algebra \(A\)
having an identity. Suppose no element of \(M\) is invertible, then \(M\) is an ideal.

(Stated by Kahane and Żelazko) A subspace \(X \subset A\) of codimension \(1\) is a maximal ideal if and only if it
consists of non-invertible elements.

Spectrum version

(Stated by Kahane and Żelazko) Let \(A\) be a commutative complex Banach algebra
with unit element. Then a functional \(f \in
A^\ast\) is a multiplicative linear functional if and only if
\(f(x)=\sigma(x)\) holds for all \(x \in A\).

Here \(\sigma(x)\) denotes the
spectrum of \(x\).

The connection

Clearly any maximal ideal contains no invertible element (if so, then
it contains \(e\), then it's the ring
itself). So it suffices to show that it has codimension 1, and if it
consists of non-invertible elements. Also note that every maximal ideal
is the kernel of some complex homomorphism. For such a subspace \(X \subset A\), since \(e \notin X\), we may define \(\phi\) so that \(\phi(e)=1\), and \(\phi(x) \in \sigma(x)\) for all \(x \in A\). Note that \(\phi(e)=1\) holds if and only if \(\phi(x) \in \sigma(x)\). As we will show,
\(\phi\) has to be a complex
homomorphism.

Tools to prove the theorem

Lemma 0 Suppose \(A\) is a unitary Banach algebra, \(x \in A\), \(\lVert x \rVert<1\), then \(e-x\) is invertible.

This lemma can be found in any functional analysis book introducing
Banach algebra.

Lemma 1 Suppose \(f\) is an entire function of one complex
variable, \(f(0)=1\), \(f'(0)=0\), and \[
0<|f(\lambda)| \leq e^{|\lambda|}
\] for all complex \(\lambda\),
then \(f(\lambda)=1\) for all \(\lambda \in \mathbb{C}\).

Note that there is an entire function \(g\) such that \(f=\exp(g)\). It can be shown that \(g=0\). Indeed, if we put \[
h_r(\lambda) = \frac{r^2g(\lambda)}{\lambda^2[2r-g(\lambda)]}
\] then we see \(h_r\) is
holomorphic in the open disk centred at \(0\) with radius \(2r\). Besides, \(|h_r(\lambda)| \leq 1\) if \(|\lambda|=r\). By the maximum modulus
theorem, we have \[
|h_r(\lambda)| \leq 1
\] whenever \(|\lambda| \leq
r\). Fix \(\lambda\) and let
\(r \to \infty\), by definition of
\(h_r(\lambda)\), we must have \(g(\lambda)=0\).

Jordan homomorphism

A mapping \(\phi\) from one ring
\(R\) to another ring \(R'\) is said to be a Jordan
homomorphism from \(R\) to
\(R'\) if \[
\phi(a+b)=\phi(a)+\phi(b)
\] and \[
\phi(ab+ba)=\phi(a)\phi(b)+\phi(b)\phi(a).
\] It's of course clear that every homomorphism is Jordan. Note
if \(R'\) is not of characteristic
\(2\), the second identity is
equivalent to \[
\phi(a^2)=\phi(a)^2.
\]To show the equivalence, one let \(b=a\) in the first case and puts \(a+b\) in place of \(a\) in the second case.

Since in this case \(R=A\) and \(R'=\mathbb{C}\), the latter of which is
commutative, we also write \[
\phi(ab+ba)=2\phi(a)\phi(b).
\] As we will show, the \(\phi\)
in the theorem is a Jordan homomorphism.

The proof

We will follow an unusual approach. By keep 'downgrading' the goal,
one will see this algebraic problem be transformed into a pure analysis
problem neatly.

To begin with, let \(N\) be the
kernel of \(\phi\).

Step
1 - It suffices to prove that \(\phi\)
is a Jordan homomorphism

If \(\phi\) is a complex
homomorphism, it is immediate that \(\phi\) is a Jordan homomorphism.
Conversely, if \(\phi\) is Jordan, we
have \[
\phi(xy+yx) =2\phi(x)\phi(y).
\] If \(x\in N\), the right hand
becomes \(0\), and therefore \[
xy+yx \in N \quad \text{if } x \in N, y \in A.
\] Consider the identity \[
(xy-yx)^2+(xy+yx)^2=2[x(yxy)+(yxy)x]
\]

Therefore \[
\begin{aligned}
\phi((xy-yx)^2+(xy+yx)^2)&=\phi((xy-yx)^2)+\phi((xy+yx)^2) \\
&=\phi(xy-yx)^2+\phi(xy+yx)^2 \\
&= \phi(xy-yx)^2 \\
&=2\phi[x(yxy)+(yxy)x] \\
&=0
\end{aligned}
\] Since \(x \in N\) and \(yxy \in A\), we see \(x(yxy)+(yxy)x \in N\). Therefore \(\phi(xy-yx)=0\) and \[
xy-yx \in N
\] if \(x \in N\) and \(y \in A\). Further we see \[
xy-yx+xy+yx=2xy \in N \quad \text {and}\quad xy+yx-xy+yx = 2yx \in N,
\] which implies that \(N\) is
an ideal. This may remind you of this classic diagram (we will not use
it since it is additive though):

For \(x,y \in A\), we have \(x \in \phi(x)e+N\) and \(y \in \phi(y)e+N\). As a result, \(xy \in \phi(x)\phi(y)e+N\), and therefore
\[
\phi(xy)=\phi(x)\phi(y)+0.
\]

Step 2 - It
suffices to prove that \(\phi(a^2)=0\)
if \(\phi(a)=0\).

Again, if \(\phi\) is Jordan, we
have \(\phi(x^2)=\phi(x)^2\) for all
\(x \in A\). Conversely, if \(\phi(a^2)=0\) for all \(a \in N\), we may write \(x\) by \[
x=\phi(x)e+a
\] where \(a \in N\) for all
\(x \in A\). Therefore \[
\begin{aligned}
\phi(x^2)&=\phi((\phi(x)e+a)^2)=\phi(x)^2+2\phi(x)\phi(a)+\phi(a)^2=\phi(x)^2,
\end{aligned}
\] which also shows that \(\phi\) is Jordan.

Step
3 - It suffices to show that the following function is constant

Fix \(a \in N\), assume \(\lVert a \rVert = 1\) without loss of
generality, and define \[
f(\lambda)=\sum_{n=0}^{\infty}\frac{\phi(a^n)}{n!}\lambda^n
\] for all complex \(\lambda\).
If this function is constant (lemma 1), we immediately have \(f''(0)=\phi(a^2)=0\). This is
purely a complex analysis problem however.

Step
4 - It suffices to describe the behaviour of an entire function

Note in the definition of \(f\), we
have \[
\lvert \phi(a^n) \rvert \leq \lVert \phi \rVert \lVert a^n \rVert \leq
\lVert \phi \rVert \lVert a \rVert^n=\lVert \phi \rVert.
\] So we expect the norm of \(\phi\) to be finite, which ensures that
\(f\) is entire. By reductio ad
absurdum, if \(\lVert e-a \rVert <
1\) for \(a \in N\), by lemma 0,
we have \(e-e+a=a\) to be invertible,
which is impossible. Hence \(\lVert e-a \rVert
\geq 1\) for all \(a \in N\). On
the other hand, for \(\lambda \in
\mathbb{C}\), we have the following inequality: \[
\begin{aligned}
\lVert \lambda e-a \rVert = \lambda\lVert e-\lambda^{-1}a \rVert
&\geq|\lambda| \\
&= |\phi(\lambda e)-\phi(a)| \\
&= |\phi(\lambda e-a)|
\end{aligned}
\] Therefore \(\phi\) is
continuous with norm less than \(1\). The continuity of \(\phi\) is not assumed at the beginning but
proved here.

For \(f\) we have some immediate
facts. Since each coefficient in the series of \(f\) has finite norm, \(f\) is entire with \(f'(0)=\phi(a)=0\). Also, since \(\phi\) has norm \(1\), we also have \[
|f(\lambda)|=\left|\sum_{n=0}^{\infty}\frac{\phi(a^n)}{n!}\lambda^n\right|
\leq \sum_{n=0}^{\infty}\frac{|\lambda^n|}{n!}=e^{|\lambda|}.
\] All we need in the end is to show that \(f(\lambda) \neq 0\) for all \(\lambda \in \mathbb{C}\).

The series \[
E(\lambda)=\exp(a\lambda)=\sum_{n=0}^{\infty}\frac{(\lambda a)^n}{n!}
\] converges since \(\lVert a
\rVert=1\). The continuity of \(\phi\) shows now \[
f(\lambda)=\phi(E(\lambda)).
\] Note \[
E(-\lambda)E(\lambda)=\left(\sum_{n=0}^{\infty}\frac{(-\lambda
a)^n}{n!}\right)\left(\sum_{n=0}^{\infty}\frac{(\lambda
a)^n}{n!}\right)=e.
\] Hence \(E(\lambda)\)is invertible for all \(\lambda \in
C\), hence \(f(\lambda)=\phi(E(\lambda)) \neq 0\). By
lemma 1, \(f(\lambda)=1\) is constant.
The proof is completed by reversing the steps. \(\square\)

References / Further reading

Walter Rudin, Real and Complex Analysis

Walter Rudin, Functional Analysis

Andrew M. Gleason, A Characterization of Maximal
Ideals

J.-P. Kahane and W. Żelazko, A Characterization of Maximal
Ideals in Commutative Banach Algebras

W. Żelazko A Characterization of Multiplicative linear
functionals in Complex Banach Algebras

The Hahn-Banach theorem has been a central tool for functional
analysis and therefore enjoys a wide variety, many of which have a
numerous uses in other fields of mathematics. Therefore it's not
possible to cover all of them. In this post we are covering two
'abstract enough' results, which are sometimes called the dominated
extension theorem. Both of them will be discussed in real vector space
where topology is not endowed. This allows us to discuss any topological
vector space.

Another interesting thing is, we will be using axiom of choice, or
whatever equivalence you may like, for example Zorn's lemma or
well-ordering principle. Before everything, we need to examine more
properties of vector spaces.

Vector space

It's obvious that every complex vector space is also a real vector
space. Suppose \(X\) is a complex
vector space, and we shall give the definition of real-linear and
complex-linear functionals.

An addictive functional \(\Lambda\)
on \(X\) is called real-linear
(complex-linear) if \(\Lambda(\alpha
x)=\alpha\Lambda(x)\) for every \(x \in
X\) and for every real (complex) scalar \(\alpha\).

For *-linear functionals, we have two important but easy
theorems.

If \(u\) is the real part of a
complex-linear functional \(f\) on
\(X\), then \(u\) is real-linear and \[
f(x)=u(x)-iu(ix) \quad (x \in X).
\]

Proof. For complex \(f(x)=u(x)+iv(x)\), it suffices to denote
\(v(x)\) correctly. But \[
if(x)=iu(x)-v(x),
\] we see \(\Im(f(x)=v(x)=-\Re(if(x))\). Therefore
\[
f(x)=u(x)-i\Re(if(x))=u(x)-i\Re(f(ix))
\] but \(\Re(f(ix))=u(ix)\), we
get \[
f(x)=u(x)-iu(ix).
\] To show that \(u(x)\) is
real-linear, note that \[
f(x+y)=u(x+y)+iv(x+y)=f(x)+f(y)=u(x)+u(y)+i(v(x)+v(y)).
\] Therefore \(u(x)+u(y)=u(x+y)\). Similar process can be
applied to real scalar \(\alpha\).
\(\square\)

Conversely, we are able to generate a complex-linear functional by a
real one.

If \(u\) is a real-linear
functional, then \(f(x)=u(x)-iu(ix)\)
is a complex-linear functional

Proof. Direct computation. \(\square\)

Suppose now \(X\) is a complex
topological vector space, we see a complex-linear functional on \(X\) is continuous if and only if its real
part is continuous. Every continuous real-linear \(u: X \to \mathbb{R}\) is the real part of a
unique complex-linear continuous functional \(f\).

Sublinear, seminorm

Sublinear functional is 'almost' linear but also 'almost' a norm.
Explicitly, we say \(p: X \to
\mathbb{R}\) a sublinear functional when it satisfies \[
\begin{aligned}
p(x)+p(y) &\leq p(x+y) \\
p(tx) &= tp(x) \\
\end{aligned}
\] for all \(t \geq 0\). As one
can see, if \(X\) is normable, then
\(p(x)=\lVert x \rVert\) is a sublinear
functional. One should not be confused with semilinear functional, where
inequality is not involved. Another thing worth noting is that \(p\) is not restricted to be
nonnegative.

A seminorm on a vector space \(X\)
is a real-valued function \(p\) on
\(X\) such that \[
\begin{aligned}
p(x+y) &\leq p(x)+p(y) \\
p(\alpha x)&=|\alpha|p(x)
\end{aligned}
\] for all \(x,y \in X\) and
scalar \(\alpha\).

Obviously a seminorm is also a sublinear functional. For the
connection between norm and seminorm, one shall note that \(p\) is a norm if and only if it satisfies
\(p(x) \neq 0\) if \(x \neq 0\).

Dominated extension theorems

Are the results will be covered in this post. Generally speaking, we
are able to extend a functional defined on a subspace to the whole space
as long as it's dominated by a sublinear functional. This is similar to
the dominated convergence theorem, which states that if a convergent
sequence of measurable functions are dominated by another function, then
the convergence holds under the integral operator.

(Hahn-Banach) Suppose

\(M\) is a subspace of a real
vector space \(X\),

\(f: M \to \mathbb{R}\) is linear
and \(f(x) \leq p(x)\) on \(M\) where \(p\) is a sublinear functional on \(X\)

Then there exists a linear \(\Lambda: X \to
\mathbb{R}\) such that \[
\Lambda(x)=f(x)
\] for all \(x \in M\) and \[
-p(-x) \leq \Lambda(x) \leq p(x)
\] for all \(x \in X\).

Step 1 -
Extending the function by one dimension

With that being said, if \(f(x)\) is
dominated by a sublinear functional, then we are able to extend this
functional to the whole space with a relatively proper range.

Proof. If \(M=X\) we have
nothing to do. So suppose now \(M\) is
a nontrivial proper subspace of \(X\).
Choose \(x_1 \in X-M\) and define \[
M_1=\{x+tx_1:x \in M,t \in R\}.
\] It's easy to verify that \(M_1\) satisfies all axioms of vector space
(warning again: no topology is endowed). Now we will be using the
properties of sublinear functionals.

Since \[
f(x)+f(y)=f(x+y) \leq p(x+y) \leq p(x-x_1)+p(x_1+y)
\] for all \(x,y \in M\), we
have \[
f(x)-p(x-x_1) \leq p(x_1+y) -f(y).
\] Let \[
\alpha=\sup_{x}\{f(x)-p(x-x_1):x \in M\}.
\] By definition, we naturally get \[
f(x)-\alpha \leq p(x-x_1)
\] and \[
f(y)+\alpha \leq p(x_1+y).
\] Define \(f_1\) on \(M_1\) by \[
f_1(x+tx_1)=f(x)+t\alpha.
\] So when \(x +tx_1 \in M\), we
have \(t=0\), and therefore \(f_1=f\).

To show that \(f_1 \leq p\) on \(M_1\), note that for \(t>0\), we have \[
f(x/t)-\alpha \leq p(x/t-x_1),
\] which implies \[
f(x)-t\alpha=f_1(x-t\alpha)\leq p(x-tx_1).
\] Similarly, \[
f(y/t)+\alpha \leq p(y/t+x_1),
\] and therefore \[
f(y)+t\alpha=f_1(y+tx_1) \leq p(y+tx_1).
\] Hence \(f_1 \leq p\).

Step 2 - An application
of Zorn's lemma

Side note: Why Zorn's lemma

It seems that we can never stop using step 1 to extend \(M\) to a larger space, but we have to
extend. (If \(X\) is a finite
dimensional space, then this is merely a linear algebra problem.) This
meets exactly what William Timothy Gowers said in his blog post:

If you are building a mathematical object in stages and find that (i)
you have not finished even after infinitely many stages, and (ii) there
seems to be nothing to stop you continuing to build, then Zorn’s lemma
may well be able to help you.

-- How to use Zorn's lemma

And we will show that, as W. T. Gowers said,

If the resulting partial order satisfies the chain condition and if a
maximal element must be a structure of the kind one is trying to build,
then the proof is complete.

To apply Zorn's lemma, we need to construct a partially ordered set.
Let \(\mathscr{P}\) be the collection
of all ordered pairs \((M',f')\) where \(M'\) is a subspace of \(X\) containing \(M\) and \(f'\) is a linear functional on \(M'\) that extends \(f\) and satisfies \(f' \leq p\) on \(M'\). For example we have \[
(M,f) , (M_1,f_1) \subset \mathscr{P}.
\] The partial order \(\leq\) is
defined as follows. By \((M',f') \leq
(M'',f'')\), we mean \(M' \subset M''\) and \(f' = f''\) on \(M'\). Obviously this is a partial order
(you should be able to check this).

Suppose now \(\mathcal{F}\) is a
chain (totally ordered subset of \(\mathscr{P}\)). We claim that \(\mathcal{F}\) has an upper bound (which is
required by Zorn's lemma). Let \[
M_0=\bigcup_{(M',f') \in \mathcal{F}}M'
\] and \[
f_0(y)=f(y)
\] whenever \((M',f') \in
\mathcal{F}\) and \(y \in
M'\). It's easy to verify that \((M_0,f_0)\) is the upper bound we are
looking for. But \(\mathcal{F}\) is
arbitrary, therefore by Zorn's lemma, there exists a maximal element
\((M^\ast,f^\ast)\) in \(\mathscr{P}\). If \(M^* \neq X\), according to step 1, we are
able to extend \(M^\ast\), which
contradicts the maximality of \(M^\ast\). And \(\Lambda\) is defined to be \(f^\ast\). By the linearity of \(\Lambda\), we see \[
-p(-x) \leq -\Lambda(-x)=\Lambda{x}.
\] The theorem is proved. \(\square\)

How this proof is
constructed

This is a classic application of Zorn's lemma (well-ordering
principle, or Hausdorff maximality theorem). First, we showed that we
are able to extend \(M\) and \(f\). But since we do not know the dimension
or other properties of \(X\), it's not
easy to control the extension which finally 'converges' to \((X,\Lambda)\). However, Zorn's lemma saved
us from this random exploration: Whatever happens, the maximal element
is there, and take it to finish the proof.

Generalisation onto the
complex field

Since inequality is appeared in the theorem above, we need more
careful validation.

(Bohnenblust-Sobczyk-Soukhomlinoff) Suppose \(M\) is a subspace of a vector space \(X\), \(p\)
is a seminorm on \(X\), and \(f\) is a linear functional on \(M\) such that \[
|f(x)| \leq p(x)
\] for all \(x \in M\). Then
\(f\) extends to a linear functional
\(\Lambda\) on \(X\) satisfying \[
|\Lambda (x)| \leq p(x)
\] for all \(x \in X\).

Proof. If the scalar field is \(\mathbb{R}\), then we are done, since \(p(-x)=p(x)\) in this case (can you see
why?). So we assume the scalar field is \(\mathbb{C}\).

Put \(u = \Re f\). By dominated
extension theorem, there is some real-linear functional \(U\) such that \(U(x)=u\) on \(M\) and \(U \leq
p\) on \(X\). And here we have
\[
\Lambda(x)=U(x)-iU(ix)
\] where \(\Lambda(x)=f(x)\) on
\(M\).

To show that \(|\Lambda(x)| \leq
p(x)\) for \(x \neq 0\), by
taking \(\alpha=\frac{|\Lambda(x)|}{\Lambda(x)}\),
we have \[
U(\alpha{x})=\Lambda(\alpha{x})=|\Lambda(x)|\leq p(\alpha x)=p(x)
\] since \(|\alpha|=1\) and
\(p(\alpha{x})=|\alpha|p(x)=p(x)\).
\(\square\)

Extending
Hahn-Banach theorem under linear transform

To end this post, we state a beautiful and useful extension of the
Hahn-Banach theorem, which is done by R. P. Agnew and A. P. Morse.

(Agnew-Morse) Let \(X\) denote a real vector space and \(\mathcal{A}\) be a collection of linear
maps \(A_\alpha: X \to X\) that
commute, or namely \[
A_\alpha A_\beta=A_\beta A_\alpha
\] for all \(A_\alpha,A_\beta \in
\mathcal{A}\). Let \(p\) be a
sublinear functional such that \[
p(A_\alpha{x})=p(x)
\] for all \(A_\alpha \in
\mathcal{A}\). Let \(Y\) be a
subspace of \(X\) on which a linear
functional \(f\) is defined such
that

\(f(y) \leq p(y)\) for all \(y \in Y\).

For each mapping \(A\) and \(y \in Y\), we have \(Ay \in Y\).

Under the hypothesis of 2, we have \(f(Ay)=f(y)\).

Then \(f\) can be extended to \(X\) by \(\Lambda\) so that \(-p(-x) \leq \Lambda(x) \leq p(x)\) for all
\(x \in X\), and \[
\Lambda(A_\alpha{x})=\Lambda{x}.
\]

To prove this theorem, we need to construct a sublinear functional
that dominates \(f\). For the whole
proof, see Functional Analysis by Peter Lax.

The series

Since there is no strong reason to write more posts on this topic,
i.e. the three fundamental theorems of linear functional analysis, I
think it's time to make a list of the series. It's been around half a
year.