# The Banach Algebra of Borel Measures on Euclidean Space

This blog post is intended to deliver a quick explanation of the algebra of Borel measures on $\mathbb{R}^n$. It will be broken into pieces. All complex-valued complex Borel measures $M(\mathbb{R}^n)$ clearly form a vector space over $\mathbb{C}$. The main goal of this post is to show that this is a Banach space and also a Banach algebra.

In fact, the $\mathbb{R}^n$ case can be generalised into any locally compact abelian group (see any abstract harmonic analysis books), this is because what really matters here is being locally compact and abelian. But at this moment we stick to Euclidean spaces. Note since $\mathbb{R}^n$ is $\sigma$-compact, all Borel measures are regular.

To read this post you need to be familiar with some basic properties of Banach algebra, complex Borel measures, and the most important, Fubini's theorem.

# Several ways to prove Hardy's inequality

Suppose $1 < p < \infty$ and $f \in L^p((0,\infty))$ (with respect to Lebesgue measure of course) is a nonnegative function, take $F(x) = \frac{1}{x}\int_0^x f(t)dt \quad 0 < x <\infty,$ we have Hardy's inequality $\def\lrVert[#1]{\lVert #1 \rVert}$ $\lrVert[F]_p \leq q\lrVert[f]_p$ where $\frac{1}{p}+\frac{1}{q}=1$ of course.

There are several ways to prove it. I think there are several good reasons to write them down thoroughly since that may be why you find this page. Maybe you are burnt out since it's left as exercise. You are assumed to have enough knowledge of Lebesgue measure and integration.

## Minkowski's integral inequality

Let $S_1,S_2 \subset \mathbb{R}$ be two measurable set, suppose $F:S_1 \times S_2 \to \mathbb{R}$ is measurable, then $\left[\int_{S_2} \left\vert\int_{S_1}F(x,y)dx \right\vert^pdy\right]^{\frac{1}{p}} \leq \int_{S_1} \left[\int_{S_2} |F(x,y)|^p dy\right]^{\frac{1}{p}}dx.$ A proof can be found at here by turning to Example A9. You may need to replace all measures with Lebesgue measure $m$.

Now let's get into it. For a measurable function in this place we should have $G(x,t)=\frac{f(t)}{x}$. If we put this function inside this inequality, we see \begin{aligned} \lrVert[F]_p &= \left[\int_0^\infty \left\vert \int_0^x \frac{f(t)}{x}dt \right\vert^p dx\right]^{\frac{1}{p}} \\ &= \left[\int_0^\infty \left\vert \int_0^1 f(ux)du \right\vert^p dx\right]^{\frac{1}{p}} \\ &\leq \int_0^1 \left[\int_0^\infty |f(ux)|^pdx\right]^{\frac{1}{p}}du \\ &= \int_0^1 \left[\int_0^\infty |f(ux)|^pudx\right]^{\frac{1}{p}}u^{-\frac{1}{p}}du \\ &= \lrVert[f]_p \int_0^1 u^{-\frac{1}{p}}du \\ &=q\lrVert[f]_p. \end{aligned} Note we have used change-of-variable twice and the inequality once.

## A constructive approach

I have no idea how people came up with this solution. Take $xF(x)=\int_0^x f(t)t^{u}t^{-u}dt$ where $0<u<1-\frac{1}{p}$. Hölder's inequality gives us \begin{aligned} xF(x) &= \int_0^x f(t)t^ut^{-u}dt \\ &\leq \left[\int_0^x t^{-uq}dt\right]^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}} \\ &=\left(\frac{1}{1-uq}x^{1-uq}\right)^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}} \end{aligned} Hence \begin{aligned} F(x)^p & \leq \frac{1}{x^p}\left\{\left(\frac{1}{1-uq}x^{1-uq}\right)^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}}\right\}^{p} \\ &= \left(\frac{1}{1-uq}\right)^{\frac{p}{q}}x^{\frac{p}{q}(1-uq)-p}\int_0^x f(t)^pt^{up}dt \\ &= \left(\frac{1}{1-uq}\right)^{p-1}x^{-up-1}\int_0^x f(t)^pt^{up}dt \end{aligned}

Note we have used the fact that $\frac{1}{p}+\frac{1}{q}=1 \implies p+q=pq$ and $\frac{p}{q}=p-1$. Fubini's theorem gives us the final answer: \begin{aligned} \int_0^\infty F(x)^pdx &\leq \int_0^\infty\left[\left(\frac{1}{1-uq}\right)^{p-1}x^{-up-1}\int_0^x f(t)^pt^{up}dt\right]dx \\ &=\left(\frac{1}{1-uq}\right)^{p-1}\int_0^\infty dx\int_0^x f(t)^pt^{up}x^{-up-1}dt \\ &=\left(\frac{1}{1-uq}\right)^{p-1}\int_0^\infty dt\int_t^\infty f(t)^pt^{up}x^{-up-1}dx \\ &=\left(\frac{1}{1-uq}\right)^{p-1}\frac{1}{up}\int_0^\infty f(t)^pdt. \end{aligned} It remains to find the minimum of $\varphi(u) = \left(\frac{1}{1-uq}\right)^{p-1}\frac{1}{up}$. This is an elementary calculus problem. By taking its derivative, we see when $u=\frac{1}{pq}<1-\frac{1}{p}$ it attains its minimum $\left(\frac{p}{p-1}\right)^p=q^p$. Hence we get $\int_0^\infty F(x)^pdx \leq q^p\int_0^\infty f(t)^pdt,$ which is exactly what we want. Note the constant $q$ cannot be replaced with a smaller one. We simply proved the case when $f \geq 0$. For the general case, one simply needs to take absolute value.

## Integration by parts

This approach makes use of properties of $L^p$ space. Still we assume that $f \geq 0$ but we also assume $f \in C_c((0,\infty))$, that is, $f$ is continuous and has compact support. Hence $F$ is differentiable in this situation. Integration by parts gives $\int_0^\infty F^p(x)dx=xF(x)^p\vert_0^\infty- p\int_0^\infty xdF^p = -p\int_0^\infty xF^{p-1}(x)F'(x)dx.$ Note since $f$ has compact support, there are some $[a,b]$ such that $f >0$ only if $0 < a \leq x \leq b < \infty$ and hence $xF(x)^p\vert_0^\infty=0$. Next it is natural to take a look at $F'(x)$. Note we have $F'(x) = \frac{f(x)}{x}-\frac{\int_0^x f(t)dt}{x^2},$ hence $xF'(x)=f(x)-F(x)$. A substitution gives us $\int_0^\infty F^p(x)dx = -p\int_0^\infty F^{p-1}(x)[f(x)-F(x)]dx,$ which is equivalent to say $\int_0^\infty F^p(x)dx = \frac{p}{p-1}\int_0^\infty F^{p-1}(x)f(x)dx.$ Hölder's inequality gives us \begin{aligned} \int_0^\infty F^{p-1}(x)f(x)dx &\leq \left[\int_0^\infty F^{(p-1)q}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty f(x)^pdx\right]^{\frac{1}{p}} \\ &=\left[\int_0^\infty F^{p}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty f(x)^pdx\right]^{\frac{1}{p}}. \end{aligned} Together with the identity above we get $\int_0^\infty F^p(x)dx = q\left[\int_0^\infty F^{p}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty f(x)^pdx\right]^{\frac{1}{p}}$ which is exactly what we want since $1-\frac{1}{q}=\frac{1}{p}$ and all we need to do is divide $\left[\int_0^\infty F^pdx\right]^{1/q}$ on both sides. So what's next? Note $C_c((0,\infty))$ is dense in $L^p((0,\infty))$. For any $f \in L^p((0,\infty))$, we can take a sequence of functions $f_n \in C_c((0,\infty))$ such that $f_n \to f$ with respect to $L^p$-norm. Taking $F=\frac{1}{x}\int_0^x f(t)dt$ and $F_n = \frac{1}{x}\int_0^x f_n(t)dt$, we need to show that $F_n \to F$ pointwise, so that we can use Fatou's lemma. For $\varepsilon>0$, there exists some $m$ such that $\lrVert[f_n-f]_p < \frac{1}{n}$. Thus \begin{aligned} |F_n(x)-F(x)| &= \frac{1}{x}\left\vert \int_0^x f_n(t)dt - \int_0^x f(t)dt \right\vert \\ &\leq \frac{1}{x} \int_0^x |f_n(t)-f(t)|dt \\ &\leq \frac{1}{x} \left[\int_0^x|f_n(t)-f(t)|^pdt\right]^{\frac{1}{p}}\left[\int_0^x 1^qdt\right]^{\frac{1}{q}} \\ &=\frac{1}{x^{1/p}}\left[\int_0^x|f_n(t)-f(t)|^pdt\right]^{\frac{1}{p}} \\ &\leq \frac{1}{x^{1/p}}\lrVert[f_n-f]_p <\frac{\varepsilon}{x^{1/p}}. \end{aligned} Hence $F_n \to F$ pointwise, which also implies that $|F_n|^p \to |F|^p$ pointwise. For $|F_n|$ we have \begin{aligned} \int_0^\infty |F_n(x)|^pdx &= \int_0^\infty \left\vert\frac{1}{x}\int_0^x f_n(t)dt\right\vert^p dx \\ &\leq \int_0^\infty \left[\frac{1}{x}\int_0^x |f_n(t)|dt\right]^{p}dx \\ &\leq q\int_0^\infty |f_n(t)|^pdt \end{aligned} note the third inequality follows since we have already proved it for $f \geq 0$. By Fatou's lemma, we have \begin{aligned} \int_0^\infty |F(x)|^pdx &= \int_0^\infty \lim_{n \to \infty}|F_n(x)|^pdx \\ &\leq \lim_{n \to \infty} \int_0^\infty |F_n(x)|^pdx \\ &\leq \lim_{n \to \infty}q^p\int_0^\infty |f_n(x)|^pdx \\ &=q^p\int_0^\infty |f(x)|^pdx. \end{aligned}

# A Continuous Function Sending L^p Functions to L^1

Throughout, let $(X,\mathfrak{M},\mu)$ be a measure space where $\mu$ is positive.

## The question

If $f$ is of $L^p(\mu)$, which means $\lVert f \rVert_p=\left(\int_X |f|^p d\mu\right)^{1/p}<\infty$, or equivalently $\int_X |f|^p d\mu<\infty$, then we may say $|f|^p$ is of $L^1(\mu)$. In other words, we have a function \begin{aligned} \lambda: L^p(\mu) &\to L^1(\mu) \\ f &\mapsto |f|^p. \end{aligned} This function does not have to be one to one due to absolute value. But we hope this function to be fine enough, at the very least, we hope it is continuous.

Here, $f \sim g$ means that $f-g$ equals $0$ almost everywhere with respect to $\mu$. It can be easily verified that this is an equivalence relation.

## Continuity

We still use the $\varepsilon-\delta$ argument but it's in a metric space. Suppose $(X,d_1)$ and $(Y,d_2)$ are two metric spaces and $f:X \to Y$ is a function. We say $f$ is continuous at $x_0 \in X$ if, for any $\varepsilon>0$, there exists some $\delta>0$ such that $d_2(f(x_0),f(x))<\varepsilon$ whenever $d_1(x_0,x)<\delta$. Further, we say $f$ is continuous on $X$ if $f$ is continuous at every point $x \in X$.

## Metrics

For $1\leq p<\infty$, we already have a metric by $d(f,g)=\lVert f-g \rVert_p$ given that $d(f,g)=0$ if and only if $f \sim g$. This is complete and makes $L^p$ a Banach space. But for $0<p<1$ (yes we are going to cover that), things are much more different, and there is one reason: Minkowski inequality holds reversely! In fact, we have $\lVert f+g \rVert_p \geq \lVert f \rVert_p + \lVert g \rVert_p$ for $0<p<1$. $L^p$ space has too many weird things when $0<p<1$. Precisely,

For $0<p<1$, $L^p(\mu)$ is locally convex if and only if $\mu$ assumes finitely many values. (Proof.)

On the other hand, for example, $X=[0,1]$ and $\mu=m$ be the Lebesgue measure, then $L^p(\mu)$ has no open convex subset other than $\varnothing$ and $L^p(\mu)$ itself. However,

A topological vector space $X$ is normable if and only if its origin has a convex bounded neighbourhood. (See Kolmogorov's normability criterion.)

Therefore $L^p(m)$ is not normable, hence not Banach.

We have gone too far. We need a metric that is fine enough.

### Metric of $L^p$ when $0<p<1$

Define $\Delta(f)=\int_X |f|^p d\mu$ for $f \in L^p(\mu)$. We will show that we have a metric by $d(f,g)=\Delta(f-g).$ Fix $y\geq 0$, consider the function $f(x)=(x+y)^p-x^p.$ We have $f(0)=y^p$ and $f'(x)=p(x+y)^{p-1}-px^{p-1} \leq px^{p-1}-px^{p-1}=0$ when $x > 0$ and hence $f(x)$ is nonincreasing on $[0,\infty)$, which implies that $(x+y)^p \leq x^p+y^p.$ Hence for any $f$, $g \in L^p$, we have $\Delta(f+g)=\int_X |f+g|^p d\mu \leq \int_X |f|^p d\mu + \int_X |g|^p d\mu=\Delta(f)+\Delta(g).$ This inequality ensures that $d(f,g)=\Delta(f-g)$ is a metric. It's immediate that $d(f,g)=d(g,f) \geq 0$ for all $f$, $g \in L^p(\mu)$. For the triangle inequality, note that $d(f,h)+d(g,h)=\Delta(f-h)+\Delta(h-g) \geq \Delta((f-h)+(h-g))=\Delta(f-g)=d(f,g).$ This is translate-invariant as well since $d(f+h,g+h)=\Delta(f+h-g-h)=\Delta(f-g)=d(f,g)$ The completeness can be verified in the same way as the case when $p>1$. In fact, this metric makes $L^p$ a locally bounded F-space.

## The continuity of $\lambda$

The metric of $L^1$ is defined by $d_1(f,g)=\lVert f-g \rVert_1=\int_X |f-g|d\mu.$ We need to find a relation between $d_p(f,g)$ and $d_1(\lambda(f),\lambda(g))$, where $d_p$ is the metric of the corresponding $L^p$ space.

### $0<p<1$

As we have proved, $(x+y)^p \leq x^p+y^p.$ Without loss of generality we assume $x \geq y$ and therefore $x^p=(x-y+y)^p \leq (x-y)^p+y^p.$ Hence $x^p-y^p \leq (x-y)^p.$ By interchanging $x$ and $y$, we get $|x^p-y^p| \leq |x-y|^p.$ Replacing $x$ and $y$ with $|f|$ and $|g|$ where $f$, $g \in L^p$, we get $\int_{X}\lvert |f|^p-|g|^p \rvert d\mu \leq \int_X |f-g|^p d\mu.$ But $d_1(\lambda(f),\lambda(g))=\int_{X}\lvert |f|^p-|g|^p \rvert d\mu \\ d_p(f,g)=\Delta(f-g)= d\mu \leq \int_X |f-g|^p d\mu$ and we therefore have $d_1(\lambda(f),\lambda(g)) \leq d_p(f,g).$ Hence $\lambda$ is continuous (and in fact, Lipschitz continuous and uniformly continuous) when $0<p<1$.

## $1 \leq p < \infty$

It's natural to think about Minkowski's inequality and Hölder's inequality in this case since they are critical inequality enablers. You need to think about some examples of how to create the condition to use them and get a fine result. In this section we need to prove that $|x^p-y^p| \leq p|x-y|(x^{p-1}+y^{p-1}).$ This inequality is surprisingly easy to prove however. We will use nothing but the mean value theorem. Without loss of generality we assume that $x > y \geq 0$ and define $f(t)=t^p$. Then $\frac{f(x)-f(y)}{x-y}=f'(\zeta)=p\zeta^{p-1}$ where $y < \zeta < x$. But since $p-1 \geq 0$, we see $\zeta^{p-1} < x^{p-1} <x^{p-1}+y^{p-1}$. Therefore $f(x)-f(y)=x^p-y^p=p(x-y)\zeta^{p-1}<p(x-y)(x^{p-1}-y^{p-1}).$ For $x=y$ the equality holds.

Therefore \begin{aligned} d_1(\lambda(f),\lambda(g)) &= \int_X \left||f|^p-|g|^p\right|d\mu \\ &\leq \int_Xp\left||f|-|g|\right|(|f|^{p-1}+|g|^{p-1})d\mu \end{aligned} By Hölder's inequality, we have \begin{aligned} \int_X ||f|-|g||(|f|^{p-1}+|g|^{p-1})d\mu & \leq \left[\int_X \left||f|-|g|\right|^pd\mu\right]^{1/p}\left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q} \\ &\leq \left[\int_X \left|f-g\right|^pd\mu\right]^{1/p}\left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q} \\ &=\lVert f-g \rVert_p \left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q}. \end{aligned} By Minkowski's inequality, we have $\left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q} \leq \left[\int_X|f|^{(p-1)q}d\mu\right]^{1/q}+\left[\int_X |g|^{(p-1)q}d\mu\right]^{1/q}$ Now things are clear. Since $1/p+1/q=1$, or equivalently $1/q=(p-1)/p$, suppose $\lVert f \rVert_p$, $\lVert g \rVert_p \leq R$, then $(p-1)q=p$ and therefore $\left[\int_X|f|^{(p-1)q}d\mu\right]^{1/q}+\left[\int_X |g|^{(p-1)q}d\mu\right]^{1/q} = \lVert f \rVert_p^{p-1}+\lVert g \rVert_p^{p-1} \leq 2R^{p-1}.$ Summing the inequalities above, we get \begin{aligned} d_1(\lambda(f),\lambda(g)) \leq 2pR^{p-1}\lVert f-g \rVert_p =2pR^{p-1}d_p(f,g) \end{aligned} hence $\lambda$ is continuous.

## Conclusion and further

We have proved that $\lambda$ is continuous, and when $0<p<1$, we have seen that $\lambda$ is Lipschitz continuous. It's natural to think about its differentiability afterwards, but the absolute value function is not even differentiable so we may have no chance. But this is still a fine enough result. For example we have no restriction to $(X,\mathfrak{M},\mu)$ other than the positivity of $\mu$. Therefore we may take $\mathbb{R}^n$ as the Lebesgue measure space here, or we can take something else.

It's also interesting how we use elementary Calculus to solve some much more abstract problems.

# The Riesz-Markov-Kakutani Representation Theorem

## This post

Is intended to establish the existence of the Lebesgue measure in the future, which is often denoted by $m$. In fact, the Lebesgue measure follows as a special case of R-M-K representation theorem. You may not believe it, but euclidean properties of $\mathbb{R}^k$ plays no role in the existence of $m$. The only topological property that works is the fact that $\mathbb{R}^k$ is a locally compact Hausdorff space.

The theorem is named after F. Riesz who introduced it for continuous functions on $[0,1]$ (with respect to Riemann-Steiltjes integral). Years later, after the generalization done by A. Markov and S. Kakutani, we are able to view it on a locally compact Hausdorff space.

You may find there are some over-generalized properties, but this is intended to have you being able to enjoy more alongside (there are some tools related to differential geometry). Also there are many topology and analysis tricks worth your attention.

## Tools

### Different kinds of topological spaces

Again, euclidean topology plays no role in this proof. We need to specify the topology for different reasons. This is similar to what we do in linear functional analysis. Throughout, let $X$ be a topological space.

0.0 Definition. $X$ is a Hausdorff space if the following is true: If $p \in X$, $q\in X$ but $p \neq q$, then there are two disjoint open sets $U$ and $V$ such that $p \in U$ and $q \in V$.

0.1 Definition. $X$ is locally compact if every point of $X$ has a neighborhood whose closure is compact.

0.2 Remarks. A Hausdorff space is also called a $T_2$ space (see Kolmogorov classification) or a separated space. There is a classic example of locally compact Hausdorff space: $\mathbb{R}^n$. It is trivial to verify this. But this is far from being enough. In the future we will see, we can construct some ridiculous but mathematically valid measures.

0.3 Definition. A set $E \subset X$ is called $\sigma$-compact if $E$ is a countable union of compact sets. Note that every open subset in a euclidean space $\mathbb{R}^n$ is $\sigma$-compact since it can always be a countable union of closed balls (which is compact).

0.4 Definition. A covering of $X$ is locally finite if every point has a neighborhood which intersects only finitely many elements of the covering. Of course, if the covering is already finite, it's also locally finite.

0.5 Definition. A refinement of a covering of $X$ is a second covering, each element of which is contained in an element of the first covering.

0.6 Definition. $X$ is paracompact if it is Hausdorff, and every open covering has a locally finite open refinement. Obviously any compact space is paracompact.

0.7 Theorem. If $X$ is a second countable Hausdorff space and is locally compact, then $X$ is paracompact. For proof, see this [Theorem 2.6]. One uses this to prove that a differentiable manifold admits a partition of unity.

0.8 Theorem. If $X$ is locally compact and sigma compact, then $X=\bigcup_{i=1}^{\infty}K_i$ where for all $i \in \mathbb{N}$, $K_i$ is compact and $K_i \subset\operatorname{int}K_{i+1}$.

### Partition of unity

The basic technical tool in the theory of differential manifolds is the existence of a partition of unity. We will steal this tool for the application of analysis theory.

1.0 Definition. A partition of unity on $X$ is a collection $(g_i)$ of continuous real valued functions on $X$ such that

1. $g_i \geq 0$ for each $i$.
2. every $x \in X$ has a neighborhood $U$ such that $U \cap \operatorname{supp}(g_i)=\varnothing$ for all but finitely many of $g_i$.
3. for each $x \in X$, we have $\sum_{i}g_i(x)=1$. (That's why you see the word 'unity'.)

One should be reminded that, partition of unity is frequently used in many other fields. For example, in differential geometry, one uses it to find Riemannian structure on a smooth manifold. In generalised function theory, one uses it to find the connection between local property and global property as well.

1.1 Definition. A partition of unity $(g_i)$ on $X$ is subordinate to an open cover of $X$ if and only if for each $g_i$ there is an element $U$ of the cover such that $\operatorname{supp}(g_i) \subset U$. We say $X$ admits partitions of unity if and only if for every open cover of $X$, there exists a partition of unity subordinate to the cover.

1.2 Theorem. A Hausdorff space admits a partition of unity if and only if it is paracompact (the 'only if' part is by considering the definition of partition of unity. For the 'if' part, see here). As a corollary, we have:

1.3 Corollary. Suppose $V_1,\cdots,V_n$ are open subsets of a locally compact Hausdorff space $X$, $K$ is compact, and $K \subset \bigcup_{k=1}^{n}V_k.$ Then there exists a partition of unity $(h_i)$ that is subordinate to the cover $(V_n)$ such that $\operatorname{supp}(h_i) \subset V_i$ and $\sum_{i=1}^{n}h_i=1$ for all $x \in K$.

### Urysohn's lemma (for locally compact Hausdorff spaces)

2.0 Notation. The notation $K \prec f$ will mean that $K$ is a compact subset of $X$, that $f \in C_c(X)$, that $f(X) \subset [0,1]$, and that $f(x)=1$ for all $x \in K$. The notation $f \prec V$ will mean that $V$ is open, that $f \in C_c(X)$, that $f(X) \subset [0,1]$ and that $\operatorname{supp}(f) \subset V$. If both hold, we write $K \prec f \prec V.$ 2.1 Remarks. Clearly, with this notation, we are able to simplify the statement of being subordinate. We merely need to write $g_i \prec U$ in 1.1 instead of $\operatorname{supp}(g_i) \subset U$.

2.2 Urysohn's Lemma for locally compact Hausdorff space. Suppose $X$ is locally compact and Hausdorff, $V$ is open in $X$ and $K \subset V$ is a compact set. Then there exists an $f \in C_c(X)$ such that $K \prec f \prec V.$ 2.3 Remarks. By $f \in C_c(X)$ we shall mean $f$ is a continuous function with a compact support. This relation also says that $\chi_K \leq f \leq \chi_V$. For more details and the proof, visit this page. This lemma is generally for normal space, for a proof on that level, see arXiv:1910.10381. (Question: why we consider two disjoint closed subsets thereafter?)

### The $\varepsilon$-definitions of $\sup$ and $\inf$

We will be using the $\varepsilon$-definitions of $\sup$ and $\inf$, which will makes the proof easier in this case, but if you don't know it would be troublesome. So we need to put it down here.

Let $S$ be a nonempty subset of the real numbers that is bounded below. The lower bound $w$ is to be the infimum of $S$ if and only if for any $\varepsilon>0$, there exists an element $x_\varepsilon \in S$ such that $x_\varepsilon<w+\varepsilon$.

This definition of $\inf$ is equivalent to the if-then definition by

Let $S$ be a set that is bounded below. We say $w=\inf S$ when $w$ satisfies the following condition.

1. $w$ is a lower bound of $S$.
2. If $t$ is also a lower bound of $S$, then $t \leq s$.

We have the analogous definition for $\sup$.

## The main theorem

Analysis is full of vector spaces and linear transformations. We already know that the Lebesgue integral induces a linear functional. That is, for example, $L^1([0,1])$ is a vector space, and we have a linear functional by $f \mapsto \int_0^1 f(x)dx.$ But what about the reverse? Given a linear functional, is it guaranteed that we have a measure to establish the integral? The R-M-K theorem answers this question affirmatively. The functional to be discussed is positive, which means that if $\Lambda$ is positive and $f(X) \subset [0,\infty)$, then $\Lambda{f} \in [0,\infty)$.

Let $X$ be a locally compact Hausdorff space, and let $\Lambda$ be a positive linear functional on $C_c(X)$. Then there exists a $\sigma$-algebra $\mathfrak{M}$ on $X$ which contains all Borel sets in $X$, and there exists a unique positive measure $\mu$ on $\mathfrak{M}$ which represents $\Lambda$ in the sense that $\Lambda{f}=\int_X fd\mu$ for all $f \in C_c(X)$.

For the measure $\mu$ and the $\sigma$-algebra $\mathfrak{M}$, we have four assertions:

1. $\mu(K)<\infty$ for every compact set $K \subset X$.
2. For every $E \in \mathfrak{M}$, we have

$\mu(E)=\{\mu(V):E \subset V, V\text{ open}\}.$

1. For every open set $E$ and every $E \in \mathfrak{M}$, we have

$\mu(E)=\sup\{\mu(K):K \subset E, K\text{ compact}\}.$

1. If $E \in \mathfrak{M}$, $A \subset E$, and $\mu(E)=0$, then $A \in \mathfrak{M}$.

Remarks before proof. It would be great if we can establish the Lebesgue measure $m$ by putting $X=\mathbb{R}^n$. But we need a little more extra work to get this result naturally. If 2 is satisfied, we say $\mu$ is outer regular, and inner regular for 3. If both hold, we say $\mu$ is regular. The partition of unity and Urysohn's lemma will be heavily used in the proof of the main theorem, so make sure you have no problem with it. It can also be extended to complex space, but that requires much non-trivial work.

### Proving the theorem

The proof is rather long so we will split it into several steps. I will try my best to make every line clear enough.

#### Step 0 - Construction of $\mu$ and $\mathfrak{M}$

For every open set $V \in X$, define $\mu(V)=\sup\{\Lambda{f}:f \prec V\}.$

If $V_1 \subset V_2$ and both are open, we claim that $\mu(V_1) \leq \mu(V_2)$. For $f \prec V_1$, since $\operatorname{supp}f \subset V_1 \subset V_2$, we see $f \prec V_2$. But we are able to find some $g \prec V_2$ such that $g \geq f$, or more precisely, $\operatorname{supp}(g) \supset \operatorname{supp}(f)$. By taking another look at the proof of Urysohn's lemma for locally compact Hausdorff space, we see there is an open set G with compact closure such that $\operatorname{supp}(f) \subset G \subset \overline{G} \subset V_2.$ By Urysohn's lemma to the pair $(\overline{G},V_2)$, we see there exists a function $g \in C_c(X)$ such that $\overline{G} \prec g \prec V_2.$ Therefore $\operatorname{supp}(f) \subset \overline{G} \subset \operatorname{supp}(g).$ Thus for any $f \prec V_1$ and $g \prec V_2$, we have $\Lambda{g} \geq \Lambda{f}$ (monotonic) since $\Lambda{g}-\Lambda{f}=\Lambda{(g-f)}\geq 0$. By taking the supremum over $f$ and $g$, we see $\mu(V_1) \leq \mu(V_2).$ The 'monotonic' property of such $\mu$ enables us to define $\mu(E)$ for all $E \subset X$ by $\mu(E)=\inf \{\mu(V):E \subset V, V\text{ open}\}.$ The definition above is trivial to valid for open sets. Sometimes people say $\mu$ is the outer measure. We will discuss other kind of sets thoroughly in the following steps. Warning: we are not saying that $\mathfrak{M} = 2^X$. The crucial property of $\mu$, namely countable additivity, will be proved only on a certain $\sigma$-algebra.

It follows from the definition of $\mu$ that if $E_1 \subset E_2$, then $\mu(E_1) \leq \mu(E_2)$.

Let $\mathfrak{M}_F$ be the class of all $E \subset X$ which satisfy the two following conditions:

1. $\mu(E) <\infty$.

2. 'Inner regular': $\mu(E)=\sup\{\mu(K):K \subset E, K\text{ compact}\}.$

One may say here $\mu$ is the 'inner measure'. Finally, let $\mathfrak{M}$ be the class of all $E \subset X$ such that for every compact $K$, we have $E \cap K \in \mathfrak{M}_F$. We shall show that $\mathfrak{M}$ is the desired $\sigma$-algebra.

Remarks of Step 0. So far, we have only proved that $\mu(E) \geq 0$ for all $E {\color\red{\subset}}X$. What about the countable additivity? It's clear that $\mathfrak{M}_F$ and $\mathfrak{M}$ has some strong relation. We need to get a clearer view of it. Also, if we restrict $\mu$ to $\mathfrak{M}_F$, we restrict ourself to finite numbers. In fact, we will show finally $\mathfrak{M}_F \subset \mathfrak{M}$.

#### Step 1 - The 'measure' of compact sets (outer)

If $K$ is compact, then $K \in \mathfrak{M}_F$, and $\mu(K)=\inf\{\Lambda{f}:K \prec f\}<\infty$

Define $V_\alpha=f^{-1}(\alpha,1]$ for $K \prec f$ and $0 < \alpha < 1$. Since $f(x)=1$ for all $x \in K$, we have $K \subset V_{\alpha}$. Therefore by definition of $\mu$ for all $E \subset X$, we have $\mu(K) \leq \mu(V_\alpha)=\sup\{\Lambda{g}:g \prec V_{\alpha}\} < \frac{1}{\alpha}\Lambda{f}.$ Note that $f \geq \alpha{g}$ whenever $g \prec V_{\alpha}$ since $\alpha{g} \leq \alpha < f$. Since $\mu(K)$ is an lower bound of $\frac{1}{\alpha}\Lambda{f}$ with $0<\alpha<1$, we see $\mu(K) \leq \inf_{\alpha \in (0,1)}\{\frac{1}{\alpha}\Lambda{f}\}=\Lambda{f}.$ Since $f(X) \in [0,1]$, we have $\Lambda{f}$ to be finite. Namely $\mu(K) <\infty$. Since $K$ itself is compact, we see $K \in \mathfrak{M}_F$.

To prove the identity, note that there exists some $V \supset K$ such that $\mu(V)<\mu(K)+\varepsilon$ for some $\varepsilon>0$. By Urysohn's lemma, there exists some $h \in C_c(X)$ such that $K \prec h \prec V$. Therefore $\Lambda{h} \leq \mu(V) < \mu(K)+\varepsilon$ Therefore $\mu(K)$ is the infimum of $\Lambda{h}$ with $K \prec h$.

Remarks of Step 1. We have just proved assertion 1 of the property of $\mu$. The hardest part of this proof is the inequality $\mu(V)<\mu(K)+\varepsilon.$ But this is merely the $\varepsilon$-definition of $\inf$. Note that $\mu(K)$ is the infimum of $\mu(V)$ with $V \supset K$. For any $\varepsilon>0$, there exists some open $V$ for what? Under certain conditions, this definition is much easier to use. Now we will examine the relation between $\mathfrak{M}_F$ and $\tau_X$, namely the topology of $X$.

#### Step 2 - The 'measure' of open sets (inner)

$\mathfrak{M}_F$ contains every open set $V$ with $\mu(V)<\infty$.

It suffices to show that for open set $V$, we have $\mu(V)=\sup\{\mu(K):K \subset E, K\text{ compact}\}.$ For $0<\varepsilon<\mu(V)$, we see there exists an $f \prec V$ such that $\Lambda{f}>\mu(V)-\varepsilon$. If $W$ is any open set which contains $K= \operatorname{supp}(f)$, then $f \prec W$, and therefore $\Lambda{f} \leq \mu(W)$. Again by definition of $\mu(K)$, we see $\Lambda{f}\leq\mu(K).$ Therefore $\mu(V)-\varepsilon<\Lambda{f}\leq\mu(K)\leq\mu(V).$ This is exactly the definition of $\sup$. The identity is proved.

Remarks of Step 2. It's important to that this identity can only be satisfied by open sets and sets $E$ with $\mu(E)<\infty$, the latter of which will be proved in the following steps. This is the flaw of this theorem. With these preparations however, we are able to show the countable additivity of $\mu$ on $\mathfrak{M}_F$.

#### Step 3 - The subadditivity of $\mu$ on $2^X$

If $E_1,E_2,E_3,\cdots$ are arbitrary subsets of $X$, then $\mu\left(\bigcup_{k=1}^{\infty}E_k\right) \leq \sum_{k=1}^{\infty}\mu(E_k)$

First we show this holds for finitely many open sets. This is tantamount to show that $\mu(V_1 \cup V_2)\leq \mu(V_1)+\mu(V_2)$ if $V_1$ and $V_2$ are open. Pick $g \prec V_1 \cup V_2$. This is possible due to Urysohn's lemma. By corollary 1.3, there is a partition of unity $(h_1,h_2)$ subordinate to $(V_1,V_2)$ in the sense of corollary 1.3. Therefore, \begin{aligned} \Lambda(g)&=\Lambda((h_1+h_2)g) \\ &=\Lambda(h_1g)+\Lambda(h_2g) \\ &\leq\mu(V_1)+\mu(V_2). \end{aligned} Notice that $h_1g \prec V_1$ and $h_2g \prec V_2$. By taking the supremum, we have $\mu(V_1 \cup V_2)\leq \mu(V_1)+\mu(V_2).$

Now we back to arbitrary subsets of $X$. If $\mu(E_i)=\infty$ for some $i$, then there is nothing to prove. Therefore we shall assume that $\mu(E_i)<\infty$ for all $i$. By definition of $\mu(E_i)$, we see there are open sets $V_i \supset E_i$ such that $\mu(V_i)<\mu(E_i)+\frac{\varepsilon}{2^i}.$ Put $V=\bigcup_{i=1}^{\infty}V_i$, and choose $f \prec V_i$. Since $f \in C_c(X)$, there is a finite collection of $V_i$ that covers the support of $f$. Therefore without loss of generality, we may say that $f \prec V_1 \cup V_2 \cup \cdots \cup V_n$ for some $n$. We therefore obtain \begin{aligned} \Lambda{f} &\leq \mu(V_1 \cup V_2 \cup \cdots \cup V_n) \\ &\leq \mu(V_1)+\mu(V_2)+\cdots+\mu(V_n) \\ &\leq \sum_{i=1}^{n}\left(\mu(E_i)+\frac{\varepsilon}{2^i}\right) \\ &\leq \sum_{i=1}^{\infty}\mu(E_i)+\varepsilon, \end{aligned} for all $f \prec V$. Since $\bigcup E_i \subset V$, we have $\mu(\bigcup E_i) \leq \mu(V)$. Therefore $\mu(\bigcup_{i=1}^{\infty}E_i)\leq\mu(V)=\sup\{\Lambda{f}\}\leq\sum_{i=1}^{\infty}\mu(E_i)+\varepsilon.$ Since $\varepsilon$ is arbitrary, the inequality is proved.

Remarks of Step 3. Again, we are using the $\varepsilon$-definition of $\inf$. One may say this step showed the subaddtivity of the outer measure. Also note the geometric series by $\sum_{k=1}^{\infty}\frac{\varepsilon}{2^k}=\varepsilon$.

#### Step 4 - Additivity of $\mu$ on $\mathfrak{M}_F$

Suppose $E=\bigcup_{i=1}^{\infty}E_i$, where $E_1,E_2,\cdots$ are pairwise disjoint members of $\mathfrak{M}_F$, then $\mu(E)=\sum_{i=1}^{\infty}\mu(E_i).$ If $\mu(E)<\infty$, we also have $E \in \mathfrak{M}_F$.

As a dual to Step 3, we firstly show this holds for finitely many compact sets. As proved in Step 1, compact sets are in $\mathfrak{M}_F$. Suppose now $K_1$ and $K_2$ are disjoint compact sets. We want to show that $\mu(K_1 \cup K_2)=\mu(K_1)+\mu(K_2).$ Note that compact sets in a Hausdorff space is closed. Therefore we are able to apply Urysohn's lemma to the pair $(K_1,K_2^c)$. That said, there exists a $f \in C_c(X)$ such that $K_1 \prec f \prec K_2^c.$ In other words, $f(x)=1$ for all $x \in K_1$ and $f(x)=0$ for all $x \in K_2$, since $\operatorname{supp}(f) \cap K_2 = \varnothing$. By Step 1, since $K_1 \cup K_2$ is compact, there exists some $g \in C_c(X)$ such that $K_1 \cup K_2 \prec g \quad \text{and} \quad \Lambda(g) < \mu(K_1 \cup K_2)+\varepsilon.$ Now things become tricky. We are able to write $g$ by $g=fg+(1-f)g.$ But $K_1 \prec fg$ and $K_2 \prec (1-f)g$ by the properties of $f$ and $g$. Also since $\Lambda$ is linear, we have $\mu(K_1)+\mu(K_2) \leq \Lambda(fg)+\Lambda((1-f)g)=\Lambda(g) < \mu(K_1 \cup K_2)+\varepsilon.$ Therefore we have $\mu(K_1)+\mu(K_2) \leq \mu(K_1 \cup K_2).$ On the other hand, by Step 3, we have $\mu(K_1 \cup K_2) \leq \mu(K_1)+\mu(K_2).$ Therefore they must equal.

If $\mu(E)=\infty$, there is nothing to prove. So now we should assume that $\mu(E)<\infty$. Since $E_i \in \mathfrak{M}_F$, there are compact sets $K_i \subset E_i$ with $\mu(K_i) > \mu(E_i)-\frac{\varepsilon}{2^i}.$ Putting $H_n=K_1 \cup K_2 \cup \cdots \cup K_n$, we see $E \supset H_n$ and $\mu(E) \geq \mu(H_n)=\sum_{i=1}^{n}\mu(H_i)>\sum_{i=1}^{n}\mu(E_i)-\varepsilon.$ This inequality holds for all $n$ and $\varepsilon$, therefore $\mu(E) \geq \sum_{i=1}^{\infty}\mu(E_i).$ Therefore by Step 3, the identity holds.

Finally we shall show that $E \in \mathfrak{M}_F$ if $\mu(E) <\infty$. To make it more understandable, we will use elementary calculus notation. If we write $\mu(E)=x$ and $x_n=\sum_{i=1}^{n}\mu(E_i)$, we see $\lim_{n \to \infty}x_n=x.$ Therefore, for any $\varepsilon>0$, there exists some $N \in \mathbb{N}$ such that $x-x_N<\varepsilon.$ This is tantamount to $\mu(E)<\sum_{i=1}^{N}\mu(E_i)+\varepsilon.$ But by definition of the compact set $H_N$ above, we see $\mu(E)<{\color\red{\sum_{i=1}^{N}\mu(E_i)}}+\varepsilon<{\color\red {\mu(H_N)+\varepsilon}}+\varepsilon=\mu(H_N)+2\varepsilon.$ Hence $E$ satisfies the requirements of $\mathfrak{M}_F$, thus an element of it.

Remarks of Step 4. You should realize that we are heavily using the $\varepsilon$-definition of $\sup$ and $\inf$. As you may guess, $\mathfrak{M}_F$ should be a subset of $\mathfrak{M}$ though we don't know whether it is a $\sigma$-algebra or not. In other words, we hope that the countable additivity of $\mu$ holds on a $\sigma$-algebra that is properly extended from $\mathfrak{M}_F$. However it's still difficult to show that $\mathfrak{M}$ is a $\sigma$-algebra. We need more properties of $\mathfrak{M}_F$ to go on.

#### Step 5 - The 'continuity' of $\mathfrak{M}_F$.

If $E \in \mathfrak{M}_F$ and $\varepsilon>0$, there is a compact $K$ and an open $V$ such that $K \subset E \subset V$ and $\mu(V-K)<\varepsilon$.

There are two ways to write $\mu(E)$, namely $\mu(E)=\sup\{\mu(K):K \subset E\} \quad \text{and} \quad \mu(E)=\inf\{\mu(V):V\supset E\}$ where $K$ is compact and $V$ is open. Therefore there exists some $K$ and $V$ such that $\mu(V)-\frac{\varepsilon}{2}<\mu(E)<\mu(K)+\frac{\varepsilon}{2}.$ Since $V-K$ is open, and $\mu(V-K)<\infty$, we have $V-K \in \mathfrak{M}_F$. By Step 4, we have $\mu(K)+\mu(V-K)=\mu(V) <\mu(K)+\varepsilon.$ Therefore $\mu(V-K)<\varepsilon$ as proved.

Remarks of Step 5. You should be familiar with the $\varepsilon$-definitions of $\sup$ and $\inf$ now. Since $V-K =V\cap K^c \subset V$, we have $\mu(V-K)\leq\mu(V)<\mu(E)+\frac{\varepsilon}{2}<\infty$.

#### Step 6 - $\mathfrak{M}_F$ is closed under certain operations

If $A,B \in \mathfrak{M}_F$, then $A-B,A\cup B$ and $A \cap B$ are elements of $\mathfrak{M}_F$.

This shows that $\mathfrak{M}_F$ is closed under union, intersection and relative complement. In fact, we merely need to prove $A-B \in \mathfrak{M}_F$, since $A \cup B=(A-B) \cup B$ and $A\cap B = A-(A-B)$.

By Step 5, for $\varepsilon>0$, there are sets $K_A$, $K_B$, $V_A$, $V_B$ such that $K_A \subset A \subset V_A$, $K_B \subset B \subset V_B$, and for $A-B$ we have $A-B \subset V_A-K_B \subset (V_A-K_A) \cup (K_A-V_B) \cup (V_B-K_B).$ With an application of Step 3 and 5, we have $\mu(A-B) \leq \mu(V_A-K_A)+\mu(K_A-V_B)+\mu(V_B-K_B)< \varepsilon+\mu(K_A-V_B)+\varepsilon.$ Since $K_A-V_B$ is a closed subset of $K_A$, we see $K_A-V_B$ is compact as well (a closed subset of a compact set is compact). But $K_A-V_B \subset A-B$, and $\mu(A-B) <\mu(K_A-V_B)+2\varepsilon$, we see $A-B$ meet the requirement of $\mathfrak{M}_F$ (, the fact that $\mu(A-B)<\infty$ is trivial since $\mu(A-B)<\mu(A)$).

Since $A-B$ and $B$ are pairwise disjoint members of $\mathfrak{M}_F$, we see $\mu(A \cup B)=\mu(A-B)+\mu(B)<\infty.$ Thus $A \cup B \in \mathfrak{M}_F$. Since $A,A-B \in \mathfrak{M}_F$, we see $A \cap B = A-(A-B) \in \mathfrak{M}_F$.

Remarks of Step 6. In this step, we demonstrated several ways to express a set, all of which end up with a huge simplification. Now we are able to show that $\mathfrak{M}_F$ is a subset of $\mathfrak{M}$.

#### Step 7 - $\mathfrak{M}_F \subset \mathfrak{M}$

There is a precise relation between $\mathfrak{M}$ and $\mathfrak{M}_F$ given by $\mathfrak{M}_F=\{E \in \mathfrak{M}:\mu(E)<\infty\} \subset \mathfrak{M}.$

If $E \in \mathfrak{M}_F$, we shall show that $E \in \mathfrak{M}$. For compact $K\in\mathfrak{M}_F$ (Step 1), by Step 6, we see $K \cap E \in \mathfrak{M}_F$, therefore $E \in \mathfrak{M}$.

If $E \in \mathfrak{M}$ with $\mu(E)<\infty$ however, we need to show that $E \in \mathfrak{M}_F$. By definition of $\mu$, for $\varepsilon>0$, there is an open $V$ such that $\mu(V)<\mu(E)+\varepsilon<\infty.$ Therefore $V \in \mathfrak{M}_F$. By Step 5, there is a compact set $K$ such that $\mu(V-K)<\varepsilon$ (the open set containing $V$ should be $V$ itself). Since $E \cap K \in \mathfrak{M}_F$, there exists a compact set $H \subset E \cap K$ with $\mu(E \cap K)<\mu(H)+\varepsilon.$ Since $E \subset (E \cap K) \cup (V-K)$, it follows from Step 1 that $\mu(E) \leq {\color\red{\mu(E\cap K)}}+\mu(V-K)<{\color\red{\mu(H)+\varepsilon}}+\varepsilon=\mu(H)+2\varepsilon.$ Therefore $E \in \mathfrak{M}_F$.

Remarks of Step 7. Several tricks in the preceding steps are used here. Now we are pretty close to the fact that $(X,\mathfrak{M},\mu)$ is a measure space. Note that for $E \in \mathfrak{M}-\mathfrak{M}_F$, we have $\mu(E)=\infty$, but we have already proved the countable additivity for $\mathfrak{M}_F$. Is it 'almost trivial' for $\mathfrak{M}$? Before that, we need to show that $\mathfrak{M}$ is a $\sigma$-algebra. Note that assertion 3 of $\mu$ has been proved.

#### Step 8 - $\mathfrak{M}$ is a $\sigma$-algebra in $X$ containing all Borel sets

We will validate the definition of $\sigma$-algebra one by one.

$X \in \mathfrak{M}$.

For any compact $K \subset X$, we have $K \cap X=K$. But as proved in Step 1, $K \in \mathfrak{M}_F$, therefore $X \in \mathfrak{M}$.

If $A \in \mathfrak{M}$, then $A^c \in\mathfrak{M}$.

If $A \in \mathfrak{M}$, then $A \cap K \in \mathfrak{M}_F$. But $K-(A \cap K)=K \cap(A^c \cup K^c)=K\cap A^c \cup \varnothing=K \cap A^c.$ By Step 1 and Step 6, we see $K \cap A^c \in \mathfrak{M}_F$, thus $A^c \in \mathfrak{M}$.

If $A_n \in \mathfrak{M}$ for all $n \in \mathbb{N}$, then $A=\bigcup_{n=1}^{\infty}A_n \in \mathfrak{M}$.

We assign an auxiliary sequence of sets inductively. For $n=1$, we write $B_1=A_1 \cap K$ where $K$ is compact. Then $B_1 \in \mathfrak{M}_F$. For $n \geq 2$, we write $B_n=(A_n \cap K)-(B_1 \cup \cdots\cup B_{n-1}).$ Since $A_n \cap K \in \mathfrak{M}_F$, $B_1,B_2,\cdots,B_{n-1} \in \mathfrak{M}_F$, by Step 6, $B_n \in \mathfrak{M}_F$. Also $B_n$ is pairwise disjoint.

Another set-theoretic manipulation shows that \begin{aligned} A \cap K&=K \cap\left(\bigcup_{n=1}^{\infty}A_n\right) \\ &=\bigcup_{n=1}^{\infty}(K \cap A_n) \\ &=\bigcup_{n=1}^{\infty}B_n \cup(B_1 \cup \cdots\cup B_{n-1}) \\ &=\bigcup_{n=1}^{\infty}B_n. \end{aligned} Now we are able to evaluate $\mu(A \cap K)$ by Step 4. \begin{aligned} \mu(A \cap K)&=\sum_{n=1}^{\infty}\mu(B_n) \\ &= \lim_{n \to \infty}(A_n \cap K) <\infty. \end{aligned} Therefore $A \cap K \in \mathfrak{M}_F$, which implies that $A \in \mathfrak{M}$.

$\mathfrak{M}$ contains all Borel sets.

Indeed, it suffices to prove that $\mathfrak{M}$ contains all open sets and/or closed sets. We'll show two different paths. Let $K$ be a compact set.

1. If $C$ is closed, then $C \cap K$ is compact, therefore $C$ is an element of $\mathfrak{M}_F$. (By Step 2.)
2. If $D$ is open, then $D \cap K \subset K$. Therefore $\mu(D \cap K) \leq \mu(K)<\infty$, which shows that $D$ is an element of $\mathfrak{M}_F$ (step 7).

Therefore by 1 or 2, $\mathfrak{M}$ contains all Borel sets.

#### Step 9 - $\mu$ is a positive measure on $\mathfrak{M}$

Again, we will verify all properties of $\mu$ one by one.

$\mu(E) \geq 0$ for all $E \in \mathfrak{M}$.

This follows immediately from the definition of $\mu$, since $\Lambda$ is positive and $0 \leq f \leq 1$.

$\mu$ is countably additive.

If $A_1,A_2,\cdots$ form a disjoint countable collection of members of $\mathfrak{M}$, we need to show that $\mu\left(\bigcup_{n=1}^{\infty}A_n\right)=\sum_{n=1}^{\infty}\mu(A_n).$ If $A_n \in \mathfrak{M}_F$ for all $n$, then this is merely what we have just proved in Step 4. If $A_j \in \mathfrak{M}-\mathfrak{M}_F$ however, we have $\mu(A_j)=\infty$. So $\sum_n\mu(A_n)=\infty$. For $\mu(\cup_n A_n)$, notice that $\cup_n A_n \supset A_j$, we have $\mu(\cup_n A_n) \geq \mu(A_j)=\infty$. The identity is now proved.

#### Step 10 - The completeness of $\mu$

So far assertion 1-3 have been proved. But the final assertion has not been proved explicitly. We do that since this property will be used when discussing the Lebesgue measure $m$. In fact, this will show that $(X,\mathfrak{M},\mu)$ is a complete measure space.

If $E \in \mathfrak{M}$, $A \subset E$, and $\mu(E)=0$, then $A \in \mathfrak{M}$.

It suffices to show that $A \in \mathfrak{M}_F$. By definition, $\mu(A)=0$ as well. If $K \subset A$, where $K$ is compact, then $\mu(K)=\mu(A)=0$. Therefore $0$ is the supremum of $\mu(K)$. It follows that $A \in \mathfrak{M}_F \subset \mathfrak{M}$.

#### Step 11 - The functional and the measure

For every $f \in C_c(X)$, $\Lambda{f}=\int_X fd\mu$.

This is the absolute main result of the theorem. It suffices to prove the inequality $\Lambda f \leq \int_X fd\mu$ for all $f \in C_c(X)$. What about the other side? By the linearity of $\Lambda$ and $\int_X \cdot d\mu$, once inequality above proved, we have $\Lambda(-f)=-\Lambda{f}\leq\int_{X}-fd\mu=-\int_Xfd\mu.$ Therefore $\Lambda{f} \geq \int_X fd\mu$ holds as well, and this establish the equality.

Notice that since $K=\operatorname{supp}(f)$ is compact, we see the range of $f$ has to be compact. Namely we may assume that $[a,b]$ contains the range of $f$. For $\varepsilon>0$, we are able to pick a partition around $[a,b]$ such that $y_n - y_{n-1}<\varepsilon$ and $y_0 < a < y_1<\cdots<y_n=b.$ Put $E_i=\{x:y_{i-1}< f(x) \leq y_i\}\cap K.$ Since $f$ is continuous, $f$ is Borel measurable. The sets $E_i$ are trivially pairwise disjoint Borel sets. Again, there are open sets $V_i \supset E_i$ such that $\mu(V_i) < \mu(E_i)+\frac{\varepsilon}{n}$ for $i=1,2,\cdots,n$, and such that $f(x)<y_i + \varepsilon$ for all $x \in V_i$. Notice that $(V_i)$ covers $K$, therefore by the partition of unity, there are a sequence of functions $(h_i)$ such that $h_i \prec V_i$ for all $i$ and $\sum h_i=1$ on $K$. By Step 1 and the fact that $f=\sum_i h_i$, we see $\mu(K) \leq \Lambda(\sum_i h_i)=\sum_i \Lambda{h_i}.$ By the way we picked $V_i$, we see $h_if \leq (y_i+\varepsilon)h_i$. We have the following inequality: \begin{aligned} \Lambda{f} &= \sum_{i=1}^{n}\Lambda(h_if) \leq\sum_{i=1}^{n}(y_i+\varepsilon)\Lambda{h_i} \\ &= \sum_{i=1}^{n}\left(|a|-|a|+y_i+\varepsilon\right)\Lambda{h_i} \\ &=\sum_{i=1}^{n}(|a|+y_i+\varepsilon)\Lambda{h_i}-|a|\sum_{i=1}^{n}\Lambda{h_i}. \end{aligned} Since $h_i \prec V_i$, we have $\mu(E_i)+\frac{\varepsilon}{n}>\mu(V_i) \geq \Lambda{h_i}$. And we already get $\sum_i \Lambda{h_i} \geq \mu(K)$. If we put them into the inequality above, we get \begin{aligned} \Lambda{f} &\leq \sum_{i=1}^{n}(|a|+y_i+\varepsilon)\Lambda{h_i}-|a|\sum_{i=1}^{n}\Lambda{h_i} \\ &\leq \sum_{i=1}^{n}(|a|+y_i+\varepsilon){\color\red{(\mu(E_i)+\frac{\varepsilon}{n})}}-|a|\color\red{\mu(K)}. \end{aligned} Observe that $\cup_i E_i=K$, by Step 9 we have $\sum_{i}\mu(E_i)=\mu(K)$. A slight manipulation shows that \begin{aligned} \sum_{i=1}^{n}(|a|+y_i+\varepsilon)\mu(E_i)-|a|\mu(K)&=|a|\sum_{i=1}^{n}\mu(E_i)-|a|\mu(K)+\sum_{i=1}^{n}(y_i+\varepsilon)\mu(E_i) \\ &=\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)+2\varepsilon\mu(K). \end{aligned} Therefore for $\Lambda f$ we get \begin{aligned} \Lambda{f} &\leq\sum_{i=1}^{n}(|a|+y_i+\varepsilon)(\mu(E_i)+\frac{\varepsilon}{n})-|a|\mu(K) \\ &=\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)+2\varepsilon\mu(K)+\frac{\varepsilon}{n}\sum_{i=1}^n(|a|+y_i+\varepsilon). \end{aligned} Now here comes the trickiest part of the whole blog post. By definition of $E_i$, we see $f(x) > y_{i-1}>y_{i}-\varepsilon$ for $x \in E_i$. Therefore we get simple function $s_n$ by $s_n=\sum_{i=1}^{n}(y_i-\varepsilon)\chi_{E_i}.$ If we evaluate the Lebesgue integral of $f$ with respect to $\mu$, we see $\int_X s_nd\mu={\color\red{\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)}} \leq {\color\red{\int_X fd\mu}}.$ For $2\varepsilon\mu(K)$, things are simple since $0\leq\mu(K)<\infty$. Therefore $2\varepsilon\mu(K) \to 0$ as $\varepsilon \to 0$. Now let's estimate the final part of the inequality. It's trivial that $\frac{\varepsilon}{n}\sum_{i=1}^{n}(|a|+\varepsilon)=\varepsilon(\varepsilon+|a|)$. For $y_i$, observe that $y_i \leq b$ for all $i$, therefore $\frac{\varepsilon}{n}\sum_{i=1}^{n}y_i \leq \frac{\varepsilon}{n}nb=\varepsilon b$. Thus ${\color\green{\frac{\varepsilon}{n}\sum_{i=1}^{n}(|a|+y_i+\varepsilon)}} \color\black\leq {\color\green {\varepsilon(|a|+b+\varepsilon)}}\color\black{.}$ Notice that $b+|a| \geq 0$ since $b \geq a \geq -|a|$. Our estimation of $\Lambda{f}$ is finally done: \begin{aligned} \Lambda{f} &\leq{\color\red{\sum_{i=1}^{n}(y_i-\varepsilon)\mu(E_i)}}+2\varepsilon\mu(K)+{\color\green{\frac{\varepsilon}{n}\sum_{i=1}^n(|a|+y_i+\varepsilon)}} \\ &\leq{\color\red {\int_Xfd\mu}}+2\varepsilon\mu(K)+{\color\green{\varepsilon(|a|+b+\varepsilon)}} \\ &= \int_X fd\mu+\varepsilon(2\mu(K)+|a|+b+\varepsilon). \end{aligned} Since $\varepsilon$ is arbitrary, we see $\Lambda{f} \leq \int_X fd\mu$. The identity is proved.

#### Step 12 - The uniqueness of $\mu$

If there are two measures $\mu_1$ and $\mu_2$ that satisfy assertion 1 to 4 and are correspond to $\Lambda$, then $\mu_1=\mu_2$.

In fact, according to assertion 2 and 3, $\mu$ is determined by the values on compact subsets of $X$. It suffices to show that

If $K$ is a compact subset of $X$, then $\mu_1(K)=\mu_2(K)$.

Fix $K$ compact and $\varepsilon>0$. By Step 1, there exists an open $V \supset K$ such that $\mu_2(V)<\mu_2(K)+\varepsilon$. By Urysohn's lemma, there exists some $f$ such that $K \prec f \prec V$. Hence $\mu_1(K)=\int_X\chi_kd\mu \leq\int_X fd\mu=\Lambda{f}=\int_X fd\mu_2 \\ \leq \int_X \chi_V fd\mu_2=\mu_2(V)<\mu_2(V)+\varepsilon.$ Thus $\mu_1(K) \leq \mu_2(K)$. If $\mu_1$ and $\mu_2$ are exchanged, we see $\mu_2(K) \leq \mu_1(K)$. The uniqueness is proved.

## The flaw

Can we simply put $X=\mathbb{R}^k$ right now? The answer is no. Note that the outer regularity is for all sets but inner is only for open sets and members of $\mathfrak{M}_F$. But we expect the outer and inner regularity to be 'symmetric'. There is an example showing that locally compact is far from being enough to offer the 'symmetry'.

### A weird example

Define $X=\mathbb{R}_1 \times \mathbb{R}_2$, where $\mathbb{R}_1$ is the real line equipped with discrete metric $d_1$, and $\mathbb{R}_2$ is the real line equipped with euclidean metric $d_2$. The metric of $X$ is defined by $d_X((x_1,y_1),(x_2,y_2))=d_1(x_1,x_2)+d_2(x_1,x_2).$ The topology $\tau_X$ induced by $d_X$ is naturally Hausdorff and locally compact by considering the vertical segments. So what would happen to this weird locally compact Hausdorff space?

If $f \in C_c(X)$, let $x_1,x_2,\cdots,x_n$ be those values of $x$ for which $f(x,y) \neq 0$ for at least one $y$. Since $f$ has compact support, it is ensured that there are only finitely many $x_i$'s. We are able to define a positive linear functional by $\Lambda f=\sum_{i=1}^{n}\int_{-\infty}^{+\infty}f(x_i,y)dy=\int_X fd\mu,$ where $\mu$ is the measure associated with $\Lambda$ in the sense of R-M-K theorem. Let $E=\mathbb{R}_1 \times \{0\}.$ By squeezing the disjoint vertical segments around $(x_i,0)$, we see $\mu(K)=0$ for all compact $K \subset E$ but $\mu(E)=\infty$.

This is in violent contrast to what we do expect. However, if $X$ is required to be $\sigma$-compact (note that the space in this example is not), this kind of problems disappear neatly.

1. Walter Rudin, Real and Complex Analysis
2. Serge Lang, Fundamentals of Differential Geometry
3. Joel W. Robbin, Partition of Unity
4. Brian Conrad, Paracompactness and local compactness
5. Raoul Bott & Loring W. Tu, Differential Forms in Algebraic Topology

# The Lebesgue-Radon-Nikodym theorem and how von Neumann proved it

## An introduction

If one wants to learn the fundamental theorem of Calculus in the sense of Lebesgue integral, properties of measures have to be taken into account. In elementary calculus, one may consider something like $df(x)=f'(x)dx$ where $f$ is differentiable, say, everywhere on an interval. Now we restrict $f$ to be a differentiable and nondecreasing real function defined on $I=[a,b]$. There we got a one-to-one function defined by $g(x)=x+f(x)$

For measurable sets $E\in\mathfrak{M}$, it can be seen that if $m(E)=0$, we have $m(g(E))=0$. Moreover, $g(E) \in \mathfrak{M}$, and $g$ is one-to-one. Therefore we can define a measure like $\mu(E)=m(g(E))$ If we have a relation $\mu(E)=\int_{E}hdm$ (in fact, this is the Radon-Nikodym theorem we will prove later), the fundamental theorem of calculus for $f$ becomes somewhat clear since if $E=[a,x]$, we got $g(E)=[a+f(a),x+f(x)]$, thus we got \begin{aligned} \mu(E)=m(g(E))&=g(x)-g(a)\\ &=f(x)-f(a)+\int_a^xdt \\ &=\int_a^xh(t)dt \end{aligned} which trivially implies $f(x)-f(a)=\int_a^x[h(t)-1]dt$ the function $h$ looks like to be $g'=f'+1$.

We are not proving the fundamental theorem here. But this gives rise to a question. Is it possible to find a function such that $\mu(E)=\int_{E}hdm$ one may write as $d\mu=hdm$ or, more generally, a measure $\mu$ with respect to another measure $\lambda$? Does this $\mu$ exist with respect to $\lambda$? Does this $h$ exist? Lot of questions. Luckily the Lebesgue decomposition and Radon-Nikodym theorem make it possible.

### Notations

Let $\mu$ be a positive measure on a $\sigma$-algebra $\mathfrak{M}$, let $\lambda$ be any arbitrary measure (positive or complex) defined on $\mathfrak{M}$.

We write $\lambda \ll \mu$ if $\lambda(E)=0$ for every $E\in\mathfrak{M}$ for which $\mu(E)=0$. (You may write $\mu \ll m$ in the previous section.) We say $\lambda$ is absolutely continuous with respect to $\mu$.

Another relation between measures worth consideration is being mutually singular. If we have $\lambda(E)=\lambda(A \cap E)$ for every $E \in \mathfrak{M}$, we say $\lambda$ is concentrated on $A$.

If we now have two measures $\mu_1$ and $\mu_2$, two disjoint sets $A$ and $B$ such that $\mu_1$ is concentrated on $A$, $\mu_2$ is concentrated on $B$, we say $\mu_1$ and $\mu_2$ are mutually singular, and write $\mu_1 \perp \mu_2$

Let $\mu$ be a positive $\sigma$-finite measure on $\mathfrak{M}$, and $\lambda$ a complex measure on $\mathfrak{M}$.

• There exists a unique pair of complex measures $\lambda_{ac}$ and $\lambda_{s}$ on $\mathfrak{M}$ such that

$\lambda = \lambda_{ac}+\lambda_s \quad \lambda_{ac}\ll\mu\quad \lambda_s \perp \mu$

• There is a unique $h \in L^1(\mu)$ such that

$\lambda_{ac}(E)=\int_{E}hd\mu$

for every $E \in \mathfrak{M}$.

The unique pair $(\lambda_{ac},\lambda_s)$ is called the Lebesgue decomposition; the existence of $h$ is called the Radon-Nikodym theorem, and $h$ is called the Radon-Nikodym derivative. One also writes $d\lambda_{ac}=hd\mu$ or $\frac{d\lambda_{ac}}{d\mu}=h$ in this situation.

These are two separate theorems, but von Neumann gave the idea to prove these two at one stroke.

If we already have $\lambda \ll \mu$, then $\lambda_s=0$ and the Radon-Nikodym derivative shows up in the natural of things.

Also, one cannot ignore the fact that $m$ the Lebesgue measure is $\sigma$-finite.

## Proof explained

### Step 1 - Construct a bounded functional

We are going to employ Hilbert space technique in this proof. Precisely speaking, we are going to construct a bounded linear functional to find another function, namely $g$, which is the epicentre of this proof.

The boundedness of $\lambda$ is clear since it's complex, but $\mu$ is only assumed to be $\sigma$-finite. Therefore we need some adjustment onto $\mu$.

#### 1.1 Replacing $\mu$ with a finite measure

If $\mu$ is a positive $\sigma$-finite measure on a $\sigma$-algebra $\mathfrak{M}$ in a set $X$, then there is a function $w$ such that $w \in L^1(\mu)$ and $0<w(x)<1$ for every $x \in X$.

The $\sigma$-finiteness of $\mu$ denotes that, there exist some sets $E_n$ such that $X=\bigcup_{n=1}^{\infty}E_n$ and that $\mu(E_n)<\infty$ for all $n$.

Define w_n(x)= \begin{aligned} \begin{cases} \frac{1}{2^n(1+\mu(E_n))}\quad &x \in E_n \\ 0 \quad &x\notin E_n \end{cases} \end{aligned} (you can also say that $w_n=\frac{1}{2^n(1+\mu(E_n))}\chi_{E_n}$), then we have \begin{aligned} w &= \sum_{n=1}^{\infty}w_n \\ \end{aligned} satisfies $0<w<1$ for all $x$. With $w$, we are able to define a new measure, namely $\tilde{\mu}(E)=\int_{E}wd\mu.$ The fact that $\tilde{\mu}(E)$ is a measure can be validated by considering $\int_{E}wd\mu=\int_{X}\chi_{E}wd\mu$. It's more important that $\tilde{\mu}(E)$ is bounded and $\tilde{\mu}(E)=0$ if and only if $\mu(E)=0$. The second one comes from the strict positivity of $w$. For the first one, notice that \begin{aligned} \tilde{\mu}(X) &\leq \sum_{n=1}^{\infty}\tilde{\mu}(E_n) \\ &= \sum_{n=1}^{\infty}\frac{1}{2^n(1+\mu(E_n))} \\ &\leq \sum_{n=1}^{\infty}\frac{1}{2^n} \end{aligned}

#### 1.2 A bounded linear functional associated with $\lambda$

Since $\lambda$ is complex, without loss of generality, we are able to assume that $\lambda$ is a positive bounded measure on $\mathfrak{M}$. By 1.1, we are able to obtain a positive bounded measure by $\varphi=\lambda+\tilde{\mu}$ Following the construction of Lebesgue measure, we have $\int_{X}fd\varphi=\int_{X}fd\lambda+\int_{X}fwd\mu$ for all nonnegative measurable function $f$. Also, notice that $\lambda \leq \varphi$, we have $\left\vert \int_{X}fd\lambda \right\vert \leq \int_{X}|f|d\lambda \leq \int_{X}|f|d\varphi \leq \sqrt{\varphi(X)}\left\Vert f \right\Vert_2$ for $f \in L^2(\varphi)$ by Schwarz inequality.

Since $\varphi(X)<\infty$, we have $\Lambda{f}=\int_{X}fd\lambda$ to be a bounded linear functional on $L^2(\varphi)$.

### Step 2 - Find the associated function with respect to $\lambda$

Since $L^2(\varphi)$ is a Hilbert space, every bounded linear functional on a Hilbert space $H$ is given by an inner product with an element in $H$. That is, by the completeness of $L^2(\varphi)$, there exists a function $g$ such that $\Lambda{f}=\int_{X}fd\lambda=\int_{X}fgd\varphi=(f,g).$ The properties of $L^2$ space shows that $g$ is determined almost everywhere with respect to $\varphi$.

For $E \in \mathfrak{M}$, we got $0 \leq (\chi_{E},g)=\int_{E}gd\varphi=\int_{E}d\lambda=\lambda(E)\leq\varphi(E)$ which implies $0 \leq g \leq 1$ for almost every $x$ with respect to $\varphi$. Therefore we are able to assume that $0 \leq g \leq 1$ without ruining the identity. The proof is in the bag once we define $A$ to be the set where $0 \leq g < 1$ and $B$ the set where $g=1$.

### Step 3 - Generate $\lambda_{ac}$ and $\lambda_{s}$ and the Radon-Nikodym derivative at one stroke

We claim that $\lambda(A \cap E)$ and $\lambda(B \cap E)$ form the decomposition we are looking for, $\lambda_{ac}$ and $\lambda_s$, respectively. Namely, $\lambda_{ac}=\lambda(A \cap E)$, $\lambda_s=\lambda(B \cap E)$.

#### Proving $\lambda_s \perp \mu$

If we combine $\Lambda{f}=(f,g)$ and $\varphi=\lambda+\tilde{\mu}$ together, we have $\int_{X}(1-g)fd\lambda=\int_{X}fgwd\mu.$ Put $f=\chi_{B}$, we have $\int_{B}wd\mu=0.$ Since $w$ is strictly positive, we see that $\mu(B)=0$. Notice that $A \cap B = \varnothing$ and $A \cup B=X$. For $E \in \mathfrak{M}$, we write $E=E_A \cup E_B$, where $E_A \subset A$ and $E_B \subset B$. Therefore $\mu(E)=\mu(E_A)+\mu(E_B)=\mu(E \cap A)+\mu(E \cap B)=\mu(E \cap A).$ Therefore $\mu$ is concentrated on $A$.

For $\lambda_s$, observe that $\lambda_s(E)=\lambda(E \cap B)=\lambda((E \cap B) \cap B)=\lambda_s(E \cap B).$ Hence $\lambda_s$ is concentrated on $B$. This observation shows that $\lambda_s \perp \mu$.

#### Proving $\lambda_{ac} \ll \mu$ by the Radon-Nikodym derivative

The relation that $\lambda_{ac} \ll \mu$ will be showed by the existence of the Radon-Nikodym derivative.

If we replace $f$ by $(1+g+\cdots+g^n)\chi_E,$ where $E \in \mathfrak{M}$, we have $\int_X(1-g)fd\lambda=\int_E(1-g^{n+1})d\lambda=\int_Eg(1+g+\cdots+g^n)wd\mu.$ Notice that \begin{aligned} \int_{E}(1-g^{n+1})d\lambda &=\int\limits_{E \cap A}(1-g^{n+1})d\lambda + \int\limits_{E \cap B}(1-g^{n+1})d\lambda \\ &=\int\limits_{E \cap A}(1-g^{n+1})d\lambda \\ &\to\lambda(E \cap A) = \lambda_{ac}(E)\quad(n\to\infty) \end{aligned} Define $h_n=g(1+g+g^2+\cdots+g^n)w$, we see that on $A$, $h_n$ converges monotonically to h= \begin{aligned} \begin{cases} \frac{gw}{1-g} \quad &x\in{A}\\ 0 \quad &x\in{B} \end{cases} \end{aligned} By monotone convergence theorem, we got $\lim_{n\to\infty}\int_{E}h_nd\mu = \int_{E}hd\mu=\lambda_{ac}(E).$ for every $E\in\mathfrak{M}$.

The measurable function $h$ is the desired Radon-Nikodym derivative once we show that $h \in L^1(\mu)$. Replacing $E$ with $X$, we see that $\int_{X}|h|d\mu=\int_{X}hd\mu=\lambda_{ac}(X)\leq\lambda(X)<\infty.$ Clearly, if $\mu(E)=0$, we have $\lambda_{ac}(E)=\int_{E}hd\mu=0$ which shows that $\lambda_{ac}\ll\mu$ as desired.

### Step 3 - Generalization onto complex measures

By far we have proved this theorem for positive bounded measure. For real bounded measure, we can apply the proceeding case to the positive and negative part of it. For all complex measures, we have $\lambda=\lambda_1+i\lambda_2$ where $\lambda_1$ and $\lambda_2$ are real.

### Step 4 - Uniqueness of the decomposition

If we have two Lebesgue decompositions of the same measure, namely $(\lambda_{ac},\lambda_s)$ and $(\lambda'_{ac},\lambda'_s)$, we shall show that $\lambda_{ac}-\lambda_{ac}'=\lambda_s'-\lambda_s=0$ By the definition of the decomposition we got $\lambda_{ac}-\lambda'_{ac}=\lambda'_s-\lambda_s$ with $\lambda_{ac}-\lambda_{ac}' \ll \mu$ and $\lambda_{s}'-\lambda_{s}\perp\mu$. This implies that $\lambda'_{s}-\lambda_{s} \ll \mu$ as well.

Since $\lambda'_s-\lambda_s\perp\mu$, there exists a set with $\mu(A)=0$ on which $\lambda'_s-\lambda_s$ is concentrated; the absolute continuity shows that $\lambda'_s(E)-\lambda_s(E)=0$ for all $E \subset A$. Hence $\lambda_s'-\lambda_s$ is concentrated on $X-A$. Therefore we got $(\lambda'_s-\lambda_s)\perp(\lambda'_s-\lambda_s)$, which forces $\lambda'_s-\lambda_s=0$. The uniqueness is proved.

(Following the same process one can also show that $\lambda_{ac}\perp\lambda_s$.)