Desvl's blog

Posted 2020-04-18Updated 2023-07-08

从科学记数法讲起

当处理非常大或者非常小的数的时候科学记数法肯定是很方便的，不用记写很多$0$，也不用数有几个$0$。像是木星的质量大约是$1.9\times 10^{27}$千克，而电子的质量大约是$9.1 \times 10^{-31}$ 千克，能很轻松地进行表达。如果这两个数进行加法或者乘法，那也不是很困难（至少相比于非得把两个数列出来好算）。

对于一个用科学记数法表示的数

$x=a \times 10^b$

很多信息自然地暴露了出来。$a$帮助我们认识这个数的具体值（可能是估计值，但是估计起来也很方便！），$b$帮助我们认识这个数的数量级。在计算pH值时对浓度取$\lg$对数，正好是利用了这一点。虽然在数学上$e$的应用更广泛，但是实际计算并不方便；而$\lg$对数如果用来计算像是$10^b$这种数，就非常简单了。

有人就会问，这些虽然好用，但是这是2000多年前阿基米德时代的数学，和17世纪以后的微积分有什么关系？我在这篇博客中会尝试建立一个观点，无穷小量、无穷大量、阶的估计是微积分的科学记数法。另外，无限趋近、逼近这些词看上去很不严格，其实是一种对严格表达的形象化表述，而原先的严格表达并没有问题。

微积分的“科学记数法”

抓住极限定义的重点

虽然无限趋近、逼近这些词很形象，但是如果看到严格的表述仍然觉得不知所云，那样的话理解还是有问题。

$\lim_{x \to a}f(x) = A$的定义是: 对任意的$\varepsilon>0$，存在$\delta>0$，使得当$0<|x-a|<\delta$时，成立$|f(x)-A|<\varepsilon$.

教授微积分的老师尝试用很多办法形象化地表述这个观点。像是作出函数的图像，然后对图像缩放，让人能看到这个结论的成立。但是这可能只是一个结果。重点在于这几个逻辑词上。极限的定义其实是一个朴素的逻辑命题，这里面最复杂的内容，仅仅是不等式。对于任意的$\varepsilon>0$，至少存在一个$\delta>0$，使得当一个关于$\delta$的不等式成立时，一个关于$\varepsilon$的不等式一定成立。如果满足这样的条件，就说明这个函数有极限。不管问题再怎么复杂，我们最终关心的还是不等式会不会成立。你可以设想，写了一个bool函数，然后给它传入任意正值，总是返回true（这样一个$\delta$存在）。其实到了后面探讨什么函数可导、Riemann可积、Lebesgue可积、可测，都是试图化归到一个平凡的判断问题上。

记号$o,O,\sim$

$\sim$，同阶和等价无穷小（大）量

可能这个极限不少人中学就知道了

$\lim_{x\to 0}\frac{\sin{x}}{x}=1$

这个时候就可以记成$\sin{x}\sim x(x\to 0)$。可以说，$\sin{x}$在$x=0$处和$y=x$非常接近，几乎重合。但是我们也可以说，对任意$\varepsilon>0$，存在$\delta>0$，使得$0<|x|<\delta$时，一定有$|\frac{\sin{x}}{x}-1|<\varepsilon$。这个的论证可能涉及到一些计算技巧，但是逻辑问题的表达还是很朴素的。

严格来说，如果任意两个函数$f(x)$和$g(x)$，如果有
$\lim_{x \to a}\frac{f(x)}{g(x)}=1$
就记$f(x)\sim g(x) (x \to a)$. 这里有$-\infty\leq a \leq +\infty$。如果有$|f(x)|\to\infty$或者$f(x) \to 0$，这个时候就是在计算无穷大量或者无穷小量，这个时候就称$f$是$g$的等价无穷小（大）量。

这里的”等价“其实也很有意思。经过简单的计算，不难验证，对于关系$\sim$，满足以下三个性质：

自反性：$f \sim f$
对称性：$f \sim g$当且仅当$g \sim f$
传递性：如果$f \sim g$且$g \sim h$，那么一定有$f \sim h$

如果一个关系满足这三条性质，就称为“等价关系”。$\mathbb{R}$上两个数的相等自然也满足这个关系。

当然可能比值等于$1$有点严苛，但是可能有同阶无穷小：

$\lim_{x\to a}\frac{f(x)}{g(x)}=A\neq 0$

那么其实也罢，只需要两边除以$A$就行了，也就是说$f(x)\sim Ag(x)(x \to A)$。比如说$1-\cos{x}\sim\frac{1}{2}x^2$.

为什么引入这个概念？

这就体现出文章开头讲的“科学计数法”了。可能实际的函数计算很繁琐（比如，涉及到三角函数、反三角函数等等），然而有一种函数的计算一直是很平凡的：多项式。或者稍微退一步，幂函数。这种函数甚至可以直接计算出数值来。如果我们能把复杂的函数简化成幂函数，那么事情就简单了很多。这就好比计算$9800000000000.00001 \times 82012$可以简化成$9.8 \times 8.2 \times 10^{12+4}$。误差是可以接受的。或者说，我们尝试用幂函数表示一个函数的“数量级”，这个“数量”指的是在某点的逼近速度。当然也不一定必须要化归成幂函数，也可以逆向操作，变成一个不同的但是更好计算的极限。下面解释一个经典的例子。

设$\alpha \neq 0$, 求极限
$\lim_{x \to 0}\frac{(1+x)^\alpha-1}{x}$

令$y=(1+x)^\alpha-1$，那么简单计算就能得到$\ln(1+y)=\alpha\ln(1+x)$。又知道$\ln(1+x)\sim x$，综合一下我们就有

$\begin{aligned} \lim_{x \to 0}\frac{y(x)}{x}&=\lim_{x \to 0}\frac{y(x)}{x}\frac{\ln(1+y)}{\ln(1+y)} \\ &=\lim_{x \to 0}\frac{\ln(1+y)}{x}\frac{y}{\ln(1+y)}\\ &=\lim_{x \to 0}\frac{\alpha\ln(1+x)}{x}\lim_{x \to 0}\frac{y}{\ln(1+y)} \\ &=\alpha \cdot 1\cdot 1 \end{aligned}$

这里利用了极限的一个基本性质：

如果$\lim_{x \to a}f(x)=A$，$\lim_{x \to a}g(x)=B$，那么$\lim_{x \to a}f(x)g(x)=AB$。

要注意这个结论是完全可以用$\varepsilon-\delta$语言进行证明的。

这个问题的解决我们依赖了一个很简单的结论：$\ln(1+x)$和$x$在$x=0$附近是相同的“数量级”。然后我们把原来的极限化归成这样的一个比值，反复组合利用。其实通过$\ln(1+x)\sim x$可以直接得到$y \sim \ln(1+y) \sim \alpha\ln(1+x)\sim\alpha x$（这里的每一个$\sim$都是有理论基础的，再注意最开始已经指出，这个$\sim$是具有传递性的），从而得到

$\begin{aligned} \lim_{x \to 0}\frac{y(x)}{x} &= \lim_{x \to 0}\frac{y(x)}{x}\frac{\alpha x}{\alpha x} \\ &=\alpha\lim_{x \to 0}\frac{y(x)}{\alpha x}\\ &= \alpha \cdot 1 \end{aligned}$

无穷小量的阶

既然我们试图将一个连续函数的极限”简化“成一个幂函数，那么我们就可以用幂函数的阶定义极限的阶。也就是说

若已知$\lim_{x \to a}f(x)=0$，且存在常数$\alpha>0$使得
$\lim_{x \to a}\left\vert\frac{f(x)}{(x-a)^\alpha}\right\vert=A \neq 0$
即$f(x)\sim A(x-a)^\alpha$，那么就称$f(x)$在$x \to a$时有$\alpha$阶无穷小量

这里的$\alpha$就好比是科学计数法里的数量级，虽然不能给我们准确信息，但是给我们的信息在一些环境下已经足够了，这些信息恰好就是我们想要的。

但是有的函数不一定有阶。例如函数$x\sin\frac{1}{x}$在$x=0$附近并没有阶。只需要观察

$\frac{x\sin\frac{1}{x}}{x^\alpha}$

在$0<\alpha<1$时，此函数极限为$0$。在$\alpha \geq 1$时，函数极限不存在，这时就并不存在阶这一说了。

现实中估计错一个数的数量级可能会产生不少麻烦，比如$10^{26}$和$10^{27}$差别还是很大的。在计算极限的时候估算错阶，可能就直接得到了错误的结果。等价关系$\sim$只在比值这一方面告诉了我们一些信息，但是并没有告诉我们加减法上的信息（就好比科学计数法也没告诉我们加减法的信息一样）。如果不注意到这一点，下面这个极限就会算错：

$\begin{aligned} \lim_{x \to 0}\frac{\sin{x}-\tan{x}}{x^3}&=\lim_{x \to 0}\left\{\left(\frac{\sin{x}}{x}\right)\left(\frac{1}{\cos{x}}\right)\left(\frac{\cos{x}-1}{x^2}\right)\right\}\\ &= -\frac{1}{2} \end{aligned}$

这个地方自然没有问题。但是如果没认识到这一点，能得到好几个不同的结果。可能会有人认为$\sin{x}\sim x \sim \tan{x}$，所以直接利用$\sin{x} \sim \tan{x}$，直接得到这个错误结果：

$\lim_{x \to 0}\frac{\sin{x}-\tan{x}}{x^3} = \lim_{x \to 0}\frac{\sin{x}-\sin{x}}{x^3}=0$

所以$\sin{x}-\tan{x}$的阶应该$<3$。但实际上阶就是$3$。还可能得到另外一些古怪的结果。例如

$\lim_{x \to 0}\frac{x - \tan{x}}{x^3}= -\frac{1}{3} \\ \lim_{x \to 0}\frac{\sin{x} - x}{x^3} = -\frac{1}{6}$

（上面两个极限都是正确的，但是肯定不是最初求的极限！）$\sin{x}$是$1$阶无穷小，$\tan{x}$是$1$阶无穷小，而二者的和是$1$阶的，差是$3$阶的。在计算极限时，就需要考虑两个无穷小的和差的阶是否发生改变。这就好比，我们不能说两个质量的数量级为$10^{20}$千克的行星的和或者差的数量级也是$10^{20}$这个数量级。

记号$o$和$O$又在表达什么？

先举个例子。我们在谈论飞机、火箭这些航天飞行器的时候，往往会引入音速甚至是光速等等概念，因为慢速下的速度单位已经不够用了。比如马赫（Mach number）这个概念。它指的是速度和音速的比值。有的时候我们就会关注飞机的飞行速度有没有超过音速，甚至要讨论是音速的几倍。但是骑自行车的速度就和音速没有可比性了。

这里的$o$和$O$也是一个”参考“的作用。在上一节我们探讨了$\frac{f(x)}{g(x)}$收敛到非零常数的情况，但是不一定能得到这样的结果。

如果满足$\lim_{x \to a}\frac{f(x)}{g(x)}=0$，就记$f(x)=o(g(x))(x \to a)$。如果$x \to a$时有$f(x) \to 0$ 且$g(x) \to 0$，那么就称$f$是比$g$更高阶无穷小。

比如对于$\alpha > 1$, 就有$x^\alpha = o(x)(x \to 0)$。因为这个函数满足这个极限关系。以$g$做参考，那么$f$的收敛速度是更快的。这里的$=o(\cdot)$也可以看成一种“关系”，但是这种关系只满足传递性，并不是一个等价关系。等价关系的剩下两个条件都不满足。如果$f(x)=o(g(x))$且$g(x)=o(h(x))$，那么一定有$f(x)=o(h(x))$，但是并没有$f(x)=o(f(x))$，更不会有$g(x)=o(f(x))$，这通过很简单的极限计算就能得到。

如果有$f(x)=o(1)(x \to a)$，那么就有$\lim_{x \to a}f(x)=0$。当然这个时候用$o$表示会显得很别扭，但是把$1$当分母，也确实没有问题！这个时候我们看成和常数函数比较收敛速度，那么函数收敛就是收敛，不收敛就是不收敛。

$f(x)=O(g(x))(x \to a)$的定义是，存在$a$的某个去心邻域和非负常数$M$，使$\left\vert\frac{f(x)}{g(x)}\right\vert \leq M$在这个邻域上恒成立。

这里的“去心邻域”并不是什么复杂的概念。在处理极限的时候，$f(x)$和$g(x)$在$a$处可能都没有极限，所以就讨论某个小区间，把$a$抠去，也就是说$(a-\delta,a) \cup (a,a+\delta)$。这个是最粗略的估计，但是也很方便。可能计算比值的极限比较麻烦，但是我们知道比值是有界的，就能减少不少麻烦。

在研究傅里叶级数时，常常会研究一类“适度下降”的函数，需要函数$f$满足$f(x)=O\left(\frac{1}{1+|x|^\alpha}\right)$，其中$\alpha > 1$。我们还可以探讨前面那个无阶函数，就能发现

$x\sin\frac{1}{x}=O(x),\quad x\sin{\frac{1}{x}}=o(x^\alpha)(\alpha<1)$

“科学记数法”的总结

像是现实中的科学计数法，为了方便计算和理解，把很长的数的信息抽象出来然后使用另一种办法总结。$o, O,\sim$这种符号将指定函数满足的比值和极限的结论抽象出来，然后利用这些结论表示一个函数，从而间接计算出原来的极限。这就像是精确到小数点后几位，和音速、光速做对比，将问题进行简化。而这些简化的基础都在于极限的定义，在于那些逻辑命题能不能成立。

Posted 2020-03-29Updated 2023-07-08Analysis / Integration Theory / Measure Theory

The Lebesgue-Radon-Nikodym theorem and how von Neumann proved it

An introduction

If one wants to learn the fundamental theorem of Calculus in the sense of Lebesgue integral, properties of measures have to be taken into account. In elementary calculus, one may consider something like

$df(x)=f'(x)dx$

where $f$ is differentiable, say, everywhere on an interval. Now we restrict $f$ to be a differentiable and nondecreasing real function defined on $I=[a,b]$. There we got a one-to-one function defined by

$g(x)=x+f(x)$

For measurable sets $E\in\mathfrak{M}$, it can be seen that if $m(E)=0$, we have $m(g(E))=0$. Moreover, $g(E) \in \mathfrak{M}$, and $g$ is one-to-one. Therefore we can define a measure like

$\mu(E)=m(g(E))$

If we have a relation

$\mu(E)=\int_{E}hdm$

(in fact, this is the Radon-Nikodym theorem we will prove later), the fundamental theorem of calculus for $f$ becomes somewhat clear since if $E=[a,x]$, we got $g(E)=[a+f(a),x+f(x)]$, thus we got

$\begin{aligned} \mu(E)=m(g(E))&=g(x)-g(a)\\ &=f(x)-f(a)+\int_a^xdt \\ &=\int_a^xh(t)dt \end{aligned}$

which trivially implies

$f(x)-f(a)=\int_a^x[h(t)-1]dt$

the function $h$ looks like to be $g’=f’+1$.

We are not proving the fundamental theorem here. But this gives rise to a question. Is it possible to find a function such that

$\mu(E)=\int_{E}hdm$

one may write as

$d\mu=hdm$

or, more generally, a measure $\mu$ with respect to another measure $\lambda$? Does this $\mu$ exist with respect to $\lambda$? Does this $h$ exist? Lot of questions. Luckily the Lebesgue decomposition and Radon-Nikodym theorem make it possible.

Notations

Let $\mu$ be a positive measure on a $\sigma$-algebra $\mathfrak{M}$, let $\lambda$ be any arbitrary measure (positive or complex) defined on $\mathfrak{M}$.

We write

$\lambda \ll \mu$

if $\lambda(E)=0$ for every $E\in\mathfrak{M}$ for which $\mu(E)=0$. (You may write $\mu \ll m$ in the previous section.) We say $\lambda$ is absolutely continuous with respect to $\mu$.

Another relation between measures worth consideration is being mutually singular. If we have $\lambda(E)=\lambda(A \cap E)$ for every $E \in \mathfrak{M}$, we say $\lambda$ is concentrated on $A$.

If we now have two measures $\mu_1$ and $\mu_2$, two disjoint sets $A$ and $B$ such that $\mu_1$ is concentrated on $A$, $\mu_2$ is concentrated on $B$, we say $\mu_1$ and $\mu_2$ are mutually singular, and write

$\mu_1 \perp \mu_2$

The Theorem of Lebesgue-Radon-Nikodym

Let $\mu$ be a positive $\sigma$-finite measure on $\mathfrak{M}$, and $\lambda$ a complex measure on $\mathfrak{M}$.

There exists a unique pair of complex measures $\lambda_{ac}$ and $\lambda_{s}$ on $\mathfrak{M}$ such that

$\lambda = \lambda_{ac}+\lambda_s \quad \lambda_{ac}\ll\mu\quad \lambda_s \perp \mu$

There is a unique $h \in L^1(\mu)$ such that

$\lambda_{ac}(E)=\int_{E}hd\mu$
for every $E \in \mathfrak{M}$.

The unique pair $(\lambda_{ac},\lambda_s)$ is called the Lebesgue decomposition; the existence of $h$ is called the Radon-Nikodym theorem, and $h$ is called the Radon-Nikodym derivative. One also writes $d\lambda_{ac}=hd\mu$ or $\frac{d\lambda_{ac}}{d\mu}=h$ in this situation.

These are two separate theorems, but von Neumann gave the idea to prove these two at one stroke.

If we already have $\lambda \ll \mu$, then $\lambda_s=0$ and the Radon-Nikodym derivative shows up in the natural of things.

Also, one cannot ignore the fact that $m$ the Lebesgue measure is $\sigma$-finite.

Proof explained

Step 1 - Construct a bounded functional

We are going to employ Hilbert space technique in this proof. Precisely speaking, we are going to construct a bounded linear functional to find another function, namely $g$, which is the epicentre of this proof.

The boundedness of $\lambda$ is clear since it’s complex, but $\mu$ is only assumed to be $\sigma$-finite. Therefore we need some adjustment onto $\mu$.

1.1 Replacing $\mu$ with a finite measure

If $\mu$ is a positive $\sigma$-finite measure on a $\sigma$-algebra $\mathfrak{M}$ in a set $X$, then there is a function $w$ such that $w \in L^1(\mu)$ and $0<w(x)<1$ for every $x \in X$.

The $\sigma$-finiteness of $\mu$ denotes that, there exist some sets $E_n$ such that

$X=\bigcup_{n=1}^{\infty}E_n$

and that $\mu(E_n)<\infty$ for all $n$.

Define

$w_n(x)= \begin{aligned} \begin{cases} \frac{1}{2^n(1+\mu(E_n))}\quad &x \in E_n \\ 0 \quad &x\notin E_n \end{cases} \end{aligned}$

(you can also say that $w_n=\frac{1}{2^n(1+\mu(E_n))}\chi_{E_n}$), then we have

$\begin{aligned} w &= \sum_{n=1}^{\infty}w_n \\ \end{aligned}$

satisfies $0<w<1$ for all $x$. With $w$, we are able to define a new measure, namely

$\tilde{\mu}(E)=\int_{E}wd\mu.$

The fact that $\tilde{\mu}(E)$ is a measure can be validated by considering $\int_{E}wd\mu=\int_{X}\chi_{E}wd\mu$. It’s more important that $\tilde{\mu}(E)$ is bounded and $\tilde{\mu}(E)=0$ if and only if $\mu(E)=0$. The second one comes from the strict positivity of $w$. For the first one, notice that

$\begin{aligned} \tilde{\mu}(X) &\leq \sum_{n=1}^{\infty}\tilde{\mu}(E_n) \\ &= \sum_{n=1}^{\infty}\frac{1}{2^n(1+\mu(E_n))} \\ &\leq \sum_{n=1}^{\infty}\frac{1}{2^n} \end{aligned}$

1.2 A bounded linear functional associated with $\lambda$

Since $\lambda$ is complex, without loss of generality, we are able to assume that $\lambda$ is a positive bounded measure on $\mathfrak{M}$. By 1.1, we are able to obtain a positive bounded measure by

$\varphi=\lambda+\tilde{\mu}$

Following the construction of Lebesgue measure, we have

$\int_{X}fd\varphi=\int_{X}fd\lambda+\int_{X}fwd\mu$

for all nonnegative measurable function $f$. Also, notice that $\lambda \leq \varphi$, we have

$\left\vert \int_{X}fd\lambda \right\vert \leq \int_{X}|f|d\lambda \leq \int_{X}|f|d\varphi \leq \sqrt{\varphi(X)}\left\Vert f \right\Vert_2$

for $f \in L^2(\varphi)$ by Schwarz inequality.

Since $\varphi(X)<\infty$, we have

$\Lambda{f}=\int_{X}fd\lambda$

to be a bounded linear functional on $L^2(\varphi)$.

Step 2 - Find the associated function with respect to $\lambda$

Since $L^2(\varphi)$ is a Hilbert space, every bounded linear functional on a Hilbert space $H$ is given by an inner product with an element in $H$. That is, by the completeness of $L^2(\varphi)$, there exists a function $g$ such that

$\Lambda{f}=\int_{X}fd\lambda=\int_{X}fgd\varphi=(f,g).$

The properties of $L^2$ space shows that $g$ is determined almost everywhere with respect to $\varphi$.

For $E \in \mathfrak{M}$, we got

$0 \leq (\chi_{E},g)=\int_{E}gd\varphi=\int_{E}d\lambda=\lambda(E)\leq\varphi(E)$

which implies $0 \leq g \leq 1$ for almost every $x$ with respect to $\varphi$. Therefore we are able to assume that $0 \leq g \leq 1$ without ruining the identity. The proof is in the bag once we define $A$ to be the set where $0 \leq g < 1$ and $B$ the set where $g=1$.

Step 3 - Generate $\lambda_{ac}$ and $\lambda_{s}$ and the Radon-Nikodym derivative at one stroke

We claim that $\lambda(A \cap E)$ and $\lambda(B \cap E)$ form the decomposition we are looking for, $\lambda_{ac}$ and $\lambda_s$, respectively. Namely, $\lambda_{ac}=\lambda(A \cap E)$, $\lambda_s=\lambda(B \cap E)$.

Proving $\lambda_s \perp \mu$

If we combine $\Lambda{f}=(f,g)$ and $\varphi=\lambda+\tilde{\mu}$ together, we have

$\int_{X}(1-g)fd\lambda=\int_{X}fgwd\mu.$

Put $f=\chi_{B}$, we have

$\int_{B}wd\mu=0.$

Since $w$ is strictly positive, we see that $\mu(B)=0$. Notice that $A \cap B = \varnothing$ and $A \cup B=X$. For $E \in \mathfrak{M}$, we write $E=E_A \cup E_B$, where $E_A \subset A$ and $E_B \subset B$. Therefore

$\mu(E)=\mu(E_A)+\mu(E_B)=\mu(E \cap A)+\mu(E \cap B)=\mu(E \cap A).$

Therefore $\mu$ is concentrated on $A$.

For $\lambda_s$, observe that

$\lambda_s(E)=\lambda(E \cap B)=\lambda((E \cap B) \cap B)=\lambda_s(E \cap B).$

Hence $\lambda_s$ is concentrated on $B$. This observation shows that $\lambda_s \perp \mu$.

Proving $\lambda_{ac} \ll \mu$ by the Radon-Nikodym derivative

The relation that $\lambda_{ac} \ll \mu$ will be showed by the existence of the Radon-Nikodym derivative.

If we replace $f$ by

$(1+g+\cdots+g^n)\chi_E,$

where $E \in \mathfrak{M}$, we have

$\int_X(1-g)fd\lambda=\int_E(1-g^{n+1})d\lambda=\int_Eg(1+g+\cdots+g^n)wd\mu.$

Notice that

$\begin{aligned} \int_{E}(1-g^{n+1})d\lambda &=\int\limits_{E \cap A}(1-g^{n+1})d\lambda + \int\limits_{E \cap B}(1-g^{n+1})d\lambda \\ &=\int\limits_{E \cap A}(1-g^{n+1})d\lambda \\ &\to\lambda(E \cap A) = \lambda_{ac}(E)\quad(n\to\infty) \end{aligned}$

Define $h_n=g(1+g+g^2+\cdots+g^n)w$, we see that on $A$, $h_n$ converges monotonically to

$h= \begin{aligned} \begin{cases} \frac{gw}{1-g} \quad &x\in{A}\\ 0 \quad &x\in{B} \end{cases} \end{aligned}$

By monotone convergence theorem, we got

$\lim_{n\to\infty}\int_{E}h_nd\mu = \int_{E}hd\mu=\lambda_{ac}(E).$

for every $E\in\mathfrak{M}$.

The measurable function $h$ is the desired Radon-Nikodym derivative once we show that $h \in L^1(\mu)$. Replacing $E$ with $X$, we see that

$\int_{X}|h|d\mu=\int_{X}hd\mu=\lambda_{ac}(X)\leq\lambda(X)<\infty.$

Clearly, if $\mu(E)=0$, we have

$\lambda_{ac}(E)=\int_{E}hd\mu=0$

which shows that

$\lambda_{ac}\ll\mu$

as desired.

Step 3 - Generalization onto complex measures

By far we have proved this theorem for positive bounded measure. For real bounded measure, we can apply the proceeding case to the positive and negative part of it. For all complex measures, we have

$\lambda=\lambda_1+i\lambda_2$

where $\lambda_1$ and $\lambda_2$ are real.

Step 4 - Uniqueness of the decomposition

If we have two Lebesgue decompositions of the same measure, namely $(\lambda_{ac},\lambda_s)$ and $(\lambda’_{ac},\lambda’_s)$, we shall show that

$\lambda_{ac}-\lambda_{ac}'=\lambda_s'-\lambda_s=0$

By the definition of the decomposition we got

$\lambda_{ac}-\lambda'_{ac}=\lambda'_s-\lambda_s$

with $\lambda_{ac}-\lambda_{ac}’ \ll \mu$ and $\lambda_{s}’-\lambda_{s}\perp\mu$. This implies that $\lambda’_{s}-\lambda_{s} \ll \mu$ as well.

Since $\lambda’_s-\lambda_s\perp\mu$, there exists a set with $\mu(A)=0$ on which $\lambda’_s-\lambda_s$ is concentrated; the absolute continuity shows that $\lambda’_s(E)-\lambda_s(E)=0$ for all $E \subset A$. Hence $\lambda_s’-\lambda_s$ is concentrated on $X-A$. Therefore we got $(\lambda’_s-\lambda_s)\perp(\lambda’_s-\lambda_s)$, which forces $\lambda’_s-\lambda_s=0$. The uniqueness is proved.

(Following the same process one can also show that $\lambda_{ac}\perp\lambda_s$.)

Posted 2020-03-23Updated 2023-07-08

Counterexamples of Fubini's theorem

Hypotheses in Fubini’s theorem cannot be dispensed with

In this post we proved Fubini’s theorem in the sense of Lebesgue measure, which makes it easier to evaluate multi variable integral. But these two classic counterexamples in this post prevent you from using Fubini’s theorem without enough consideration.

Counterexamples

So we said $f(x,y)$ has to be integrable. What if $f$ is not? First let’s see this function.

$f(x,y)=\frac{y^2-x^2}{(x^2+y^2)^2}=\frac{\partial^2}{\partial x \partial y}\arctan(\frac{y}{x})$

This function is not Lebesgue integrable on $[0,1] \times [0,1]$, since we have

$\begin{aligned} \iint\limits_{D} |f(x,y)|dxdy & = \int_\varepsilon^1 \int_0^{\frac{\pi}{2}}\frac{r^2|\cos{2\theta}|}{r^4}rd\theta{dr} \\ &=\int_\varepsilon^1\frac{1}{r}dr \\ &\to \infty \quad (\varepsilon \to 0) \end{aligned}$

where $D=\{(x,y):\varepsilon^2 \leq x^2+y^2 \leq 1, x \geq 0, y \geq 0\}$. Fubini’s theorem fails then since

$\begin{aligned} \int_0^1 dx \int_0^1 f(x,y)dy &= \int_0^1 \frac{-1}{1+x^2}dx \\ &= -\frac{\pi}{4} \end{aligned}$

meanwhile

$\begin{aligned} \int_0^1 dy \int_0^1 f(x,y)dx &= \int_0^1\frac{1}{1+y^2}dy \\ &=\frac{\pi}{4} \end{aligned}$

See, everything messes up, and the identity disappears. This function is too ‘large’ for Fubini’s theorem to work with, so is the next one.

The following function is generated by series. First consider the sequence on $[0,1]$ generated by

$0= \delta_1<\delta_2<\cdots,\delta_n \to 1$

And a sequence of functions $g_n$ generated by $\int_0^1 g_ndx=1$ with supports in $(\delta_n,\delta_{n+1})$.

Define $f(x,y)$ on $[0,1] \times [0,1]$ such that

$f(x,y)=\sum_{n=1}^{\infty}[g_n(x)-g_{n+1}(x)]g_n(y)$

The right hand is convergent since for each point $(x,y)$ there is at least one term in this sum that is different from $0$.

An easy computation shows that

$\int_0^1\int_0^1 |f(x,y)|dxdy = \infty$

and

$\int_0^1 dx \int_0^1 f(x,y)dy = 1 \neq 0 = \int_0^1 dy \int_0^1 f(x,y)dx$

Posted 2020-03-22Updated 2023-07-08

Fubini's theorem in Euclidean space (Understanding 'almost everywhere')

General Idea

In elementary calculus, integrals of continuous functions of several variables are often calculated by iterating one-dimensional integrals. But the properties of measurability give rise to a lot of issues for Lebesgue integration on $\mathbb{R}^d$. What we are looking for is the equation

$\int_{\mathbb{R}^m}\left(\int_{\mathbb{R}^n}f(x,y)dy\right)dx=\int_{\mathbb{R}^n}\left(\int_{\mathbb{R}^m}f(x,y)dx\right)dy=\int_{\mathbb{R}^d}fdm$

where $d=m+n$ and $m,n$ are positive integers. If this equation holds for $f$, the integration would be relatively easy, as the iteration can be taken in any order. In fact, this equation can be generalized to some other abstract measure space, but that’s beyond what this post could cover.

Notations

For $d=m+n$, we write

$\mathbb{R}^d=\mathbb{R}^m\times\mathbb{R}^n$

A point in $\mathbb{R}^d$ therefore takes the form $(x,y)$, where $x\in\mathbb{R}^m$ and $y\in\mathbb{R}^n$. If $f$ is defined on $\mathbb{R}^d$, the slice of $f$ is respectively

$\begin{aligned} f^y(x)&=f(x,y)\quad y\in\mathbb{R}^n \\ f_x(y)&=f(x,y)\quad x\in\mathbb{R}^m \end{aligned}$

For $E \subset \mathbb{R}^m\times\mathbb{R}^n$, we defines its slices by

$\begin{aligned} E^y&=\{x\in\mathbb{R}^m:(x,y)\in{E}\} \\ E_x&=\{y\in\mathbb{R}^n:(x,y)\in{E}\} \end{aligned}$

But why ‘almost everywhere’?

Unfortunately, even if we assume that $f$ is measurable on $\mathbb{R}^d$, it can be shown that $f^y$ is not necessarily measurable for each $y$. It’s easy to construct a non-measurable set on $\mathbb{R}$ (x-axis), namely $A$. Then $A$ has Lebesgue measure $0$ in $\mathbb{R} \times \mathbb{R}$. But $E^y$ is not measurable for $y=0$. Nevertheless, the consideration of ‘almost everywhere’ is able to save us from this.

Fubini’s Theorem

Suppose $f(x,y)$ is integrable on $\mathbb{R}^m \times \mathbb{R}^n$. Then for almost every $y \in \mathbb{R}^n$, we have

$f^y$ is integrable on $\mathbb{R}^m$.

The function defined by $\int_{\mathbb{R}^m}f^y(x)dx$ is integrable on $\mathbb{R}^n$.

This equation holds ($m$ denotes the Lebesgue measure on $\mathbb{R}^d$):
$\int_{\mathbb{R}^m}\left(\int_{\mathbb{R}^n}f(x,y)dx\right)dy=\int_{\mathbb{R}^n}\left(\int_{\mathbb{R}^m}f(x,y)dy\right)dx=\int_{\mathbb{R}^d}fdm$
The symmetric conclusion can be obtained for $x$.

General and more rigorous version

The general version of Fubini’s theorem is developed in abstract product space, which will not be proved here. But it’s worth a peek. Of course, feel free to jump to the next section if you are not interested.

Let $(X,\mathscr{S},\mu)$ and $(Y,\mathscr{T},\lambda)$ be $\sigma$-finite measure spaces, and let $f$ be an $(\mathscr{S} \times \mathscr{T})$-measurable function defined on $X \times Y$.

If $f$ is an nonnegative real function, and if
$\varphi(x)=\int_Y f_xd\lambda \quad \psi(y) = \int_X f^yd\mu\qquad (x\in X, y \in Y)$
then $\varphi$ is $\mathscr{S}$-measurable, and $\psi$ is $\mathscr{T}$-measurable, and
$\int_Xd\mu(x)\int_Yf(x,y)d\lambda(y)=\int_{Y}d\lambda(y)\int_{X}f(x,y)d\mu(x)$

If $f$ is complex and if
$\varphi^\ast(x)=\int_Y|f|_xd\lambda\quad\text{and}\quad\int_{X}\varphi^\ast d\mu<\infty$
then $f \in L^1(\mu\times\lambda)$.

If $f \in L^1(\mu \times \lambda)$, then $f_x \in L^1(\lambda)$ for almost all $x \in X$, $f^y \in L^1(\mu)$ for almost all $y \in Y$. The function therefore defined in 1 a.e. are in $L^1(\mu)$ and $L^1(\lambda)$ respectively, and the equation holds.

Clearly, if we replace $X$, $Y$ with $\mathbb{R}^m$ and $\mathbb{R}^n$, $\mathscr{S}$ and $\mathscr{T}$ with the respective Lebesgue $\sigma$-algebra, $\lambda$ and $\mu$ with Lebesgue measure, then we obtained the Euclidean version. Notice that $f$ is integrable means that $\int_X|f|d\mu < \infty$.

Before the proof

The proof is relatively long. Instead of proving that $f$ as an integrable function satisfies the three conclusions, we shall show that, however, the family of functions satisfy the three conclusions (say, $\mathcal{F}$) contains all integrable functions. If you check the general version of Fubini’s theorem, you see that integrability was explicitly discussed.

First, we shall show that $\mathcal{F}$ is not empty. This is important because we might have been discussing something that never exists. Second. Considering the fact that any integrable function can be “approximated” by simple functions, where simple functions can be generated linear combination, it encourage us to discuss limits and linear combinations in $\mathcal{F}$. Finally, we shall show that if $f$ is integrable, then $f \in \mathcal{F}$. The power of almost-everywhere will show up along the proof.

Complete proof of Fubini’s Theorem (With explanation)

Step 1 - $\mathcal{F}$ is not empty

It’s somewhat absurd to discuss the property of $\mathcal{F}$ without proving that it’s not empty. But that can be done easily.

Suppose $E$ is a bounded open cube in $\mathbb{R}^d$ such that $E = Q_1 \times Q_2$, where $Q_1$ and $Q_2$ are open cubes in $\mathbb{R}^m$ and $\mathbb{R}^n$. Then $\chi_E \in \mathcal{F}$.

For each $y$, $\chi_E(x,y)$ is measurable. And the integrability of $\chi_E(x,y)$ follows with

$\begin{aligned} g(y)=\int_{\mathbb{R}^m}\chi_E(x,y)dx=\begin{cases}\text{vol}(Q_1)\quad &y \in Q_2 \\ 0 \quad &y\notin Q_2\end{cases} \end{aligned}$

It shows that $g(y)=\text{vol}(Q_1)\chi_{Q_2}$, which is measurable and integrable as well. Further,

$\int_{\mathbb{R}^{n}}g(y)dy=\text{vol}(Q_1)\text{vol}(Q_2)$

Since we initially have $\int_{\mathbb{R}^d}\chi_Edm=\text{vol}(E)=\text{vol}(Q_1)\text{vol}(Q_2)$, we see that $\chi_E$ satisfies these three properties, hence $\chi_E \in \mathcal{F}$.

Step 2 - $\mathcal{F}$ is closed under finite linear combination

We have only judged open cubes in $\mathbb{R}^d$, which are far from Lebesgue $\sigma$-algebra. To get there, we may have to check some $G_\delta$ sets, but we can’t do that since we have no idea about limits in $\mathcal{F}$. We are also looking for some simple functions, which are linear combinations of character functions.

Any finite linear combination of functions in $\mathcal{F}$ also belongs to $\mathcal{F}$.

Since there are arbitrarily many bounded open cubes in $\mathbb{R}^d$, we are able to find arbitrarily many members in $\mathcal{F}$. Say,

$f_1,f_2,\cdots,f_n\in\mathcal{F}$

Following the definition of $\mathcal{F}$, for each $1 \leq k \leq n$, we are able to find a set $A_k \subset \mathbb{R}^n$ such that $A_k$ has measure $0$ and whenever $y \notin A_k$, $f_k^y$ is integrable on $\mathbb{R}^m$. If we collect these sets altogether, namely $A=\cup A_k$, we see that in $\mathbb{R}^n-A$, all $f_k$’s has the desired property, so does their arbitrary finite linear combination (due to the linear property of Lebesgue integral). Since $A$ has measure zero as well, it turns out that the finite linear combinations belong to $\mathcal{F}$.

Step 3 - Monotone convergence in $\mathcal{F}$

Limits and convergence come into play. One may think about something like complete metric space, where Cauchy sequences converges. In this step we show that the monotone limit does exist in $\mathcal{F}$.

Suppose $f_k$ is a sequence of measurable functions in $\mathcal{F}$ so that $f_{k} \leq f_{k+1}$ or $f_k \geq f_{k+1}$ holds for all $k$, and $f_k \to f$ where $f$ is integrable on $\mathbb{R}^d$, then $f \in \mathcal{F}$.

Without loss of generality, it suffices to assume that

$0 \leq f_1 \leq f_2 \leq \cdots \leq f_n$

Since for other situations, we can take some $-f_k$ or $f_k-f_1$ or something like that. An application of monotone convergence theorem yields that

$\lim\limits_{k \to \infty}\int_{\mathbb{R}^d}f_kdm = \int_{\mathbb{R}^d}fdm$

Also, we can find some sets with measure $0$, namely $A_k$, carrying the same meaning as is in Step 2. For $A=\bigcup_{k=1}^{\infty}A_k$, we also have $m(A)=0$ in $\mathbb{R}^n$. Also, for $y \in \mathbb{R}^n - A$, $f_k^y$ is integrable on $\mathbb{R}^m$ for all $k$. Thus by monotone convergence theorem, we see that

$g_k(y)=\int_{\mathbb{R}^m}f_k^ydx \to g(y)=\int_{\mathbb{R}^m}f^ydx\quad(k\to\infty)$

Clearly we have $g_k \leq g_{k+1}$ for all $k$, and by assumption, $g_k$ is integrable. Use monotone convergence theorem again, we see that

$\int_{\mathbb{R}^n}g_kdy\to\int_{\mathbb{R}^n}gdy \quad (k\to\infty)$

Combining these two limits, we see

$\int_{\mathbb{R}^n}gdy =\int_{\mathbb{R}^d}fdm$

We’ll show that $f \in \mathcal{F}$ by checking its properties one by one.

Since $f$ is integrable, we see that $\int_{\mathbb{R}^n}g = \int_{\mathbb{R}^d}f<\infty$. Thus $g$ is integrable.
Since $g$ is integrable, we have $g(y)<\infty$ a.e. for $y$, consequently $f^y$ is integrable a.e. for $y$.
By the definition of $g$, we have
$\int_{\mathbb{R}^n}\left(\int_{\mathbb{R}^m}f(x,y)dx\right)dy=\int_{\mathbb{R}^d}fdm$

Thus $f \in \mathcal{F}$ as proved.

Step 4 - Characteristic functions of measurable sets

4.1 - Final destination

We are pretty close to simple functions now. To get rid of infinity, we are going to prove this:

If $E$ is any measurable subset in $\mathbb{R}^d$ with $m(E)<\infty$, then $\chi_E\in\mathcal{F}$.

Once it’s done, we can construct simple functions, which approximate to any integrable functions, with ease. Fortunately, with the help of the property of Lebesgue measurable sets, we are able to break “measurable subsets” into several pieces. Recall the fact that

$E \subset \mathbb{R}^d$ is Lebesgue measurable if and only if there are sets $A$ and $B\subset\mathbb{R}^d$ such that $A \subset E \subset B$, $A$ is a $F_{\sigma}$ and $B$ is a $G_{\delta}$, and $m(B-A)=0$.

Since $B-E \subset B-A$, we also have $m(B-E)=0$. Also, since $E \cup (B-E)=B$, $E \cap (B-E) = \varnothing$, we have

$\chi_B=\chi_E+\chi_{B-E}$

which is equivalent to$\chi_{E}=\chi_{B}-\chi_{B-E}$. Notice that the right hand of this equation is a finite combination of functions (Step 2 comes into play). If we prove that $\chi_{B},\chi_{B-E} \in \mathcal{F}$, then we are done.

We are going to prove that if $E$ is a $G_{\delta}$ set, or $E$ has measure $0$, then $\chi_{E}\in\mathcal{F}$. That is, we are going to generalize all Lebesgue measurable sets by proving these two key situations.

4.2 - Finite measure $G_{\delta}$ sets

In Step 1 we proved $\chi_{E} \in \mathcal{F}$ if $E$ is a bounded open cube. Now we are going to generalize this to $G_\delta$, which is a countable intersection of open sets. Also, since every open sets can be a countable union of closed cubes ($\mathbb{R}^d$ is a locally compact Hausdorff space in which every open set is $\sigma$-compact). You will see how Step 2 and Step 3 play a role in this section.

4.2.1 - Characteristic function of closed cubes

If $Q$ a closed cube in $\mathbb{R}^d$, then $\chi_{Q} \in \mathcal{F}$.

Since $Q = \text{int}(Q) \cup \partial{Q}$, where $\text{int}(Q)$ denotes its interior and $\partial{Q}$ denotes its boundary, we have

$\chi_{Q}=\chi_{\text{int}(Q)}+\chi_{\partial{Q}}$

As proved in Step 1, $\text{int}(Q) \in \mathcal{F}$. So we have to prove that $\chi_{\partial{Q}}\in\mathcal{F}$, and the conclusion follows from Step 2.

Since $m(\partial{Q})=0$, we have $\int_{\mathbb{R}^d}\chi_{\partial{Q}}dm=0$. Also, it can be seen that for almost every $y$, we have $\partial{Q}^y$ has measure $0$ in $\mathbb{R}^m$, and therefore $g(y)=\int_{\mathbb{R}^m}\chi_{\partial{Q}}dx=0$ a.e. for $y$. Consequently, $\int_{\mathbb{R}^n}gdy=0$, therefore $\chi_{\partial{Q}} \in \mathcal{F}$.

4.2.2 - Finitely many almost disjoint closed cubes

Suppose $E = \bigcup_{k=1}^{K}Q_k$, where $Q_k$ is closed cube, and $\text{int}(Q_i)\cap\text{int}(Q_j)=\varnothing$ for $i \neq j$, then $\chi_{E} \in \mathcal{F}$.

This conclusion is obvious if one notice that

$\chi_{E} = \sum_{k=1}^{K}\chi_{Q_k}.$

In 4.2.1 we showed that $\chi_{Q_k} \in \mathcal{F}$. Hence $\chi_{E} \in \mathcal{F}$ according to Step 2.

4.2.3 - Arbitrary open sets with finite measure

Since every open sets in $\mathbb{R}^d$ can be a countable union of almost disjoint cubes, we have

$E = \bigcup_{k=1}^\infty Q_k$

If we take $E_{K}=\bigcup_{k=1}^{K}Q_k$, we have $f_K=\chi_{E_K}=\sum_{k=1}^{K}\chi_{Q_k}$. And we are going to follow Step 3 to show that $\chi_{E} \in \mathcal{F}$ if $m(E)<\infty$.

Since the Lebesgue $\sigma$-measure contains all Borel sets, and $E$ is open, we see that $E$ is measurable. If $m(E)<\infty$, then we see that $\chi_{E}$ is integrable. Also we have $f_{K+1} \geq f_{K}$ for all $K$, and $f_{K} \to \chi_{E}$; hence $f_{K}$ is what we described in Step 3. Say, $\chi_{E} \in \mathcal{F}$.

4.2.4 - Arbitrary $G_\delta$ sets

If $E$ is a $G_\delta$ set of finite measure, then $\chi_{E} \in \mathcal{F}$.

By the definition of $G_\delta$ sets, we have

$E=\bigcap_{k=1}^{\infty}R_k$

where $R_k$ are open sets. Since $m(E)<\infty$, $m$ is regular, we have a open set $S_0 \supset E$ such that $m(S_0)<\infty$. Let

$S_k = S_0 \cap \left(\bigcap_{j=1}^{k}R_j\right)$

Then we have

$E= \bigcap_{k=1}^{\infty}S_k$

For $S_k$’s, observe that $S_0 \supset S_1 \supset \cdots$, we have $f_k=\chi_{S_k}$ decreases to the limit $f=\chi_E$. Following Step 3, we see that $\chi_E \in \mathcal{F}$.

4.3 - Sets with measure $0$

If $m(E)=0$, then $\chi_{E} \in \mathcal{F}$

If $E$ is a $G_{\delta}$ set, then we are done by following 4.2. If not, it comes to the issue of $m$’s being a complete measure.

Again, by the regularity of $m$, we may choose a set $G$ of $G_\delta$ such that $E \subset G$ and that $m(G)=0$. As proved, $\chi_{G} \in \mathcal{F}$. Therefore

$\int_{\mathbb{R}^m}\chi_{G}dx=0\quad\text{for a.e. }y$

Thus, the slice $G^y$ has measure $0$ a.e. for $y$, since $E^y \subset G^y$, we have $E^y$ has measure $0$ a.e. for $y$. Therefore the fact that $\chi_{E}\in\mathcal{F}$ can be verified by simple calculation.

Step 5 - All integrable functions

If $f$ is integrable, then $f \in \mathcal{F}$.

Like the construction of Lebesgue integral, $f$ has the decomposition that $f= f^+-f^-$. Thus it suffice to prove this for nonnegative $f$ (by Step 1).

There exists a sequence of integrable and nonnegative simple functions $s_k$ that monotonically converges to $f$. Since each integrable $s_k$ is a finite combination of sets with finite measure, by Step 2 and 4, $s_k \in \mathcal{F}$. By Step 3, clearly we have $f \in \mathcal{F}$.

Comments

Fubini’s theorem shows us that we might be able to evaluate multidimensional integrals in the sense of measure theory with ease (at least ‘almost everywhere’). However there are some counterexamples showing that Fubini’s theorem will fall, which will be discussed later.

This proof is a good example of how to play with the elements of Lebesgue integral. Let’s take a rewind. We want to obtain all integrable functions in $\mathcal{F}$, which however can’t be done directly. So we are looking for simple functions, which are generated by characteristic functions. And luckily we obtained a wide enough range of characteristic functions. With linear combinations and limits, we finally achieved the goal to describe all integrable functions. The properties of ‘almost everywhere’ played a critical role.

Posted 2020-03-14Updated 2023-07-08Analysis / Complex Analysis

Topological properties of the zeros of a holomorphic function

What’s going on

If for every $z_0 \in \Omega$ where $\Omega$ is a plane open set, the limit

$\lim_{z \to z_0}\frac{f(z)-f(z_0)}{z-z_0}$

exists, we say that $f$ is holomorphic (a.k.a. analytic) in $\Omega$. If $f$ is holomorphic in the whole plane, it’s called entire. The class of all holomorphic functions (denoted by $H(\Omega)$) has many interesting properties. For example it does form a ring.

But what happens if we talk about the points where $f$ is equal to $0$? Is it possible to find an entire function $g$ such that $g(z)=0$ if and only if $z$ is on the unit circle? The topological property we will discuss in this post answers this question negatively.

Zeros

Suppose $\Omega$ is a region, the set
$Z(f)=\{z_0\in\Omega:f(z_0)=0\}$
is a at most countable set without limit point, as long as $f$ is not identically equal to $0$ on $\Omega$.

Trivially, if $f(\Omega)=\{0\}$, we have $Z(f)=\Omega$. The set of unit circle is not at most countable and every point is a limit point. Hence if an entire function is equal to $0$ on the unit circle, then the function equals to $0$ on the whole plane.

Note: the connectivity of $\Omega$ is important. For example, for two disjoint open sets $\Omega_0$ and $\Omega_1$, define $f(z)=0$ on $\Omega_0$ and $f(z)=1$ on $\Omega_1$, then everything fails.

A simple application (Feat. Baire Category Theorem)

Before establishing the proof, let’s see what we can do using this result.

Suppose that $f$ is an entire function, and that in every power series
$f(z)=\sum c_n(z-a)^n$
has at leat one coefficient is $0$, then $f$ is a polynomial.

Clearly we have $n!c_n=f^{(n)}(a)$, thus for every $a \in \mathbb{C}$, we can find a postivie integer $n_0$ such that $f^{(n_0)}(a)=0$. Thus we establish the identity:

$\bigcup_{n=0}^{\infty} Z(f^{(n)})=\mathbb{C}$

Notice the fact that $f^{(n)}$ is entire. So $Z(f^{n})$ is either an at most countable set without limit point, or simply equal to $\mathbb{C}$. If there exists a number $N$ such that $Z(f^{N})=\mathbb{C}$, then naturally $Z(f^{n})=\mathbb{C}$ holds for all $n \geq N$. Whilst we see that $f$’s power series has finitely many nonzero coefficients, thus polynomial.

So the question is, is this $N$ always exist? Being an at most countable set without limit points , $Z(f^{(n)})$ has empty interior (nowhere dense). But according to Baire Category Theorem, $\mathbb{C}$ could not be a countable union of nowhere dense sets (of the first category if you say so). This forces the existence of $N$.

Proof

The proof will be finished using some basic topology techniques.

Let $A$ be the set of all limit points of $Z(f)$ in $\Omega$. The continuity of $f$ shows that $A \subset Z(f)$. We’ll show that if $A \neq \varnothing$, then $Z(f)=\Omega$.

First we claim that if $a \in A$, then $a \in \bigcap_{n \geq 0}Z(f^{(n)})$. That is, $f^{(k)}(a) = 0$ for all $k \geq 0$. Suppose this fails, then there is a smallest positive integer $m$ such that $c_m \neq 0$ for the power series on the disc $D(a;r)$:

$f(z)=\sum_{n=1}^{\infty}c_n(z-a)^{n}.$

Define

$\begin{aligned} g(z)=\begin{cases} (z-a)^{-m}f(z)\quad&(z\in\Omega-\{a\}) \\\ c_m\quad&(z=a) \end{cases} \end{aligned}$

It’s clear that $g \in H(D(a;r))$ since we have

$g(z)=\sum_{n=1}^{\infty}c_{m+n}(z-a)^{n}\quad(z\in D(a;r))$

But the continuity shows that $g(a)=0$ while $c_m \neq 0$. A contradiction.

Next fix a point $b \in \Omega$. Choose a curve (continuous mapping) defined $\gamma$ on $[0,1]$ such that $\gamma(0)=a$ and $\gamma(1)=b$. Let

$\Gamma=\{t\in[0,1]:\gamma(t)\in\bigcap_{n \geq 0}Z(f^{(n)})\}$

By hypothesis, $0 \in \Gamma$. We shall prove that $1 \in \Gamma$. Let

$s = \sup\Gamma$

There exists a sequence $\{t_n\}\subset\Gamma$ such that $t_n \to s$. The continuity of $f^{(k)}$ and $\gamma$ shows that

$f^{(k)}(\gamma(s))=0$

Hence $s \in \Gamma$. Choose a disc $D(\gamma(s);\delta)\subset\Omega$. On this disc, $f$ is represented by its power series but all coefficients are $0$. It follows that $f(z)=0$ for all $z \in D(\gamma(s);\delta)$. Further, $f^{(k)}(z)=0$ for all $z \subset D(\gamma(s);\delta)$ for all $k \geq 0$. Therefore by the continuity of $\gamma$, there exists $\varepsilon>0$ such that $\gamma(s-\varepsilon,s+\varepsilon)\subset D(\gamma(s);\delta)$, which implies that $(s-\varepsilon, s+\varepsilon)\cap[0,1]\subset\Gamma$. Since $s=\sup\Gamma$, we have $s=1$, therefore $1 \in \Gamma$.

So far we showed that $\Omega = \bigcap_{n \geq 0}Z(f^{(n)})$, which forces $Z(f)=\Omega$. This happens when $Z(f)$ contains limit points, which is equivalent to what we shall prove.

When $Z(f)$ contains no limit point, all points of $Z(f)$ are isolated points; hence in each compact subset of $\Omega$, there are at most finitely many points in $Z(f)$. Since $\Omega$ is $\sigma$-compact, $Z(f)$ is at most countable. $Z(f)$ is also called a discrete set in this situation.

Posted 2020-02-08Updated 2023-07-08

准素分解与准素标准型

特征向量的推广

Matlab(以及很多支持矩阵运算的编程语言)里，知道了一个方阵的特征值，怎么计算特征向量? 有一个比较形象的解决方案:

null(A-c.*I)

其中$A$表示矩阵，$c$表示一个特征值，$I$表示单位矩阵。(这篇博客里讨论的线性空间都是有限维线性空间$V$，讨论域$\mathbb{F}$。) 这个办法可能有点笨，但它解释了一个简单的概念。一个矩阵指定特征值的特征向量，正好是一个特殊线性变换的核。只需要注意下面的等式:

$\text{ker}(\mathscr{A}-c\mathscr{I})=\{\alpha|(\mathscr{A}-c\mathscr{I})\alpha=0\{$

但是有一点很有意思: $\lambda-c$又是一个多项式。这就引导我们通过线性变换多项式的核这一概念推广特征子空间和特征向量。

对多项式形式$f\in\mathbb{F}[\lambda]$，记

$\text{ker}(f)=\{\alpha\in{V}|f(\mathscr{A})\alpha=0\{$

为属于$f$的广义特征子空间，其中的向量称为广义特征向量。结合线性空间和多项式的若干性质，不难得到如下几个基本结论。

性质1: 设$f，g\in\mathbb{F}[\lambda]$，$m=m(\lambda)$是$\mathscr{A}$的极小多项式，则有如下结论:

$\text{ker}(1)=0$，$\text{ker}(m)=V$

若$g|f$，那么$\text{ker}(g)\subset\text{ker}(f)$

$\text{ker}(f)\cap\text{ker}(g)=\text{ker}(\text{gcd}(f,g))$

$\text{ker}(f)+\text{ker}(g)=\text{ker}(\text{lcm}(f,g))$

若$\text{gcd}(f,g)=1$，那么$\text{ker}(fg)=\text{ker}(f)\oplus\text{ker}(g)$

应该注意到，多项式的加法、乘法、最小公倍数、最大公因子等与线性空间子空间的基本性质在这里是互相融洽的。

空间准素分解——基于极小多项式的广义特征子空间分解

再说一下最常见的特征值分解。这个分解主要是通过解特征多项式的根，找到一组特征向量构成的基，得到对角形。Cayley-Hamilton定理告诉我们，特征多项式是一个零化多项式。也就是说，设$\mathscr{A}:V\to{V}$，$f$是$\mathscr{A}$的特征多项式，那么对于$\alpha\in{V}$，有$f(\mathscr{A})\alpha=0$。

满足$g(\mathscr{A})\alpha=0(\alpha\in{V})$的多项式$g(\lambda)$称为零化多项式，全体零化多项式中次数最低的首一多项式(最高次数项系数为$1$)称为极小多项式，记为$m(\lambda)$。不难验证，对于任一零化多项式$g(\lambda)$，都有$m(\lambda)|g(\lambda)$，也就是说，$g(\lambda)=m(\lambda)n(\lambda)$，其中$n(\lambda)$是一个多项式(次数可能为$0$)。而$m(\lambda)$的存在性与唯一性也可以很容易得到证明.

准素分解就是通过对极小多项式进行分解得到的，和循环分解类似，是一个很稳定的分解.

接下来我们回顾一下多项式的一个基本性质:

域$\mathbb{F}$上任一非常数多项式$f\in\mathbb{F}[X]$均可以唯一分解成不可约多项式乘积，即
$f=p_1p_2\cdots{}p_m$
其中$p_i$是不可约多项式，且是唯一的(不考虑常数和次序)。对于$i\neq{j}$，一定有$\text{gcd}(p_i,p_j)=1$.

那么我们就可以唯一分解$m$了。不妨设$m(\lambda)=\prod_{i=1}^{s}p_i(\lambda)^{r_i}$。又设$W_i=\text{ker}(p_i^{r_i})$，那么根据性质1，$V$就可以唯一分解(不考虑次序)成

$V=W_1\oplus W_2\oplus\cdots\oplus W_s.$

而且不难验证，$W_i$是$\mathscr{A}$的不变子空间。此即准素分解。总而言之，将一个线性变换的极小多项式分解成互质因子的乘积，将整个线性空间分解成这些多项式因子的核的直和，即为准素分解.

矩阵的准素标准形

现已将$V$分解成几个子空间的直和。如果分别在这几个子空间选取一组基，合成$V$的基，就能得到准素标准形。具体地说，

存在$\mathbb{F}$上的可逆矩阵$P$使得$A=P\text{diag}(A_1，\cdots，A_s)P^{-1}$.

其中$A_i$就是$\mathscr{A}$限制在$W_i$上的线性变换的矩阵。这里的基的选择是任意的，但是得到的准对角矩阵又是怎样的性质? 接下来从极小多项式和特征多项式入手。

准素分解与极小多项式

我们肯定希望这样的分解比较”干净”。根据极小多项式分解子空间，那么子空间的性质也应该和极小多项式的分解相融洽。也就是说，

$\mathscr{A}_i=\mathscr{A}\vert_{W_i}$的极小多项式为$p_i(\lambda)^{r_i}$.

下面给出证明。

因为$W_i=\text{ker}(p_i^{r_i})$，对$\alpha\in{W_i}$，一定有$p_i(\mathscr{A}_i)^{r_i}=0$，也就是说，$p_i(\lambda)^{r_i}$一定是$\mathscr{A}_i$的零化多项式。

如果$r_i=1$，考虑到$p_i$不可约，那么已经有$p_i(\lambda)^{r_i}$是$\mathscr{A}_i$的极小多项式。对于$r_i>1$，只需要证明$p_i^{r_i-1}$不是零化多项式，也就是说，证明$\text{ker}(p_i^{r_i})\neq\text{ker}(p_i^{r_i-1})$

设$q=m/p_i^{r_i}$。那么$q$和$p_i^{r_i-1}$互质。假设$W_i=\text{ker}(p_i^{r_i-1})$，那么一定有 \begin{equation} \begin{aligned} \text{ker}(p_i^{r_i-1}q)&=W_i\oplus\text{ker}(q) \\
&=W_i\oplus(W_1\oplus\cdots\oplus W_{i-1}\oplus W_{i+1}\oplus\cdots\oplus W_s )\\
&=V \end{aligned} \end{equation}

但是注意到$\text{deg}(p_i^{r_i-1}q)<\text{deg}(m)$，$\text{ker}(m)=V$，这和$m$的唯一性矛盾。命题得证。

准素分解和特征多项式

在这篇博客里我们指出，一个线性变换的特征多项式和极小多项式有着相同的不可约因子。也就是说，对于本文中提到的极小多项式，一定有正整数$d_1，d_2，\cdots，d_s$使得$f(\lambda)=\prod_{i=1}^{s}p_i(\lambda)^{d_i}$，其中$d_i\geq r_i$。有意思的是，特征多项式的分解也和准素分解的子空间相洽.

$\mathscr{A}_i$的特征多项式是$p_i(\lambda)^{d_i}$

我们讨论准对角矩阵$B=\text{diag}(A_1，\cdots，A_s)$的特征多项式，因为这和$A$的特征多项式是相等的(为什么?).

直接计算特征多项式，有

$|\lambda{I}-B|=|\lambda{I_1}-A_1|\cdots|\lambda{I_s}-A_s|$

(注意，这个等式可以看成一个Laplace展开的递归运用。)

如果设$|\lambda{I_1}-A_1|=f_i(\lambda)$，那么就将$f(\lambda)$分解成了$\textbf{s}$个互质因子

$f(\lambda)=f_1(\lambda)f_2(\lambda)\cdots{}f_s(\lambda)$

其中$f_i(\lambda)$是$A_i$的特征多项式。而$f(\lambda)=\prod_{i=1}^{s}p_i(\lambda)^{d_i}$是唯一分解，不难得到$f_i(\lambda)=p_i(\lambda)^{d_i}$，此即所求结论。

准素分解的实例，以及如何判断矩阵是否可以对角化

设矩阵

$A=\begin{bmatrix} 2&-1&2&2 \\ 1&0&-2&0 \\ 1&0&1&1 \\ -1&1&-2&-1 \end{bmatrix}$

对应线性变换$\mathscr{A}:x\mapsto{Ax}$。现在$\mathbb{R}^4$上进行准素分解.

$A$的极小多项式为$m(\lambda)=(\lambda-1)^2(\lambda^2+1)$。那么$\mathbb{R}^4$应该分解为 $\mathbb{R}^4=\text{ker}[(\lambda-1)^2]\oplus\text{ker}(\lambda^2+1)$

在这两个核中分别选取一组基，即得准素分解。这个时候如果用matlab代码解第一个子空间，就可以是

null((A-I)^2)

在$W_1$中取基$x_1=(0，0，0，1)^T$和$x_2=(2，0，1，0)^T$，在$W_2$中取基$(1，0，0，-1)^T$和$(1，1，0，-1)^T$，得到矩阵

$P=\begin{bmatrix} 0&2&1&1 \\ 0&0&0&1 \\ 0&1&0&0 \\ 1&0&-1&-1 \end{bmatrix}$

不难得到准素标准形

$B=P^{-1}AP=\begin{bmatrix} -1&-4&0&0 \\ 1&3&0&0 \\ 0&0&-1&-2 \\ 0&0&1&1 \end{bmatrix}$

准素分解的若干性质都可以在这个准对角形中得到验证。

然而，矩阵$A$是不能被对角化的(无论是在$\mathbb{R}$中还是$\mathbb{C}$中)。然而，上篇博客中探讨的矩阵

$C=\begin{bmatrix} 0&-1&2&0 \\ -1&0&-2&0 \\ 0&0&-5&0 \\ 1&1&-2&1 \end{bmatrix}$

是一定可以对角化的(不论在$\mathbb{R}$中还是在$\mathbb{C}$中)。这时为什么? 有没有什么一般化的规则?

注意$m_c(\lambda)=(\lambda-1)(\lambda+1)(\lambda+5)$，极小多项式的根互异。而$A$的极小多项式的根不是互异的(在$\mathbb{C}$中)。又想，如果一个矩阵的特征多项式的根互异，那么就一定有$n$个特征根，而属于不同特征根的特征向量又是线性无关的，那么就一定能选取一组特征向量构成的基，就一定能对角化。但是这个时候极小多项式一定和特征多项式相等。实际上

域$\mathbb{F}$上的矩阵$A$可以对角化的充分必要条件是，$m(\lambda)$可以写成这种形式

$m(\lambda)=(\lambda-\lambda_1)(\lambda-\lambda_2)\cdots(\lambda-\lambda_s)$ 其中$\lambda_1，\cdots，\lambda_s\in\mathbb{F}$，且互异.

如果有$P^{-1}AP=\text{diag}(\lambda_1{I_1}，\cdots，\lambda_s{I_s})$，那么通过简单的矩阵运算，可以发现

$(A-\lambda_1I)(A-\lambda_2I)\cdots(A-\lambda_sI)=0$

因此有

$m|(\lambda-\lambda_1)\cdots(\lambda-\lambda_s)$

同时，考虑到$\lambda_i$是$A$的特征值，因此是$m$的根，所以有

$(\lambda-\lambda_1)\cdots(\lambda-\lambda_s)|m$

此即$m=(\lambda-\lambda_1)\cdots(\lambda-\lambda_s)$。

另一方面，如果$m(\lambda)=(\lambda-\lambda_1)(\lambda-\lambda_2)\cdots(\lambda-\lambda_s)$，那么对$A$进行准素分解能得到

$P^{-1}AP=\text{diag}(A_1，\cdots，A_s)$

接下来，考虑到$A_i$的极小多项式是$\lambda-\lambda_i$，那么就有

$A_i-\lambda_i{I_i}=0$

所以$A$可以被对角化。

Posted 2020-02-03Updated 2023-07-08

线性空间的循环分解

说在前面

这篇博客中仅仅引入了循环子空间的概念，遗留了很多问题没得到解决。这些问题会在这篇博客中得到解决。一个矩阵的有理标准形虽然不能得到与特征值、特征向量有关的内容，但是在分解期间只涉及到了加法和乘法，不会出现扩充数域的现象。可以说，有理标准形能得到一类相似矩阵更朴素的信息.

循环分解

设$V$是$\mathbb{F}$上的$n$维线性空间，$\mathscr{A}$是$V$的线性变换，则 $V=\bigoplus_{i=1}^r\mathbb{F}\big[\mathscr{A}\big]\alpha_i$ 其中，$\mathbb{F} [\mathscr{A}] \alpha_i\neq\{0\}$为$\alpha_i\in{V}$生成的循环子空间.

注意，这里的直和告诉我们，$\alpha_1,\alpha_2,\cdots,\alpha_r$线性无关。

这里我们会用到这篇博客里提到的导子。其实，这一节也可以看成导子性质的补充。我们会用到下面一个基本事实：设$U \subset W \subset V$，则有

$P_{\text{cond }W,\mathscr{A}}(\lambda)|P_{\text{cond }U,\mathscr{A}}(\lambda).$

这是因为，对于任意的$\alpha \in V$，我们有$P_{\text{cond }U,\mathscr{A}}(\lambda)\alpha \in U \subset W$，因此$P_{\text{cond }U,\mathscr{A}}(\lambda) \in I(W,\mathscr{A})$.

从此还能得到的一个结论是$I(U,\mathscr{A}) \subset I(W,\mathscr{A})$. 在学习了交换代数，特别是Krull维度相关的内容，就会对这里有不一样的理解。但是这已经远远超出这篇博客的范围。

循环分解的证明

如果$V$本身是一个相对于$\mathscr{A}$的循环空间，那么证明结束。现在讨论另一种情况.

讨论导子时我们了解到，存在$\alpha_1\in{V}$使得

$P_{\text{min }\mathscr{A}}(\lambda)=P_{\text{min }\mathscr{A},\alpha_1}(\lambda)$

设$W_1=\mathbb{F}[\mathscr{A}]\alpha_1$，并为方便起见设$W_0=0$. 又可以选取$\beta_2 \in V \setminus W_1$使得

$R_2(\lambda)=P_{\text{cond }W_1,\mathscr{A}}(\lambda)=P_{\text{cond }W_1,\mathscr{A},\beta_2}(\lambda)$

我们自然希望，$R_2(\lambda)$的“地位”和$P_{\text{min }\mathscr{A},\alpha_1}(\lambda)$ 类似。对此，我们断言，可以调整$\beta_2$至某$\alpha_2$，使得$R_2(\lambda)=P_{\text{min }\mathscr{A},\alpha_2}(\lambda)$. 证明此断言后，我们可以用类似的办法继续寻找后面的多项式，最终得到结论。

首先，$R_2(\mathscr{A})\beta_2=Q_1(\mathscr{A})\alpha_1$. 我们首先感兴趣的是$Q_1$和$R_2$的关系。对此，设

$Q_1(\lambda)=R_2(\lambda)h_1(\lambda)+r_1(\lambda),\quad \deg r_1 < \deg R_2.$

则立刻得到$R_2(\mathscr{A})\beta_2=R_2(\mathscr{A})h_1(\mathscr{A})\alpha_1+r_1(\mathscr{A})\alpha_1$. 令

$\alpha_2=\beta_2-h_1(\mathscr{A})\alpha_1 \in \beta+W_1.$

则

$R_2(\mathscr{A})\alpha_2=R_2(\mathscr{A})h_1(\mathscr{A})\alpha_1+r_1(\mathscr{A})\alpha_1-R_2(\mathscr{A})h_1(\mathscr{A})\alpha_1=r_1(\mathscr{A})\alpha_1 \in W_1.$

这也说明了$R_2(\lambda)=P_{\text{cond }W_1,\mathscr{A},\beta_2}(\lambda)=P_{\text{ cond }W_1,\mathscr{A},\alpha_2 }(\lambda)$. 这是由导子的定义决定的

由于$W_0 \subset W_1$, 我们又有

$P_{\text{cond }W_1,\mathscr{A},\alpha_2}(\lambda)|P_{\text{cond }W_0,\mathscr{A},\alpha_2}(\lambda).$

注意到$P_{\text{cond }W_0,\mathscr{A},\alpha_2}(\lambda)=P_{\text{min }\mathscr{A},\alpha_2}(\lambda)$. 从而我们可以整理成

$R_2(\lambda)|P_{\text{min }\mathscr{A},\alpha_2}(\lambda).$

对于上式我们可以写成

$P_{\text{min }\mathscr{A},\alpha_2}(\lambda)=g(\lambda)R_2(\lambda).$

从此可以得到

$g(\mathscr{A})P_{\text{cond }W_0,\mathscr{A},\alpha_2}(\mathscr{A})\alpha_2=g(\mathscr{A})r_1(\mathscr{A})\alpha_1=0.$

若$r_1 \ne 0$, 则$P_{\text{min }\mathscr{A},\alpha_1}(\lambda)|[g(\lambda)r_1(\lambda)]$. 从而

$\begin{aligned} \deg g(\lambda)r_1(\lambda) &\ge \deg P_{\text{min }\mathscr{A},\alpha_1}(\lambda) \\ &= \deg P_{\text{min }\mathscr{A}}(\lambda) \\ &\ge \deg P_{\text{min }\mathscr{A},\alpha_2}(\lambda) \\ &= \deg g(\lambda) R_2(\lambda). \end{aligned}$

因此$\deg r_1(\lambda) \ge \deg R_2(\lambda)$，这与$r_1(\lambda)$的定义相矛盾，故我们一定有$r_1(\lambda)=0$. 因此$R_2(\mathscr{A})\alpha_2=0$，从而$P_{\text{min }\mathscr{A},\alpha_2}(\lambda)|R_2(\lambda)$，这说明$R_2(\lambda)=P_{\text{min }\mathscr{A},\alpha_2}(\lambda)$. 我们的断言得到了证明.

令$W_2=\mathbb{F}[\mathscr{A}]\alpha_1+\mathbb{F}[\mathscr{A}]\alpha_2$。现在证明子空间的和为直和。

对于$\alpha\in W_1\cap\mathbb{F}[\mathscr{A}]\alpha_2$，可以设 $\alpha=P(\mathscr{A})\alpha_2$ 而

$P_{\text{cond }W_1,\mathscr{A},\alpha_2}(\lambda)|P(\lambda)$

故$P(\lambda)$是一个零化多项式，因此$\alpha=0$。这说明，子空间的和为直和。因此得到了子空间的直和

$W_2=\mathbb{F}[\mathscr{A}]\alpha_1\oplus\mathbb{F}[\mathscr{A}]\alpha_2$

如果$W_2\neq{V}$，那么又取$\beta_3$使得

$R_3(\lambda)=P_{\text{cond }W_2,\mathscr{A}}(\lambda)=P_{\text{cond }W_2,\mathscr{A},\beta_3}(\lambda)$

按照上面的办法，我们可以将$\beta_3$调整为我们想要的$\alpha_3$，这里需要注意的是，我们需要写

$\begin{aligned} R_3(\mathscr{A})\beta_3 &=Q_1(\mathscr{A})\alpha_1+Q_2(\mathscr{A})\alpha_2 \\ &=(R_2(\mathscr{A})h_1(\mathscr{A})\alpha_1+R_3(\mathscr{A})h_2(\mathscr{A})\alpha_2)+r_1(\mathscr{A})\alpha_1+r_2(\mathscr{A})\alpha_2 \end{aligned}$

从而按照同样的办法证明$r_1=r_2=0$.

调整至$\alpha_1$后，我们得到子空间

$W_3=\mathbb{F}[\mathscr{A}]\alpha_1\oplus\mathbb{F}[\mathscr{A}]\alpha_2\oplus\mathbb{F}[\mathscr{A}]\alpha_3.$

如此继续下去，如果$W_k\neq{V}$，则可继续选择$\alpha_{k+1}$等，直到得到

$V=\mathbb{F}[\mathscr{A}]\alpha_1\oplus\mathbb{F}[\mathscr{A}]\alpha_2\oplus\cdots\oplus\mathbb{F}[\mathscr{A}]\alpha_r$

此即循环分解。导子的性质保证这个向量的选择可以进行下去。$V$是有限维的，所以这个过程总会结束。

不变因子与极小多项式

这一节我们来探讨一下上一节中得到的各个若干多项式。设$\alpha_k$对应的多项式为$m_k(\lambda)$。那么，根据导子的唯一性，这些多项式都是唯一的(被$\mathscr{A}$唯一决定)，而且已经得到了$m_k|m_{k-1}$这一关系。也就是说，对于指定的有限维度线性空间$V$和线性变换$\mathscr{A}$，(在顺序上)唯一决定了一组多项式$m_1,m_2,\cdots,m_r$。这组多项式称为$\mathscr{A}$的不变因子。

上面的证明中已经解释，$m_k$是$\alpha_k$的最小零化子。那么对于整个循环子空间$\mathbb{F}[\mathscr{A}]\alpha_k$又是怎样? 对于$\alpha\in\mathbb{F}[\mathscr{A}]\alpha_k$，可以设

$\alpha = P(\mathscr{A})\alpha_k$

那么

$m_k(\mathscr{A})\alpha=m_k(\mathscr{A})P(\mathscr{A})\alpha_k=0$

也就是说，$m_k$也是$\mathscr{A}$限制在$\mathbb{F}[\mathscr{A}]\alpha_k$的极小多项式(次数不能再低，否则就不再是$\alpha_k$的零化子).

有理标准形与特征多项式

对域$\mathbb{F}$上任一方阵$A$，存在域$\mathbb{F}$上的可逆方阵$P$使得 $B=P^{-1}AP=\text{diag}(C(m_1),\cdots,C(m_r))$ 其中，$C(m_k)$是$m_k$的友阵，$m_k$是由$A$决定的线性变换$x\mapsto{Ax}$决定的线性变换的不变因子.

方阵$B$即为方阵$A$的有理标准形.

在这篇博客里我们指出，对于循环子空间$W=\mathbb{F}[\mathscr{A}]\alpha$，$\mathscr{A}$限制在$W$上的矩阵$A_W=C(m)$。这时我们选取了一组$W$上的循环基。现在，我们已经把一个线性空间分解成若干个循环子空间的直和，如果我们选择每个子空间的一组循环基，即得到有理标准形.

设$k_i=\text{deg }m_i$，则在对于基

$\alpha_1,\mathscr{A}\alpha_1,\cdots,\mathscr{A}^{k_1-1}\alpha_1,\cdots,\alpha_r,\mathscr{A}\alpha_r,\cdots,\mathscr{A}^{k_r-1}\alpha_r$

而言，$\mathscr{A}$的方阵表示即为

$A=\text{diag}(C(m_1),\cdots,C(m_r))$

这里的$P$即为由这组基构成的矩阵。计算特征多项式不难发现有

$f=m_1m_2\cdots m_r$

一些补充

根据$f$和不变因子的关系，不难得到三条结论:

$V$是循环空间当且仅当$f=m$。也就是说，极小多项式和特征多项式相等。
$m|f$且$m$和$f$有相同的不可约因子(尽管次数可能不同)。也就是说，对于$f$可以唯一分解成不可约因子 $f=p_1^{d_1}\cdots p_s^{d_s}$ 那么一定有正整数$r_1,\cdots,r_s$使得
$m=p_1^{r_1}\cdots p_s^{r_s}$
其中$0<r_i\leq d_i$。
循环分解也成了Cayley-Hamilton的证明。因为$m|f$，所以特征多项式$f$一定是零化多项式。

循环分解的实例

求矩阵
$A=\begin{pmatrix} 0&-1&2&0 \\ -1&0&-2&0 \\ 0&0&-5&0 \\ 1&1&-2&1 \end{pmatrix}$
的有理标准形，对$\mathbb{R}^4$进行循环分解

这是线性变换 $\mathscr{A}:x\mapsto{Ax}$在$\varepsilon_1,\varepsilon_2,\varepsilon_3,\varepsilon_4$下的方阵表示.

可以算出$f$的特征多项式和极小多项式为

$f(\lambda)=(\lambda-1)^2(\lambda+1)(\lambda+5) \\ m(\lambda)=(\lambda-1)(\lambda+1)(\lambda+5)$

这时其实已经完成了分解。因为不变因子已经被确定，也就是说，

$m_1=\lambda^3+5\lambda^2-\lambda-5 \\ m_2=\lambda-1$

那么有理标准形就有

$B=\left(\begin{array}{c c c|c} 0&0&\textbf{5}&0 \\ 1&0&\textbf{1}&0 \\ 0&1&\textbf{-5}&0 \\ \hline 0&0&0&\textbf{1} \end{array}\right)$

接下来我们要求可逆矩阵$P$。首先讨论子空间$\mathbb{R}[\mathscr{A}]\alpha_2$。对于$m_2=\lambda-1$，只需要取$A$属于特征值$1$的特征向量即可。不妨直接取$\alpha_2=(0,0,0,1)^T$.

对于另一个循环空间的生成元$\alpha_1$，只需要保证$\alpha_1,\mathscr{A}\alpha_1,\mathscr{A}^2\alpha_1,\alpha_2$线性无关(秩为$4$)，从而构成循环基，不妨取$\alpha_1=(1,1,1,1)^T$。此时$A\alpha_1=(1,-3,-5,1)^T$，$A^2\alpha_1=(-7,9,25,9)^T$。此时就得到矩阵

$P=\begin{pmatrix} 1&1&-7&0 \\ 1&-3&9&0 \\ 1&-5&25&0 \\ 1&1&9&1 \end{pmatrix}$

且

$\mathbb{R}^4=\mathbb{R}[\mathscr{A}]{}(1,1,1,1)^T\oplus\mathbb{R}[\mathscr{A}] {}(0,0,0,1)^T$

最后解释一下选择$\alpha_1$和$\alpha_2$的合理性(虽然完全可以验证$P^{-1}AP=B$)。已经知道,$m_k$一定是$\alpha_k$的最小零化子。对于$\alpha_2$，考虑 $ (A-I)\alpha_2=0 $ 即可。这刚好要求$\alpha_2$是$\mathscr{A}$关于特征值$1$下的特征向量。对于$\alpha_1$，因为$m=\lambda^3+5\lambda^2-\lambda-5$是最小零化子，所以$\alpha_1$，$\mathscr{A}\alpha_1$，$\mathscr{A}^2\alpha_1$必须线性无关，否则就说明此时$m_1$不是$\alpha_1$的最小零化子，这与$\mathbb{R}[\mathscr{A}]\alpha_1$的性质矛盾.

Posted 2020-01-28Updated 2023-07-08

线性变换不变子空间的导子及其性质

问题的引入

在研究循环子空间的时候，我们是从线性变换出发，对一个指定的向量进行反复作用，这恰好和多项式吻合。这是从线性变换和唯一指定向量的角度出发的。但是有的时候不能从向量出发，因为选取一个合适的向量不总是可行的，我们也不一定需要研究全体多项式。可能更需要研究一个特定的多项式。这就要求我们在另一个角度刻画不变子空间。

循环空间是某个线性变换的最小不变子空间。那么可不可以研究某个子空间、某个指定线性变换的保证线性空间不变性的多项式？

设$W\subset{V}$为$V$的子空间，如果对任意$\alpha\in{W}$都有$\mathscr{A}\alpha\in{W}$，那么$W$就是$\mathscr{A}$的不变子空间。设多项式$g(\lambda)=\lambda$，那么就有$g(\mathscr{A})\alpha\in{W}$。如果$g(\lambda)$更复杂一些会怎样？有没有什么特殊的例子和性质？对于所有$\alpha\in{V}$又是怎样？这就是这篇博客要关注的问题。线性空间的循环分解也要用到这一个工具。

导子多项式(Conductor)

设$V$是定义在数域$\mathbb{F}$上的有限维线性空间，定义线性变换$\mathscr{A}:V\to{V}$，设$W$为$\mathscr{A}$的线性子空间。取$\textbf{v}\in{V}$。那么将$\textbf{v}$映入$W$的导子多项式$P_{\text{cond }W,\mathscr{A},\textbf{v}}(\lambda)$是指满足$P(\mathscr{A})\textbf{v}\in{W}$的次数最小的首项系数为$1$的多项式(首一多项式，monic polynomial); 多项式$P_{\text{cond }W,\mathscr{A}}(\lambda)$是指将全体$\textbf{v}\in{V}$都有$P(\mathscr{A})\in{W}$的次数最小的首项系数为$1$的多项式。

分别记所有满足$P(\mathscr{A})\textbf{v}\in{W}$的全体多项式为$I(W,\mathscr{A},\textbf{v})$（对于指定的$\mathbf{v}$）和$I(W,\mathscr{A})$（对于所有向量），称为引导多项式。这里$I$的意思是Ideal。因为这是$\mathbb{F}[\lambda]$的一个理想。

这可以看作最小多项式的推广。实际上，对于指定向量和全体向量空间的多项式，只需要将定义中的$W$换成$\{\textbf{0}\}$即可，记为$P_{\text{min }\mathscr{A},\textbf{v}}$和$P_{\text{min }\mathscr{A}}$。

另外可以考虑$V=\mathbb{R}^3$中的一个例子。设$W=\mathbb{R}^2$，定义矩阵

$A=\begin{pmatrix} 1&0&0 \\ 0&1&0 \\ 0&0&0 \end{pmatrix}$

和线性变换$\mathscr{A}:x\mapsto{Ax}$。那么显然$W$是$\mathscr{A}$的不变子空间，且$P(\lambda)=\lambda=P_{\text{cond }W,\mathscr{A}}(\lambda)$。它将$V$中全体向量都映入$W$。自然它也是$(0,0,1)^T$的引导多项式。

如果$W\neq{V}$，对于$\textbf{v}\in{V-W}$，到底需要怎样的”代价”才能利用$\mathscr{A}$将$\textbf{v}$“导入”$W$？这”代价”就是导子。

两种导子多项式的存在性、唯一性

好在这两种多项式是稳定存在的，这也为我们探讨后面的性质做了保证。

首先是$P_{\text{cond }W,\mathscr{A},\textbf{v}}(\lambda)$的存在性。和循环子空间的零化子类似，通过讨论维数和线性无关性解决。假设导子不存在，设$\textbf{v}\in{V-W}$。则有$I(W,\mathscr{A},\textbf{v})=\varnothing$。设$V$的维度为$n$，那么向量$\textbf{v},\mathscr{A}\textbf{v},\cdots,\mathscr{A}^n\textbf{v}$必定线性相关。也就是说有 $\sum_{k=0}^{n}c_k\mathscr{A}^k\textbf{v}=\textbf{0}\in{W}$ 其中$c_k\in\mathbb{F}$不全为$0$。那么$P(\lambda)=\sum_{k=0}^{n}c_k\lambda^k$就是一个非$0$多项式使得$P(\mathscr{A})\textbf{v}\in{W}$的多项式。这和假设矛盾。存在性得到证明。

对线性空间的导子而言，考虑$V$的一组基$\{\textbf{e}_i\}(i=1,\cdots,n)$。那么只需要考虑

$P(\lambda)=\prod_{i=1}^{n}P_{\text{cond }W,\mathscr{A},\textbf{e}_i}(\lambda)$

不妨验证一下，对任意$\textbf{v}\in{V}$，都有$P(\mathscr{A})\textbf{v}\in{W}$。因此$I(W,\mathscr{A})\neq\varnothing$。

两种导子的唯一性讨论是类似的。设有最高次数相同的首一多项式$P(\lambda),Q(\lambda)\in{I(W,\mathscr{A})}$，那么$(P-Q)(\mathscr{A})\textbf{v}\in{W}$。而$P-Q$的次数更低，除以第一项次数又变成了首一多项式。因此唯一性得到了保证。$I(W,\mathscr{A},\textbf{v})$也可以类似进行讨论。

导子的性质

这一节中$V,W,\mathscr{A},\textbf{v}$和上一节相同。

定理0: 对任意$Q(\lambda)\in{I(W,\mathscr{A})}$，都有 $P_{\text{cond }W,\mathscr{A}}(\lambda)| Q(\lambda)。$ 类似结果在$I(W,\mathscr{A},\textbf{v})$中也成立。

这说明，导子是对应引导多项式的最小公因式。也可以用环论的语言理解。注意$\mathbb{F}[\lambda]$为PID，所有理想都存在唯一生成元。这个定理就是在说，$P_{\text{cond }W,\mathscr{A}}$为生成元

这个结论看上去是显然的，证明也是很简单的。假设 $P_{\text{cond }W,\mathscr{A}}(\lambda) \nmid Q(\lambda)。$

那么有非零多项式$R(\lambda)$使得

$Q(\lambda)=S(\lambda)P_{\text{cond }W,\mathscr{A}}(\lambda)+R(\lambda)$

其中$R(\lambda)\in{I(W,\mathscr{A})}$，且最高次数小于$P_{\text{cond }W,\mathscr{A}}(\lambda)$。然而$P_{\text{cond }W,\mathscr{A}}(\lambda)$的次数是最低的。这得到了一个矛盾。$I(W,\mathscr{A},\textbf{v})$和$P_{\text{cond }W,\mathscr{A},\textbf{v}}(\lambda)$也用相同的办法进行证明。

定理1: 存在向量$\textbf{v}\in{V}$使得 $P_{\text{cond }W,\mathscr{A}}(\lambda)=P_{\text{cond }W,\mathscr{A},\textbf{v}}(\lambda)$

这个结论将两种导子充分地联系起来。显然有$P_{\text{cond }W,\mathscr{A}}(\lambda)\in{I(W,\mathscr{A},\textbf{v})}$，但是反过来又是怎样？这就需要证明这个结论。这个结论看似复杂，但是从多项式的性质出发，就不复杂了。这个定理的证明可以划分成证明这两个引理:

（引理1）设两多项式满足$\text{gcd}(P(\lambda),Q(\lambda))=1$，对于向量$\textbf{u},\textbf{v}\in{V}$，有$P_{\text{cond }W,\mathscr{A},\textbf{u}}(\lambda)=P(\lambda)$和 $P_{\text{cond }W,\mathscr{A},\textbf{v}}(\lambda)=Q(\lambda)$，那么
$P_{\text{cond }W,\mathscr{A},\textbf{u+v}}(\lambda)=P(\lambda)Q(\lambda)$

这可以看成导子多项式关于生成向量的加法和多项式乘法的统一。还可以注意到，$P$和$Q$的互素也对应了$\textbf{u}$和$\textbf{v}$的线性无关性。

首先有

$P(\mathscr{A})Q(\mathscr{A})(\textbf{u+v})=Q(\mathscr{A})P(\mathscr{A})\textbf{u}+P(\mathscr{A})Q(\mathscr{A})\textbf{v}\in{W}$

根据定理0，一定有 $P_{\text{cond }W,\mathscr{A},\textbf{u+v}}(\lambda)|P(\lambda)Q(\lambda)$ 此时就有$P_{\text{cond }W,\mathscr{A},\textbf{u+v}}(\lambda)R(\lambda)=P(\lambda)Q(\lambda)$。现在需要证明，$R(\lambda)=1$。

不失一般性，不妨设$S(\lambda)=\frac{P(\lambda)}{R(\lambda)}$。那么经过简单的运算就有$P_{\text{cond }W,\mathscr{A},\textbf{u+v}}(\lambda)=S(\lambda)Q(\lambda)$。现在假设$\text{deg}R>0$，那么$\text{deg}S<\text{deg}P$。那么不难得到$Q(\mathscr{A})S(\mathscr{A})\textbf{u}\notin{W}$和$S(\mathscr{A})Q(\mathscr{A})\textbf{u}\in{W}$同时成立。但是

$P_{\text{cond }W,\mathscr{A},\textbf{u+v}}(\mathscr{A})(\textbf{u}+\textbf{v})=S(\mathscr{A})Q(\mathscr{A})\textbf{v}+Q(\mathscr{A})S(\mathscr{A})\textbf{u}\in{W}$

这得到了矛盾。因此$R=1$。引理1得证。

（引理2）设$P(\lambda)$是一个在$\mathbb{F}$上不可约的多项式，设有正整数$m$使得$(P(\lambda))^m|P_{\text{cond }W,\mathscr{A}(\lambda)}$，那么存在$\textbf{v}\in{V}$使得$P_{\text{cond },W,\mathscr{A},\textbf{v}}(\lambda)=(P(\lambda))^m$

由条件可以设

$P_{\text{cond }W,\mathscr{A}}(\lambda)=(P(\lambda))^mQ(\lambda)$

注意到一定存在$\textbf{v}_0$使得

$(P(\mathscr{A}))^{m-1}Q(\mathscr{A})\textbf{v}_0\notin{W}$

否则，有$(P(\lambda))^{m-1}Q(\lambda)\in{I(W,\mathscr{A})}$，而次数低于$P_{\text{cond }W,\mathscr{A}}(\lambda)$。已知$P_{\text{cond }W,\mathscr{A}}(\lambda)$是$I(W,\mathscr{A})$里次数最低的，所以这得到了矛盾.

另一方面，$(P(\mathscr{A}))^mQ(\mathscr{A})\textbf{v}_0=P_{\text{cond }W,\mathscr{A}}(\mathscr{A})\in{W}$。令$\textbf{u}=Q(\mathscr{A})\textbf{v}_0$，那么就有 $(P(\mathscr{A}))^m\textbf{u}\in{W}$

已经验证，这里的$m$不能再低，同时也不难验证$P^m$为首一多项式。所以有

$P_{\text{cond },W,\mathscr{A},\textbf{u}}(\lambda)=(P(\lambda))^m$

引理2得证

最后来考虑定理1。首先，域$\mathbb{F}$上任意非常数多项式$f$都可以唯一分解为不可约因子，即

$f=\prod_{i=1}^{m}p_i$

其中$p_i$是不可约的，且这种乘积是唯一的(不考虑常数项和次序).

那么$P_{\text{cond },W,\mathscr{A}}(\lambda)$就可以进行分解(如果导子是常数，结论是显然的)，合并相同因子之后有

$P_{\text{cond },W,\mathscr{A}}(\lambda)=(P_1(\lambda))^{r_1}(P_2(\lambda))^{r_2}\cdots(P_n(\lambda))^{r_n}$

那么根据引理2就有$\textbf{v}_1,\textbf{v}_2,\cdots,\textbf{v}_n\in{V}$使得

$P_{\text{cond }W,\mathscr{A}}(\lambda)=\prod_{i=1}^{n}P_{\text{cond }W,\mathscr{A},\textbf{v}_i}(\lambda)$

又根据引理1，令$\textbf{v}_1+\textbf{v}_2+\cdots+\textbf{v}_n=\textbf{v}$，就能得到

$P_{\text{cond }W,\mathscr{A}}(\lambda)=P_{\text{cond }W,\mathscr{A},\textbf{v}}(\lambda)$

得证。

Posted 2020-01-22Updated 2023-07-08

线性空间的循环子空间与简单应用

矩阵的有理标准形

矩阵对角化分解虽然会得到最简单的形式，但是条件是很苛刻的。它要求被分解的$n$阶矩阵有$n$个线性不相关的特征向量，具体地说，一个矩阵可对角化当且仅当对应特征值的几何重数等于对应的代数重数。还有可能需要扩张数域，例如实矩阵解出复根，考虑复特征向量。所以说这种分解不稳定。而且它并不能直接反映一个矩阵的所有性质。幸运的是，除了对角化分解之外还有一些不同的矩阵分解办法。虽然形式上不如对角矩阵一样简洁，但是却能体现不同的性质——例如对特征多项式的分解。而且，这些分解办法对任何矩阵都是可行的。

首先要介绍的是有理标准形(rational form)。这里的“有理”和$\mathbb{Q}$没有关系。所谓“rational”是因为，不同于其他几种分解方式，这种分解不需要扩张数域。具体来说，对域$\mathbb{F}$上任意方阵$A$，存在$\mathbb{F}$上的可逆矩阵$P$使得 $B=P^{-1}AP=\text{diag}(C(m_1),\cdots,C(m_r))$

其中，$C(m_i)$是$m_i\in\mathbb{F}(\lambda)$的友阵(companion matrix)，是由$\mathbb{F}$上的多项式$m_i$唯一决定的。这些多项式也是由$A$唯一决定的，叫做矩阵的不变因子。

这些多项式也有特殊的性质(这也是这种分解的意义所在)。首先，$m_1$是$A$的最小多项式，$A$的特征多项式可以表示为$f(\lambda)=m_1\cdots{m_r}$。在顺序上，应该有$m_i|m_{i-1}(i=2,3,\cdots，r)$。也就是说次数大的在前面小的在后面。

一个多项式的友阵是指这样的一个方阵：

$C(c_0+c_1t+\cdots+c_{n-1}t^{n-1}+t^n)=\begin{pmatrix} 0&0&\cdots&0&-c_0 \\ 1&0&\cdots&0&-c_1 \\ 0&1&\cdots&0&-c_2 \\ \vdots&\vdots&\ddots&\vdots&\vdots \\ 0&0&\cdots&1&-c_{n-1} \end{pmatrix}$

不难发现，一个多项式的方阵是被这一个方阵唯一决定的。讨论这个矩阵的特征多项式可以发现特征多项式$f$即为此多项式。

这篇博客里会先介绍一下这种分解的背景和一些相对粗略简单的分解实例。矩阵有理标准形的完整论述会放到接下来的博客中。

循环空间

需要指出，这样一个标准型涉及到线性空间的循环分解，而循环分解又涉及到循环空间这一概念。“循环”这一概念体现在被同一个线性变换反复作用。循环空间指的是对某一线性变换，某一向量的最小不变子空间。

设$V$是数域$\mathbb{F}$上的线性空间，对于固定的线性变换$\mathscr{A}$，和一个指定的向量$\alpha \in V$，应该有 $\mathscr{A}\alpha, \mathscr{A}^2\alpha, \cdots\in{V}$

考虑到线性空间对向量的加法和数乘的封闭性，$\mathbb{F}$上的全体多项式就恰好固定了这些向量。也就是说，对于最小不变子空间$W$，应该包含$\mathbb{F}$上全体多项式$\mathbb{F}[\lambda]$的像 $\mathbb{F}[{\lambda}]{\alpha}=\mathbb{F}[\mathscr{A}]\alpha=\{g(\mathscr{A})\alpha :g(\lambda)\in\mathbb{F}[\lambda]\}$

另一方面，$\mathbb{F}[\lambda]\alpha$已经是不变子空间，因此有$W=\mathbb{F}[\lambda]\alpha$。$W$称为$\alpha$生成$\mathscr{A}$的循环子空间，如果有$W=V$，那么$V$称为循环空间，$\alpha$称为$V$的循环向量。

零化子

在接触循环分解之前，先需要考察一下循环空间的“大小”。一些线性空间的基本性质要在这里搞明白。这样一个线性空间的维数是多少? 怎样能找到这个线性空间的零向量? 线性空间的基怎么找到? 既然是不变子空间，那么限制线性变换的方阵表示又是怎样? 这些问题都需要通过分析零化子来解决。

如果对多项式$g(\lambda)$有 $g(\lambda)\alpha=g(\mathscr{A})\alpha=0$ 那么$g(\lambda)$称为$\alpha$的零化子(aka 零化多项式)。

$\alpha$的次数最低的首一零化多项式称为最小零化子。显然，零化子是最小零化子的倍。注意这里的零化子是相对于$\alpha$，这和线性变换的零化子是两回事。接下来会通过分析线性相关性找出最小零化子。

如何找到最小零化子

选取循环向量$\alpha\in{W}$，再考察向量$\mathscr{A}\alpha$是否和$\alpha$线性相关，如果否，也将$\mathscr{A}\alpha$选取，再按照同样的办法选取$\mathscr{A}^2\alpha,\cdots,\mathscr{A}^{k-1}\alpha$，而$\alpha,\mathscr{A}\alpha,\mathscr{A}^2\alpha,\cdots,\mathscr{A}^{k-1}\alpha,\mathscr{A}^k\alpha$线性相关。根据线性空间的性质，$\mathscr{A}^k\alpha$可以被前面$(k-1)$个向量线性表示出来。也就是说有$c_0,\cdots,c_{k-1}\in\mathbb{F}$使得 $\mathscr{A}^{k}\alpha+c_{k-1}\mathscr{A}^{k-1}\alpha+\cdots+c_1\mathscr{A}\alpha+c_0\alpha=0$

如果令$m_{\alpha}(\lambda)=\lambda^k+c_{k-1}\lambda^{k-1}+\cdots+c_0$，那么就有$m_\alpha(\mathscr{A})\alpha=0$。也就是说，$m_\alpha$是$\alpha$的零化子。接下来需要验证这是次数最低的首一多项式。

设$g(\lambda)$为$\alpha$的零化多项式。那么$g(\lambda)$可以写成 $g(\lambda)=m_{\alpha}(\lambda)q(\lambda)+r(\lambda)$

其中，$r=0$或$\text{deg}r<k$。$r=0$时，显然$g(\lambda)$是$m_{\alpha}(\lambda)$的倍。而$\text{deg}r<\text{deg}g$时不然。因此只需要证明第二种情况是不存在的即可。注意到 $0=g(\mathscr{A})\alpha=m_{\alpha}(\mathscr{A})q(\mathscr{A})\alpha+r(\mathscr{A})\alpha=r(\mathscr{A})\alpha$

如果有$r\neq{0}$，则$\text{deg}r<k$，这和$\alpha,\mathscr{A}\alpha,\cdots,\mathscr{A}^{k-1}\alpha$线性无关矛盾。因此，$m_{\alpha}(\lambda)$即为所求最小零化子。

循环空间的维度、基

在求最小零化子的时候，我们找到了$k$个线性无关的向量。然后第$k+1$个可以用这$k$个向量表示。这里的$k$应该和这个循环空间有着特殊的关系。注意到这$k$个向量能表示$\mathscr{A}^{k}\alpha$。那么能不能表示$W$中全部向量? 如果是这样的话，这$k$个向量就是这个循环空间的基，$k$就是维度(这可以联想到抽象代数中的循环空间的阶)。

已知 $\mathscr{A}^{k}\alpha=-(c_{k-1}\mathscr{A}^{k-1}\alpha+\cdots+c_1\mathscr{A}\alpha+c_0\alpha)$

两边分别用$\mathscr{A}$作用，那么左边是$\mathscr{A}^{k+1}\alpha$，右边是$\alpha,\mathscr{A}\alpha,\cdots,\mathscr{A}^{k-1}$的线性组合(注意，等式右边$\mathscr{A}$的次数分别为$k，k-1,\cdots，1$，而$\mathcal{A}^{k}$已经可以被线性表示)。递推下去可知，对于所有$n>=0$，$A^{n}$都可以被$\alpha,\mathscr{A}\alpha,\mathscr{A}^{k-1}\alpha$线性表示。也就是说，$\alpha,\mathscr{A}\alpha,\mathscr{A}^{k-1}\alpha$是$W$的一组基，$W$的维度为$k$，也就是$\alpha$的最小零化子的最高次。

此外不难验证，$\mathscr{A}$在$W$上的限制对应的矩阵$A_W$即为$C(m_\alpha)$。

循环空间的实例

不妨讨论一下特征向量。设非零向量$\alpha$是$\mathscr{A}$的特征向量。那么由$\alpha$生成的循环子空间的维度只能是$1$。

首先$\alpha$和自己线性不相关，而$\mathscr{A}\alpha=\lambda\alpha$，故与$\alpha$已经线性相关。此时也得到极小零化子为$\mathscr{A}\alpha-\lambda\alpha$。而循环子空间也已经被确定，是$\alpha$所在的”直线”$\mathbb{F}\alpha$。假设$\alpha=(1,0)^T$，$V=\mathbb{R}^{(2)}$，那么这个循环空间就是$y$轴。

为了体现循环空间对数域的”稳定性”，现在讨论线性空间$V=\mathbb{Q}^3$上的线性变换。也就是说，我们会尝试对有理数三阶方阵进行分解。

设有矩阵 $A=\begin{pmatrix} 1&0&0 \\ 0&1&1 \\ 1&1&0 \end{pmatrix}$

和定义在$V$上的线性变换$\mathscr{A}(x)=Ax$。设$V$的自然基为$\varepsilon_1,\varepsilon_2,\varepsilon_3$。现在讨论一下由$\varepsilon_1$生成的循环空间$W_1$。显然$W_1$的维数不超过$3$。循环对$\varepsilon_1$作用$\mathscr{A}$，能得到

$\mathscr{A}\varepsilon_1=(1,0,1)^T,\mathscr{A}^2\varepsilon_1=(1,1,1),\mathscr{A}^3\varepsilon_1=(1,2,2)$

可以验证，$\varepsilon_1,\mathscr{A}\varepsilon_1,\mathscr{A}^2\varepsilon_1$线性不相关，又有$\mathscr{A}^3\varepsilon_1=-\varepsilon_1+2\mathscr{A}^2\varepsilon_1$可知，最小零化子为 $m_1(\lambda)=\lambda^3-2\lambda^2+1$ 注意到系数全是整数。从而得到在$\varepsilon_1,\mathscr{A}\varepsilon_1,\mathscr{A}^2\varepsilon_1$下$\mathscr{A}$这个线性变换的方阵表示为$m_1(\lambda)$的友阵

$B=\begin{pmatrix} 0&0&-1 \\ 1&0&0 \\ 0&1&2 \end{pmatrix}$

另一方面，自然基到$\varepsilon_1,\mathscr{A}\varepsilon_1,\mathscr{A}^2\varepsilon_1$这组基的过渡方阵又有

$P=\begin{pmatrix} 1&1&1 \\ 0&0&1 \\ 0&1&1 \end{pmatrix}$

不难验证，$B=P^{-1}AP$。整个运算推导过程没有脱离$\mathbb{Q}$。这是理所当然的。整个过程中只有域$\mathbb{Q}$上的加法和乘法运算。但是特征值分解不一定能这么”稳定”。矩阵$A$本身就是一个很有意思的反例：特征值中出现了无理数。

实际上，矩阵$B$就是矩阵$A$的有理标准形，而且特征多项式$f=m_1$。显然$m_1$被$A$唯一决定。这是一个比较特殊的例子(不妨试试用$\varepsilon_2$进行一下循环分解)。接下来的博客会讨论一下更普遍的结论和性质。

Posted 2020-01-17Updated 2023-07-08Legacy content

抽象Lebesgue积分的构建

从一个”难题”入手

设$\{f_n\}$是一个定义在$[0，1]$上的连续函数列，且$0\leq f_n\leq 1$. $n\to\infty$时，对任意$x\in[0，1]$有$f_n(x)\to{0}$. 求证 $\lim_{n\to\infty}\int_{0}^{1}f_n(x)dx=0$

在Riemann积分下这个命题的证明确实很头疼。虽然说有$\lim\limits_{n\to\infty}f_n(x)=0$，但是这里并不是一致收敛，所以不能直接将极限号和积分号交换。但是也没有别的信息，只能从连续性入手。

Riemann积分在讨论函数列的时候往往需要考虑是否一致收敛，这往往很麻烦。19世纪末，很多数学家都主张，高等数学课中的Riemann积分(这也是每个人都要学的)应该被新的一种更普遍、更灵活、更方便解决极限问题的积分替代。那个时期很多的数学家都进行了尝试，Lebesgue的办法可以说是集大成者。粗略地说，Riemann积分是由下面这个和式逼近的: $\sum_{i=1}^{n}f(t_i)\Delta{x_i}$

也就是所谓面积的极限。$f(t_i)$是矩形的高，$\Delta{x_i}$是矩形的宽。当然还有Darboux上和、Darboux下和等等概念，然后讨论两者的差，在$\varepsilon-N$语言下严谨地逼近，这就是所谓可积性。讨论一个指定函数的积分，函数已经确定了，但这里的$\Delta{x_i}$还可以做文章。对于$\Delta{x_i}=x_i-x_{i-1}$，它代表了区间$[x_{i-1}，x_i]$的长度，而区间是一个集合。能不能通过讨论集合的”大小”来解决积分问题呢?

这篇博客里讨论的集合是任意的(这也是博客标题里”抽象”所指)。既可以考虑经典的Euclidean空间，又可以考虑概率论中的事件空间，或者是其他。它们都可以统一到Lebesgue积分中，而Riemann积分在很多时候也可以通过Lebesgue测度”$m$“进行计算(粗略地说，$m(E)$就是$E$的“体积”)。另外，最开始的这个题在Lebesgue积分下也是很简单的。

$\pi$-系统、$\lambda$-系统、$\sigma$-代数

函数的值域

如果要给一个集合定义一个”大小”，也就是对应一个值，那么需要定义一个函数，这个函数建立起集合到实数或者复数的映射。例如定义$m([a，b])=b-a$，这就是集合$[a，b]$的长度，$b-a$就是一个实数。这个函数的值域可以是$\mathbb{R}$或者$\mathbb{R}^2$的一个子集，而定义域是怎样的呢? 首先，它应该是一个由集合构成的集合。比如一个集合$A$的幂集$\mathcal{P}(A)$。但是一定是幂集吗? 可不可以小一点或者大一点? 它又能不能保证一些运算的合理性? 这就是这里需要解决的问题。接下来，我们会一步步把这个”定义域”所需要满足的条件逐步勾勒出来。这也是Lebesgue积分的”主战场”。

一个由集合$X$的子集构成的集合$\mathcal{P}$在满足如下条件时被称为$\pi$-系统: 如果$A\in{P}$且$B\in\mathcal{P}$，那么$A\cap{B}\in\mathcal{P}$.

$\pi$-系统保证了这个集合族在有限次交运算的封闭性。一个最简单的$\pi$-系统是$\mathbb{R}$中所有闭区间(注意把$\varnothing$也算上)构成的集合。两个闭区间的并集必定是闭区间或$\varnothing$，而$\varnothing$和闭区间的并集是$\varnothing$。这就是一个$\pi$-系统。但是不一定保证无穷次运算的封闭，也不保证并集的封闭。

概率论中的样本空间也是一个$\pi$-系统。两个事件的交也在一个样本空间中。这自然是合理的。但是只是$\pi$-系统肯定不够。就比如说，一个事件的否定该怎么办? 无穷个事件(这可能涉及到概率论中的收敛问题)又该怎么办? 如果积分是定义在$\pi$-系统上也不行，不能只考虑全体闭区间。接下来会引入另一个系统。

一个由集合$X$的子集构成的集合$\mathcal{L}$在满足如下条件时被称为$\lambda$-系统:

$X\in\mathcal{L}$.

若$A, B\in\mathcal{L}$，且$B\subset{A}$，那么$A-B\in\mathcal{L}$.

若$A_n\in\mathcal{L}$，且$A_n\subset{A_{n+1}}$，那么有$\bigcup_{n=1}^{\infty}A_n\in\mathcal{L}$.

样本空间也是一个$\lambda$-系统，有了一些$\pi$-系统中没有的合理性质。比如全事件，两个事件的差，单调事件列的极限的封闭性。

$\sigma$-代数，两个系统的结合

已经看到，两种系统各有优劣，都只能锁定一部分性质。实际上，两种系统结合起来，就是一个合理定义的最精炼的”定义域”，也就是$\sigma$-代数。如果一个集合$X$的子集族$\mathfrak{M}$既是$\pi$-系统又是$\lambda$-系统，那么$\mathfrak{M}$被称为定义在$X$上的$\sigma$-代数。

继续从样本空间出发考虑概率论中的例子。首先空集和全事件是肯定要有的。$\lambda$-系统就保证了这一点。根据1和2，$X-X=\varnothing\in\mathcal{L}$。如果将2中的$A$固定为$X$，那么又可以发现，任意子集的补集也在$\mathcal{L}$中。

最后需要考虑可数个并集的情况(这涉及到加法)。考虑到De Morgan定律，这也就解决了交集的问题。$\pi$-系统只交代了有限个的交运算，$\lambda$-系统只解决了单调集合列的运算，这两个单独看局限性肯定是很大的。但是结合起来就能得到任意可数个并集的情况了。这一点的论证是一个非常有意思的集合运算技巧，在这里演示一下。

设对于$n=1，2，\cdots$有$A_n\in\mathfrak{M}$，已经有$A_n^c\in\mathfrak{M}$。不难验证$B_n=\bigcup_{i=1}^{n}A_i=\left(\bigcap_{i=1}^{n}A_i^c\right)^c\in\mathfrak{M}$。又有$B_{n}\subset B_{n+1}$，所以$\bigcup_{n=1}^{\infty}B_n=\bigcup_{n=1}^{A_n}\in\mathfrak{M}$。

综上，定义在$X$上的$\sigma$-代数$\mathfrak{M}$满足三个性质:

$X\in\mathfrak{M}$.

若$A\in\mathfrak{M}$，那么$A^c\in\mathfrak{M}$(这里$A^c=X-A$).

若对$n=1, 2, \cdots$有$A_n\in\mathfrak{M}$，那么$\bigcup A_n\in\mathfrak{M}$.

这时，$X$称为可测空间，$\mathfrak{M}$中的元素称为可测集合。

一些评注和补充

不难证明，$\sigma$-代数既是$\pi$-系统又是$\lambda$-系统。也就是说，它满足这两个系统本身的性质，所以集合的差，有限个集合的交、并自然不在话下。
$\sigma$-代数中的元素可以有很多个，比如$\mathcal{P}(X)$，也可以有两个，比如$\{\varnothing，X\}$。实际上，$X$的任何子集族都可以生成一个最小的$\sigma$-代数。特别地，由$X$的全体开子集生成的$\sigma$-代数$\mathcal{B}$是一个有特殊地位的代数，它能和谐地处理连续函数(广义的)。$\mathcal{B}$的元素称为Borel集。
$\pi-\lambda$定理(两种系统的关系)：设$\mathcal{P}$和$\mathcal{L}$分别是一个$\pi$-系统和一个$\lambda$-系统，而且$\mathcal{P}\subset\mathcal{L}$，设包含$\mathcal{P}$的最小$\sigma$-代数为$\sigma(\mathcal{P})$，那么有$\sigma(\mathcal{P})\subset\mathcal{L}$。

可测函数

对于一个有界函数，如果这个函数Riemann可积，那么这个函数几乎处处连续。例如单调函数、有可数个甚至有限个间断点的函数。但是在这里讨论Lebesgue函数时并不考虑函数是否连续(尽管连续函数和可测函数有很多联系，这不是这篇博客的重点)。

设函数$f:X\to{Y}$，定义在$X$上的$\sigma$代数为$\mathfrak{M}$，若对任意的开集$V\subset{Y}$都有$f^{-1}(V)\in\mathfrak{M}$。其中$f^{-1}(V)=\{x\in{X}:f(x)\in{V}\}$

如果不了解什么是”开集”，可以先看作开区间的推广，即不包括边界点的集合。比如开区间、平面中不包含边界的集合，而开集的补集为闭集。开集是一个拓扑的基本元素，可测函数的定义保证这样的函数是”不病态”的。其实很好理解: 我们花好大功夫规定了$\sigma$-代数，是为了方便我们积分，结果值域里一个开区间就找不到$\sigma$-代数里对应的一个$X$的合理的子集，那肯定是不合理的。至于闭集。闭集是开集的补集，严格地说，一个集合是开集当且仅当其补集为闭集。又考虑到$\sigma$-代数对补集和并集的封闭性，可测函数的合理性就更清楚了。

对于实函数，有一种很有用的判别方法:

如果$f(x)$的值域为$\mathbb{R}$，对于任意的$\alpha\in\mathbb{R}$都有$\{x\in{X}:f(x)>\alpha\}\in\mathfrak{M}$，那么$f$为可测函数。

这也是一个最基本的限制条件。考虑到$\sigma$-代数的几条性质，不难对全体开区间进行分析。对于复函数，考虑$f=u+iv$。如果$u，v$都是可测函数，那么$f$是可测函数。

对于连续函数，如果$\mathfrak{M}$包含全体Borel集，那么连续函数可测。因为对于连续函数$f$，$f^{-1}(V)$一定为开集(可以从$\varepsilon-\delta$语言的角度考虑一下)。

特征函数、简单函数

如果$E$为可测集，定义函数

$\chi_E(x)=\begin{cases}1\quad{x\in{E}}\\ 0\quad{x\notin{E}}\end{cases}$

那么$\chi_E(x)$是一个可测函数。对于离散集合，每个点都应该看成开集。$\chi$被称为特征函数。

简单函数是指值域只有有限个点的函数，也就是所谓”阶梯函数”，但是要注意这里的阶梯并不一定是单调的。Lebesgue积分就是用阶梯函数的积分逼近的。如果找出每个取值点的原象，那么一个简单函数可以写成特征函数的形式。也就是说，设简单函数$s$的取值为$\alpha_1，\cdots，\alpha_n$，又令$A_i=\{x:s(x)=\alpha_i\}$，那么不难得到

$s=\sum_{i=1}^{n}\alpha_i\chi_{A_i}$

也不难发现，如果每个集合$A_i$都是可测集，那么$s$为可测函数。

任意可测函数都可以用简单函数逼近。也就是说，

设函数$f:X\to[0，+\infty]$为可测函数，存在定义在$X$上的可测简单函数$s_n(x)$使得

$0\leq s_1\leq s_2\leq\cdots\leq f$.

对任意$x\in{X}$有$s_n(x)\to f(x)(n\to\infty)$.

如果$f$既有正值又有负值，那么可以讨论$f^{+}=\text{max}(f，0)$和$f^{-}=-\text{min}(f，0)$即可，这两部分分别逼近之后又可以通过$f=f^{+}-f^{-}$结合起来。

测度、测度空间

做完了被积函数的工作之后再回到集合的”大小”这个概念上。实际上概率论中某一事件的概率就是一种测度。只不过这一测度的值域是$[0，1]$，而一般的测度的值域是$[0，+\infty]$。概率是一个从集合到$[0，1]$的映射，另外还有一点我想大家都很熟悉。如果$A\cap{B}=\varnothing$，那么$P(A\cup{B})=P(A)+P(B)$。这其实是基于测度定义的一个推广，严格地说，

一个正测度是定义在一个$\sigma$-代数$\mathfrak{M}$上的函数$\mu$，其值域为$[0，+\infty]$，而且满足可列可加性。也就是说，对互不相交的集合列$\{A_k\}$，有 $\mu(\bigcup_{k=1}^{\infty}A_i)=\sum_{i=1}^{\infty}\mu(A_i)$

对于$\mu$，假设至少有一个$A\in\mathfrak{M}$使得$\mu(A)<+\infty$。

和Riemann积分最接近的测度就是Lebesgue积分$m$。粗略地说，$m([a，b])=b-a$。这代表了Euclidean空间中点集的”体积”。如果是离散集合，设$\mu(E)$表示$E$中元素的个数，那么$\mu$也构成一个测度。但是一个集合是不是Lebesgue可测是一个比较复杂的问题。这在以后会解释。

一个测度空间指的是一个可测空间和一个定义在可测空间的$\sigma$-代数上的正测度。复测度是一个复值函数，定义域和正测度相同，而且满足可列可加性。

不难发现，$\mu(\varnothing)=0$，对于有限个互不相交的集合，可列可加性也是成立的(对于$n$个集合，将$n+1$个以后的集合看成空集即可)。

Lebesgue积分的构造

终于到了Lebesgue积分了。在进行之前先回顾一下我们做了什么工作。首先，考虑到积分是在集合的子集上(可以考虑$\mathbb{R}$的一些子集)下文章，我们找到了这个子集族需要满足的条件，也就是说，是一个$\sigma$-代数。为了测量一个集合的”大小”，我们定义了测度这个概念。这是”区间长度”的非常和谐的抽象推广。从一般的函数到所有可测的实函数、复函数，主要会通过下面三步进行。

简单函数

考虑非负可测简单函数(其他情况会另外考虑)$s=\sum_{i=1}^{n}\alpha_{i}\chi_{A_i}$。$s$为可测函数也就是说，对任意的$A_i$都有$A_i\in\mathfrak{M}$，这样的话$\mu(A_i)$就是存在的，否则运算没法进行，这也是函数可测性的意义体现。

回到博客开头，考虑面积，就需要考虑函数值($\alpha_i$)和区间长度。这里的抽象的”区间长度”变成了$\mu(A_i)$。如果积分的集合是$E\in\mathfrak{M}$，那么就有$A_i\cap{E}\in\mathfrak{M}$(因为$\mathfrak{M}$是$\pi$-系统!)。那么直接求和就行了:

$\int_{E}sd\mu = \sum_{i=1}^{n}\alpha_{i}\mu(A_i\cap{E})$

如果$\mu$表示的是实际的区间长度，那么这就是简单的面积求和; 如果$\mu$是一个概率测度，那么这就是计算了数学期望(随机变量是一个可测函数)。这里还有一个很有意思的例子:

如果$X$表示了你全部的课程，每门课用$A_i$表示，$\mu(A_i)$表示了这门课的学分，$\alpha_i$表示了这门课的绩点，那么这个Lebesgue积分再除以总学分就是你的GPA。

这里还需要定义$0*\infty=0$。可能有点别扭，但是这种情况还是要考虑的。比如有的时候$\alpha_i=0$(别考虑GPA了!)而$\mu(A_i\cap{E})=\infty$。这个定义也是有必要的。比如$f(x)=0$在$\mathbb{R}$上的积分应该是$0$而不是别的。

全体非负可测实函数

如果$f:X\to[0，+\infty]$为可测函数，那么对于$E\in\mathfrak{M}$定义

$\int_{E}fd\mu=\sup\int_{E}sd\mu$

其中上确界取遍所有$0\leq{s}\leq{f}$的可测简单函数。而我们已经知道，可测函数可以被简单函数逼近。所以这可以看成一个被简单函数逼近的过程。

全体复函数

最开始我们只讨论了非负实函数。其余两种情况，如果涉及到负数，可能计算上确界有点不合适; 对复数更不合适，因为复数没有大小。但是好在我们可以将这两种情况统一起来。设$f=u+iv$($v$可能恒等于$0$)，那么就设

$\int_{E}fd\mu=\int_{E}u^+d\mu-\int_{E}u^-d\mu+i\left(\int_{E}v^+d\mu-\int_{E}v^-d\mu\right)$

总而言之，从计算矩形面积，变成计算抽象的集合测度和函数值的乘积，推广之后就得到了Lebesgue积分。以后会详细论证Riemann积分和Lebesgue积分的具体关系。Lebesgue积分虽然在计算上并不一定有很好的优势，但是在抽象论证过程中有了更多的可能性。以后也会讲到，Lebesgue积分在处理收敛问题时的便利之处，最开始的一个题也就很简单了。