Guided by researches in function theory, operator theorists gave the analogue to quasi-analytic classes. Let $A$ be an operator in a Banach space $X$. $A$ is not necessarily bounded hence the domain $D(A)$ is not necessarily to be the whole space. We say $x \in X$ is a $C^\infty$ vector if $x \in \bigcap_{n \geq 1}D(A^n)$. This is quite intuitive if we consider the differential operator. A vector is analytic if the series

has a positive radius of convergence. Finally, we say $x$ is quasi-analytic for $A$ provided that

or equivalently its nondecreasing majorant. Interestingly, if $A$ is symmetric, then $\lVert{A^nx}\rVert$ is log convex.

Based on the density of quasi-analytic vectors, we have an interesting result.

(Theorem)Let $A$ be a symmetric operator in a Hilbert space $\mathscr{H}$. If the set of quasi-analytic vectors spans a dense subset, then $A$ is essentially self-adjoint.

This theorem can be considered as a corollary to the fundamental theorem of quasi-analytic classes, by applying suitable Banach space techniques in lieu.

For a positive sequence ${a_n}$, we see it is the moment of a positive measure $\mu$, i.e. $a_n = \int_\mathbb{R}t^n d\mu(t)$ if and only if it is positively definite (proof). But the uniqueness is not guaranteed. Here we have a sufficient condition for this - using the concept of quasi-analytic vector. This is a old theorem (1922) but we are using operator theory to prove it which appeared decades later.

(Carleman’s condition)Suppose ${a_n}$ is the moment sequence of a positive measure $\mu$ on $\mathbb{R}$, then $\mu$ is uniquely determined provided that $\sum a_{2n}^{-1/2n}=\infty$.

**Proof.** Consider the Hilbert space

and the operator

It is clear that $A$ is self-adjoint. We shall work on the constant function $u(t) \equiv 1 \in \mathscr{H}$. Since $A^nu = t^n$, we see $u \in C^\infty$, otherwise $a_n$ is not defined. On the other hand, we have

But $a_{2n}^{-1/2n}=\lVert A^n u \rVert^{-1/n}$ and as a result we see $\sum a_{2n}^{-1/2n}= \sum \lVert A^n u \rVert^{-1/n} = \infty$, hence $u$ is quasi-analytic. In general, $t^n = A^n u$ is quasi-analytic for all $n \geq 0$.

Consider the space of polynomial $\mathcal{P}[t]$ with closure $\mathscr{H}_1$. It follows from the theorem above that $A_1 = A|_{\mathcal{P}[t]}$ is essentially self-adjoint in $\mathscr{H}_1$. Hence $\mathscr{H}_1$ is invariant under the one-parameter group $e^{iAs}$. Pick $y \in \mathcal{P}[t]^{\perp}$, we see

which implies that $y = 0$ a.e. [$\gamma$]. It follows that $\mathscr{H}_1 = \mathscr{H}$ or equivalently $\mathcal{P}[t]$ is dense in $\mathscr{H}$.

Suppose now we have another generating measure $\nu$ of ${a_n}$. With respect to $\nu$, $\mathcal{P}[t]$ is still a dense space. But the norm on $\mathcal{P}[t]$ is fixed by ${a_n}$, hence we obtain an isometry between $\mathcal{P}[t]_\gamma$ and $\mathcal{P}[t]_\nu$, which extends to the isometry between $L^2(\mathbb{R},\gamma)$ and $L^2(\mathbb{R},\nu)$ which forces $\gamma$ and $\nu$ to be equal. $\blacksquare$

There are a lot of nice properties of analytic functions, whose class is denoted by $C^\omega$. Formally we have the following definition:

If $f \in C^\omega$ and $x_0 \in \mathbb{R}$, one can write

Obviously $f \in C^\infty$ (and hence $C^\omega \subset C^\infty$) and alternatively we have the Taylor series converges to $f$ for any $x_0 \in \mathbb{R}$:

One interesting thing is, every $f \in C^\omega$ is uniquely determined by a sequence $D^0f(x_0), Df(x_0),D^2f(x_0),\cdots$.

Unfortunately, this property is not generally true on $C^\infty$. For example, we can consider the bump function $\varphi$ (a simple example can be found on wikipedia). In brief, $\varphi=0$ for all $x \in (-\infty,-1] \cup [1,+\infty)$ but $\varphi>0$ on $(-1,1)$. And more importantly, $\varphi \in C^\infty$. However, if we take $f = \varphi$ and $g = 2\varphi$, then $f \neq g$, but $D^nf(-2)=D^ng(-2)=0$ for all $n \geq 0$. We get a sequence of derivatives of different orders, but this sequence does not determine a unique $C^\infty$ function.

The term “uniquely determined” can also be described in an alternative way: If $f \in C^\omega$ and $D^k(x_0)=0$ for all $k \geq 0$, then $f=0$ everywhere.

So a question comes up naturally: how many functions can be determined by its derivatives of all orders? Does $C^\omega$ contain all we can get? If not, how can we describe them?

The class of analytics functions is our source of motivation, so it makes sense to dig into its properties to find more. For an analytic function it is natural to consider the restriction of a holomorphic function on the complex plane. Let $\Omega$ be the set of all $z=x+iy$ such that $|y| < \delta$ and suppose $f \in H(\Omega)$ and $|f(z)|<\beta$ for all $z \in \Omega$. By Cauchy’s Estimate, we get

Also the restriction of $f$ on $\mathbb{R}$ is real-analytic. Here comes the interesting part: $\beta$ and $\frac{1}{\delta}$ is determined only by $f$ and have nothing to do with $n$, meanwhile $n!$ is a special sequence that dominated $f$ to some extent.

This motivates us to define a special class of functions, which is called the class $C\{M_n\}$.

Let $\{M_n\}$ be a sequence of positive numbers, we let $C\{M_n\}$ denote the class of all $f \in C^\infty$ such that

where $\lVert \cdot \rVert_\infty$ is the supremum norm defined on $\mathbb{R}$, and $\beta_f,B_f$ are constants only determined by $f$ but not $n$.

In order to equip $C\{M_n\}$ with some satisfying algebraic structures, which can simplify our work, we need some restrictions.

Indeed, $B_f$ plays an much more important rule, since we have

while $\beta_f$ was eliminated to $1$ in this limit. However, if we eliminate $\beta_f$ at the beginning, i.e. put $\beta_f = 1$ for all $f \in C\{M_n\}$, then when $n=0$, we have

which prevents $C\{M_n\}$ to be a vector space. For example, if $\lVert f \rVert_\infty = M_0$, then $\lVert 2f \rVert_\infty = 2M_0 > M_0$, hence $2f \not\in C\{M_n\}$. However, if we add $\beta_f$ no matter what, say $\lVert f \rVert_\infty \leq \beta_f M_0$, then whenever we do addition and scalar multiplication, there is a different constant with respect to the function, which makes sure that $C\{M_n\}$ is closed under addition and scalar multiplication, i.e. is a vector space. If we don’t add such a constant, our class contains way too few functions.

Further, we have some restriction on the sequence $\{M_n\}$:

- $M_0=1$.
- $M_n^2 \leq M_{n-1}M_{n+1}$ ($\{\log M_n\}$ is a convex sequence).

As we will see soon, this makes $C\{M_n\}$ an algebra over $\mathbb{R}$, where multiplication is defined pointwise.

*Proof.* If $f,g \in C\{M_n\}$, then we need to show that $fg \in C\{M_n\}$. We have the product rule for differentiation:

Since $f,g \in C\{M_n\}$, we have

Of course we want to eliminate $M_jM_{n-j}$ to obtain a binomial expansion. To do this we need the convexity of the sequence $\{\log M_n\}$. Note $M_n^2 \leq M_{n-1}M_{n+1}$ implies

As a result, the line segment connecting $(n,\log M_n)$ and $(n-1,\log M_{n-1})$ is steeper and steeper as $n$ grows. By connecting these points, we actually gets a convex function but we will be more rigorous. For $0 < j < n$, we have

Hence $M_n \geq M_jM_{n-j}$ for $0<j<n$. It also hold when $j=0$ or $j=n$, hence we get

Hence $fg \in C\{M_n\}$. The reason why $C\{M_n\}$ is a vector space has been stated already. $\square$

This restriction does not hurt the generality. In fact whenever we are given a positive sequence $\{M_n\}$, we have another sequence $\{M’_n\}$ satisfying the two restrictions such that $C\{M_n\}=C\{M’_n\}$.

A class $C\{M_n\}$ is said to be quasi-analytic if the condition

for all $n \in \mathbb{N}$ implies that $f = 0$ for all $x \in \mathbb{R}$.

The reason we try to check whether it’s equal to $0$ everywhere, instead of check whether it is ‘uniquely determined’ by a sequence of derivative of different order is, this one is much simpler to work with. If a sequence of derivative of different order determines two functions, then their difference is always $0$.

We have seen that $C\{n!\}$ contains all functions which is a restriction of a holomorphic function in the strip defined by $|\Im(z)|<\delta$. Conversely, we show that any function in $C\{n!\}$ defined on the real axis can be extended to a holomorphic function with the same property. As a result, $C\{n!\}$ is a quasi-analytics class (which contains all bounded function of $C^\omega$). If we only consider functions defined on a closed and bounded interval $[a,b]$, then $C\{n!\}$ is exactly $C^\omega$.

Suppose $f \in C\{n!\}$. First of all we have

for $n \in \mathbb{N}$. By Taylor’s formulae

The remainder is therefore dominated by

If $|B(x-a)|<1$, then $\lim_{n \to \infty}|B(x-a)|^n = 0$, and we can safely write the expansion

Pick $0<\delta<\frac{1}{B}$, we can replace $x$ in the expansion above with $z$ such that $|z-a|<\delta$. This defines a holomorphic function $F_a$ on $D(a,\delta)$ (the open disk centred at $a$ with radius $\delta$). If $x \in D(a,\delta)$ is real, then $F_a(x)=f(x)$. Therefore $F_a$ is the analytic continuation of $f$; all $F_a$ form a holomorphic extension $F$ of $f$ in the strip $|\Im(z)|<\delta$. As a result, for $z = a+iy$ with $|y|<\delta$, we have

Hence $F$ is bounded in such a region.

In general, if $M_n \to \infty$ way too fast (at least faster than $n!$) as $n \to \infty$, then $C\{M_n\}$ is quasi-analytic. There are several equivalent statements on whether $C\{M_n\}$ is a quasi-analytic class, which is given by the Denjoy-Carleman theorem. Here I collect all conditions that I have found:

(Denjoy-Carleman theorem)The following conditions are equivalent:

- $C\{M_n\}$ is not quasi-analytic.
- $\int_0^\infty \log Q(x)\frac{dx}{1+x^2}<\infty$, where $Q(x)=\sum_{n=0}^{\infty}\frac{x^n}{M_n}$.
- $\int_0^\infty \log q(x) \frac{dx}{1+x^2}<\infty$, where $q(x) = \sup \frac{x^n}{M_n}$.
- $\sum_{n=1}^{\infty}\left(\frac{1}{M_n}\right)^{1/n}<\infty$.
- $\sum_{n=1}^{\infty}\frac{M_{n-1}}{M_n}<\infty$
- $C\{M_n\}$ contains nontrivial function with compact support.
- $\sum_{n=1}^{\infty}\frac{1}{\lambda_n}<\infty$ where $\lambda_n = \inf_{k \geq n}M_k^{\frac{1}{k}}$.

You may find condition 7 is ridiculous. In fact, in this condition $\{M_n\}$ is not required to satisfy the two restriction. This one is what Denjoy and Carleman found initially. Later, mathematicians find that for a sequence $\{M_n\}$ we can obtain its convex minorant $\{M_n’\} $ such that

- $M_n \geq M_n’$ for all $n$.
- $\{\log M_n’\}$ is convex.
- There is a sequence $0=n_0<n_1<\cdots$ such that $M_{n_0} = M’_{n_0}$ and $\log M_k$ is linear for $n_i \leq k \leq n_{i+1}$.

And as you may guess, the convex minorant $\{M_n’\}$ is what we are using today.

The proof of the Denjoy-Carleman theorem will come out in my next blog post. There are quite a lot of work to do to finish the proof, and it cannot be done within hours. We will be using many complex analysis theories. Also, I will try to cover some extra properties of quasi-analytic classes as well as why convex minorant is sufficient.

]]>This post is still on progress, neither is it finished nor polished properly. For the coming days there will be new contents, untill this line is deleted. What I’m planning to add at this moment:

- Transpose is not just about changing indices of its components.
- Norm and topology in vector spaces
- Representing groups using matrices

Since the background of the reader varies a lot, I will try to organise contents depending on topic and required background. For the following section, you are assumed to be familiar with basic abstract algebra terminologies, for example, group, ring, fields.

When learning linear algebra, we were always thinking about real or complex vectors, matrices. This makes sense because $\mathbb{R}$ and $\mathbb{C}$ are the closest number **fields** to our real life. But we should not have the stereotype that linear algebra is all about real and complex spaces, or properties of $\mathbb{R}^n$ and $\mathbb{C}^n$. Never has there been such an restriction. In fact, $\mathbb{R}$ and $\mathbb{C}$ can be replaced with any field $\mathbb{F}$, and there are vast differences depending on the properties of $\mathbb{F}$.

There are already some differences about linear algebra over $\mathbb{R}$ and $\mathbb{C}$. Since $\mathbb{C}$ is algebraically closed, that is, all polynomials of order $n \geq 1$ have $n$ roots, dealing with eigen functions has been much ‘safer’. Besides, for example, we can diagnoalise the matrix

in $\mathbb{C}$ but not in $\mathbb{R}$.

When $\mathbb{F}$ above is finite, there are a lot more interesting things. It’s not just saying, $\mathbb{F}$ is a field, and is finite. For example, if $\mathbb{F}=\mathbb{R}$, we have

There shouldn’t be any problem. However, on the other hand, if $\mathbb{F}=\mathbb{Z}_5$, we have

In application, when working on applied algebra, it’s quite often to meet finite fields. What if we want to solve linear equation over a finite field? That’s when linear algebra over finite fields comes in. Realise this before it’s late! By the way, we are working on rings in lieu of fields, we find ourselves in module theory.

The set of all invertible $n \times n$ matrices forms a multiplicative group (and you should have no problem verifying this). The notation won’t go further than $GL(n)$, $GL(n,\mathbb{F})$, $GL_n(\mathbb{F})$ or simply $GL_n$. The set of all orthomormal matrices, which is also a multiplicative group and written as $O(n)$, is obviously subgroup of $GL(n)$ since for all $A \in O(n)$, we have $\det{A} = \pm 1 \neq 0$ all the time. $O(n)$ contains $SO(n)$ as a subgroup, whose elements have determinant $1$. One should not mess up with $SO(n)$ and $SL(n)$ which is the group of all matrices of determinant $1$. In fact $SO(n)$ is a proper subset of $SL(n)$ and $SL(n) \cap O(n) = SO(n)$. In general we have

Now we consider a more detailed group structure between $GL(n)$ and $O(n)$. I met the following problem on a differential topology book and was about fibre and structure group. But for now it’s simply a linear algebra problem. The crux is finding the ‘square root’ of a positive defined matrix.

There is a direct product decomposition

This decomposition is pretty intuitive. For example if a matrix $A \subset GL(n,\mathbb{R})$ has determinant $a$, we may be looking for a positive definite matrix of determinant $|a|$, and another matrix of determinant $\frac{a}{|a|}$, which is expected to be orthonormal as well. We can consider $O(n)$ as a rotation of basis (change the direction), and the positive definite symmetric matrix as scaling (change the size). Similar result hold if we change the order of multipication. It worth mentioning that by direct product we mean it’s up to the order of eigenvalues.

**Proof.** For any invertible matrix $A$, we see $AA^T$ is positive definite and symmetric. Therefore there exists some $P \in O(n)$ such that

We assume that $\lambda_1\leq \lambda_2 \leq \cdots \leq \lambda_n$ to preserve uniqueness. Note $\lambda_k>0$ for all $1 \leq k \leq n$ since $AA^T$ is positive definite. We write $\Lambda=\operatorname{diag}(\sqrt\lambda_1,\sqrt\lambda_2,\cdots,\sqrt\lambda_n)$ which gives

Define the square root $B=\sqrt{AA^T}=\sqrt{A^TA}$ by

Then $B^2=P\Lambda P^T P \Lambda P^T = AA^T$. Note $B$ is also a positive definite symmetric matrix and is unique for given $A$. Let $v_1,v_2,\cdots,v_n$ be the orthonormal and linear independent eigenvectors of $B$ with respect to $\sqrt\lambda_1, \sqrt\lambda_2, \cdots, \sqrt\lambda_n$. We first take a look at the following basis:

Note

So if the value above is $1$ if $i = j$ and $0$ if $i \neq j$. ${e_1,e_2,\cdots,e_n}$ is a basis since $A$ is invertible, and later we know it is orthonormal.

Then we take

We see

since both ${e_1,e_2,\cdots,e_n}$ and ${v_1,v_2,\cdots,v_n}$ are orthonormal. On the other hand, we need to prove that $A=UB$. First of all,

(Note we used the fact that ${v_k}$ are orthonormal.) This yields

Therefore $A=UB$ holds on a set of basis, therefore holds on $\mathbb{R}^n$. This gives the desired conclusion. For any invertible $n \times n$ matrix $A$ we have a unique decomposition

where $U \in O(n)$ and $B$ is a positive definitive symmetric matrix. $\square$

Basis of a vector space is not coming from nowhere. The statement that all vector spaces have a basis is derived from axiom of choice and the fact that all non-zero elements in a field is invertible. I have written an article proving this already, see here (this is relatively advanced). On the other hand, since elements of a ring are not necessarily invertible, modules over a ring are not equipped with basis in general.

It is also worth mentioning that, a vector space of finite dimension is not necessarily of finite dimension. Infinite dimensional vector space is not some fancy thing. It’s quite simple: the set of basis is not finite. It can be countable or uncountable. And there is a pretty straightforward example: the set of all continuous functions $f:\mathbb{R} \to \mathbb{R}$.

One of the most important concepts developed in 20th century is, when studying a set, one can study functions defined on it. For example, let’s consider $[0,1]$ and $(0,1)$. If we consider the set of all continuous functions on $[0,1]$, which is written as $C([0,1])$, we see everything is fine. It’s fine to define norm on it, to define distance on it, and the norm and distance are complete. However, things are messy on $C((0,1))$. Defining a norm on it results in abnormal behaviour. If you are interested you can check here.

Now let’s consider the unit circle $S^1$ on the plane. The real continuous functions defined on $S^1$ can be considered as periodic functions defined on $\mathbb{R}$. So we may have a lot to do with it. If we are interested in the torus (the picture below is from wikipedia),

which is homeomorphic to $S^1 \times S^1$, how can we study the functions on it? We may consider $C(S^1) \times C(S^1)$, but as we will show later, there are some problems about that. Anyways, it makes sense to define ‘product’ from two vector spaces, which can ‘expand’ it.

Let’s review direct sum and direct product first. For the direct product of $A$ and $B$, we ask for a algebraic structure on the Cartesian product $A \times B$. For example, $(a,b)+(a’,b’)=(a+a’,b+b’)$. That is, the operation is defined componentwise. This works fine for groups since for each group there is only one binary operation. But at this point we don’t care about scalar multiplication.

There are two types of direct sum, inner and outer. For a vector space $V$ over a field $\mathbb{F}$, we consider two (or even more) subspaces $W$ and $W’$. We have a ‘bigger’ subspace generated by adding $W$ and $W’$ together, namely $W+W’$, which contains all elements of the form $w+w’$ where $w \in W$ and $w’ \in W’$. The representation is not guaranteed to be unique. That is, for $z=w+w’$, we may have $w_1 \in W$ and $w_1’ \in W’$ such that $z=w_1+w_1’$ but $w \neq w_1’$. This would be weird. Fortunately, the representation is unique if and only if $W \cap W’$ is trivial. In this case we say the sum of $W$ and $W’$ is direct, and write $W \bigoplus W’$. This is inner direct sum.

Can we represent the direct sum using an ordered pair? Of course we can. Elements in $W \bigoplus W’$ can be written in the form $(w,w’) \in W \times W’$, and the addition is defined componentwise. That is, $(w,w’)+(w_1,w_1’)=(w+w_1,w’+w_1’)$ (which is in fact $(w+w’)+(w_1+w_1’)=(w+w_1)+(w’+w_1’)$). It seems that we don’t go further than direct product. However we need to consider the scalar product. For $\alpha \in \mathbb{F}$, we have $\alpha(w,w’) = (\alpha{w},\alpha{w’})$ this is because $\alpha(w+w’)=\alpha{w}+\alpha{w’}$. We call this **inner** direct sum because $W$ and $W’$ are *inside* $V$. One may ask, since $w+w’=w’+w$, why the pair is ordered? For $w+w’$ we have the first one to be an element of $W$ and the second one to be $W’$ but for $w’+w$ we can’t.

Outer direct sum is different. To define this one considers two *arbitrary* vector spaces $W$ and $V$ over $\mathbb{F}$. It is not guaranteed that $W$ and $V$ are both subspaces of a bigger vector space. For example it’s legit to take $W$ to be $\mathbb{R}$ over itself and $V$ to be all real functions. $W \bigoplus V$ is defined to be the set of all ordered pairs $(w,v)$ with $w \in W$ and $v \in V$. The addition is defined componentwise, and scalar multiplication is defined to be $\alpha(w,v)=(\alpha{w},\alpha{v})$. One may also write $w+v$ if context is clear.

When the number of vector spaces is finite, we don’t distinguish between direct product and direct sum. When the index is infinite, for example when we consider $\prod_{i=1}^{\infty}X_i$ and $\bigoplus_{i=1}^{\infty}X_i$, things are different. To be precise, in the language of category theory, direct product is the *product*, and direct sum is the *coproduct*.

We are not touching the definition but first of all let’s imagine what we have for multiplication. Let $W$ and $V$ be two vector spaces over $\mathbb{F}$ and we use $\cdot$ to be the multiplication for the time being. Law of distribution should hold, that is, we have $w \cdot v + w’ \cdot v = (w+w’) \cdot v$ and $w \cdot v + w \cdot v’ = w \cdot (v+v’)$. On the other hand, scalar multiplication should be operated on a single component, that is, $\alpha(w \cdot v)=(\alpha w) \cdot v = w \cdot (\alpha v)$.

It seems illegal to use $\cdot$ so let’s use ordered pair. Under these laws, we have

It makes sense to call it ‘bilinear’. Fixing one component, we have a linear transform. However, direct product and direct product do not work here at all. If it would work, we have $(w,v)+(w’,v)=(w+w’,v+v)$. This gives rise to the tensor product: we need a legit multiplication works on vector and vector.

We have got the spirit of tensor product. A direct product is not OK. There has to be bilinear operation on itself no matter what. For two vector spaces $V$ and $W$, we write the tensor product by $V \bigotimes W$, for $v \in V$ and $w \in W$, we denote its tensor product by $v \otimes w$, which can be considered as a image or value of a bilinear function $\varphi(\cdot,\cdot):V \times W \to V \bigotimes W$. There are many bilinear map with domain $V \times W$. We ask the tensor product to be the essential one.

The

tensor product$V \bigotimes W$ of $V$ and $W$, is the vector space having the following properties.

There exists the canonical bilinear map $\varphi(\cdot,\cdot):V \times W \to V \otimes W$, and we write $\varphi(v,w) = v \otimes w \in V \bigotimes W$.

For any bilinear map $h(\cdot,\cdot):V \times W \to U$, there exists a unique linear map

such that $\lambda(\varphi(v,w)) = h(v,w)$ for all $(v,w) \in V \times W$. This is called the

universal propertyof $V \bigotimes W$.

It can be easily verified that, if $V$ and $W$ have two tensor products, then they are isomorphic (hint: use the universal property). So all tensor products of $V$ and $W$ are isomorphic, we only need to pick the obvious one (as long as it exists). But we don’t have too much space for it. For further study I recommend the following documents:

- Definition and properties of tensor products. This one involves a considerable amount of explicit calculation and is of elementary approach.
- Tensor products and bases. This one proves the existence in an abstract way.
- Tensor Product as a Universal Object (Category Theory & Module Theory). One of my recent blog posts. The topics here are relatively advanced, and I don’t think it’s a good idea to use the language of category theory at this early point.

Let $\mathbb{F}$ be any field (it can be replaced with a commutative ring if you want to), and $E,F$ be two modules over $\mathbb{F}$. We will have a glance at the definition of dual space and more importantly, we see what is a transpose. In general we study the bilinear form

Sometimes for simplicity we also write $f(x,y)=\langle x,y \rangle$. The set of all bilinear forms of $E \times F$ into $\mathbb{F}$ will be denoted by $L^2(E,F;\mathbb{F})$ and you may have seen it earlier.

We define the **kernel** of $f$ on the left to be $F^\perp$ and on the right to be $E^\perp$. Recall that for $S \subset E$, $S^\perp$ consists all $y$ such that $f(x,y)=0$ whenever $x \in S$; similarly, for $T \subset F$, $T^\perp$ consists all $x$ such that $f(x,y)=0$ whenever $y \in T$. Respectively, we say $f$ is **non-degenerate** on the left/right if the kernel on the left/right is trivial.

One of the simplest example is the case when $E=\mathbb{F}^m$ and $F=\mathbb{F}^n$. We take a $m \times n$ matrix $A$ over $\mathbb{F}$. Define $f(x,y) = x^T A y$. This is a classic bilinear form. Whether it is non-degenerate on the left or on the right depends on the linear independency of row vectors and column vectors. $\def\opn{\operatorname}$

The bilinear form $f$ gives rise to a homomorphism of $E$ to a ‘space of essential arrows’:

given by

$\opn{Hom}_\mathbb{F}(F,\mathbb{F})$ contains all linear maps of $F$ into $\mathbb{F}$. One can imagine $\opn{Hom}_\mathbb{F}(F,\mathbb{F})$ to be a set of ‘arrows’ from $F$ to $\mathbb{F}$.

Now let’s see what we can do in analysis and topology.

Let’s consider all complex polynomials of order $\leq 5$. This is a complex vector space and is in fact isomorphic to $\mathbb{C}^6$ since we have a bijection mapping $a_0+a_1z+a_2z^2+a_3z^3+a_4z^4+a_5z^5$ to $(a_0,a_1,a_2,a_3,a_4,a_5)^T$. Therefore we can simply use matrix and vectors. We represent differentiation via matrices. This is a straightforward work. We pick the natural basis $\{1,z,z^2,z^3,z^4,z^5\}$ to begin with and write the differentiation as $\mathscr{D}$. Since $\def\ms{\mathscr}$

We get a matrix corresponding to $\ms{D}$ by

Next we try to obtain the Jordan normal form of $D$. Since the minimal polynomial of $D$ is merely $m(\lambda)=\lambda^6$, we cannot diagonalise it. After some computation we get

where the matrix $J$ in the square bracket is our Jordan normal form. This makes sense since if we consider the basis $\{1,z,\frac{1}{2}z^2,\frac{1}{6}z^3,\frac{1}{24}z^4,\frac{1}{120}z^5\}$, we see under this basis,

which coincides with $J$.

We already know $\ms{D}^6=0$ but we can also get this by considering $D^6=SJ^6S^{-1}=0$ since $J^6=0$. Further, the format of $S$ should have you realise that we have a hidden $e$, that is

and the basis is in fact first $6$ terms of the expansion of $\exp{z}$.

If this cannot fansinate you I don’t know what can!

Next we consider an example on infinite dimensional vector spaces. Consider $E=C_c^\infty(\mathbb{R})$, the infinite dimensional vectror space of $C^\infty$ functions on $\mathbb{R}$ with compact support, namely, for $f \in C_c^\infty(\mathbb{R})$, we have $f \in C^\infty$ and there exists some $0<K<\infty$ such that $f(x)=0$ outside $[-K,K]$. Next consider the bilinear form $E \times E \to \mathbb{R}$ defined by the following inner product:

Note the differential operator $\ms{D}:E \to E$ is a linear map of $E$ into $E$, so let’s find its transpose $\ms{D}^T$. That is, we need to find the unique linear map $\ms{D}^T:E \to E$ such that

This is a simple application of integration by parts:

Hence the **transpose** of differentiation $\ms{D}$ is $-\ms{D}$. So we can say it’s skew-symmetric for some obvious reason. But the matrix of $\ms{D}$ in $n$-polynomial space is not.

(Perron’s theorem)Let $A$ be a $n \times n$ matrix having all components $a_{ij}>0$, then it must have a positive eigenvalue $\lambda_0$, and a unique corresponding positive eigenvector, i.e., $x=(x_1,x_2,\cdots,x_n)^T$ such that $x_i>0$ for all $i = 1,2,\cdots,n$.

In fact, the positive eigenvalue is the spectral radius of $A$, which is often written as $\rho(A)$. I recommend reading the following documents:

- A short proof of Perron’s theorem. This mentioned more algebraic properties of $\rho(A)$.
- The Perron-Frobenius Theorem. This paper mentioned some real life application (modelling growth of a population) and has some exercises to work on.
- Proof of the Frobenius-Perron Theorem. This paper is more elementary-focused.

But here we are using Brouwer’s fixed point theorem (you may find an elementary proof on project Euclid). In the following proof, we write $D_n$ to denote $n$-disk and $\Delta^n$ to denote $n$-simplex. That is,

Note $D_n$ is homeomorphic to $\Delta^n$. Further we have a lemma:

(Lemma)If $f:X \to X$ is a continuous function and $X$ is homeomorphic to $D_n$, then $f$ has a fixed point as well.

**Proof of the lemma.** Let $\varphi$ be the homeomorphism from $X$ to $D_n$. Then $\varphi \circ f \circ \varphi^{-1}:D_n \to D_n$ has a fixed point, according to Brouwer’s fixed point theorem, suppose we have

Then

and hence $\varphi^{-1}(y) \in X$ is our fixed point. $\square$

Now we are ready to prove Perron’s theorem using Brouwer’s fixed point theorem.

**Proof of Perron’s theorem.** Define $\sigma(x)=\sum_{i=1}^{n}x_i$ where $x = (x_1,x_2,\cdots,x_n)^T$, we see since it’s linear, it’s continuous (it’s not generally true for infinite dimensional spaces, but it’s safe now, and you can see this question on mathstackexchange for a proof). Similarly $A$ is continuous as well. Also, by definition, $x \in \Delta^{n-1}$ if and only if $\sigma(x)=1$. We see Define a function $g:\Delta^{n-1} \to \Delta^{n-1}$ by

We will show that this function is well-defined. Since $x \in \Delta^{n-1}$, not all components of $x$ are equal to $0$ since if so, we get $x_1+x_2+\cdots+x=0$, contradicting the assumption that $x \in \Delta^{n-1}$. Note we can write down $Ax$ explicitly (this is an elementary linear algebra thing):

Since $A$ has all components greater than $0$, we see all components of $Ax$ are greater than $0$ as well. Hence $\sigma(Ax)>0$. On the other hand, $g(x) \in \Delta^{n-1}$ since $\sigma(g(x))=\frac{\sigma(Ax)}{\sigma(Ax)}=1$. Since $A$, $\sigma$, $y=\frac{1}{x}$ are continuous, being a composition of continuous functions, $g$ is continuous.

However, since $\Delta^{n-1}$ is homeomorphic to $D_{n-1}$, $g$ has a fixed point according to the lemma. Hence there exists some $y \in \Delta^{n-1}$ such that

But as we have already proved, $\lambda_0=\sigma(Ay)$ is continuous. On the other hand, all components of $y$ are positive since all components of $Ay$ are positive. The proof is completed. $\square$

You are assumed to be familiar with multivariable calculus when reading this subsection since we are discussing it right now. But in general this section is much beyond elementary linear algebra. First of all we are presenting the *ultimate* abstract extension of the usual gradient, curl, and divergence. We simply consider the $C^\infty$ functions $\mathbb{R}^3 \to \mathbb{R}^3$. When working on gradient, we consider something like $\def\pf[#1]{\frac{\partial f}{\partial #1}}$

When working on curl, we consider

Finally for divergence we consider

They were connected by Green’s theorem, Gauss’s theorem, Stokes’ theorem. But are they abruptly connected for no reason but numerical equality? Fortunately, no. Let’s see why.

First of all for convenience we write $(x_1,x_2,x_3)$ instead of $(x,y,z)$. Define $dx_idx_j=-dx_jdx_i$ for all $i,j = 1,2,3$. Note this implies that $dx_idx_i=0$. For $d$ we have the definition as follows:

- If $f$ is a $C^\infty$ function, then $df = \sum_{i=1}^{3}\pf[x_i]dx_i$.
- If $\omega$ is of the
*form*$\sum f_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}$, then $d\omega=\sum df_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}$.

Then gradient, curl and divergence follows in the nature of things. You can verify that the second one is actually equal to $d(f_1dx+f_2dy+f_3dz)$ and the third one is equal to $d(f_1dydz-f_2dxdz+f_3dxdy)$. We call $d$ the exterior differentiation.

Linear algebra is not just for $\mathbb{R}^3$ space, so is exterior differentiation. Let $\Omega^\ast$ be the algebra over $\mathbb{R}$ (for algebra over a field, see this), generated by $dx_1,\dots,dx_n$ with the multiplication defined by an **anti-commutative** multiplication $dx_idx_j=-dx_jdx_i$ for all $i,j=1,2,\cdots,n$. As a vector space over $\mathbb{R}$, $\Omega^\ast$ is of dimension $2^n$ with a basis

where $i<j<k$. Let $C^\infty$ itself be the vector space of $C^\infty$ functions on $\mathbb{R}$, and we define the $C^\infty$ differential *forms* on $\mathbb{R}^n$ by

For simplicity we omit the tensor product symbol $\otimes$. As a result, for any $\omega \in \Omega^\ast(\mathbb{R})$, we have $\omega$ to be a simple $C^\infty$ function (why don’t we call it a $0$-form? ) or we have $\omega = \sum f_{i_1\cdots i_q}dx_{i_1}\dots dx_{i_q}$, and we call it a $q$-form since the maximal degree of $dx_j$ is $q$. Also we can define $\Omega^q(\mathbb{R}^n)$ to be the vector space of $q$-forms. Consider the differential defined $d$ defined by

- If $f$ is a $C^\infty$ function, then $df = \sum_{i=1}^{n}\pf[x_i]dx_i$.
- If $\omega$ is of the
*form*$\sum f_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}$, then $d\omega=\sum df_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}$.

This is what we call the exterior differentiation. It’s the ultimate abstract extension of gradient, curl and divergence. Your calculus teacher may have warned you, that you cannot deal with $dx$ independently. So is it safe to work like this? Yes, there is nothing to worry about. We are doing abstraction algebraically.

There are so many concepts can be understood in a linear algebra way. For example we also have

In fact Green’s theorem, Gauss’ theorem and Stokes’ theorem have a ultimate abstract extension as well, which is called the general Stokes’ theorem:

If $\omega$ is an $(n-1)$-form with compact support on an oriented manifold $M$ of dimension $n$ and if $\partial M$ is given the induced orientation, then

We are not diving into this theorem but we will conclude this subsection by a glimpse on integration. Recall that the Riemann integral of a differentiable function $f:\mathbb{R}^n \to \mathbb{R}$ can be written as

Here we add the absolute value function to $dx_1 \dots dx_n$ is to emphasise the distinction between the Riemann integral of a function and the integral of differential form, since order only matters in the latter case. For the latter case, if $\pi$ is a permutation of $1,2,\cdots,n$ or we simply say $\pi \in S_n$, then

This definition is natural and obvious. Since $\operatorname{sgn} \pi$ is equal to the determinant of the matrix representing $\pi$ (see here), it’s natural to consider the determinant. Consider the function

Then $J(\Pi)=\operatorname{sgn}\pi$. This is quite similar to what we expect from Jacobian determinant in general, which describes change-of-variable essentially. Let $x_1,x_2,\cdots,x_n$ be a basis of $\mathbb{R}^n$ and $T:\mathbb{R}^n \to \mathbb{R}^n$ be a diffeomorphism. We have a new basis $y_1,y_2,\cdots,y_n$ given by

where $\pi_i:(a_1,a_2,\cdots,a_n) \mapsto a_i$ is the $i$th projection. Namely

written in column vectors. We now show that

First we recall that $J(T)$ is the determinant of $(\partial T_i / \partial x_j)$, and the determinant of a matrix $(a_{ij})$ is defined by

where $\epsilon(\sigma)$ is actually $\operatorname{sgn}\sigma$ and $\sigma$ ranges through all permutation of $1,2,\cdots,n$. We need something to coincide. First of all, we compute $dy_i$. Note

Hence

We get, as a result,

After cancelling out so many zeros, we get $J(T)$. You don’t have to expand the identity. Pick a component $\frac{\partial T_1}{\partial x_{j_1}}dx_{j_1}$ from $dy_1$. Then when we pick another component from $dy_2$ to get it multiplied with the first one, say $\frac{\partial T_2}{\partial x_{j_2}}dx_{j_2}$, then we must have $j_1 \neq j_2$ since if not, then $dx_{j_1}dx_{j_2}=0$, and we cancel that. The rule remains the same (but even stricter) when we pick components from $dy_3$, $dy_4$, and until $dy_n$. In the end, $j_1,j_2,\cdots,j_n$ are pairwise unequal. This corresponds exactly a permutation of $1,2,\cdots,n$. Hence we get

On the other hand, $dx_{\sigma(1)}dx_{\sigma(2)}\cdots dx_{\sigma(n)}=\epsilon(\sigma)dx_1dx_2\cdots dx_n$, and if we put this inside the expansion of $dy_1dy_2\cdots dy_n$, we get

We answered a calculus question in an algebraic way (and more than that if you review more related concepts in calculus).

]]>There are several ways to define Dedekind domain since there are several equivalent statements of it. We will start from the one based on ring of fractions. As a friendly reminder, $\mb{Z}$ or any principal integral domain is already a Dedekind domain. In fact Dedekind domain may be viewed as a generalization of principal integral domain.

Let $\mfk{o}$ be an integral domain (a.k.a. entire ring), and $K$ be its quotient field. A **Dedekind domain** is an integral domain $\mfk{o}$ such that the fractional ideals form a group under multiplication. Let’s have a breakdown. By a **fractional ideal** $\mfk{a}$ we mean a nontrivial additive subgroup of $K$ such that

- $\mfk{o}\mfk{a}=\mfk{a}$,
- there exists some nonzero element $c \in \mfk{o}$ such that $c\mfk{a} \subset \mfk{o}$.

What does the group look like? As you may guess, the unit element is $\mfk{o}$. For a fractional ideal $\mfk{a}$, we have the inverse to be another fractional ideal $\mfk{b}$ such that $\mfk{ab}=\mfk{ba}=\mfk{o}$. Note we regard $\mfk{o}$ as a subring of $K$. For $a \in \mfk{o}$, we treat it as $a/1 \in K$. This makes sense because the map $i:a \mapsto a/1$ is injective. For the existence of $c$, you may consider it as a restriction that the ‘denominator’ is *bounded*. Alternatively, we say that fractional ideal of $K$ is a finitely generated $\mfk{o}$-submodule of $K$. But in this post it is not assumed that you have learned module theory.

Let’s take $\mb{Z}$ as an example. The quotient field of $\mb{Z}$ is $\mb{Q}$. We have a fractional ideal $P$ where all elements are of the type $\frac{np}{2}$ with $p$ prime and $n \in \mb{Z}$. Then indeed we have $\mb{Z}P=P$. On the other hand, take $2 \in \mb{Z}$, we have $2P \subset \mb{Z}$. For its inverse we can take a fractional ideal $Q$ where all elements are of the type $\frac{2n}{p}$. As proved in algebraic number theory, the ring of algebraic integers in a number field is a Dedekind domain.

Before we go on we need to clarify the definition of ideal multiplication. Let $\mfk{a}$ and $\mfk{b}$ be two ideals, we define $\mfk{ab}$ to be the set of all sums

where $x_i \in \mfk{a}$ and $y_i \in \mfk{b}$. Here the number $n$ means finite but is not fixed. Alternatively we cay say $\mfk{ab}$ contains all finite sum of products of $\mfk{a}$ and $\mfk{b}$.

(Proposition 1)A Dedekind domain $\mfk{o}$ is Noetherian.

By Noetherian ring we mean that every ideal in a ring is finitely generated. Precisely, we will prove that for every ideal $\mfk{a} \subset \mfk{o}$ there are $a_1,a_2,\cdots,a_n \in \mfk{a}$ such that, for every $r \in \mfk{a}$, we have an expression

Also note that any ideal $\mfk{a} \subset \mfk{o}$ can be viewed as a fractional ideal.

**Proof.** Since $\mfk{a}$ is an ideal of $\mfk{o}$, let $K$ be the quotient field of $\mfk{o}$, we see since $\mfk{oa}=\mfk{a}$, we may also view $\mfk{a}$ as a fractional ideal. Since $\mfk{o}$ is a Dedekind domain, and fractional ideals of $\mfk{a}$ is a group, there is an fractional ideal $\mfk{b}$ such that $\mfk{ab}=\mfk{ba}=\mfk{o}$. Since $1 \in \mfk{o}$, we may say that there exists some $a_1,a_2,\cdots, a_n \in \mfk{a}$ and $b_1,b_2,\cdots,b_n \in \mfk{o}$ such that $\sum_{i = 1 }^{n}a_ib_i=1$. For any $r \in \mfk{a}$, we have an expression

On the other hand, any element of the form $c_1a_1+c_2a_2+\cdots+c_na_n$, by definition, is an element of $\mfk{a}$. $\blacksquare$

From now on, the inverse of an fractional ideal $\mfk{a}$ will be written like $\mfk{a}^{-1}$.

(Proposition 2)For ideals $\mfk{a},\mfk{b} \subset \mfk{o}$, $\mfk{b}\subset\mfk{a}$ if and only if there exists some $\mfk{c}$ such that $\mfk{ac}=\mfk{b}$ (or we simply say $\mfk{a}|\mfk{b}$)

**Proof.** If $\mfk{b}=\mfk{ac}$, simply note that $\mfk{ac} \subset \mfk{a} \cap \mfk{c} \subset \mfk{a}$. For the converse, suppose that $a \supset \mfk{b}$, then $\mfk{c}=\mfk{a}^{-1}\mfk{b}$ is an ideal of $\mfk{o}$ since $\mfk{c}=\mfk{a}^{-1}\mfk{b} \subset \mfk{a}^{-1}\mfk{a}=\mfk{o}$, hence we may write $\mfk{b}=\mfk{a}\mfk{c}$. $\blacksquare$

(Proposition 3)If $\mfk{a}$ is an ideal of $\mfk{o}$, then there are prime ideals $\mfk{p}_1,\mfk{p}_2,\cdots,\mfk{p}_n$ such that

**Proof.** For this problem we use a classical technique: contradiction on maximality. Suppose this is not true, let $\mfk{A}$ be the set of ideals of $\mfk{o}$ that cannot be written as the product of prime ideals. By assumption $\mfk{U}$ is nonempty. Since as we have proved, $\mfk{o}$ is Noetherian, we can pick an maximal element $\mfk{a}$ of $\mfk{A}$ with respect to inclusion. If $\mfk{a}$ is maximal, then since all maximal ideals are prime, $\mfk{a}$ itself is prime as well. If $\mfk{a}$ is properly contained in an ideal $\mfk{m}$, then we write $\mfk{a}=\mfk{m}\mfk{m}^{-1}\mfk{a}$. We have $\mfk{m}^{-1}\mfk{a} \supsetneq \mfk{a}$ since if not, we have $\mfk{a}=\mfk{ma}$, which implies $\mfk{m}=\mfk{o}$. But by maximality, $\mfk{m}^{-1}\mfk{a}\not\in\mfk{U}$, hence it can be written as a product of prime ideals. But $\mfk{m}$ is prime as well, we have a prime factorization for $\mfk{a}$, contradicting the definition of $\mfk{U}$.

Next we show uniqueness up to permutation. If

since $\mfk{p}_1\mfk{p}_2\cdots\mfk{p}_k\subset\mfk{p}_1$ and $\mfk{p}_1$ is prime, we may assume that $\mfk{q}_1 \subset \mfk{p}_1$. By the property of fractional ideal we have $\mfk{q}_1=\mfk{p}_1\mfk{r}_1$ for some fractional ideal $\mfk{r}_1$. However we also have $\mfk{q}_1 \subset \mfk{r}_1$. Since $\mfk{q}_1$ is prime, we either have $\mfk{q}_1 \supset \mfk{p}_1$ or $\mfk{q}_1 \supset \mfk{r}_1$. In the former case we get $\mfk{p}_1=\mfk{q}_1$, and we finish the proof by continuing inductively. In the latter case we have $\mfk{r}_1=\mfk{q}_1=\mfk{p}_1\mfk{q}_1$, which shows that $\mfk{p}_1=\mfk{o}$, which is impossible. $\blacksquare$

(Proposition 4)Every nontrivial prime ideal $\mfk{p}$ is maximal.

**Proof.** Let $\mfk{m}$ be an maximal ideal containing $\mfk{p}$. By proposition 2 we have some $\mfk{c}$ such that $\mfk{p}=\mfk{mc}$. If $\mfk{m} \neq \mfk{p}$, then $\mfk{c} \neq \mfk{o}$, and we may write $\mfk{c}=\mfk{p}_1\cdots\mfk{p}_n$, hence $\mfk{p}=\mfk{m}\mfk{p}_1\cdots\mfk{p}_n$, which is a prime factorisation, contradicting the fact that $\mfk{p}$ has a unique prime factorisation, which is $\mfk{p}$ itself. Hence any maximal ideal containing $\mfk{p}$ is $\mfk{p}$ itself. $\blacksquare$

(Proposition 5)Suppose the Dedekind domain $\mfk{o}$ only contains one prime (and maximal) ideal $\mfk{p}$, let $t \in \mfk{p}$ and $t \not\in \mfk{p}^2$, then $\mfk{p}$ is generated by $t$.

**Proof.** Let $\mfk{t}$ be the ideal generated by $t$. By proposition 3 we have a factorisation

for some $n$ since $\mfk{o}$ contains only one prime ideal. According to proposition 2, if $n \geq 3$, we write $\mfk{p}^n=\mfk{p}^2\mfk{p}^{n-2}$, we see $\mfk{p}^2 \supset \mfk{p}^n$. But this is impossible since if so we have $t \in \mfk{p}^n \subset \mfk{p}^2$ contradicting our assumption. Hence $0<n<3$. But If $n=2$ we have $t \in \mfk{p}^2$ which is also not possible. So $\mfk{t}=\mfk{p}$ provided that such $t$ exists.

For the existence of $t$, note if not, then for all $t \in \mfk{p}$ we have $t \in \mfk{p}^2$, hence $\mfk{p} \subset \mfk{p}^2$. On the other hand we already have $\mfk{p}^2 = \mfk{p}\mfk{p}$, which implies that $\mfk{p}^2 \subset \mfk{p}$ (proposition 2), hence $\mfk{p}^2=\mfk{p}$, contradicting proposition 3. Hence such $t$ exists and our proof is finished. $\blacksquare$

In fact there is another equivalent definition of Dedekind domain:

A domain $\mfk{o}$ is Dedekind if and only if

- $\mfk{o}$ is Noetherian.
- $\mfk{o}$ is integrally closed.
- $\mfk{o}$ has Krull dimension $\leq 1$ (i.e. every non-zero prime ideals are maximal).

This is equivalent to say that faction ideals form a group and is frequently used by mathematicians as well. But we need some more advanced techniques to establish the equivalence. Presumably there will be a post about this in the future.

]]>we have Hardy’s inequality $\def\lrVert[#1]{\lVert #1 \rVert}$

where $\frac{1}{p}+\frac{1}{q}=1$ of course.

There are several ways to prove it. I think there are several good reasons to write them down thoroughly since that may be why you find this page. Maybe you are burnt out since it’s *left as exercise*. You are assumed to have enough knowledge of Lebesgue measure and integration.

Let $S_1,S_2 \subset \mathbb{R}$ be two measurable set, suppose $F:S_1 \times S_2 \to \mathbb{R}$ is measurable, then

A proof can be found at here and you need to turn to Example A9. You may need to replace all measures with Lebesgue measure $m$.

Now let’s get into it. For a measurable function in this place we should have $G(x,t)=\frac{f(t)}{x}$. If we put this function inside this inequality, we see

Note we have used change-of-variable twice and the inequality once.

I have no idea how people came up with this solution. Take $xF(x)=\int_0^x f(t)t^{u}t^{-u}dt$ where $0<u<1-\frac{1}{p}$. Hölder’s inequality gives us

Hence

Note we have used the fact that $\frac{1}{p}+\frac{1}{q}=1 \implies p+q=pq$ and $\frac{p}{q}=p-1$. Fubini’s theorem gives us the final answer:

It remains to find the minimum of $\varphi(u) = \left(\frac{1}{1-uq}\right)^{p-1}\frac{1}{up}$. This is an elementary calculus problem. By taking its derivative, we see when $u=\frac{1}{pq}<1-\frac{1}{p}$ it attains its minimum $\left(\frac{p}{p-1}\right)^p=q^p$. Hence we get

which is exactly what we want. Note the constant $q$ cannot be replaced with a smaller one. We simply proved the case when $f \geq 0$. For the general case, one simply needs to take absolute value.

This approach makes use of properties of $L^p$ space. Still we assume that $f \geq 0$ but we also assume $f \in C_c((0,\infty))$, that is, $f$ is continuous and has compact support#Compact_support). Hence $F$ is differentiable in this situation. Integration by parts gives

Note since $f$ has compact support, there are some $[a,b]$ such that $f >0$ only if $0 < a \leq x \leq b < \infty$ and hence $xF(x)^p\vert_0^\infty=0$. Next it is natural to take a look at $F’(x)$. Note we have

hence $xF’(x)=f(x)-F(x)$. A substitution gives us

which is equivalent to say

Hölder’s inequality gives us

Together with the identity above we get

which is exactly what we want since $1-\frac{1}{q}=\frac{1}{p}$ and all we need to do is divide $\left[\int_0^\infty F^pdx\right]^{1/q}$ on both sides. So what’s next? Note $C_c((0,\infty))$ is dense in $L^p((0,\infty))$. For any $f \in L^p((0,\infty))$, we can take a sequence of functions $f_n \in C_c((0,\infty))$ such that $f_n \to f$ with respect to $L^p$-norm. Taking $F=\frac{1}{x}\int_0^x f(t)dt$ and $F_n = \frac{1}{x}\int_0^x f_n(t)dt$, we need to show that $F_n \to F$ pointwise, so that we can use Fatou’s lemma. For $\varepsilon>0$, there exists some $m$ such that $\lrVert[f_n-f]_p < \frac{1}{n}$. Thus

Hence $F_n \to F$ pointwise, which also implies that $|F_n|^p \to |F|^p$ pointwise. For $|F_n|$ we have

note the third inequality follows since we have already proved it for $f \geq 0$. By Fatou’s lemma, we have

]]>It is quite often to see direct sum or direct product of groups, modules, vector spaces. Indeed, for modules over a ring $R$, direct products are also **direct products** of $R$-modules as well. On the other hand, the direct sum is a **coproduct** in the category of $R$-modules.

But what about tensor products? It is some different kind of *product* but how? Is it related to direct product? How do we write a tensor product down? We need to solve this question but it is not a good idea to dig into numeric works.

From now on, let $R$ be a commutative ring, and $M_1,\cdots,M_n$ are $R$-modules. Mainly we work on $M_1$ and $M_2$, i.e. $M_1 \times M_2$ and $M_1 \otimes M_2$. For $n$-multilinear one, simply replace $M_1\times M_2$ with $M_1 \times M_2 \times \cdots \times M_n$ and $M_1 \otimes M_2$ with $M_1 \otimes \cdots \otimes M_n$. The only difference is the change of symbols.

The bilinear maps of $M_1 \times M_2$ determines a category, say $BL(M_1 \times M_2)$ or we simply write $BL$. For an object $(f,E)$ in this category we have $f: M_1 \times M_2 \to E$ as a bilinear map and $E$ as a $R$-module of course. For two objects $(f,E)$ and $(g,F)$, we define the morphism between them as a linear function making the following diagram commutative: $\def\mor{\operatorname{Mor}}$

This indeed makes $BL$ a category. If we define the morphisms from $(f,E)$ to $(g,F)$ by $\mor(f,g)$ (for simplicity we omit $E$ and $F$ since they are already determined by $f$ and $g$) we see the composition

satisfy all axioms for a category:

**CAT 1** Two sets $\mor(f,g)$ and $\mor(f’,g’)$ are disjoint unless $f=f’$ and $g=g’$, in which case they are equal. If $g \neq g’$ but $f = f’$ for example, for any $h \in \mor(f,g)$, we have $g = h \circ f = h \circ f’ \neq g’$, hence $h \notin \mor(f,g)$. Other cases can be verified in the same fashion.

**CAT 2** The existence of identity morphism. For any $(f,E) \in BL$, we simply take the identity map $i:E \to E$. For $h \in \mor(f,g)$, we see $g = h \circ f = h \circ i \circ f$. For $h’ \in \mor(g,f)$, we see $f = h’ \circ g = i \circ h’ \circ g$.

**CAT 3** The law of composition is associative when defined.

There we have a category. But what about the tensor product? It is defined to be *initial* (or *universally repelling*) object in this category. Let’s denote this object by $(\varphi,M_1 \otimes M_2)$.

For any $(f,E) \in BL$, we have a unique morphism (which is a module homomorphism as well) $h:(\varphi,M_1 \otimes M_2) \to (f,E)$. For $x \in M_1$ and $y \in M_2$, we write $\varphi(x,y)=x \otimes y$. We call the existence of $h$ the

universal propertyof $(\varphi,M_1 \otimes M_2)$.

The tensor product is unique up to isomorphism. That is, if both $(f,E)$ and $(g,F)$ are tensor products, then $E \simeq F$ in the sense of module isomorphism. Indeed, let $h \in \mor(f,g)$ and $h’ \in \mor(g,h)$ be the unique morphisms respectively, we see $g = h \circ f$, $f = h’ \circ g$, and therefore

Hence $h \circ h’$ is the identity of $(g,F)$ and $h’ \circ h$ is the identity of $(f,E)$. This gives $E \simeq F$.

What do we get so far? For any modules that is connected to $M_1 \times M_2$ with a bilinear map, the tensor product $M_1 \oplus M_2$ of $M_1$ and $M_2$, is always able to be connected to that module with a unique module homomorphism. What if there are more than one tensor products? Never mind. All tensor products are isomorphic.

But wait, does this definition make sense? Does this product even exist? How can we study the tensor product of two modules if we cannot even write it down? So far we are only working on arrows, and we don’t know what is happening inside an module. It is not a good idea to waste our time on ‘nonsenses’. We can look into it in an natural way. Indeed, if we can find a module satisfying the property we want, then we are done, since this can represent the tensor product under any circumstances. Again, all tensor products of $M_1$ and $M_2$ are isomorphic.

Let $M$ be the free module generated by the set of all tuples $(x_1,x_2)$ where $x_1 \in M_1$ and $x_2 \in M_2$, and $N$ be the submodule generated by tuples of the following types:

First we have a inclusion map $\alpha=M_1 \times M_2 \to M$ and the canonical map $\pi:M \to M/N$. We claim that $(\pi \circ \alpha, M/N)$ is exactly what we want. But before that, we need to explain why we define such a $N$.

The reason is quite simple: We want to make sure that $\varphi=\pi \circ \alpha$ is bilinear. For example, we have $\varphi(x_1+x_1’,x_2)=\varphi(x_1,x_2)+\varphi(x_1’,x_2)$ due to our construction of $N$ (other relations follow in the same manner). This can be verified group-theoretically. Note

but

Hence we get the identity we want. For this reason we can write

Sometimes to avoid confusion people may also write $x_1 \otimes_R x_2$ if both $M_1$ and $M_2$ are $R$-modules. But before that we have to verify that this is indeed the tensor product. To verify this, all we need is the universal property of free modules.

By the universal property of $M$, for any $(f,E) \in BL$, we have a induced map $f_\ast$ making the diagram inside commutative. However, for elements in $N$, we see $f_\ast$ takes value $0$, since $f_\ast$ is a bilinear map already. We finish our work by taking $h[(x,y)+N] = f_\ast(x,y)$. This is the map induced by $f_\ast$, following the property of factor module.

For coprime integers $m,n>1$, we have $\def\mb{\mathbb}$

where $O$ means that the module only contains $0$ and $\mb{Z}/m\mb{Z}$ is considered as a module over $\mb{Z}$ for $m>1$. This suggests that, the tensor product of two modules is not necessarily ‘bigger’ than its components. Let’s see why this is trivial.

Note that for $x \in \mb{Z}/m\mb{Z}$ and $y \in \mb{Z}/n\mb{Z}$, we have

since, for example, $mx = 0$ for $x \in \mb{Z}/m\mb{Z}$ and $\varphi(0,y)=0$. If you have trouble understanding why $\varphi(0,y)=0$, just note that the submodule $N$ in our construction contains elements generated by $(0x,y)-0(x,y)$ already.

By Bézout’s identity, for any $x \otimes y$, we see there are $a$ and $b$ such that $am+bn=1$, and therefore

Hence the tensor product is trivial. This example gives us a lot of inspiration. For example, what if $m$ and $n$ are not necessarily coprime, say $\gcd(m,n)=d$? By Bézout’s identity still we have

This inspires us to study the connection between $\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z}$ and $\mb{Z}/d\mb{Z}$. By the **universal property**, for the bilinear map $f:\mb{Z}/m\mb{Z} \times \mb{Z}/n\mb{Z} \to \mb{Z}/d\mb{Z}$ defined by

(there should be no difficulty to verify that $f$ is well-defined), there exists a unique morphism $h:\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} \to \mb{Z}/d\mb{Z}$ such that

Next we show that it has a natural inverse defined by

Taking $a’ = a+kd$, we show that $g(a+d\mb{Z})=g(a’+\mb{Z})$, that is, we need to show that

By Bézout’s identity, there exists some $r,s$ such that $rm+sn=d$. Hence $a’ = a + ksn+krm$, which gives

since

So $g$ is well-defined. Next we show that this is the inverse. Firstly

Secondly,

Hence $g = h^{-1}$ and we can say

If $m,n$ are coprime, then $\gcd(m,n)=1$, hence $\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} \simeq \mb{Z}/\mb{Z}$ is trivial. More interestingly, $\mb{Z}/m\mb{Z}\otimes \mb{Z}/m\mb{Z}=\mb{Z}/m\mb{Z}$. But this elegant identity raised other questions. First of all, $\gcd(m,n)=\gcd(n,m)$, which implies

Further, for $m,n,r >1$, we have $\gcd(\gcd(m,n),r)=\gcd(m,\gcd(n,r))=\gcd(m,n,r)$, which gives

hence

Hence for modules of the form $\mb{Z}/m\mb{Z}$, we see the tensor product operation is associative and commutative up to isomorphism. Does this hold for all modules? The universal property answers this question affirmatively. From now on we will be keep using the universal property. Make sure that you have got the point already.

Let $M_1,M_2,M_3$ be $R$-modules, then there exists a unique isomorphism

for $x \in M_1$, $y \in M_2$, $z \in M_3$.

*Proof.* Consider the map

where $x \in M_1$. Since $(\cdot\otimes\cdot)$ is bilinear, we see $\lambda_x$ is bilinear for all $x \in M_1$. Hence by the universal property there exists a unique map of the tensor product:

Next we have the map

which is bilinear as well. Again by the universal property we have a unique map

This is indeed the isomorphism we want. The reverse is obtained by reversing the process. For the bilinear map

we get a unique map

Then from the bilinear map

we get the unique map, which is actually the reverse of $\overline{\mu}_x$:

Hence the two tensor products are isomorphic. $\square$

Let $M_1$ and $M_2$ be $R$-modules, then there exists a unique isomorphism

where $x_1 \in M_1$ and $x_2 \in M_2$.

*Proof.* The map

is bilinear and gives us a unique map

given by $x \otimes y \mapsto y \otimes x$. Symmetrically, the map $\lambda’:M_2 \times M_1 \to M_1 \otimes M_2$ gives us a unique map

which is the inverse of $\overline{\lambda}$. $\square$

Therefore, we may view the set of all $R$-modules as a commutative semigroup with the binary operation $\otimes$.

Consider commutative diagram:

Where $f_i:M_i \to M_i’$ are some module-homomorphism. What do we want here? On the left hand, we see $f_1 \times f_2$ sends $(x_1,x_2)$ to $(f_1(x_1),f_2(x_2))$, which is quite natural. The question is, is there a natural map sending $x_1 \otimes x_2$ to $f_1(x_1) \otimes f_2(x_2)$? This is what we want from the right hand. We know $T(f_1 \times f_2)$ exists, since we have a bilinear map by $\mu = \varphi’ \circ (f_1\times f_2)$. So for $(x_1,x_2) \in M_1 \times M_2$, we have $T(f_1 \times f_2)(x_1 \otimes x_2) = \varphi’ \circ (f_1 \times f_2)(x_1,x_2) = f_1(x_1) \otimes f_2(x_2)$ as what we want.

But $T$ in this graph has more interesting properties. First of all, if $M_1 = M_1’$ an $M_2 = M_2’$, both $f_1$ and $f_2$ are identity maps, then we see $T(f_1 \times f_2)$ is the identity as well. Next, consider the following chain

We can make it a double chain:

It is obvious that $(g_1 \circ f_1 \times g_2 \circ f_2)=(g_1 \times g_2) \circ (f_1 \times f_2)$, which also gives

Hence we can say $T$ is functorial. Sometimes for simplicity we also write $T(f_1,f_2)$ or simply $f_1 \otimes f_2$, as it sends $x_1 \otimes x_2$ to $f_1(x_1) \otimes f_2(x_2)$. Indeed it can be viewed as a map

]]>First we recall some backgrounds. Suppose $A$ is a ring with multiplicative identity $1_A$. A **left module** of $A$ is an additive abelian group $(M,+)$, together with an ring operation $A \times M \to M$ such that

for $x,y \in M$ and $a,b \in A$. As a corollary, we see $(0_A+0_A)x=0_Ax=0_Ax+0_Ax$, which shows $0_Ax=0_M$ for all $x \in M$. On the other hand, $a(x-x)=0_M$ which implies $a(-x)=-(ax)$. We can also define right $A$-modules but we are not discussing them here.

Let $S$ be a subset of $M$. We say $S$ is a **basis** of $M$ if $S$ generates $M$ and $S$ is linearly independent. That is, for all $m \in M$, we can pick $s_1,\cdots,s_n \in S$ and $a_1,\cdots,a_n \in A$ such that

and, for any $s_1,\cdots,s_n \in S$, we have

Note this also shows that $0_M\notin S$ (what happens if $0_M \in S$?). We say $M$ is **free** if it has a basis. The case when $M$ or $A$ is trivial is excluded.

If $A$ is a field, then $M$ is called a **vector space**, which has no difference from the one we learn in linear algebra and functional analysis. Mathematicians in functional analysis may be interested in the cardinality of a vector space, for example, when a vector space is of finite dimension, or when the basis is countable. But the basis does not come from nowhere. In fact we can prove that vector spaces have basis, but modules are not so lucky. $\def\mb{\mathbb}$

First of all let’s consider the cyclic group $\mb{Z}/n\mb{Z}$ for $n \geq 2$. If we define

which is actually $m$ copies of an element, then we get a module, which will be denoted by $M$. For any $x=k+n\mb{Z} \in M$, we see $nk+n\mb{Z}=0_M$. Therefore for **any** subset $S \subset M$, if $x_1,\cdots,x_k \in M$, we have

which gives the fact that $M$ has no basis. In fact this can be generalized further. If $A$ is a ring but not a field, let $I$ be a nontrivial proper ideal, then $A/I$ is a module that has no basis.

Following $\mb{Z}/n\mb{Z}$ we also have another example on finite order. Indeed, **any finite abelian group is not free as a module over $\mb{Z}$.** More generally,

Let $G$ be a abelian group, and $G_{tor}$ be its torsion subgroup. If $G_{tor}$ is non-trival, then $G$ cannot be a free module over $\mb{Z}$.

Next we shall take a look at infinite rings. Let $F[X]$ be the polynomial ring over a field $F$ and $F’[X]$ be the polynomial sub-ring that have coefficient of $X$ equal to $0$. Then $F[X]$ is a $F’[X]$-module. However it is not free.

Suppose we have a basis $S$ of $F[X]$, then we claim that $|S|>1$. If $|S|=1$, say $P \in S$, then $P$ cannot generate $F[X]$ since if $P$ is constant then we cannot generate a polynomial contains $X$ with power $1$; If $P$ is not constant, then the constant polynomial cannot be generate. Hence $S$ contains at least two polynomials, say $P_1 \neq 0$ and $P_2 \neq 0$. However, note $-X^2P_1 \in F’[X]$ and $X^2P_2 \in F’[X]$, which gives

Hence $S$ cannot be a basis.

I hope those examples have convinced you that basis is not a universal thing. We are going to prove that every vector space has a basis. More precisely,

Let $V$ be a nontrivial vector space over a field $K$. Let $\Gamma$ be a set of generators of $V$ over $K$ and $S \subset \Gamma$ is a subset which is linearly independent, then there exists a basis of $V$ such that $S \subset B \subset \Gamma$.

Note we can always find such $\Gamma$ and $S$. For the extreme condition, we can pick $\Gamma=V$ and $S$ be a set containing any single non-zero element of $V$. Note this also gives that we can generate a basis by expanding any linearly independent set. The proof relies on a fact that every non-zero element in a field is invertible, and also, Zorn’s lemma. $\def\mfk{\mathfrak}$

*Proof.* Define

Then $\mfk{T}$ is not empty since it contains $S$. If $T_1 \subset T_2 \subset \cdots$ is a totally ordered chain in $\mfk{T}$, then $T=\bigcup_{i=1}^{\infty}T_i$ is again linearly independent and contains $S$. To show that $T$ is linearly independent, note that if $x_1,x_2,\cdots,x_n \in T$, we can find some $k_1,\cdots,k_n$ such that $x_i \in T_{k_i}$ for $i=1,2,\cdots,n$. If we pick $k = \max(k_1,\cdots,k_n)$, then

But we already know that $T_k$ is linearly independent, so $a_1x_1+\cdots+a_nx_n=0_V$ implies $a_1=\cdots=a_n=0_K$.

By Zorn’s lemma, let $B$ be the maximal element of $\mfk{T}$, then $B$ is also linearly independent since it is an element of $\mfk{T}$. Next we show that $B$ generates $V$. Suppose not, then we can pick some $x \in \Gamma$ that is not generated by $B$. Define $B’=B \cup \{x\}$, we see $B’$ is linearly independent as well, because if we pick $y_1,y_2,\cdots,y_n \in B$, and if

then if $b \neq 0$ we have

contradicting the assumption that $x$ is not generated by $B$. Hence $b=0_K$. However, we have proved that $B’$ is a linearly independent set containing $B$ and contained in $S$, contradicting the maximality of $B$ in $\mfk{T}$. Hence $B$ generates $V$. $\square$

]]>In fact the construction of $\mathbb{Q}$ from $\mathbb{Z}$ has already been an example. For any $a \in \mathbb{Q}$, we have some $m,n \in \mathbb{Z}$ with $n \neq 0$ such that $a = \frac{m}{n}$. As a matter of notation we may also say an ordered pair $(m,n)$ determines $a$. Two ordered pairs $(m,n)$ and $(m’,n’)$ are *equivalent* if and only if

But we are only using the ring structure of $\mathbb{Z}$. So it is natural to think whether it is possible to generalize this process to all rings. But we are also using the fact that $\mathbb{Z}$ is an entire ring (or alternatively integral domain, they mean the same thing). However there is a way to generalize it. $\def\mfk{\mathfrak}$

(Definition 1)Amultiplicatively closed subset$S \subset A$ is a set that $1 \in S$ and if $x,y \in S$, then $xy \in S$.

For example, for $\mathbb{Z}$ we have a multiplicatively closed subset

We can also insert $0$ here but it may produce some bad result. If $S$ is also an ideal then we must have $S=A$ so this is not very interesting. However the complement is interesting.

(Proposition 1)Suppose $A$ is a commutative ring such that $1 \neq 0$. Let $S$ be a multiplicatively closed set that does not contain $0$. Let $\mfk{p}$ be the maximal element of ideals contained in $A \setminus S$, then $\mfk{p}$ is prime.

*Proof.* Recall that $\mfk{p}$ is prime if for any $x,y \in A$ such that $xy \in \mfk{p}$, we have $x \in \mfk{p}$ or $y \in \mfk{p}$. But now we fix $x,y \in \mfk{p}^c$. Note we have a strictly bigger ideal $\mfk{q}_1=\mfk{p}+Ax$. Since $\mfk{p}$ is maximal in the ideals contained in $A \setminus S$, we see

Therefore there exist some $a \in A$ and $p \in \mfk{p}$ such that

Also, $\mfk{q}_2=\mfk{p}+Ay$ has nontrivial intersection with $S$ (due to the maximality of $\mfk{p}$), there exist some $a’ \in A$ and $p’ \in \mfk{p}$ such that

Since $S$ is closed under multiplication, we have

But since $\mfk{p}$ is an ideal, we see $pp’+p’ax+pa’y \in \mfk{p}$. Therefore we must have $xy \notin \mfk{p}$ since if not, $(p+ax)(p’+a’y) \in \mfk{p}$, which gives $\mfk{p} \cap S \neq \varnothing$, and this is impossible. $\square$

As a corollary, for an ideal $\mfk{p} \subset A$, if $A \setminus \mfk{p}$ is multiplicatively closed, then $\mfk{p}$ is prime. Conversely, if we are given a prime ideal $\mfk{p}$, then we also get a multiplicatively closed subset.

(Proposition 2)If $\mfk{p}$ is a prime ideal of $A$, then $S = A \setminus \mfk{p}$ is multiplicatively closed.

*Proof.* First $1 \in S$ since $\mfk{p} \neq A$. On the other hand, if $x,y \in S$ we see $xy \in S$ since $\mfk{p}$ is prime. $\square$

We define a equivalence relation on $A \times S$ as follows:

(Proposition 3)$\sim$ is an equivalence relation.

*Proof.* Since $(as-as)1=0$ while $1 \in S$, we see $(a,s) \sim (a,s)$. For being symmetric, note that

Finally, to show that it is transitive, suppose $(a,s) \sim (b,t)$ and $(b,t) \sim (c,u)$. There exist $u,v \in S$ such that

This gives $bsv=atv$ and $buw = ctw$, which implies

But $tvw \in S$ since $t,v,w \in S$ and $S$ is multiplicatively closed. Hence

$\square$

Let $a/s$ denote the equivalence class of $(a,s)$. Let $S^{-1}A$ denote the set of equivalence classes (it is not a good idea to write $A/S$ as it may coincide with the notation of factor group), and we put a ring structure on $S^{-1}A$ as follows:

There is no difference between this one and the one in elementary algebra. But first of all we need to show that $S^{-1}A$ indeed form a ring.

(Proposition 4)The addition and multiplication are well defined. Further, $S^{-1}A$ is a commutative ring with identity.

*Proof.* Suppose $(a,s) \sim (a’,s’)$ and $(b,t) \sim (b’,t’)$ we need to show that

or

There exists $u,v \in S$ such that

If we multiply the first equation by $vtt’$ and second equation by $uss’$, we see

which is exactly what we want.

On the other hand, we need to show that

That is,

Again, we have

Hence

Since $uv \in S$, we are done.

Next we show that $S^{-1}A$ has a ring structure. If $0 \in S$, then $S^{-1}A$ contains exactly one element $0/1$ since in this case, all pairs are equivalent:

We therefore only discuss the case when $0 \notin S$. First $0/1$ is the zero element with respect to addition since

On the other hand, we have the inverse $-a/s$:

$1/1$ is the unit with respect to multiplication:

Multiplication is associative since

Multiplication is commutative since

Finally distributivity.

Note $ab/cb=a/c$ since $(abc-abc)1=0$. $\square$ $\def\mb{\mathbb}$

First we consider the case when $A$ is entire. If $0 \in S$, then $S^{-1}A$ is trivial, which is not so interesting. However, provided that $0 \notin S$, we get some well-behaved result:

(Proposition 5)Let $A$ be an entire ring, and let $S$ be a multiplicatively closed subset of $A$ that does not contain $0$, then the natural mapis injective. Therefore it can be considered as a natural inclusion. Further, every element of $\varphi_S(S)$ is invertible.

*Proof.* Indeed, if $x/1=0/1$, then there exists $s \in S$ such that $xs=0$. Since $A$ is entire and $s \neq 0$, we see $x=0$, hence $\varphi_S$ is entire. For $s \in S$, we see $\varphi_S(s)=s/1$. However $(1/s)\varphi_S(s)=(1/s)(s/1)=s/s=1$. $\square$

Note since $A$ is entire we can also conclude that $S^{-1}A$ is entire. As a word of warning, the ring homomorphism $\varphi_S$ is *not* in general injective since, for example, when $0 \in S$, this map is the zero.

If we go further, making $S$ contain all non-zero element, we have:

(Proposition 6)If $A$ is entire and $S$ contains all non-zero elements of $A$, then $S^{-1}A$ is a field, called thequotient fieldor thefield of fractions.

*Proof.* First we need to show that $S^{-1}A$ is entire. Suppose $(a/s)(b/t)=ab/st =0/1$ but $a/s \neq 0/1$, we see however

Since $A$ is entire, $b$ has to be $0$, which implies $b/t=0/1$. Second, if $a/s \neq 0/1$, we see $a \neq 0$ and therefore is in $S$, hence we’ve found the inverse $(a/s)^{-1}=s/a$. $\square$

In this case we can identify $A$ as a subset of $S^{-1}A$ and write $a/s=s^{-1}a$.

Let $A$ be a commutative ring, an let $S$ be the set of invertible elements of $A$. If $u \in S$, then there exists some $v \in S$ such that $uv=1$. We see $1 \in S$ and if $a,b \in S$, we have $ab \in S$ since $ab$ has an inverse as well. This set is frequently denoted by $A^\ast$, and is called the group of **invertible** elements of $A$. For example for $\mb{Z}$ we see $\mb{Z}^\ast$ consists of $-1$ and $1$. If $A$ is a field, then $A^\ast$ is the multiplicative group of non-zero elements of $A$. For example $\mb{Q}^\ast$ is the set of all rational numbers without $0$. For $A^\ast$ we have

If $A$ is a field, then $(A^\ast)^{-1}A \simeq A$.

*Proof.* Define

Then as we have already shown, $\varphi_S$ is injective. Secondly we show that $\varphi_S$ is surjective. For any $a/s \in (A^\ast)^{-1}A$, we see $as^{-1}/1 = a/s$. Therefore $\varphi_S(as^{-1})=a/s$ as is shown. $\square$

Now let’s see a concrete example. If $A$ is entire, then the polynomial ring $A[X]$ is entire. If $K = S^{-1}A$ is the quotient field of $A$, we can denote the quotient field of $A[X]$ as $K(X)$. Elements in $K(X)$ can be naturally called **rational polynomials**, and can be written as $f(X)/g(X)$ where $f,g \in A[X]$. For $b \in K$, we say a rational function $f/g$ is **defined** at $b$ if $g(b) \neq 0$. Naturally this process can be generalized to polynomials of $n$ variables.

We say a commutative ring $A$ is local if it has a unique maximal ideal. Let $\mfk{p}$ be a prime ideal of $A$, and $S = A \setminus \mfk{p}$, then $A_{\mfk{p}}=S^{-1}A$ is called the **local ring of $A$ at $\mfk{p}$**. Alternatively, we say the process of passing from $A$ to $A_\mfk{p}$ is *localization* at $\mfk{p}$. You will see it makes sense to call it localization:

(Proposition 7)$A_\mfk{p}$ is local. Precisely, the unique maximal ideal isNote $I$ is indeed equal to $\mfk{p}A_\mfk{p}$.

*Proof.* First we show that $I$ is an ideal. For $b/t \in A_\mfk{p}$ and $a/s \in I$, we see

since $a \in \mfk{p}$ implies $ba \in \mfk{p}$. Next we show that $I$ is maximal, which is equivalent to show that $A_\mfk{p}/I$ is a field. For $b/t \notin I$, we have $b \in S$, hence it is legit to write $t/b$. This gives

Hence we have found the inverse.

Finally we show that $I$ is the unique maximal ideal. Let $J$ be another maximal ideal. Suppose $J \neq I$, then we can pick $m/n \in J \setminus I$. This gives $m \in S$ since if not $m \in \mfk{p}$ and then $m/n \in I$. But for $n/m \in A_\mfk{p}$ we have

This forces $J$ to be $A_\mfk{p}$ itself, contradicting the assumption that $J$ is a maximal ideal. Hence $I$ is unique. $\square$

Let $p$ be a prime number, and we take $A=\mb{Z}$ and $\mfk{p}=p\mb{Z}$. We now try to determine what do $A_\mfk{p}$ and $\mfk{p}A_\mfk{p}$ look like. First $S = A \setminus \mfk{p}$ is the set of all entire numbers prime to $p$. Therefore $A_\mfk{p}$ can be considered as the ring of all rational numbers $m/n$ where $n$ is prime to $p$, and $\mfk{p}A_\mfk{p}$ can be considered as the set of all rational numbers $kp/n$ where $k \in \mb{Z}$ and $n$ is prime to $p$.

$\mb{Z}$ is the simplest example of ring and $p\mb{Z}$ is the simplest example of prime ideal. And $A_\mfk{p}$ in this case shows what does localization do: $A$ is ‘expanded’ with respect to $\mfk{p}$. Every member of $A_\mfk{p}$ is related to $\mfk{p}$, and the maximal ideal is determined by $\mfk{p}$.

Let $k$ be a infinite field. Let $A=k[x_1,\cdots,x_n]$ where $x_i$ are independent indeterminates, $\mfk{p}$ a prime ideal in $A$. Then $A_\mfk{p}$ is the ring of all rational functions $f/g$ where $g \notin \mfk{p}$. We have already defined rational functions. But we can go further and demonstrate the prototype of the local rings which arise in algebraic geometry. Let $V$ be the variety defined by $\mfk{p}$, that is,

Then what about $A_\mfk{p}$? We see since for $f/g \in A_\mfk{p}$ we have $g \notin \mfk{p}$, therefore for $g(x)$ is not equal to $0$ almost everywhere on $V$. That is, $A_\mfk{p}$ can be identified with the ring of all rational functions on $k^n$ which are defined at *almost all* points of $V$. We call this the local ring of $k^n$ **along the variety** $V$.

Let $A$ be a ring and $S^{-1}A$ a ring of fractions, then we shall see that $\varphi_S:S \to S^{-1}A$ has a universal property.

(Proposition 8)Let $g:A \to B$ be a ring homomorphism such that $g(s)$ is invertible in $B$ for all $s \in S$, then there exists a unique homomorphism $h:S^{-1}A \to B$ such that $g = h \circ \varphi_S$.

*Proof.* For $a/s \in S^{-1}A$, define $h(a/s)=g(a)g(s)^{-1}$. It looks immediate but we shall show that this is what we are looking for and is unique.

Firstly we need to show that it is well defined. Suppose $a/s=a’/s’$, then there exists some $u \in S$ such that

Applying $g$ on both side yields

Since $g(x)$ is invertible for all $s \in S$, we therefore get

It is a homomorphism since

and

they are equal since

Next we show that $g=h \circ \varphi_S$. For $a \in A$, we have

Finally we show that $h$ is unique. Let $h’$ be a homomorphism satisfying the condition, then for $a \in A$ we have

For $s \in S$, we also have

Since $a/s = (a/1)(1/s)$ for all $a/s \in S^{-1}A$, we get

That is, $h’$ (or $h$) is totally determined by $g$. $\square$

Let’s restate it in the language of category theory (you can skip it if you have no idea what it is now). Let $\mfk{C}$ be the category whose objects are ring-homomorphisms

such that $f(s)$ is invertible for all $s \in S$. Then according to proposition 5, $\varphi_S$ is an object of $\mfk{C}$. For two objects $f:A \to B$ and $f’:A \to B’$, a morphism $g \in \operatorname{Mor}(f,f’)$ is a homomorphism

such that $f’=g \circ f$. So here comes the question: what is the position of $\varphi_S$?

Let $\mfk{A}$ be a category. an object $P$ of $\mfk{A}$ is called **universally attracting** if there exists a unique morphism of each object of $\mfk{A}$ into $P$, an is called **universally repelling** if for every object of $\mfk{A}$ there exists a unique morphism of $P$ into this object. Therefore we have the answer for $\mfk{C}$.

(Proposition 9)$\varphi_S$ is a universally repelling object in $\mfk{C}$.

An ideal $\mfk{o} \in A$ is said to be **principal** if there exists some $a \in A$ such that $Aa = \mfk{o}$. For example for $\mb{Z}$, the ideal

is principal and we may write $2\mb{Z}$. If every ideal of a **commutative** ring $A$ is principal, we say $A$ is principal. Further we say $A$ is a **PID** if $A$ is also an integral domain (entire). When it comes to ring of fractions, we also have the following proposition:

(Proposition 10)Let $A$ be a principal ring and $S$ a multiplicatively closed subset with $0 \notin S$, then $S^{-1}A$ is principal as well.

*Proof.* Let $I \subset S^{-1}A$ be an ideal. If $a \in S$ where $a/s \in I$, then we are done since then $(s/a)(a/s) = 1/1 \in I$, which implies $I=S^{-1}A$ itself, hence we shall assume $a \notin S$ for all $a/s \in I$. But for $a/s \in I$ we also have $(a/s)(s/1)=a/1 \in I$. Therefore $J=\varphi_S^{-1}(I)$ is not empty. $J$ is an ideal of $A$ since for $a \in A$ and $b \in J$, we have $\varphi_S(ab) =ab/1=(a/1)(b/1) \in I$ which implies $ab \in J$. But since $A$ is principal, there exists some $a$ such that $Aa = J$. We shall discuss the relation between $S^{-1}A(a/1)$ and $I$. For any $(c/u)(a/1)=ca/u \in S^{-1}A(a/1)$, clearly we have $ca/u \in I$, hence $S^{-1}A(a/1)\subset I$. On the other hand, for $c/u \in I$, we see $c/1=(c/u)(u/1) \in I$, hence $c \in J$, and there exists some $b \in A$ such that $c = ba$, which gives $c/u=ba/u=(b/u)(a/1) \in I$. Hence $I \subset S^{-1}A(a/1)$, and we have finally proved that $I = S^{-1}A(a/1)$. $\square$

As an immediate corollary, if $A_\mfk{p}$ is the localization of $A$ at $\mfk{p}$, and if $A$ is principal, then $A_\mfk{p}$ is principal as well. Next we go through another kind of rings. A ring is called **factorial** (or a **unique factorization ring** or **UFD**) if it is entire and if every non-zero element has a unique factorization into irreducible elements. An element $a \neq 0$ is called **irreducible** if it is not a unit and whenever $a=bc$, then either $b$ or $c$ is a unit. For all non-zero elements in a factorial ring, we have

where $u$ is a unit) (invertible).

In fact, every PID is a UFD (proof here). Irreducible elements in a factorial ring is called **prime elements** or simply **prime** (take $\mathbb{Z}$ and prime numbers as an example). Indeed, if $A$ is a factorial ring and $p$ a prime element, then $Ap$ is a prime ideal. But we are more interested in the ring of fractions of a factorial ring.

(Proposition 11)Let $A$ be a factorial ring and $S$ a multiplicatively closed subset with $0 \notin S$, then $S^{-1}A$ is factorial.

*Proof.* Pick $a/s \in S^{-1}A$. Since $A$ is factorial, we have $a=up_1 \cdots p_k$ where $p_i$ are primes and $u$ is a unit. But we have no idea what are irreducible elements of $S^{-1}A$. Naturally our first attack is $p_i/1$. And we have no need to restrict ourselves to $p_i$, we should work on all primes of $A$. Suppose $p$ is a prime of $A$. If $p \in S$, then $p/1 \in S$ is a unit, not prime. If $Ap \cap S \neq \varnothing$, then $rp \in S$ for some $r \in A$. But then

again $p/1$ is a unit, not prime. Finally if $Ap \cap S = \varnothing$, then $p/1$ is prime in $S^{-1}A$. For any

we see $ab=stp \not\in S$. But this also gives $ab \in Ap$ which is a prime ideal, hence we can assume $a \in Ap$ and write $a=rp$ for some $r \in A$. With this expansion we get

Hence $b/t$ is a unit, $p/1$ is a prime.

Conversely, suppose $a/s$ is irreducible in $S^{-1}A$. Since $A$ is factorial, we may write $a=u\prod_{i}p_i$. $a$ cannot be an element of $S$ since $a/s$ is not a unit. We write

We see there is some $v \in A$ such that $uv=1$ and accordingly $(u/1)(v/1)=uv/1=1/1$, hence $u/1$ is a unit. We claim that there exist a unique $p_k$ such that $1 \leq k \leq n$ and $Ap \cap S = \varnothing$. If not exists, then all $p_j/1$ are units. If both $p_{k}$ and $p_{k’}$ satisfy the requirement and $p_k \neq p_k’$, then we can write $a/s$ as

Neither the one in curly bracket nor $p_{k’}/1$ is unit, contradicting the fact that $a/s$ is irreducible. Next we show that $a/s=p_k/1$. For simplicity we write

Note $a/s = bp_k/s = (b/s)(p_k/1)$. Since $a/s$ is irreducible, $p_k/1$ is not a unit, we conclude that $b/s$ is a unit. We are done for the study of irreducible elements of $S^{-1}A$: it is of the form $p/1$ (up to a unit) where $p$ is prime in $A$ and $Ap \cap S = \varnothing$.

Now we are close to the fact that $S^{-1}A$ is also factorial. For any $a/s \in S^{-1}A$, we have an expansion

Let $p’_1,p’_2,\cdots,p’_j$ be those whose generated prime ideal has nontrivial intersection with $S$, then $p’_1/1, p’_2/1,\cdots,p’_j/1$ are units of $S^{-1}A$. Let $q_1,q_2,\cdots,q_k$ be other $p_i$’s, then $q_1/1,q_2/1,\cdots,q_k/1$ are irreducible in $S^{-1}A$. This gives

Hence $S^{-1}A$ is factorial as well. $\square$

We finish the whole post by a comprehensive proposition:

(Proposition 12)Let $A$ be a factorial ring and $p$ a prime element, $\mfk{p}=Ap$. The localization of $A$ at $\mfk{p}$ is principal.

*Proof.* For $a/s \in S^{-1}A$, we see $p$ does not divide $s$ since if $s = rp$ for some $r \in A$, then $s \in \mfk{p}$, contradicting the fact that $S = A \setminus \mfk{p}$. Since $A$ is factorial, we may write $a = cp^n$ for some $n \geq 0$ and $p$ does not divide $c$ as well (which gives $c \in S$. Hence $a/s = (c/s)(p^n/1)$. Note $(c/s)(s/c)=1/1$ and therefore $c/s$ is a unit. For every $a/s \in S^{-1}A$ we may write it as

where $u$ is a unit of $S^{-1}A$.

Let $I$ be any ideal in $S^{-1}A$, and

Let’s discuss the relation between $S^{-1}A(p^m/1)$ and $I$. First we see $S^{-1}A(p^m/1)=S^{-1}A(up^m/1)$ since if $v$ is the inverse of $u$, we get

Any element of $S^{-1}A(up^m/1)$ is of the form

Since $up^m/1 \in I$, we see $vup^{m+k}/1 \in I$ as well, hence $S^{-1}A(up^m/1) \subset I$. On the other hand, any element of $I$ is of the form $wup^{m+n}/1=w(p^n/1)u(p^m/1)$ where $w$ is a unit and $n \geq 0$. This shows that $vup^{m+n}/1 \in S^{-1}A(up^m/1)$. Hence $S^{-1}A(p^m/1)=S^{-1}A(up^m/1)=I$ as we wanted. $\square$

]]>Let $A$ be an abelian group. Let $(e_i)_{i \in I}$ be a family of elements of $A$. We say that this family is a **basis** for $A$ if the family is not empty, and if every element of $A$ has a unique expression as a **linear expression**

where $x_i \in \mathbb{Z}$ and almost all $x_i$ are equal to $0$. This means that the sum is actually finite. An abelian group is said to be **free** if it has a basis. Alternatively, we may write $A$ as a direct sum by

Let $S$ be a set. Say we want to get a group out of this for some reason, so how? It is not a good idea to endow $S$ with a binary operation beforehead since overall $S$ is merely a set. We shall **generate** a group out of $S$ in the most **freely** way.

Let $\mathbb{Z}\langle S \rangle$ be the set of all **maps** $\varphi:S \to \mathbb{Z}$ such that, for only a **finite** number of $x \in S$, we have $\varphi(x) \neq 0$. For simplicity, we denote $k \cdot x$ to be some $\varphi_0 \in \mathbb{Z}\langle S \rangle$ such that $\varphi_0(x)=k$ but $\varphi_0(y) = 0$ if $y \neq x$. For any $\varphi$, we claim that $\varphi$ has a unique expression

One can consider these integers $k_i$ as the order of $x_i$, or simply the time that $x_i$ appears (may be negative). For $\varphi\in\mathbb{Z}\langle S \rangle$, let $I=\{x_1,x_2,\cdots,x_n\}$ be the set of elements of $S$ such that $\varphi(x_i) \neq 0$. If we denote $k_i=\varphi(x_i)$, we can show that $\psi=k_1 \cdot x_1 + k_2 \cdot x_2 + \cdots + k_n \cdot x_n$ is equal to $\varphi$. For $x \in I$, we have $\psi(x)=k$ for some $k=k_i\neq 0$ by definition of the ‘$\cdot$’; if $y \notin I$ however, we then have $\psi(y)=0$. This coincides with $\varphi$. $\blacksquare$

By definition the zero map $\mathcal{O}=0 \cdot x \in \mathbb{Z}\langle S \rangle$ and therefore we may write any $\varphi$ by

where $k_x \in \mathbb{Z}$ and can be zero. Suppose now we have two expressions, for example

Then

Suppose $k_y - k_y’ \neq 0$ for some $y \in S$, then

which is a contradiction. Therefore the expression is unique. $\blacksquare$

This $\mathbb{Z}\langle S \rangle$ is what we are looking for. It is an additive group (which can be proved immediately) and, what is more important, every element can be expressed as a ‘sum’ associated with finite number of elements of $S$. We shall write $F_{ab}(S)=\mathbb{Z}\langle S \rangle$, and call it the **free abelian group generated by $S$**. For elements in $S$, we say they are **free generators** of $F_{ab}(S)$. If $S$ is a finite set, we say $F_{ab}(S)$ is **finitely generated**.

An abelian group is

freeif and only if it is isomorphic to a free abelian group $F_{ab}(S)$ for some set $S$.

**Proof.** First we shall show that $F_{ab}(S)$ is free. For $x \in M$, we denote $\varphi = 1 \cdot x$ by $[x]$. Then for any $k \in \mathbb{Z}$, we have $k[x]=k \cdot x$ and $k[x]+k’[y] = k\cdot x + k’ \cdot y$. By definition of $F_{ab}(S)$, any element $\varphi \in F_{ab}(S)$ has a unique expression

Therefore $F_{ab}(S)$ is free since we have found the basis $([x])_{x \in S}$.

Conversely, if $A$ is free, then it is immediate that its basis $(e_i)_{i \in I}$ generates $A$. Our statement is therefore proved. $\blacksquare$

(Proposition 1)If $A$ is an abelian group, then there is a free group $F$ which has a subgroup $H$ such that $A \cong F/H$.

**Proof.** Let $S$ be any set containing $A$. Then we get a surjective map $\gamma: S \to A$ and a free group $F_{ab}(S)$. We also get a unique homomorphism $\gamma_\ast:F_{ab}(S) \to A$ by

which is also surjective. By the first isomorphism theorem, if we set $H=\ker(\gamma_\ast)$ and $F_{ab}(S)=F$, then

$\blacksquare$

(Proposition 2)If $A$ is finitely generated, then $F$ can also be chosen to be finitely generated.

**Proof.** Let $S$ be the generator of $A$, and $S’$ is a set containing $S$. Note if $S$ is finite, which means $A$ is finitely generated, then $S’$ can also be finite by inserting one or any finite number more of elements. We have a map from $S$ and $S’$ into $F_{ab}(S)$ and $F_{ab}(S’)$ respectively by $f_S(x)=1 \cdot x$ and $f_{S’}(x’)=1 \cdot x’$. Define $g=f_{S’} \circ \lambda:S’ \to F_{ab}(S)$ we get another homomorphism by

This defines a unique homomorphism such that $g_\ast \circ f_{S’} = g$. As one can also verify, this map is also surjective. Therefore by the first isomorphism theorem we have

$\blacksquare$

It’s worth mentioning separately that we have implicitly proved two statements with commutative diagrams:

(Proposition 3 | Universal property)If $g:S \to B$ is a mapping of $S$ into some abelian group $B$, then we can define a unique group-homomorphism making the following diagram commutative:

(Proposition 4)If $\lambda:S \to S$ is a mapping of sets, there is a unique homomorphism $\overline{\lambda}$ making the following diagram commutative:

(In the proof of Proposition 2 we exchanged $S$ an $S’$.)

(The Grothendieck group)Let $M$ be a commutative monoid written additively. We shall prove that there exists a commutative group $K(M)$ with a monoid homomorphismsatisfying the following universal property: If $f:M \to A$ is a homomorphism from $M$ into a abelian group $A$, then there exists a unique homomorphism $f_\gamma:K(M) \to A$ such that $f=f_\gamma\circ\gamma$. This can be represented by a commutative diagram:

**Proof.** There is a commutative diagram describes what we are doing.

Let $F_{ab}(M)$ be the free abelian group generated by $M$. For $x \in M$, we denote $1 \cdot x \in F_{ab}(M)$ by $[x]$. Let $B$ be the group generated by all elements of the type

where $x,y \in M$. This can be considered as a subgroup of $F_{ab}(M)$. We let $K(M)=F_{ab}(M)/B$. Let $i=x \to [x]$ and $\pi$ be the canonical map

We are done by defining $\gamma: \pi \circ i$. Then we shall verify that $\gamma$ is our desired homomorphism satisfying the universal property. For $x,y \in M$, we have $\gamma(x+y)=\pi([x+y])$ and $\gamma(x)+\gamma(y) = \pi([x])+\pi([y])=\pi([x]+[y])$. However we have

which implies that

Hence $\gamma$ is a monoid-homomorphism. Finally the universal property. By proposition 3, we have a unique homomorphism $f_\ast$ such that $f_\ast \circ i = f$. Note if $y \in B$, then $f_\ast(y) =0$. Therefore $B \subset \ker{f_\ast}$ Therefore we are done if we define $f_\gamma(x+B)=f_\ast (x)$. $\blacksquare$

Why such a $B$? Note in general $[x+y]$ is not necessarily equal to $[x]+[y]$ in $F_{ab}(M)$, but we don’t want it to be so. So instead we create a new **equivalence relation**, by factoring a subgroup generated by $[x+y]-[x]-[y]$. Therefore in $K(M)$ we see $[x+y]+B = [x]+[y]+B$, which finally makes $\gamma$ a homomorphism. We use the same strategy to generate the **tensor product** of two modules later. But at that time we have more than one relation to take care of.

If for all $x,y,z \in M$, $x+y=x+z$ implies $y=z$, then we say $M$ is a cancellative monoid, or the cancellation law holds in $M$. Note for the proof above we didn’t use any property of cancellation. However we still have an interesting property for cancellation law.

(Theorem)The cancellation law holds in $M$ if and only if $\gamma$ is injective.

**Proof.** This proof involves another approach to the Grothendieck group. We consider pairs $(x,y) \in M \times M$ with $x,y \in M$. Define

Then we get a equivalence relation (try to prove it yourself!). We define the addition component-wise, that is, $(x,y)+(x’,y’)=(x+x’,y+y’)$, then the equivalence classes of pairs form a group $A$, where the zero element is $[(0,0)]$. We have a monoid-homomorphism

If cancellation law holds in $M$, then

Hence $f$ is injective. By the universal property of the Grothendieck group, we get a unique homomorphism $f_\gamma$ such that $f_\gamma \circ \gamma = f$. If $x \neq 0$ in $M$, then $f_\gamma \circ \gamma(x) \neq 0$ since $f$ is injective. This implies $\gamma(x) \neq 0$. Hence $\gamma$ is injective.

Conversely, if $\gamma$ is injective, then $i$ is injective (this can be verified by contradiction). Then we see $f=f_\ast \circ i$ is injective. But $f(x)=f(y)$ if and only if $x+\ell = y+\ell$, hence $x+ \ell = y+ \ell$ implies $x=y$, the cancellation law holds on $M$.

Our first example is $\mathbb{N}$. Elements of $F_{ab}(\mathbb{N})$ are of the form

For elements in $B$ they are generated by

which we wish to represent $0$. Indeed, $K(\mathbb{N}) \simeq \mathbb{Z}$ since if we have a homomorphism

For $r \in \mathbb{Z}$, we see $f(1 \cdot r+B)=r$. On the other hand, if $\sum_{j=1}^{m}k_j \cdot n_j \not\in B$, then its image under $f$ is not $0$.

In the first example we ‘granted’ the natural numbers ‘subtraction’. Next we grant the division on multiplicative monoid.

Consider $M=\mathbb{Z} \setminus 0$. Now for $F_{ab}(M)$ we write elements in the form

which denotes that $\varphi(n_j)=k_j$ and has no other differences. Then for elements in $B$ they are generated by

which we wish to represent $1$. Then we see $K(M) \simeq \mathbb{Q} \setminus 0$ if we take the isomorphism

Of course this is not the end of the Grothendieck group. But for further example we may need a lot of topology background. For example, we have the topological $K$-theory group of a topological space to be the Grothendieck group of isomorphism classes of topological vector bundles. But I think it is not a good idea to post these examples at this timing.

]]>We begin our study by some elementary Calculus. Now we have the function $f(x)=x^2+\frac{e^x}{x^2+1}$ as our example. It should not be a problem to find its tangent line at point $(0,1)$, by calculating its derivative, we have $l:x-y+1=0$ as the tangent line.

$l$ is not a vector space since it does not get cross the origin, in general. But $l-\overrightarrow{OA}$ is a vector space. In general, suppose $P(x,y)$ is a point on the curve determined by $f$, i.e. $y=f(x)$, then we obtain a vector space $l_p-\overrightarrow{OP} \simeq \mathbb{R}$. But the action of moving the tangent line to the origin is superfluous so naturally we consider the tangent line at $P$ as a vector space **determined** by $P$. In this case, the induced vector space (tangent line) is always of dimension $1$.

Now we move to two-variable functions. We have a function $a(x,y)=x^2+y^2-x-y+xy$ as our example. Some elementary Calculus work gives us the tangent surface of $z=a(x,y)$ at $A(1,1,1)$, which can be identified by $S:2x+2y-z=3\simeq\mathbb{R}^2$. Again, this can be considered as a vector space **determined** by $A$, or roughly speaking it is one if we take $A$ as the origin. Further we have a base $(\overrightarrow{AB},\overrightarrow{AC})$. Other vectors on $S$, for example $\overrightarrow{AD}$, can be written as a linear combination of $\overrightarrow{AB}$ and $\overrightarrow{AC}$. In other words, $S$ is “spanned” by $(\overrightarrow{AB},\overrightarrow{AC})$.

Tangent line and tangent surface play an important role in differentiation. But sometimes we do not have a chance to use it with ease, for example $S^1:x^2+y^2=1$ cannot be represented by a single-variable function. However the implicit function theorem, which you have already learned in Calculus, gives us a chance to find a satisfying function locally. Here in this post we will try to generalize this concept, trying to find the tangent **space** at some point of a manifold. (The two examples above have already determined two manifolds and two tangent spaces.)

We will introduce the abstract definition of a tangent vector at beginning. You may think it is way too abstract but actually it is not. Surprisingly, the following definition can simplify our work in the future. But before we go, make sure that you have learned about Fréchet derivative (along with some functional analysis knowledge).

Let $M$ be a manifold of class $C^p$ with $p \geq 1$ and let $x$ be a point of $M$. Let $(U,\varphi)$ be a chart at $x$ and $v$ be a element of the vector space $\mathbf{E}$ where $\varphi(U)$ lies (for example, if $M$ is a $d$-dimensional manifold, then $v \in \mathbb{R}^d$). Next we consider the triple $(U,\varphi,v)$. Suppose $(U,\varphi,v)$ and $(V,\psi,w)$ are two such triples. We say these two triples are **equivalent** if the following identity holds:

This identity looks messy so we need to explain how to read it. First we consider the function in red: the derivative of $\psi\circ\varphi^{-1}$. The derivative of $\psi\circ\varphi^{-1}$ at point $\varphi(x)$ (in purple) is a linear transform, and the transform is embraced with green brackets. Finally, this linear transform maps $v$ to $w$. In short we read, the derivative of $\psi\circ\varphi^{-1}$ at $\varphi(x)$ maps $v$ on $w$. You may recall that you have meet something like $\psi\circ\varphi^{-1}$ in the definition of manifold. It is not likely that these ‘triples’ should be associated to tangent vectors. But before we explain it, we need to make sure that we indeed defined an equivalent relation.

(Theorem 1)The relationis an equivalence relation.

*Proof.* This will not go further than elementary Calculus, in fact, chain rule:

(Chain rule)If $f:U \to V$ is differentiable at $x_0 \in U$, if $g: V \to W$ is differentiable at $f(x_0)$, then $g \circ f$ is differentiable at $x_0$, and

- $(U,\varphi,v)\sim(U,\varphi,v)$.

Since $\varphi\circ\varphi^{-1}=\operatorname{id}$, whose derivative is still the identity everywhere, we have

- If $(U,\varphi,v) \sim (V,\psi,w)$, then $(V,\psi,w)\sim(U,\varphi,v)$.

So now we have

To prove that $[(\varphi\circ\psi^{-1})’(\psi(x))]{}(w)=v$, we need some implementation of chain rule.

Note first

while

But also by the chain rule, if $f$ is a diffeomorphism, we have

or equivalently

Therefore

which implies

- If $(U,\varphi,v)\sim(V,\psi,w)$ and $(V,\psi,w)\sim(W,\lambda,z)$, then $(U,\varphi,v)\sim(W,\lambda,z)$.

We are given identities

and

By canceling $w$, we get

On the other hand,

which is what we needed. $\square$

An **equivalence class** of such triples $(U,\varphi,v)$ is called a **tangent vector** of $X$ at $x$. The set of such tangent vectors is called the **tangent space** to $X$ at $x$, which is denoted by $T_x(X)$. But it seems that we have gone too far. Is the triple even a ‘vector’? To get a clear view let’s see Euclidean submanifolds first.

Suppose $M$ is a submanifold of $\mathbb{R}^n$. We say $z$ is the

tangent vectorof $M$ at point $x$ if there exists a curve $\alpha$ of class $C^1$, which is defined on $\mathbb{R}$ and where there exists an interval $I$ such that $\alpha(I) \subset M$, such that $\alpha(t_0)=x$ and $\alpha’(t_0)=z$. (For convenience we often take $t_0=0$.)

This definition is immediate if we check some examples. For the curve $M: x^2+1+\frac{e^x}{x^2+1}-y=0$, we can show that $(1,1)^T$ is a tangent vector of $M$ at $(0,1)$, which is identical to our first example. Taking

we get $\alpha(0)=(0,1)$ and

Therefore $\alpha’(0)=(1,1)^T$. $\square$

Let $\mathbf{E}$ and $\mathbf{F}$ be two Banach spaces and $U$ an open subset of $\mathbf{E}$. A $C^p$ map $f: U \to \mathbf{F}$ is called an

immersionat $x$ if $f’(x)$ is injective.

For example, if we take $\mathbf{E}=\mathbf{F}=\mathbb{R}=U$ and $f(x)=x^2$, then $f$ is an immersion at almost all point on $\mathbb{R}$ except $0$ since $f’(0)=0$ is not injective. This may lead you to Sard’s theorem.

(Theorem 2)Let $M$ be a subset of $\mathbb{R}^n$, then $M$ is a $d$-dimensional $C^p$ submanifold of $\mathbb{R}^n$ if and only if for every $x \in M$ there exists an open neighborhood $U \subset \mathbb{R}^n$ of $x$, an open neighborhood $\Omega \subset \mathbb{R}^d$ of $0$ and a $C^p$ map $g: \Omega \to \mathbb{R}^n$ such that $g$ is immersion at $0$ such that $g(0)=x$, and $g$ is a homeomorphism between $\Omega$ and $M \cap U$ with the topology induced from $\mathbb{R}^n$.

This follows from the definition of manifold and should not be difficult to prove. But it is not what this blog post should cover. For a proof you can check *Differential Geometry: Manifolds, Curves, and Surfaces* by Marcel Berger and Bernard Gostiaux. The proof is located in section 2.1.

A coordinate system on a $d$-dimensional $C^p$ submanifold $M$ of $\mathbb{R}^n$ is a pair $(\Omega,g)$ consisting of an open set $\Omega \subset \mathbb{R}^d$ and a $C^p$ function $g:\Omega \to \mathbb{R}^n$ such that $g(\Omega)$ is open in $V$ and $g$ induces a homeomorphism between $\Omega$ and $g(\Omega)$.

For convenience, we say $(\Omega,g)$ is centered at $x$ if $g(0)=x$ and $g$ is an immersion at $x$. By theorem 2 it is always possible to find such a coordinate system centered at a given point $x \in M$. The following theorem will show that we can get a easier approach to tangent vector.

(Theorem 3)Let $\mathbf{E}$ and $\mathbf{F}$ be two finite-dimensional vector spaces, $U \subset \mathbf{E}$ an open set, $f:U \to \mathbf{F}$ a $C^1$ map, $M$ a submanifold of $\mathbf{E}$ contained in $U$ and $W$ a submanifold of $\mathbf{F}$ such that $f(M) \subset W$. Take $x \in M$ and set $y=f(x)$, If $z$ is a tangent vector to $M$ at $x$, the image $f’(x)(z)$ is a tangent vector to $W$ at $y=f(x)$.

*Proof.* Since $z$ is a tangent vector, we see there exists a curve $\alpha: J \to M$ such that $\alpha(0)=x$ and $\alpha’(0)=z$ where $J$ is an open interval containing $0$. The function $\beta = f \circ \alpha: J \to W$ is also a curve satisfying $\beta(0)=f(\alpha(0))=f(x)$ and

which is our desired curve. $\square$

We shall show that equivalence relation makes sense. Suppose $M$ is a $d$-submanifold of $\mathbb{R}^n$, $x \in M$ and $z$ is a tangent vector to $M$ at $x$. Let $(\Omega,g)$ be a coordinate system centered at $x$. Since $g \in C^p(\mathbb{R}^d;\mathbb{R}^n)$, we see $g’(0)$ is a $n \times d$ matrix, and injectivity ensures that $\operatorname{rank}(g’(0))=d$.

Every open set $\Omega \subset \mathbb{R}^d$ is a $d$-dimensional submanifold of $\mathbb{R}^d$ (of $C^p$). Suppose now $v \in \mathbb{R}^d$ is a tangent vector to $\Omega$ at $0$ (determined by a curve $\alpha$), then by Theorem 3, $g \circ \alpha$ determines a tangent vector to $M$ at $x$, which is $z_x=g’(0)(v)$. Suppose $(\Lambda,h)$ is another coordinate system centered at $x$. If we want to obtain $z_x$ as well, we must have

which is equivalent to

for some $w \in \mathbb{R}^d$ which is the tangent vector to $\Lambda$ at $0 \in \Lambda$. *(The inverse makes sense since we implicitly restricted ourself to $\mathbb{R}^d$)*

However, we also have two charts by $(U,\varphi)=(g(\Omega),g^{-1})$ and $(V,\psi) = (h(\Lambda),h^{-1})$, which gives

and this is just our equivalence relation (don’t forget that $g(0)=x$ hence $g^{-1}(x)=\varphi(x)=0$!). There we have our reason for equivalence relation: If $(U,\varphi,v) \sim (V,\psi,w)$, then $(U,\varphi,u)$ and $(V,\psi,v)$ determines the same tangent vector but we do not have to evaluate it manually. In general, all elements in an equivalence class represent a single vector, so the vector is (algebraically) a equivalence class. This still holds when talking about Banach manifold since topological properties of Euclidean spaces do not play a role. The generalized proof can be implemented with little difficulty.

The tangent vectors at $x \in M$ span a vector space (which is based at $x$). We do hope that because if not our definition of tangent vector would be incomplete and cannot even hold for an trivial example (such as what we mentioned at the beginning). We shall show, satisfyingly, the set of tangent vectors to $M$ at $x$ (which we write $T_xM$) forms a vector space that is toplinearly isomorphic to $\mathbf{E}$, on which $M$ is modeled.

(Theorem 4)$T_xM \simeq \mathbf{E}$. In other words, $T_xM$ can be given the structure of topological vector space given by the chart.

*Proof.* Let $(U,\varphi)$ be a chart at $x$. For $v \in \mathbf{E}$, we see $(\varphi^{-1})’(x)(v)$ is a tangent vector at $x$. On the other hand, pick $\mathbf{w} \in T_xM$, which can be represented by $(V,\psi,w)$. Then

makes $(U,\varphi,v) \sim (V,\psi,w)$ uniquely, and therefore we get some $v \in \mathbf{E}$. To conclude,

which proves our theorem. Note that this does not depend on the choice of charts. $\square$

For many reasons it is not a good idea to identify $T_xM$ as $\mathbf{E}$ without mentioning the point $x$. For example we shouldn’t identify the tangent line of a curve as $x$-axis. Instead, it would be better to identify or visualize $T_xM$ as $(x,\mathbf{E})$, that is, a linear space with origin at $x$.

Now we treat *all* tangent spaces as a vector bundle. Let $M$ be a manifold of class $C^p$ with $p \geq 1$, define the tangent bundle by the disjoint union

This is a vector bundle if we define the projection by

and we will verify it soon. First let’s see an example. Below is a visualization of the tangent bundle of $\frac{x^2}{4}+\frac{y^2}{3}=1$, denoted by red lines:

Also we can see $\pi$ maps points on the blue line to a point on the curve, which is $B$.

To show that a tangent bundle of a manifold is a vector bundle, we need to verify that it satisfies three conditions we mentioned in previous post. Let $(U,\varphi)$ be a chart of $M$ such that $\varphi(U)$ is open in $\mathbf{E}$, then tangent vectors can be represented by $(U,\varphi,v)$. We get a bijection

by definition of tangent vectors as equivalence classes. Let $z_x$ be a tangent vector to $U$ at $x$, then there exists some $v \in \mathbf{E}$ such that $(U,\varphi,v)$ represents $z$. On the other hand, for some $v \in \mathbf{E}$ and $x \in U$, $(U,\varphi,v)$ represents some tangent vector at $x$. Explicitly,

Further we get the following diagram commutative (which establishes **VB 1**):

For **VB 2** and **VB 3** we need to check different charts. Let $(U_i,\varphi_i)$, $(U_j,\varphi_j)$ be two charts. Define $\varphi_{ji}=\varphi_j \circ \varphi_i^{-1}$ on $\varphi_i(U_i \cap U_j)$, and respectively we write $\tau_{U_i}=\tau_i$ and $\tau_{U_j}=\tau_j$. Then we get a transition mapping

One can verify that

for $x \in U_i \cap U_j$ and $v \in \mathbf{E}$. Since $D\varphi_{ji} \in C^{p-1}$ and $D\varphi_{ji}(x)$ is a toplinear isomorphism, we see

is a morphism, which goes for **VB 3**. It remains to verify **VB 2**. To do this we need a fact from Banach space theory:

If $f:U \to L(\mathbf{E},\mathbf{F})$ is a $C^k$-morphism, then the map of $U \times \mathbf{E}$ into $\mathbf{F}$ given by

is a $C^k$-morphism.

Here, we have $f(x)=\tau_{ji}(x,\cdot)$ and to conclude, $\tau_{ji}$ is a $C^{p-1}$-morphism. It is also an isomorphism since it has an inverse $\tau_{ij}$. Following the definition of manifold, we can conclude that $T(U)$ has a unique **manifold structure** such that $\tau_i$ are morphisms (there will be a formal proof in next post about any total space of a vector bundle). By **VB 1**, we also have $\pi=\tau_i\circ pr$, which makes it a morphism as well. On each fiber $\pi^{-1}(x)$, we can freely transport the topological vector space structure of any $\mathbf{E}$ such that $x$ lies in $U_i$, by means of $\tau_{ix}$. Since $f(x)$ is a toplinear isomorphism, the result is independent of the choice of $U_i$. **VB 2** is therefore established.

Using some fancier word, we can also say that $T:M \to T(M)$ is a **functor** from the category of $C^p$-manifolds to the category of vector bundles of class $C^{p-1}$.

If $f$ is of $L^p(\mu)$, which means $\lVert f \rVert_p=\left(\int_X |f|^p d\mu\right)^{1/p}<\infty$, or equivalently $\int_X |f|^p d\mu<\infty$, then we may say $|f|^p$ is of $L^1(\mu)$. In other words, we have a functional

This functional does not have to be one to one due to absolute value. But we hope this functional to be ‘fine’ enough, at the very least, we hope it is continuous.

Here, $f \sim g$ means that $f-g$ equals to $0$ almost everywhere with respect to $\mu$.

We still use $\varepsilon-\delta$ argument but it’s in a metric space. Suppose $(X,d_1)$ and $(Y,d_2)$ are two metric spaces and $f:X \to Y$ is a function. We say $f$ is continuous at $x_0 \in X$ if for any $\varepsilon>0$, there exists some $\delta>0$ such that $d_2(f(x_0),f(x))<\varepsilon$ whenever $d_1(x_0,x)<\delta$. Further, we say $f$ is continuous on $X$ if $f$ is continuous at every point $x \in X$.

For $1\leq p<\infty$, we already have a metric by

given that $d(f,g)=0$ if and only if $f \sim g$. This is complete and makes $L^p$ a Banach space. But for $0<p<1$ (yes we are going to cover that), things are much more different, and there is one reason: Minkowski inequality holds reversely! In fact we have

for $0<p<1$. In fact, $L^p$ space has too many weird things when $0<p<1$. Precisely,

For $0<p<1$, $L^p(\mu)$ is locally convex if and only if $\mu$ assumes finitely many values. (Proof.)

On the other hand, for example, $X=[0,1]$ and $\mu=m$ be the Lebesgue measure, then $L^p(\mu)$ has *no* open convex subset other than $\varnothing$ and $L^p(\mu)$ itself. However,

A topological vector space $X$ is normable if and only if its origin has a convex bounded neighborhood. (See Kolmogorov’s normability criterion.)

Therefore $L^p(m)$ is not normable, hence not Banach.

We have gone too far. We need a metric that is fine enough.

*In this subsection we always have $0<p<1$.*

Define

for $f \in L^p(\mu)$. We will show that we have a metric by

Fix $y\geq 0$, consider the function

We have $f(0)=y^p$ and

when $x > 0$ and hence $f(x)$ is nonincreasing on $[0,\infty)$, which implies that

Hence for any $f$, $g \in L^p$, we have

This inequality ensures that

is a metric. It’s immediate that $d(f,g)=d(g,f) \geq 0$ for all $f$, $g \in L^p(\mu)$. For the triangle inequality, note that

This is translate-invariant as well since

The completeness can be verified in the same way as the case when $p>1$. In fact, this metric makes $L^p$ a locally bounded F-space.

The metric of $L^1$ is defined by

We need to find a relation between $d_p(f,g)$ and $d_1(\lambda(f),\lambda(g))$, where $d_p$ is the metric of the corresponding $L^p$ space.

As we have proved,

Without loss of generality we assume $x \geq y$ and therefore

Hence

By interchanging $x$ and $y$, we get

Replacing $x$ and $y$ with $|f|$ and $|g|$ where $f$, $g \in L^p$, we get

But

and we therefore have

Hence $\lambda$ is continuous (and in fact, Lipschitz continuous and uniformly continuous) when $0<p<1$.

It’s natural to think about Minkowski’s inequality and Hölder’s inequality in this case since they are critical inequality enablers. You need to think about some examples of how to create the condition to use them and get a fine result. In this section we need to prove that

This inequality is surprisingly easy to prove however. We will use nothing but the mean value theorem. Without loss of generality we assume that $x > y \geq 0$ and define $f(t)=t^p$. Then

where $y < \zeta < x$. But since $p-1 \geq 0$, we see $\zeta^{p-1} < x^{p-1} <x^{p-1}+y^{p-1}$. Therefore

For $x=y$ the equality holds.

Therefore

By *Hölder’s inequality*, we have

By *Minkowski’s inequality*, we have

Now things are clear. Since $1/p+1/q=1$, or equivalently $1/q=(p-1)/p$, suppose $\lVert f \rVert_p$, $\lVert g \rVert_p \leq R$, then $(p-1)q=p$ and therefore

Summing the inequalities above, we get

hence $\lambda$ is continuous.

We have proved that $\lambda$ is continuous, and when $0<p<1$, we have seen that $\lambda$ is Lipschitz continuous. It’s natural to think about its differentiability afterwards, but the absolute value function is not even differentiable so we may have no chance. But this is still a fine enough result. For example we have no restriction to $(X,\mathfrak{M},\mu)$ other than the positivity of $\mu$. Therefore we may take $\mathbb{R}^n$ as the Lebesgue measure space here, or we can take something else.

It’s also interesting how we use elementary Calculus to solve some much more abstract problems.

]]>Direction is a considerable thing. For example take a look at this picture (by David Gunderman):

The position of the red ball and black ball shows that this triple of balls turns upside down every time they finish one round. This wouldn’t happen if this triple were on a normal band, which can be denoted by $S^1 \times (0,1)$. What would happen if we try to describe their velocity on the Möbius band, both locally and globally? There must be some significant difference from a normal band. If we set some move pattern on balls, for example let them run horizontally or zig-zagly, hopefully we get different *set* of vectors. those vectors can span some vector spaces as well.

Here and in the forgoing posts, we will try to develop purely formally certain functorial constructions having to do with vector bundles. It may be overly generalized, but we will offer some examples to make it concrete.

Let $M$ be a manifold (of class $C^p$, where $p \geq 0$ and can be set to $\infty$) modeled on a Banach space $\mathbf{E}$. Let $E$ be another topological space and $\pi: E \to M$ a surjective $C^p$-morphism. A **vector bundle** is a topological construction associated with $M$ (base space), $E$ (total space) and $\pi$ (bundle projection) such that, roughly speaking, $E$ is locally a product of $M$ and $\mathbf{E}$.

We use $\mathbf{E}$ instead of $\mathbb{R}^n$ to include the infinite dimensional cases. We will try to distinguish finite-dimensional and infinite-dimensional Banach spaces here. There are a lot of things to do, since, for example, infinite dimensional Banach spaces have no countable Hamel basis, while the finite-dimensional ones have finite ones (this can be proved by using the Baire category theorem).

Next we will show precisely how $E$ locally becomes a product space. Let $\mathfrak{U}=(U_i)_i$ be an open covering of $M$, and for each $i$, suppose that we are *given* a mapping

satisfying the following three conditions.

**VB 1** $\tau_i$ is a $C^p$ diffeomorphism making the following diagram commutative:

where $pr$ is the projection of the first component: $(x,y) \mapsto x$. By restricting $\tau_i$ on one point of $U_i$, we obtain an isomorphism on each fiber $\pi^{-1}(x)$:

**VB 2** For each pair of open sets $U_i$, $U_j \in \mathfrak{U}$, we have the map

to be a toplinear isomorphism (that is, it preserves $\mathbf{E}$ for being a *topological* vector space).

**VB 3** For any two members $U_i$, $U_j \in \mathfrak{U}$, we have the following function to be a $C^p$-morphism:

**REMARKS.** As with manifold, we call the set of 2-tuples $(U_i,\tau_i)_i$ a **trivializing covering** of $\pi$, and that $(\tau_i)$ are its **trivializing maps**. Precisely, for $x \in U_i$, we say $U_i$ or $\tau_i$ trivializes at $x$.

Two trivializing *coverings* for $\pi$ is said to be **VB-equivalent** if taken together they also satisfy conditions of **VB 2** and **VB 3**. It’s immediate that **VB-equivalence** is an equivalence relation and we leave the verification to the reader. It is this VB-equivalence *class* of trivializing coverings that determines a structure of **vector bundle** on $\pi$. With respect to the Banach space $\mathbf{E}$, we say that the vector bundle has **fiber** $\mathbf{E}$, or is **modeled on** $\mathbf{E}$.

Next we shall give some motivations of each condition. Each pair $(U_i,\tau_i)$ determines a local product of ‘a part of the manifold’ and the model space, on the latter of which we can deploy the direction with ease. This is what **VB 1** tells us. But that’s far from enough if we want our vectors fine enough. We do want the total space $E$ to actually be able to qualify our requirements. As for **VB 2**, it is ensured that using two different trivializing maps will give the same structure of some Banach spaces (with *equivalent* norms). According to the image of $\tau_{ix}$, we can say, for each point $x \in X$, which can be determined by a fiber $\pi^{-1}(x)$ (the pre-image of $\tau_{ix}$), can be given another Banach space by being sent via $\tau_{jx}$ for some $j$. Note that $\pi^{-1}(x) \in E$, the total space. In fact, **VB 2** has an equivalent alternative:

**VB 2’** On each fiber $\pi^{-1}(x)$ we are given a structure of Banach space as follows. For $x \in U_i$, we have a toplinear isomorphism which is in fact the trivializing map:

As stated, **VB 2** implies **VB 2’**. Conversely, if **VB 2’** is satisfied, then for open sets $U_i$, $U_j \in \mathfrak{U}$, and $x \in U_i \cap U_j$, we have $\tau_{jx} \circ \tau_{ix}^{-1}:\mathbf{E} \to \mathbf{E}$ to be an toplinear isomorphism. Hence, we can consider **VB 2** or **VB 2’** as the refinement of **VB 1**.

In finite dimensional case, one can omit **VB 3** since it can be implied by **VB 2**, and we will prove it below.

(Lemma)Let $\mathbf{E}$ and $\mathbf{F}$ be two finite dimensional Banach spaces. Let $U$ be open in some Banach space. Letbe a $C^p$-morphism such that for each $x \in U$, the map

given by $f_x(v)=f(x,v)$ is a linear map. Then the map of $U$ into $L(\mathbf{E},\mathbf{F})$ given by $x \mapsto f_x$ is a $C^p$-morphism.

**PROOF.** Since $L(\mathbf{E},\mathbf{F})=L(\mathbf{E},\mathbf{F_1}) \times L(\mathbf{E},\mathbf{F_2}) \times \cdots \times L(\mathbf{E},\mathbf{F_n})$ where $\mathbf{F}=\mathbf{F_1} \times \cdots \times \mathbf{F_n}$, by induction on the dimension of $\mathbf{F}$ and $\mathbf{E}$, it suffices to assume that $\mathbf{E}$ and $\mathbf{F}$ are toplinearly isomorphic to $\mathbb{R}$. But in that case, the function $f(x,v)$ can be written $g(x)v$ for some $g:U \to \mathbb{R}$. Since $f$ is a morphism, it follows that as a function of each argument $x$, $v$ is also a morphism, Putting $v=1$ shows that $g$ is also a morphism, which finishes the case when both the dimension of $\mathbf{E}$ and $\mathbf{F}$ are equal to $1$, and the proof is completed by induction. $\blacksquare$

To show that **VB 3** is implied by **VB 2**, put $\mathbf{E}=\mathbf{F}$ as in the lemma. Note that $\tau_j \circ \tau_i^{-1}$ maps $U_i \cap U_j \times \mathbf{E}$ to $\mathbf{E}$, and $U_i \cap U_j$ is open, and for each $x \in U_i \cap U_j$, the map $(\tau_j \circ \tau_i^{-1})_x=\tau_{jx} \circ \tau_{ix}^{-1}$ is toplinear, hence linear. Then the fact that $\varphi$ is a morphism follows from the lemma.

Let $M$ be any $n$-dimensional smooth manifold that you are familiar with, then $pr:M \times \mathbb{R}^n \to M$ is actually a vector bundle. Here the total space is $M \times \mathbb{R}^n$ and the base is $M$ and $pr$ is the bundle projection but in this case it is simply a projection. Intuitively, on a total space, we can determine a point $x \in M$, and another component can be any direction in $\mathbb{R}^n$, hence a *vector*.

We need to verify three conditions carefully. Let $(U_i,\varphi_i)_i$ be any atlas of $M$, and $\tau_i$ is the identity map on $U_i$ (which is naturally of $C^p$). We claim that $(U_i,\tau_i)_i$ satisfy the three conditions, thus we get a vector bundle.

For **VB 1** things are clear: since $pr^{-1}(U_i)=U_i \times \mathbb{R}^n$, the diagram is commutative. Each fiber $pr^{-1}(x)$ is essentially $(x) \times \mathbb{R}^n$, and still, $\tau_{jx} \circ \tau_{ix}^{-1}$ is the identity map between $(x) \times \mathbb{R}^n$ and $(x) \times \mathbb{R}^n$, under the same Euclidean topology, hence **VB 2** is verified, and we have no need to verify **VB 3**.

First of all, imagine you have embedded a circle into a Möbius band. Now we try to give some formal definition. As with quotient topology, $S^1$ can be defined as

where $I$ is the unit interval and $0 \sim_1 1$ (identifying two ends). On the other hand, the infinite Möbius band can be defined by

where $(0,v) \sim_2 (1,-v)$ for all $v \in \mathbb{R}$ (not only identifying two ends of $I$ but also ‘flips’ the vertical line). Then all we need is a natural projection on the first component:

And the verification has few difference from the trivial bundle. Quotient topology of Banach spaces follows naturally in this case, but things might be troublesome if we restrict ourself in $\mathbb{R}^n$.

The first example is relatively rare in many senses. By $S^n$ we mean the set in $\mathbb{R}^{n+1}$ with

and the tangent bundle can be defined by

where, of course, $\mathbf{x} \in S^n$ and $\mathbf{y} \in \mathbb{R}^{n+1}$. The vector bundle is given by $pr:TS^n \to S^n$ where $pr$ is the projection of the first factor. This total space is of course much finer than $M \times \mathbb{R}^n$ in the first example. Each point in the manifold now is associated with a *tangent space* $T_x(M)$ at this point.

More generally, we can define it in any Hilbert space $H$, for example, $L^2$ space:

where

The projection is natural:

But we will not cover the verification in this post since it is required to introduce the abstract definition of tangent vectors. This will be done in the following post.

We want to study those ‘vectors’ associated to some manifold both globally and locally. For example we may want to describe the tangent line of some curves at some point without heavily using elementary calculus stuff. Also, we may want to describe the vector bundle of a manifold globally, for example, when will we have a trivial one? Can we classify the manifold using the behavior of the bundle? Can we make it a little more abstract, for example, consider the class of all isomorphism bundles? How do one bundle *transform* to another? But to do this we need a big amount of definitions and propositions.

We can define several relations between two norms. Suppose we have a topological vector space $X$ and two norms $\lVert \cdot \rVert_1$ and $\lVert \cdot \rVert_2$. One says $\lVert \cdot \rVert_1$ is *weaker* than $\lVert \cdot \rVert_2$ if there is $K>0$ such that $\lVert x \rVert_1 \leq K \lVert x \rVert_2$ for all $x \in X$. Two norms are *equivalent* if each is weaker than the other (trivially this is a equivalence relation). The idea of stronger and weaker norms is related to the idea of the “finer” and “coarser” topologies in the setting of topological spaces.

So what about their limit of convergence? Unsurprisingly this can be verified with elementary $\epsilon-N$ arguments. Suppose now $\lVert x_n - x \rVert_1 \to 0$ as $n \to 0$, we immediately have

for some large enough $n$. Hence $\lVert x_n - x \rVert_2 \to 0$ as well. But what about the converse? We give a new definition of equivalence relation between norms.

(Definition)Two norms $\lVert \cdot \rVert_1$ and $\lVert \cdot \rVert_2$ of a topological vector space arecompatibleif given that $\lVert x_n - x \rVert_1 \to 0$ and $\lVert x_n - y \rVert_2 \to 0$ as $n \to \infty$, we have $x=y$.

By the uniqueness of limit, we see if two norms are equivalent, then they are compatible. And surprisingly, with the help of the closed graph theorem we will discuss in this post, we have

(Theorem 1)If $\lVert \cdot \rVert_1$ and $\lVert \cdot \rVert_2$ are compatible, and both $(X,\lVert\cdot\rVert_1)$ and $(X,\lVert\cdot\rVert_2)$ are Banach, then $\lVert\cdot\rVert_1$ and $\lVert\cdot\rVert_2$ are equivalent.

This result looks natural but not seemingly easy to prove, since one find no way to build a bridge between the limit and a general inequality. But before that, we need to elaborate some terminologies.

(Definition)For $f:X \to Y$, thegraphof $f$ is defined by

If both $X$ and $Y$ are topological spaces, and the topology of $X \times Y$ is the usual one, that is, the smallest topology that contains all sets $U \times V$ where $U$ and $V$ are open in $X$ and $Y$ respectively, and if $f: X \to Y$ is continuous, it is natural to expect $G(f)$ to be closed. For example, by taking $f(x)=x$ and $X=Y=\mathbb{R}$, one would expect the diagonal line of the plane to be closed.

(Definition)The topological space $(X,\tau)$ is an $F$-space if $\tau$ is induced by a complete invariant metric $d$. Here invariant means that $d(x+z,y+z)=d(x,y)$ for all $x,y,z \in X$.

A Banach space is easily to be verified to be a $F$-space by defining $d(x,y)=\lVert x-y \rVert$.

(Open mapping theorem)See this post

By definition of closed set, we have a practical criterion on whether $G(f)$ is closed.

(Proposition 1)$G(f)$ is closed if and only if, for any sequence $(x_n)$ such that the limitsexist, we have $y=f(x)$.

In this case, we say $f$ is closed. For continuous functions, things are trivial.

(Proposition 2)If $X$ and $Y$ are two topological spaces and $Y$ is Hausdorff, and $f:X \to Y$ is continuous, then $G(f)$ is closed.

*Proof.* Let $G^c$ be the complement of $G(f)$ with respect to $X \times Y$. Fix $(x_0,y_0) \in G^c$, we see $y_0 \neq f(x_0)$. By the Hausdorff property of $Y$, there exists some open subsets $U \subset Y$ and $V \subset Y$ such that $y_0 \in U$ and $f(x_0) \in V$ and $U \cap V = \varnothing$. Since $f$ is continuous, we see $W=f^{-1}(V)$ is open in $X$. We obtained a open neighborhood $W \times U$ containing $(x_0,y_0)$ which has empty intersection with $G(f)$. This is to say, every point of $G^c$ has a open neighborhood contained in $G^c$, hence a interior point. Therefore $G^c$ is open, which is to say that $G(f)$ is closed. $\square$

**REMARKS.** For $X \times Y=\mathbb{R} \times \mathbb{R}$, we have a simple visualization. For $\varepsilon>0$, there exists some $\delta$ such that $|f(x)-f(x_0)|<\varepsilon$ whenever $|x-x_0|<\delta$. For $y_0 \neq f(x_0)$, pick $\varepsilon$ such that $0<\varepsilon<\frac{1}{2}|f(x_0)-y_0|$, we have two boxes ($CDEF$ and $GHJI$ on the picture), namely

and

In this case, $B_2$ will not intersect the graph of $f$, hence $(x_0,y_0)$ is an interior point of $G^c$.

The Hausdorff property of $Y$ is not removable. To see this, since $X$ has no restriction, it suffices to take a look at $X \times X$. Let $f$ be the identity map (which is continuous), we see the graph

is the diagonal. Suppose $X$ is not Hausdorff, we reach a contradiction. By definition, there exists some distinct $x$ and $y$ such that all neighborhoods of $x$ contain $y$. Pick $(x,y) \in G^c$, then *all* neighborhoods of $(x,y) \in X \times X$ contain $(x,x)$ so $(x,y) \in G^c$ is *not* a interior point of $G^c$, hence $G^c$ is not open.

Also, as an immediate consequence, every affine algebraic variety in $\mathbb{C}^n$ and $\mathbb{R}^n$ is closed with respect to Euclidean topology. Further, we have the Zariski topology $\mathcal{Z}$ by claiming that, if $V$ is an affine algebraic variety, then $V^c \in \mathcal{Z}$. It’s worth noting that $\mathcal{Z}$ is *not* Hausdorff (example?) and in fact much coarser than the Euclidean topology although an affine algebraic variety is both closed in the Zariski topology and the Euclidean topology.

After we have proved this theorem, we are able to prove the theorem about compatible norms. We shall assume that both $X$ and $Y$ are $F$-spaces, since the norm plays no critical role here. This offers a greater variety but shall not be considered as an abuse of abstraction.

(The Closed Graph Theorem)Suppose(a) $X$ and $Y$ are $F$-spaces,

(b) $f:X \to Y$ is linear,

(c) $G(f)$ is closed in $X \times Y$.

Then $f$ is continuous.

In short, the closed graph theorem gives a sufficient condition to claim the continuity of $f$ (keep in mind, linearity does not imply continuity). If $f:X \to Y$ is continuous, then $G(f)$ is closed; if $G(f)$ is closed and $f$ is linear, then $f$ is continuous.

*Proof.* First of all we should make $X \times Y$ an $F$-space by assigning addition, scalar multiplication and metric. Addition and scalar multiplication are defined componentwise in the nature of things:

The metric can be defined without extra effort:

Then it can be verified that $X \times Y$ is a topological space with translate invariant metric. (Potentially the verifications will be added in the future but it’s recommended to do it yourself.)

Since $f$ is linear, the graph $G(f)$ is a subspace of $X \times Y$. Next we quote an elementary result in point-set topology, a subset of a complete metric space is closed if and only if it’s complete, by the translate-invariance of $d$, we see $G(f)$ is an $F$-space as well. Let $p_1: X \times Y \to X$ and $p_2: X \times Y \to Y$ be the natural projections respectively (for example, $p_1(x,y)=x$). Our proof is done by verifying the properties of $p_1$ and $p_2$ on $G(f)$.

*For simplicity one can simply define $p_1$ on $G(f)$ instead of the whole space $X \times Y$, but we make it a global projection on purpose to emphasize the difference between global properties and local properties. One can also write $p_1|_{G(f)}$ to dodge confusion.*

**Claim 1.** $p_1$ (with restriction on $G(f)$) defines an isomorphism between $G(f)$ and $X$.

For $x \in X$, we see $p_1(x,f(x)) = x$ (surjectivity). If $p_1(x,f(x))=0$, we see $x=0$ and therefore $(x,f(x))=(0,0)$, hence the restriction of $p_1$ on $G$ has trivial kernel (injectivity). Further, it’s trivial that $p_1$ is linear.

**Claim 2.** $p_1$ is continuous on $G(f)$.

For every sequence $(x_n)$ such that $\lim_{n \to \infty}x_n=x$, we have $\lim_{n \to \infty}f(x_n)=f(x)$ since $G(f)$ is closed, and therefore $\lim_{n \to \infty}p_1(x_n,f(x_n)) =x$. Meanwhile $p_1(x,f(x))=x$. The continuity of $p_1$ is proved.

**Claim 3.** $p_1$ is a homeomorphism with restriction on $G(f)$.

We already know that $G(f)$ is an $F$-space, so is $X$. For $p_1$ we have $p_1(G(f))=X$ is of the second category (since it’s an $F$-space and $p_1$ is one-to-one), and $p_1$ is continuous and linear on $G(f)$. By the open mapping theorem, $p_1$ is an open mapping on $G(f)$, hence is a homeomorphism thereafter.

**Claim 4.** $p_2$ is continuous.

This follows the same way as the proof of claim 2 but much easier since we have no need to care about $f$.

Now things are immediate once one realizes that $f=p_2 \circ p_1|_{G(f)}^{-1}$, and hence $f$ is continuous. $\square$

Before we go for theorem 1 at the beginning, we drop an application on Hilbert spaces.

Let $T$ be a bounded operator on the Hilbert space $L_2([0,1])$ so that if $\phi \in L_2([0,1])$ is a continuous function so is $T\phi$. Then the restriction of $T$ to $C([0,1])$ is a bounded operator of $C([0,1])$.

For details please check this.

Now we go for the identification of norms. Define

i.e. the identity map between two Banach spaces (hence $F$-spaces). Then $f$ is linear. We need to prove that $G(f)$ is closed. For the convergent sequence $(x_n)$

we have

Hence $G(f)$ is closed. Therefore $f$ is continuous, hence bounded, we have some $K$ such that

By defining

we see $g$ is continuous as well, hence we have some $K’$ such that

Hence two norms are weaker than each other.

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it’s time to make a list of the series. It’s been around half a year.

- The Big Three Pt. 1 - Baire Category Theorem Explained
- The Big Three Pt. 2 - The Banach-Steinhaus Theorem
- The Big Three Pt. 3 - The Open Mapping Theorem (Banach Space)
- The Big Three Pt. 4 - The Open Mapping Theorem (F-Space)
- The Big Three Pt. 5 - The Hahn-Banach Theorem (Dominated Extension)
- The Big Three Pt. 6 - Closed Graph Theorem with Applications

- Walter Rudin,
*Functional Analysis* - Peter Lax,
*Functional Analysis* - Jesús Gil de Lamadrid,
*Some Simple Applications of the Closed Graph Theorem*

Partition of unity builds a bridge between local properties and global properties. A nice example is the Stokes’ theorem on manifolds.

Suppose $\omega$ is a $(n-1)$-form with compact support on a oriented manifold $M$ of dimension $n$ and if $\partial{M}$ is given the induced orientation, then

This theorem can be proved in two steps. First, by Fubini’s theorem, one proves the identity on $\mathbb{R}^n$ and $\mathbb{H}^n$. Second, for the general case, let $(U_\alpha)$ be an oriented atlas for $M$ and $(\rho_\alpha)$ a partition of unity to $(U_\alpha)$, one naturally writes $\omega=\sum_{\alpha}\rho_\alpha\omega$. Since $\int_M d\omega=\int_{\partial M}\omega$ is linear with respect to $\omega$, it suffices to prove it only for $\rho_\alpha\omega$. Note that the support of $\rho_\alpha\omega$ is contained in the intersection of supports of $\rho_\alpha$ and $\omega$, hence a compact set.

On the other hand, $U_\alpha$ is diffeomorphic to either $\mathbb{R}^n$ or $\mathbb{H}^n$, it is immediate that

Which furnishes the proof for the general case.

As is seen, to prove a global thing, we do it locally. If you have trouble with these terminologies, never mind. We will go through this right now (in a more abstract way however). If you are familiar with them however, fell free to skip.

Throughout, we use bold letters like $\mathbf{E}$, $\mathbf{F}$ to denote Banach spaces. We will treat Euclidean spaces as a case instead of our restriction. Indeed since Banach spaces are not necessarily of finite dimension, our approach can be troublesome. But the benefit is a better view of abstraction.

Let $X$ be a set. An

atlas of class$C^p$ ($p \geq 0$) on $X$ is a collection of pairs $(U_i,\varphi_i)$ where $i$ ranges through some indexing set, satisfying the following conditions:

AT 1.Each $U_i$ is a subset of $X$ and $\bigcup_{i}U_i=X$.

AT 2.Each $\varphi_i$ is a bijection of $U_i$ onto an open subset $\varphi_iU_i$ of some Banach space $\mathbf{E}_i$ and for any $i$ and $j$, $\phi_i(U_i \cap U_j)$ is open in $E_i$.

AT 3.The mapis a $C^p$-isomorphism for all $i$ and $j$.

One should be advised that isomorphism here does not come from group theory, but category theory. Precisely speaking, it’s the isomorphism in the category $\mathfrak{O}$ whose objects are the continuous maps of Banach spaces and whose morphisms are the continuous maps of class $C^p$.

Also, by setting $\tau_X=(U_i)_i$, we see $\tau_X$ is a topology, and $\varphi_i$ are topological isomorphisms. Also, we see no need to assume that $X$ is Hausdorff unless we start with Hausdorff spaces. Lifting this restriction gives us more freedom (also sometimes more difficulty to some extent though).

For condition **AT 2**, we did not require that the vector spaces be the same for all indexes $i$, or even that they be toplinearly isomorphic. If they are all equal to the same space $\mathbf{E}$, then we say that the atlas is an $\mathbf{E}$-atlas.

Suppose that we are given an open subset $U$ of $X$ and a topological isomorphism $\phi:U \to U’$ onto an open subset of some Banach space $\mathbb{E}$. We shall say that $(U,\varphi)$ is **compatible** with the atlas $(U_i,\varphi_i)_i$ if each map $\varphi\circ\varphi^{-1}$ is a $C^p$-isomorphism. Two atlas are said to be **compatible** if each chart of one is compatible with other atlas. It can be verified that this is a equivalence relation. *An equivalence relation of atlases of class $C^p$ on $X$ is said to define a structure of $C^p$- manifold on $X$.* If all the vector spaces $\mathbf{E}_i$ in some atlas are toplinearly isomorphic, we can find some universal $\mathbf{E}$ that is equal to all of them. In this case, we say $X$ is a $\mathbf{E}$-manifold or that $X$ is

As we know, $\mathbb{R}^n$ is a Banach space. If $\mathbf{E}=\mathbb{R}^n$ for some fixed $n$, then we say that the manifold is $n$-dimensional. Also we have the **local coordinates**. A chart

is given by $n$ coordinate functions $\varphi_1,\cdots,\varphi_n$. If $P$ denotes a point of $U$, these functions are often written

or simply $x_1,\cdots,x_n$.

Let $X$ be a topological space. A covering $\mathfrak{U}$ of $X$ is **locally finite** if every point $x$ has a neighborhood $U$ such that all but a finite number of members of $\mathfrak{U}$ do not intersect with $U$ (as you will see, this prevents some nonsense summation). A **refinement** of a covering $\mathfrak{U}$ is a covering $\mathfrak{U}’$ such that for any $U’ \in \mathfrak{U}’$, there exists some $U \in \mathfrak{U}$ such that $U’ \subset U$. If we write $\mathfrak{U} \leq \mathfrak{U}’$ in this case, we see that the set of open covers on a topological space forms a *direct set*.

A topological space is **paracompact** if it is Hausdorff, and every open covering has a locally finite open refinement. Here follows some examples of paracompact spaces.

- Any compact Hausdorff space.
- Any CW complex.
- Any metric space (hence $\mathbb{R}^n$).
- Any Hausdorff Lindelöf space.
- Any Hausdorff $\sigma$-compact space

These are not too difficult to prove, and one can easily find proofs on the Internet. Below are several key properties of paracompact spaces.

If $X$ is paracompact, then $X$ is normal. (Proof here)

Let $X$ be a paracompact (hence normal) space and $\mathfrak{U}=(U_i)$ a locally finite open cover, then there exists a locally finite open covering $\mathfrak{V}=(V_i)$ such that $\overline{V_i} \subset U_i$. (Proof here. Note the axiom of choice is assumed.

One can find proofs of the following propositions on *Elements of Mathematics, General Topology, Chapter 1-4* by N. Bourbaki. It’s interesting to compare them to the corresponding ones of compact spaces.

Every closed subspace $F$ of a paracompact space $X$ is paracompact.

The product of a paracompact space and a compact space is paracompact.

Let $X$ be a locally compact paracompact space. Then every open covering $\mathfrak{R}$ of $X$ has a locally finite open refinement $\mathfrak{R}’$ formed of relatively compact sets. If $X$ is $\sigma$-compact then $\mathfrak{R}’$ can be taken to be countable.

A

partition of unity(of class $C^p$) on a manifold $X$ consists of an open covering $(U_i)$ of $X$ and a family of functionssatisfying the following conditions:

PU 1.For all $x \in X$ we have $\phi_i(x) \geq 0$.

PU 2.The support of $\psi_i$ is contained in $U_i$.

PU 3.The covering is locally finite

PU 4.For each point $x \in X$ we have

The sum in PU 4 makes sense because for given point $x$, there are only finite many $i$ such that $\psi_i(x) >0$, according to PU 3.

A manifold $X$ will be said to **admit partition of unity** if it is paracompact, and if, given a locally finite open covering $(U_i)$, there exists a partition of unity $(\psi_i)$ such that the support of $\psi_i$ is contained in $U_i$.

This function will be useful when dealing with finite dimensional case.

For every integer $n$ and every real number $\delta>0$ there exist maps $\psi_n \in C^{\infty}(\mathbb{R}^n;\mathbb{R})$ which equal $1$ on $B(0,1)$ and vanish in $\mathbb{R}^n\setminus B(1,1+\delta)$.

*Proof.* It suffices to prove it for $\mathbb{R}$ since once we proved the existence of $\psi_1$, then we may write

Consider the function $\phi: \mathbb{R} \to \mathbb{R}$ defined by

The reader may have seen it in some analysis course and should be able to check that $\phi \in C^{\infty}(\mathbb{R};\mathbb{R})$. Integrating $\phi$ from $-\infty$ to $x$ and divide it by $\lVert \phi \rVert_1$ (you may have done it in probability theory) to obtain

it is immediate that $\theta(x)=0$ for $x \leq a$ and $\theta(x)=1$ for $x \geq b$. By taking $a=1$ and $b=(1+\delta)^2$, our job is done by letting $\psi_1(x)=1-\theta(x^2)$. Considering $x^2=|x|^2$, one sees that the identity about $\psi_n$ and $\psi_1$ is redundant. $\square$

In the following blog posts, we will generalize this to Hilbert spaces.

Of course this is desirable. But we will give an example that sometimes we cannot find a satisfying partition of unity.

Let $D$ be a connected bounded open set in $\ell^p$ where $p$ is not an even integer. Assume $f$ is a real-valued function, continuous on $\overline{D}$ and $n$-times differentiable in $D$ with $n \geq p$. Then $f(\overline{D}) \subset \overline{f(\partial D)}$.

(Corollary)Let $f$ be an $n$-times differentiable function on $\ell^p$ space, where $n \geq p$, and $p$ is not an even integer. If $f$ has its support in a bounded set, then $f$ is identically zero.

It follows that for $n \geq p$, $C^n$ partitions of unity do not exists whenever $p$ is not an even integer. For example,e $\ell^1[0,1]$ does not have a $C^2$ partition of unity. It is then our duty to find that under what condition does the desired partition of unity available.

Below are two theorems about the existence of partitions of unity. We are not proving them here but in the future blog post since that would be rather long. The restrictions on $X$ are acceptable. For example $\mathbb{R}^n$ is locally compact and hence the manifold modeled on $\mathbb{R}^n$.

Let $X$ be a manifold which is locally compact Hausdorff and whose topology has a countable base. Then $X$ admits partitions of unity

Let $X$ be a paracompact manifold of class $C^p$, modeled on a separable Hilbert space $E$, then $X$ admits partitions of unity (of class $C^p$)

- N. Bourbaki,
*Elements of Mathematics* - S. Lang,
*Fundamentals of Differential Geometry* - M. Berger,
*Differential Geometry: Manifolds, Curves, and Surfaces* - R. Bonic and J. Frampton,
*Differentiable Functions on Certain Banach Spaces*

对于$\Gamma$函数，我们有一个经典的极限式（证明请见ProofWiki）。

利用这个式子，我们能立刻计算出一些比较难算的极限。注意到这个公式如果写成自然数的形式，有

所以我们能立刻计算出这个极限：

但是Stirling公式不仅仅如此。这篇博客里我们会见到几个比较经典的估计。

这一节我们会看到的结论是

如果在计算器里算一下右边的数，会发现，$\phi_n=\frac{n!}{(n/e)^n\sqrt{2\pi n}}$一直在$1$附近。

对于$m=1,2,3,\dots$，在$y=\ln(x)$下方定义“折线函数”：

其中$m \leq x \leq m+1$。在上方定义另一个“折线函数”：

其中$m-1/2 \leq x < m+1/2$。如果画出$f$，$\ln{x}$，$g$的图像，会发现，$f$和$g$是对$\ln{x}$的拟合。且在$x \geq 1$时，我们有

所以计算定积分的时候就有

但是$f$和$g$的关系并不是那么简单。计算$f$的积分，我们发现

而对于$g$，我们又有

这就说明

总结上面几个不等式，我们得到，对$n>1$：

不等式各项都减去$\int_1^n \ln x dx$，我们又有

由Stirling公式我们知道，

而数列$x_n=-\frac{1}{8n}+\ln(n!)-(\frac{1}{2}+n)\ln{n}+n$是单调递增的，由上式可知收敛到$\ln\sqrt{2\pi}$。在不等式左边，我们取上确界$\ln\sqrt{2\pi}$。在不等式右边，我们取下确界$x_1+\frac{1}{8}=1$。这就让我们得到了

这也就导致

这对所有$n =1,2,3,\dots$都成立。

对于任意$c \in \mathbb{R}$，我们有

这可以看成，把$\Gamma(x)$向左平移$c$后，在$x$足够大时，其值和$x^c\Gamma(x)$接近。这个等式的证明也是比较简单的，虽然计算比较繁琐，只需要利用Stirling公式。

现在这三个因式的极限就很好计算了。显然我们有

以及

最后，

故原极限为$1$。计算过程也非常精彩。注意到如果把$x$和$c$换成正整数$n$和整数$k$，我们又有

结合Bernoulli不等式我们有

接下来我们会给出一个比较精细的估计。实际上，

根据$B(x,y)$函数的定义，

令$t=u^2，我们得到

代入$x=\frac{1}{2}$和$y=n+1$，我们就和所想要的结果很近了：

注意到，利用$B$函数的第二个表达式，我们是可以计算出$\Gamma(\frac{1}{2})$的。实际上，

从而$\Gamma(\frac{1}{2})=\sqrt{\pi}$。对于$B(\frac{1}{2},n+1)$，我们可以用到上面的平移公式了：

从而

最后我们证明一个和Stirling公式没有关系的等式

根据古典代数学基本定理，我们立刻有

注意到另一方面

$x=1$时，我们有

此即

考虑到欧拉反射公式，对于$1 \leq k \leq n-1$，我们有

如果$n$为奇数，那么根据上面的结果，我们能得到

这时我们只用到了一半数量的$k$。要用上另一半的$k$，我们只需要把$k$和$n-k$交换顺序，从而得到了

即为所得。如果$n$为偶数，只需要把$1/2$这一项单独拿出来分两段计算即可。

我们给出两个看上去很难计算的极限式。

如果用Stirling公式直接替换$n!$，这个极限的结果是显然的。

所以只需要求$(1+\frac{1}{n})^{n^2}e^{-n}$的极限即可。但是可千万别想当然地认为这个极限是$1$。如果我们利用Taylor展开，能得到

所以原极限为$\sqrt\frac{2\pi}{e}$

注意$n$项的分子相乘，有$\exp(n-1-\frac{1}{2}-\cdots-\frac{1}{n})$，而调和级数是发散的，我们想得到收敛，自然就要想到Euler常数$\gamma=\lim_{n\to\infty}\left(1+\frac{1}{2}+\cdots+\frac{1}{n}-\ln{n}\right)$。我们似乎也没有办法直接化简分母，我们知道$(1+1/k)^k$的极限是$e$，但是这里似乎用不上。所以不如先把分母展开化简一下。

所以原极限可以写成

这时候就可以直接使用Stirling公式了。

而$\lim_{n\to\infty}\left(1+\frac{1}{n}\right)^{-n}=e^{-1}$，$\lim_{n\to\infty}e^{\ln{n}-1-\frac{1}{2}-\frac{1}{3}-\cdots-\frac{1}{n}}=e^{-\gamma}$，我们得到原极限为$\frac{\sqrt{2\pi}}{e^{1+\gamma}}$

]]>

(Gleason-Kahane-Żelazko)If $\phi$ is a complex linear functional on a unitary Banach algebra $A$, such that $\phi(e)=1$ and $\phi(x) \neq 0$ for every invertible $x \in A$, thenNamely, $\phi$ is a complex homomorphism.

Suppose $A$ is a complex unitary Banach algebra and $\phi: A \to \mathbb{C}$ is a linear functional which is not identically $0$ (for convenience), and if

for all $x \in A$ and $y \in A$, then $\phi$ is called a *complex homomorphism* on $A$. Note that a unitary Banach algebra (with $e$ as multiplicative unit) is also a ring, so is $\mathbb{C}$, we may say in this case $\phi$ is a ring-homomorphism. For such $\phi$, we have an instant proposition:

Proposition 0$\phi(e)=1$ and $\phi(x) \neq 0$ for every invertible $x \in A$.

*Proof.* Since $\phi(e)=\phi(ee)=\phi(e)\phi(e)$, we have $\phi(e)=0$ or $\phi(e)=1$. If $\phi(e)=0$ however, for any $y \in A$, we have $\phi(y)=\phi(ye)=\phi(y)\phi(e)=0$, which is an excluded case. Hence $\phi(e)=1$.

For invertible $x \in A$, note that $\phi(xx^{-1})=\phi(x)\phi(x^{-1})=\phi(e)=1$. This can’t happen if $\phi(x)=0$. $\square$

The theorem reveals that Proposition $0$ actually characterizes the complex homomorphisms (ring-homomorphisms) among the linear functionals (group-homomorphisms).

This theorem was proved by Andrew M. Gleason in 1967 and later independently by J.-P. Kahane and W. Żelazko in 1968. Both of them worked mainly on commutative Banach algebras, and the non-commutative version, which focused on complex homomorphism, was by W. Żelazko. In this post we will follow the third one.

Unfortunately, one cannot find an educational proof on the Internet with ease, which may be the reason why I write this post and why you read this.

Following definitions of Banach algebra and some logic manipulation, we have several equivalences worth noting.

(Stated by Gleason)Let $M$ be a linear subspace of codimension one in a commutative Banach algebra $A$ having an identity. Suppose no element of $M$ is invertible, then $M$ is an ideal.

(Stated by Kahane and Żelazko)A subspace $X \subset A$ of codimension $1$ is a maximal ideal if and only if it consists of non-invertible elements.

(Stated by Kahane and Żelazko)Let $A$ be a commutative complex Banach algebra with unit element. Then a functional $f \in A^\ast$ is a multiplicative linear functional if and only if $f(x)=\sigma(x)$ holds for all $x \in A$.

Here $\sigma(x)$ denotes the spectrum of $x$.

Clearly any maximal ideal contains no invertible element (if so, then it contains $e$, then it’s the ring itself). So it suffices to show that it has codimension 1, and if it consists of non-invertible elements. Also note that every maximal ideal is the kernel of some complex homomorphism. For such a subspace $X \subset A$, since $e \notin X$, we may define $\phi$ so that $\phi(e)=1$, and $\phi(x) \in \sigma(x)$ for all $x \in A$. Note that $\phi(e)=1$ holds if and only if $\phi(x) \in \sigma(x)$. As we will show, $\phi$ has to be a complex homomorphism.

We leave the elementary proofs to the reader since the proof of these lemmas are off topic.

Lemma 0Suppose $A$ is a unitary Banach algebra, $x \in A$, $\lVert x \rVert<1$, then $e-x$ is invertible.

Lemma 1Suppose $f$ is an entire function of one complex variable, $f(0)=1$, $f’(0)=0$, andfor all complex $\lambda$, then $f(\lambda)=1$ for all $\lambda \in \mathbb{C}$.

Note that there is an entire function $g$ such that $f=\exp(g)$. It can be shown that $g=0$.

A mapping $\phi$ from one ring $R$ to another ring $R’$ is said to be a **Jordan homomorphism** from $R$ to $R’$ if

and

It’s of course clear that every homomorphism is Jordan. Note if $R’$ is not of characteristic $2$, the second identity is equivalent to

*To show the equivalence, one let $b=a$ in the first case and puts $a+b$ in place of $a$ in the second case.*

Since in this case $R=A$ and $R’=\mathbb{C}$, the latter of which is commutative, we also write

As we will show, the $\phi$ in the theorem is a Jordan homomorphism.

We will follow an unusual approach. By keep ‘downgrading’ the goal, one will see this algebraic problem be transformed into a pure analysis problem neatly.

To begin with, let $N$ be the kernel of $\phi$.

If $\phi$ is a complex homomorphism, it is immediate that $\phi$ is a Jordan homomorphism. Conversely, if $\phi$ is Jordan, we have

If $x\in N$, the right hand becomes $0$, and therefore

Consider the identity

Therefore

Since $x \in N$ and $yxy \in A$, we see $x(yxy)+(yxy)x \in N$. Therefore $\phi(xy-yx)=0$ and

if $x \in N$ and $y \in A$. Further we see

which implies that $N$ is an ideal. This may remind you of this classic diagram (we will not use it since it is additive though):

For $x,y \in A$, we have $x \in \phi(x)e+N$ and $y \in \phi(y)e+N$. As a result, $xy \in \phi(x)\phi(y)e+N$, and therefore

Again, if $\phi$ is Jordan, we have $\phi(x^2)=\phi(x)^2$ for all $x \in A$. Conversely, if $\phi(a^2)=0$ for all $a \in N$, we may write $x$ by

where $a \in N$ for all $x \in A$. Therefore

which also shows that $\phi$ is Jordan.

Fix $a \in N$, assume $\lVert a \rVert = 1$ without loss of generality, and define

for all complex $\lambda$. If this function is constant (lemma 1), we immediately have $f’’(0)=\phi(a^2)=0$. This is purely a complex analysis problem however.

Note in the definition of $f$, we have

So we expect the norm of $\phi$ to be finite, which ensures that $f$ is entire. By *reductio ad absurdum*, if $\lVert e-a \rVert < 1$ for $a \in N$, by lemma 0, we have $e-e+a=a$ to be invertible, which is impossible. Hence $\lVert e-a \rVert \geq 1$ for all $a \in N$. On the other hand, for $\lambda \in \mathbb{C}$, we have the following inequality:

Therefore $\phi$ is *continuous* with norm $1$. The continuity of $\phi$ is not assumed at the beginning.

For $f$ we have some immediate fact. Since each coefficient in the series of $f$ has finite norm, $f$ is entire with $f’(0)=\phi(a)=0$. Also, since $\phi$ has norm $1$, we also have

All we need in the end is to show that $f(\lambda) \neq 0$ for all $\lambda \in \mathbb{C}$.

The series

converges since $\lVert a \rVert=1$. The continuity of $\phi$ shows now

Note

Hence $E(\lambda)$ *is* invertible for all $\lambda \in C$, hence $f(\lambda)=\phi(E(\lambda)) \neq 0$. By lemma 1, $f(\lambda)=1$ is constant. The proof is completed by reversing the steps. $\square$

- Walter Rudin,
*Real and Complex Analysis* - Walter Rudin,
*Functional Analysis* - Andrew M. Gleason,
*A Characterization of Maximal Ideals* - J.-P. Kahane and W. Żelazko,
*A Characterization of Maximal Ideals in Commutative Banach Algebras* - W. Żelazko
*A Characterization of Multiplicative linear functionals in Complex Banach Algebras* - I. N. Herstein,
*Jordan Homomorphisms*

The Hahn-Banach theorem has been a central tool for functional analysis and therefore enjoys a wide variety, many of which have a numerous uses in other fields of mathematics. Therefore it’s not possible to cover all of them. In this post we are covering two ‘abstract enough’ results, which are sometimes called the dominated extension theorem. Both of them will be discussed in real vector space where topology is not endowed. This allows us to discuss any topological vector space.

Another interesting thing is, we will be using axiom of choice, or whatever equivalence you may like, for example Zorn’s lemma or well-ordering principle. Before everything, we need to examine more properties of vector spaces.

It’s obvious that every complex vector space is also a real vector space. Suppose $X$ is a complex vector space, and we shall give the definition of real-linear and complex-linear functionals.

An addictive functional $\Lambda$ on $X$ is called

real-linear(complex-linear) if $\Lambda(\alpha x)=\alpha\Lambda(x)$ for every $x \in X$ and for every real (complex) scalar $\alpha$.

For *-linear functionals, we have two important but easy theorems.

If $u$ is the real part of a complex-linear functional $f$ on $X$, then $u$ is real-linear and

*Proof.* For complex $f(x)=u(x)+iv(x)$, it suffices to denote $v(x)$ correctly. But

we see $\Im(f(x)=v(x)=-\Re(if(x))$. Therefore

but $\Re(f(ix))=u(ix)$, we get

To show that $u(x)$ is real-linear, note that

Therefore $u(x)+u(y)=u(x+y)$. Similar process can be applied to real scalar $\alpha$. $\square$

Conversely, we are able to generate a complex-linear functional by a real one.

If $u$ is a real-linear functional, then $f(x)=u(x)-iu(ix)$ is a complex-linear functional

*Proof.* Direct computation. $\square$

Suppose now $X$ is a complex topological vector space, we see a complex-linear functional on $X$ is continuous if and only if its real part is continuous. Every continuous real-linear $u: X \to \mathbb{R}$ is the real part of a unique complex-linear continuous functional $f$.

Sublinear functional is ‘almost’ linear but also ‘almost’ a norm. Explicitly, we say $p: X \to \mathbb{R}$ a sublinear functional when it satisfies

for all $t \geq 0$. As one can see, if $X$ is normable, then $p(x)=\lVert x \rVert$ is a sublinear functional. One should not be confused with semilinear functional, where inequality is not involved. Another thing worth noting is that $p$ is not restricted to be nonnegative.

A seminorm on a vector space $X$ is a real-valued function $p$ on $X$ such that

for all $x,y \in X$ and scalar $\alpha$.

Obviously a seminorm is also a sublinear functional. For the connection between norm and seminorm, one shall note that *$p$ is a norm if and only if it satisfies $p(x) \neq 0$ if $x \neq 0$.*

Are the results will be covered in this post. Generally speaking, we are able to extend a functional defined on a subspace to the whole space as long as it’s dominated by a sublinear functional. This is similar to the dominated convergence theorem, which states that if a convergent sequence of measurable functions are dominated by another function, then the convergence holds under the integral operator.

(Hahn-Banach)Suppose

- $M$ is a subspace of a real vector space $X$,
- $f: M \to \mathbb{R}$ is linear and $f(x) \leq p(x)$ on $M$ where $p$ is a sublinear functional on $X$
Then there exists a linear $\Lambda: X \to \mathbb{R}$ such that

for all $x \in M$ and

for all $x \in X$.

With that being said, if $f(x)$ is dominated by a sublinear functional, then we are able to extend this functional to the whole space with a relatively proper range.

*Proof.* If $M=X$ we have nothing to do. So suppose now $M$ is a nontrivial proper subspace of $X$. Choose $x_1 \in X-M$ and define

It’s easy to verify that $M_1$ satisfies all axioms of vector space (warning again: no topology is endowed). Now we will be using the properties of sublinear functionals.

Since

for all $x,y \in M$, we have

Let

By definition, we naturally get

and

Define $f_1$ on $M_1$ by

So when $x +tx_1 \in M$, we have $t=0$, and therefore $f_1=f$.

To show that $f_1 \leq p$ on $M_1$, note that for $t>0$, we have

which implies

Similarly,

and therefore

Hence $f_1 \leq p$.

It seems that we can never stop using step 1 to extend $M$ to a larger space, but we have to extend. (If $X$ is a finite dimensional space, then this is merely a linear algebra problem.) This meets exactly what William Timothy Gowers said in his blog post:

If you are building a mathematical object in stages and find that (i) you have not finished even after infinitely many stages, and (ii) there seems to be nothing to stop you continuing to build, then Zorn’s lemma may well be able to help you.

— How to use Zorn’s lemma

And we will show that, as W. T. Gowers said,

If the resulting partial order satisfies the chain condition and if a maximal element must be a structure of the kind one is trying to build, then the proof is complete.

To apply Zorn’s lemma, we need to construct a partially ordered set. Let $\mathscr{P}$ be the collection of all ordered pairs $(M’,f’)$ where $M’$ is a subspace of $X$ containing $M$ and $f’$ is a linear functional on $M’$ that extends $f$ and satisfies $f’ \leq p$ on $M’$. For example we have

The partial order $\leq$ is defined as follows. By $(M’,f’) \leq (M’’,f’’)$, we mean $M’ \subset M’’$ and $f’ = f’’$ on $M’$. Obviously this is a partial order (you should be able to check this).

Suppose now $\mathcal{F}$ is a chain (totally ordered subset of $\mathscr{P}$). We claim that $\mathcal{F}$ has an upper bound (which is required by Zorn’s lemma). Let

and

whenever $(M’,f’) \in \mathcal{F}$ and $y \in M’$. It’s easy to verify that $(M_0,f_0)$ is the upper bound we are looking for. But $\mathcal{F}$ is arbitrary, therefore by Zorn’s lemma, there exists a maximal element $(M^\ast,f^\ast)$ in $\mathscr{P}$. If $M^* \neq X$, according to step 1, we are able to extend $M^\ast$, which contradicts the maximality of $M^\ast$. And $\Lambda$ is defined to be $f^\ast$. By the linearity of $\Lambda$, we see

The theorem is proved. $\square$

This is a classic application of Zorn’s lemma (well-ordering principle, or Hausdorff maximality theorem). First, we showed that we are able to extend $M$ and $f$. But since we do not know the dimension or other properties of $X$, it’s not easy to control the extension which finally ‘converges’ to $(X,\Lambda)$. However, Zorn’s lemma saved us from this random exploration: Whatever happens, the maximal element is there, and take it to finish the proof.

Since inequality is appeared in the theorem above, we need more careful validation.

(Bohnenblust-Sobczyk-Soukhomlinoff)Suppose $M$ is a subspace of a vector space $X$, $p$ is a seminorm on $X$, and $f$ is a linear functional on $M$ such thatfor all $x \in M$. Then $f$ extends to a linear functional $\Lambda$ on $X$ satisfying

for all $x \in X$.

*Proof.* If the scalar field is $\mathbb{R}$, then we are done, since $p(-x)=p(x)$ in this case (can you see why?). So we assume the scalar field is $\mathbb{C}$.

Put $u = \Re f$. By dominated extension theorem, there is some real-linear functional $U$ such that $U(x)=u$ on $M$ and $U \leq p$ on $X$. And here we have

where $\Lambda(x)=f(x)$ on $M$.

To show that $|\Lambda(x)| \leq p(x)$ for $x \neq 0$, by taking $\alpha=\frac{|\Lambda(x)|}{\Lambda(x)}$, we have

since $|\alpha|=1$ and $p(\alpha{x})=|\alpha|p(x)=p(x)$. $\square$

To end this post, we state a beautiful and useful extension of the Hahn-Banach theorem, which is done by R. P. Agnew and A. P. Morse.

(Agnew-Morse)Let $X$ denote a real vector space and $\mathcal{A}$ be a collection of linear maps $A_\alpha: X \to X$ that commute, or namelyfor all $A_\alpha,A_\beta \in \mathcal{A}$. Let $p$ be a sublinear functional such that

for all $A_\alpha \in \mathcal{A}$. Let $Y$ be a subspace of $X$ on which a linear functional $f$ is defined such that

- $f(y) \leq p(y)$ for all $y \in Y$.
- For each mapping $A$ and $y \in Y$, we have $Ay \in Y$.
- Under the hypothesis of 2, we have $f(Ay)=f(y)$.
Then $f$ can be extended to $X$ by $\Lambda$ so that $-p(-x) \leq \Lambda(x) \leq p(x)$ for all $x \in X$, and

To prove this theorem, we need to construct a sublinear functional that dominates $f$. For the whole proof, see *Functional Analysis* by Peter Lax.

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it’s time to make a list of the series. It’s been around half a year.

- The Big Three Pt. 1 - Baire Category Theorem Explained
- The Big Three Pt. 2 - The Banach-Steinhaus Theorem
- The Big Three Pt. 3 - The Open Mapping Theorem (Banach Space)
- The Big Three Pt. 4 - The Open Mapping Theorem (F-Space)
- The Big Three Pt. 5 - The Hahn-Banach Theorem (Dominated Extension)
- The Big Three Pt. 6 - Closed Graph Theorem with Applications

- Walter Rudin,
*Functional Analysis*. - Peter Lax,
*Functional Analysis*. - William Timothy Gowers,
*How to use Zorn’s lemma*.

*(This section is intended to introduce the background. Feel free to skip if you already know exterior differentiation.)*

There are several useful tools for vector calculus on $\mathbb{R}^3,$ namely gradient, curl, and divergence. It is possible to treat the gradient of a differentiable function $f$ on $\mathbb{R}^3$ at a point $x_0$ as the Fréchet derivative at $x_0$. But it does not work for curl and divergence at all. Fortunately there is another abstraction that works for all of them. It comes from differential forms.

Let $x_1,\cdots,x_n$ be the linear coordinates on $\mathbb{R}^n$ as usual. We define an *algebra* $\Omega^{\ast}$ over $\mathbb{R}$ generated by $dx_1,\cdots,dx_n$ with the following relations:

This is a vector space as well, and it’s easy to derive that it has a basis by

where $i<j<k$. The $C^{\infty}$ differential *forms* on $\mathbb{R}^n$ are defined to be the tensor product

As is can be shown, for $\omega \in \Omega^{\ast}(\mathbb{R}^n)$, we have a unique representation by

and in this case we also say $\omega$ is a $C^{\infty}$ $k$-form on $\mathbb{R}^n$ (for simplicity we also write $\omega=\sum f_Idx_I$). The algebra of all $k$-forms will be denoted by $\Omega^k(\mathbb{R}^n)$. And naturally we have $\Omega^{\ast}(\mathbb{R}^n)$ to be graded since

But if we have $\omega \in \Omega^0(\mathbb{R}^n)$, we see $\omega$ is merely a $C^{\infty}$ function. As taught in multivariable calculus course, for the differential of $\omega$ we have

and it turns out that $d\omega\in\Omega^{1}(\mathbb{R}^n)$. This inspires us to obtain a generalization onto the differential operator $d$:

and $d\omega$ is defined as follows. The case when $k=0$ is defined as usual (just the one above). For $k>0$ and $\omega=\sum f_I dx_I,$ $d\omega$ is defined ‘inductively’ by

This $d$ is the so-called *exterior differentiation*, which serves as the ultimate abstract extension of gradient, curl, divergence, etc. If we restrict ourself to $\mathbb{R}^3$, we see these vector calculus tools comes up in the nature of things.

**Functions**

**$1$-forms**

**$2$-forms**

The calculation is tedious but a nice exercise to understand the definition of $d$ and $\Omega^{\ast}$.

By elementary computation we are also able to show that $d^2\omega=0$ for all $\omega \in \Omega^{\ast}(\mathbb{R}^n)$ (*Hint: $\frac{\partial^2 f}{\partial x_i \partial x_j}=\frac{\partial^2 f}{\partial x_j \partial x_i}$ but $dx_idx_j=-dx_idx_j$)*. Now we consider a vector field $\overrightarrow{v}=(v_1,v_2)$ of dimension $2$. If $C$ is an arbitrary simply closed smooth curve in $\mathbb{R}^2$, then we expect

to be $0$. If this happens (note the arbitrary of $C$), we say $\overrightarrow{v}$ to be a conservative field (path independent).

So when conservative? It happens when there is a function $f$ such that

This is equivalent to say that

If we use $C^{\ast}$ to denote the area enclosed by $C$, by Green’s theorem, we have

If you translate what you’ve learned in multivariable calculus course (path independence) into the language of differential form, you will see that the set of all conservative fields is precisely the *image* of $d_0:\Omega^0(\mathbb{R}^2) \to \Omega^1(\mathbb{R}^2)$. Also, they are in the *kernel* of the next $d_1:\Omega^1(\mathbb{R}^2) \to \Omega^2(\mathbb{R}^2)$. These $d$’s are naturally homomorphism, so it’s natural to discuss the *factor group*. But before that, we need some terminologies.

The complex $\Omega^{\ast}(\mathbb{R}^n)$ together with $d$ is called the *de Rham complex* on $\mathbb{R}^n$. Now consider the sequence

We say $\omega \in \Omega^k(\mathbb{R}^n)$ is *closed* if $d_k\omega=0$, or equivalently, $\omega \in \ker d_k$. Dually, we say $\omega$ is *exact* if there exists some $\mu \in \Omega^{k-1}(\mathbb{R}^n)$ such that $d\mu=\omega$, that is, $\omega \in \operatorname{im}d_{k-1}$. Of course all $d_k$’s can be written as $d$ but the index makes it easier to understand. Instead of doing integration or differentiation, which is ‘uninteresting’, we are going to discuss the abstract structure of it.

The $k$-th *de Rham cohomology* in $\mathbb{R}^n$ is defined to be the factor space

As an example, note that by the fundamental theorem of calculus, every $1$-form is exact, therefore $H_{DR}^1(\mathbb{R})=0$.

Since de Rham complex is a special case of *differential complex*, and other restrictions of de Rham complex plays no critical role thereafter, we are going discuss the algebraic structure of differential complex directly.

We are going to show that, there exists a long exact sequence of cohomology groups after a short exact sequence is defined. For the convenience let’s recall here some basic definitions

A sequence of vector spaces (or groups)

is said to be *exact* if the image of $f_{k-1}$ is the kernel of $f_k$ for all $k$. Sometimes we need to discuss a extremely short one by

As one can see, $f$ is injective and $g$ is surjective.

A direct sum of vector spaces $C=\oplus_{k \in \mathbb{Z}}C^k$ is called a *differential complex* if there are homomorphisms by

such that $d_{k-1}d_k=0$. Sometimes we write $d$ instead of $d_{k}$ since this *differential operator* of $C$ is universal. Therefore we may also say that $d^2=0$. The cohomology of $C$ is the direct sum of vector spaces $H(C)=\oplus_{k \in \mathbb{Z}}H^k(C) $ where

A map $f: A \to B$ where $A$ and $B$ are differential complexes, is called a *chain map* if we have $fd_A=d_Bf$.

Now consider a short exact sequence of differential complexes

where both $f$ and $g$ are chain maps (this is important). Then there exists a long exact sequence by

Here, $f^{\ast}$ and $g^{\ast}$ are the naturally induced maps. For $c \in C^q$, $d^{\ast}[c]$ is defined to be the cohomology class $[a]$ where $a \in A^{q+1}$, and that $f(a)=db$, and that $g(b)=c$. The sequence can be described using the two-layer commutative diagram below.

The long exact sequence is actually the purple one (you see why people may call this zig-zag lemma). This sequence is ‘based on’ the blue diagram, which can be considered naturally as an expansion of the short exact sequence. The method that will be used in the following proof is called diagram-chasing, whose importance has already been described by Professor James Munkres: *master* this. We will be *abusing* the properties of almost every homomorphism and group appeared in this commutative diagram to trace the elements.

First, we give a precise definition of $d^{\ast}$. For a closed $c \in C^q$, by the surjectivity of $g$ (note this sequence is exact), there exists some $b \in B^q$ such that $g(b)=c$. But $g(db)=d(g(b))=dc=0$, we see for $db \in B^{q+1}$ we have $db \in \ker g$. By the exactness of the sequence, we see $db \in \operatorname{im}{f}$, that is, there exists some $a \in A^{q+1}$ such that $f(a)=db$. Further, $a$ is closed since

and we already know that $f$ has trivial kernel (which contains $da$).

$d^{\ast}$ is therefore defined by

where $[\cdot]$ means “the homology class of”.

But it is expected that $d^{\ast}$ is a well-defined homomorphism. Let $c_q$ and $c_q’$ be two closed forms in $C^q$. To show $d^{\ast}$ is well-defined, we suppose $[c_q]=[c_q’]$ (i.e. they are homologous). Choose $b_q$ and $b_q’$ so that $g(b_q)=c_q$ and $g(b_q’)=c_q’$. Accordingly, we also pick $a_{q+1}$ and $a_{q+1}’$ such that $f(a_{q+1})=db_q$ and $f(a_{q+1}’)=db_q’$. By definition of $d^{\ast}$, we need to show that $[a_{q+1}]=[a_{q+1}’]$.

Recall the properties of factor group. $[c_q]=[c_q’]$ if and only if $c_q-c_q’ \in \operatorname{im}d$. Therefore we can pick some $c_{q-1} \in C^{q-1}$ such that $c_q-c_q’=dc_{q-1}$. Again, by the surjectivity of $g$, there is some $b_{q-1}$ such that $g(b_{q-1})=c_{q-1}$.

Note that

Therefore $b_q-b_q’-db_{q-1} \in \operatorname{im} f$. We are able to pick some $a_q \in A^{q}$ such that $f(a_q)=b_q-b_q’-db_{q-1}$. But now we have

Since $f$ is injective, we have $da_q=a_{q+1}-a_{q+1}’$, which implies that $a_{q+1}-a_{q+1}’ \in \operatorname{im}d$. Hence $[a_{q+1}]=[a_{q+1}’]$.

To show that $d^{\ast}$ is a homomorphism, note that $g(b_q+b_q’)=c_q+c_q’$ and $f(a_{q+1}+a_{q+1}’)=d(b_q+b_q’)$. Thus we have

The latter equals $[a_{q+1}]+[a_{q+1}’]$ since the canonical map is a homomorphism. Therefore we have

Therefore the long sequence exists. It remains to prove exactness. Firstly we need to prove exactness at $H^q(B)$. Pick $[b] \in H^q(B)$. If there is some $a \in A^q$ such that $f(a)=b$, then $g(f(a))=0$. Therefore $g^{\ast}[b]=g^{\ast}[f(a)]=[g(f(a))]=[0]$; hence $\operatorname{im}f \subset \ker g$.

Conversely, suppose now $g^{\ast}[b]=[0]$, we shall show that there exists some $[a] \in H^q(A)$ such that $f^{\ast}[a]=[b]$. Note $g^{\ast}[b]=\operatorname{im}d$ where $d$ is the differential operator of $C$ (why?). Therefore there exists some $c_{q-1} \in C^{q-1}$ such that $g(b)=dc_{q-1}$. Pick some $b_{q-1}$ such that $g(b_{q-1})=c_{q-1}$. Then we have

Therefore $f(a)=b-db_{q-1}$ for some $a \in A^q$. Note $a$ is closed since

and $f$ is injective. $db=0$ since we have

Furthermore,

Therefore $\ker g^{\ast} \subset \operatorname{im} f$ as desired.

Now we prove exactness at $H^q(C)$. (Notation:) pick $[c_q] \in H^q(C)$, there exists some $b_q$ such that $g(b_q)=c_q$; choose $a_{q+1}$ such that $f(a_{q+1})=db_q$. Then $d^{\ast}[c_q]=[a_{q+1}]$ by definition.

If $[c_q] \in \operatorname{im}g^{\ast}$, we see $[c_q]=[g(b_q)]=g^{\ast}[b_q]$. But $b_q$ is closed since $[b_q] \in H^q(B)$, we see $f(a_{q+1})=db_q=0$, therefore $d^{\ast}[c_q]=[a_{q+1}]=[0]$ since $f$ is injective. Therefore $\operatorname{im}g^{\ast} \subset \ker d^{\ast}$.

Conversely, suppose $d^{\ast}[c^q]=[0]$. By definition of $H^{q+1}(A)$, there is some $a_q \in A$ such that $da_q = a_{q+1}$ (can you see why?). We claim that $b_q-f(a_q)$ is closed and we have $[c_q]=g^{\ast}[b_q-f(a_q)]$.

By direct computation,

Meanwhile

Therefore $\ker d^{\ast} \subset \operatorname{im}g^{\ast}$. Note that $g(f(a_q))=0$ by exactness.

Finally, we prove exactness at $H^{q+1}(A)$. Pick $\alpha \in H^{q+1}(A)$. If $\alpha \in \operatorname{im}d^{\ast}$, then $\alpha=[a_{q+1}]$ where $f(a_{q+1})=db_q$ by definition. Then

Therefore $\alpha \in \ker f^{\ast}$. Conversely, if we have $f^{\ast}(\alpha)=[0]$, pick the representative element of $\alpha$, namely we write $\alpha=[a]$; then $[f(a)]=[0]$. But this implies that $f(a) \in \operatorname{im}d$ where $d$ denotes the differential operator of $B$. There exists some $b_{q+1} \in B^{q+1}$ and $b_q \in B^q$ such that $db_{q}=b_{q+1}$. Suppose now $c_q=g(b_q)$. $c_q$ is closed since $dc_q=g(db_q)=g(b_{q+1})=g(f(a))=0$. By definition, $\alpha=d^{\ast}[c_q]$. Therefore $\ker f^{\ast} \subset \operatorname{im}d^{\ast}$.

As you may see, almost every property of the diagram has been used. The exactness at $B^q$ ensures that $g(f(a))=0$. The definition of $H^q(A)$ ensures that we can simplify the meaning of $[0]$. We even use the injectivity of $f$ and the surjectivity of $g$.

This proof is also a demonstration of diagram-chasing technique. As you have seen, we keep running through the diagram to ensure that there is “someone waiting” at the destination.

This long exact group is useful. Here is an example.

By differential forms on a open set $U \subset \mathbb{R}^n$, we mean

And the de Rham cohomology of $U$ comes up in the nature of things.

We are able to compute the cohomology of the union of two open sets. Suppose $M=U \cup V$ is a manifold with $U$ and $V$ open, and $U \amalg V$ is the disjoint union of $U$ and $V$ (the coproduct in the category of sets). $\partial_0$ and $\partial_1$ are inclusions of $U \cap V$ in $U$ and $V$ respectively. We have a natural sequence of inclusions

Since $\Omega^{*}$ can also be treated as a contravariant functor from the category of Euclidean spaces with smooth maps to the category of commutative differential graded algebras and their homomorphisms, we have

By taking the difference of the last two maps, we have

The sequence above is a short exact sequence. Therefore we may use the zig-zag lemma to find a long exact sequence (which is also called the Mayer-Vietoris sequence) by

This sequence allows one to compute the cohomology of two union of two open sets. For example, for $H^{*}_{DR}(\mathbb{R}^2-P-Q)$, where $P(x_p,y_p)$ and $Q(x_q,y_q)$ are two distinct points in $\mathbb{R}^2$, we may write

and

Therefore we may write $M=\mathbb{R}^2$, $U=\mathbb{R}^2-P$ and $V=\mathbb{R}^2-Q$. For $U$ and $V$, we have another decomposition by

where

But

is a four-time (homeomorphic) copy of $\mathbb{R}^2$. So things become clear after we compute $H^{\ast}_{DR}(\mathbb{R}^2)$.

- Raoul Bott, Loring W. Tu,
*Differential Forms in Algebraic Topology* - Munkres J. R.,
*Elements of Algebraic Topology* - Micheal Spivak,
*Calculus on Manifolds* - Serge Lang,
*Algebra*

We are going to evaluate the Fourier transform of $\frac{\sin{x}}{x}$ and $\left(\frac{\sin{x}}{x}\right)^2$. And it turns out to be a comprehensive application of many elementary theorems of single complex variable functions. Thus it is recommended to make sure that you can evaluate and understand all the identities in this post by yourself. Also, make sure that you can recall what all words in *italics* means.

For real $t$, find the limit by

Since $\frac{\sin{x}}{x}e^{itx}\not\in L^1$, we cannot evaluate the integral of it over $\mathbb{R}$ directly since it’s not defined. Instead, for given $A>0$, the integral of it over $[-A,A]$ is defined, and we evaluate this limit to get what we want.

We will do this using contour integration. Since the complex function $f(z)=\frac{\sin{z}}{z}e^{itz}$ is *entire*, by *Cauchy’s theorem*, its integral over $[-A,A]$ is equal to the one over the path $\Gamma_A$ by going from $-A$ to $-1$ along the real axis, from $-1$ to $1$ along the lower half of the unit circle, and from $1$ to $A$ along the real axis (why?). Since the path $\Gamma_A$ avoids the origin, we may use the identity

Replacing $\sin{z}$ with $\frac{1}{2i}(e^{itz}-e^{-itz})$, we get

If we put $\varphi_A(t)=\int_{\Gamma_A}\frac{1}{2iz}e^{i(t+1)z}dz$, we see $I_A(t)=\varphi_A(t+1)-\varphi_A(t-1)$. It is convenient to divide $\varphi_A$ by $\pi$ since we therefore get

and we are cool with the divisor $2\pi i$.

Now, finish the path $\Gamma_A$ in two ways. First, by the semicircle from $A$ to $-Ai$ to $-A$; second, by the semicircle from $A$ to $Ai$ to $-A$, which finishes a circle with radius $A$ actually. For simplicity we denote the two paths by $\Gamma_U$ and $\Gamma_L$ Again by the Cauchy theorem, the first case gives us a integral with value $0$, thus by Cauchy’s theorem,

Notice that

we see, if $t\sin\theta>0$, we have $|\exp(iAte^{i\theta})| \to 0$ as $A \to \infty$. When $-\pi < \theta <0$ in this case, we have $\sin\theta<0$. Therefore we get

(You should be able to prove the convergence above.) Also trivially

But what if $t>0$? Indeed, it would be difficult to obtain the limit using the integral over $[-\pi,0]$. But we have another path, namely the upper one.

Note that $\frac{e^{itz}}{z}$ is a *meromorphic function* in $\mathbb{C}$ with a pole at $0$. For such a function we have

which implies that the residue at $0$ is $1$. By the *residue theorem*,

Note that we have used the *change-of-variable* formula as we did for the upper one. $\operatorname{Ind}_{\Gamma_L}(0)$ denotes the *winding number* of $\Gamma_L$ around $0$, which is $1$ of course. The identity above implies

Thus if $t>0$, since $\sin\theta>0$ when $0<\theta<\pi$, we get

But as already shown, $I_A(t)=\varphi_A(t+1)-\varphi_A(t-1)$. To conclude,

Since $\psi(x)=\left(\frac{\sin{x}}{x}\right)$ is even, by dividing $I_A$ by $\sqrt{\frac{1}{2\pi}}$, we actually obtain the *Fourier transform* of it by abuse of language. Therefore we also get

Note that $\hat\psi(t)$ is not continuous, let alone being uniformly continuous. ‘Therefore’, $\psi(x) \notin L^1$. The reason is, if $f \in L^1$, then $\hat{f}$ is *uniformly continuous* (proof). Another interesting fact is, this also implies the value of the Dirichlet integral since we have

We end this section by evaluating the inverse of $\hat\psi(t)$. This requires a simple calculation.

For real $t$, compute

Now since $h(x)=\frac{\sin^2{x}}{x^2} \in L^1$, we are able to say with ease that the integral above is the Fourier transform of $h(x)$. But still we will be using the limit form by

where

And we are still using the contour integration as above (keep $\Gamma_A$, $\Gamma_U$ and $\Gamma_L$ in mind!). For this we get

Therefore it suffices to discuss the function

since we have

Dividing $\mu_A(z)$ by $\frac{1}{\pi i}$, we see

An integration of $\frac{e^{itz}}{z^2}$ over $\Gamma_L$ gives

Since we still have

if $t<0$ in this case, $\frac{1}{\pi i}\mu_A(z) \to 0$ as $A \to \infty$. For $t>0$, integrating along $\Gamma_U$, we have

We can also evaluate $\mu_A(0)$ by computing the integral but we are not doing that. To conclude, we have

Therefore for $J_A$ we have

Now you may ask, how did you find the value at $0$, $2$ or $-2$? $\mu_A(0)$ is not evaluated. But $h(t) \in L^1$, $\hat{h}(t)=\sqrt{\frac{1}{2\pi}}J(t)$ is uniformly continuous, thus continuous, and the values at these points follows from continuity.

Again, we get the value of a classic improper integral by

And this time it’s not hard to find the Fourier inverse:

Thereafter you are able to evaluate the improper integral of $\left(\frac{\sin{x}}{x}\right)^n$. Using *Fubini’s* or *Tonelli’s* theorem is not a good idea. But using the contour integral as such will force you deal with $n$ binomial coefficients, which might be tedious still. It’s even possible to discuss the convergence of the sequence $(I_n)$ where

Is intended to establish the existence of the Lebesgue measure in the future, which is often denoted by $m$. In fact, the Lebesgue measure follows as a special case of R-M-K representation theorem. You may not believe it, but euclidean properties of $\mathbb{R}^k$ plays no role in the existence of $m$. The only topological property that works is the fact that $\mathbb{R}^k$ is a locally compact Hausdorff space.

The theorem is named after F. Riesz who introduced it for continuous functions on $[0,1]$ (with respect to Riemann-Steiltjes integral). Years later, after the generalization done by A. Markov and S. Kakutani, we are able to view it in a locally compact Hausdorff space.

You may find there are some over-generalized properties, but this is intended to have you being able to enjoy more alongside (there are some tools related to differential geometry). Also there are many topology and analysis tricks worth your attention.

Again, euclidean topology plays no role in this proof. We need to specify the topology for different reasons. This is similar to what we do in linear functional analysis. Throughout, let $X$ be a topological space.

**0.0 Definition.** $X$ is a *Hausdorff space* if the following is true: If $p \in X$, $q\in X$ but $p \neq q$, then there are two **disjoint** open sets $U$ and $V$ such that $p \in U$ and $q \in V$.

**0.1 Definition.** $X$ is *locally compact* if every point of $X$ has a neighborhood whose closure is compact.

**0.2 Remarks.** A Hausdorff space is also called a $T_2$ space (see Kolmogorov classification) or a separated space. There is a classic example of locally compact Hausdorff space: $\mathbb{R}^n$. It is trivial to verify this. But this is far from being enough. In the future we will see, we can construct some ridiculous but mathematically valid measures.

**0.3 Definition.** A set $E \subset X$ is called *$\sigma$-compact* if $E$ is a countable union of compact sets. Note that every open subset in a euclidean space $\mathbb{R}^n$ is $\sigma$-compact since it can always be a countable union of closed balls (which is compact).

**0.4 Definition.** A covering of $X$ is *locally finite* if every point has a neighborhood which intersects only finitely many elements of the covering. Of course, if the covering is already finite, it’s also locally finite.

**0.5 Definition.** A *refinement* of a covering of $X$ is a second covering, each element of which is contained in an element of the first covering.

**0.6 Definition.** $X$ is *paracompact* if it is Hausdorff, and every open covering has a locally finite open refinement. Obviously any compact space is paracompact.

**0.7 Theorem.** If $X$ is a second countable Hausdorff space and is locally compact, then $X$ is paracompact. For proof, see this [Theorem 2.6].

**0.8 Theorem.** If $X$ is locally compact and sigma compact, then $X=\bigcup_{i=1}^{\infty}K_i$ where for all $i \in \mathbb{N}$, $K_i$ is compact and $K_i \subset\operatorname{int}K_{i+1}$.

The basic technical tool in the theory of differential manifolds is the existence of a partition of unity. We will steal this tool for the application of analysis theory.

**1.0 Definition.** A **partition of unity** on $X$ is a collection $(g_i)$ of continuous real valued functions on $X$ such that

- $g_i \geq 0$ for each $i$.
- every $x \in X$ has a neighborhood $U$ such that $U \cap \operatorname{supp}(g_i)=\varnothing$ for all but finitely many of $g_i$.
- for each $x \in X$, we have $\sum_{i}g_i(x)=1$. (That’s why you see the word ‘unity’.)

**1.1 Definition.** A partition of unity $(g_i)$ on $X$ is *subordinate* to an open cover of $X$ if and only if for each $g_i$ there is an element $U$ of the cover such that $\operatorname{supp}(g_i) \subset U$. We say $X$ *admits* partitions of unity if and only if for every open cover of $X$, there exists a partition of unity subordinate to the cover.

**1.2 Theorem.** A Hausdorff space admits a partition of unity if and only if it is paracompact (the ‘only if’ part is by considering the definition of partition of unity. For the ‘if’ part, see here). As a corollary, we have:

**1.3 Corollary.** Suppose $V_1,\cdots,V_n$ are open subsets of a locally compact Hausdorff space $X$, $K$ is compact, and

Then there exists a partition of unity $(h_i)$ that is subordinate to the cover $(V_n)$ such that $\operatorname{supp}(h_i) \subset V_i$ and $\sum_{i=1}^{n}h_i=1$ for all $x \in K$.

**2.0 Notation.** The notation

will mean that $K$ is a compact subset of $X$, that $f \in C_c(X)$, that $f(X) \subset [0,1]$, and that $f(x)=1$ for all $x \in K$. The notation

will mean that $V$ is open, that $f \in C_c(X)$, that $f(X) \subset [0,1]$ and that $\operatorname{supp}(f) \subset V$. If both hold, we write

**2.1 Remarks.** Clearly, with this notation, we are able to simplify the statement of being subordinate. We merely need to write $g_i \prec U$ in 1.1 instead of $\operatorname{supp}(g_i) \subset U$.

**2.2 Urysohn’s Lemma for locally compact Hausdorff space.** Suppose $X$ is locally compact and Hausdorff, $V$ is open in $X$ and $K \subset V$ is a compact set. Then there exists an $f \in C_c(X)$ such that

**2.3 Remarks.** By $f \in C_c(X)$ we shall mean $f$ is a continuous function with a compact support. This relation also says that $\chi_K \leq f \leq \chi_V$. For more details and the proof, visit this page. This lemma is generally for normal space, for a proof on that level, see arXiv:1910.10381. (Question: why we consider two disjoint closed subsets thereafter?)

We will be using the $\varepsilon$-definitions of $\sup$ and $\inf$, which will makes the proof easier in this case, but if you don’t know it would be troublesome. So we need to put it down here.

Let $S$ be a nonempty subset of the real numbers that is bounded below. The lower bound $w$ is to be the infimum of $S$ if and only if for any $\varepsilon>0$, there exists an element $x_\varepsilon \in S$ such that $x_\varepsilon<w+\varepsilon$.

This definition of $\inf$ is equivalent to the if-then definition by

Let $S$ be a set that is bounded below. We say $w=\inf S$ when $w$ satisfies the following condition.

- $w$ is a lower bound of $S$.
- If $t$ is also a lower bound of $S$, then $t \leq s$.

We have the analogous definition for $\sup$.

Analysis is full of vector spaces and linear transformations. We already know that the Lebesgue integral induces a linear functional. That is, for example, $L^1([0,1])$ is a vector space, and we have a linear functional by

But what about the reverse? Given a linear functional, is it guaranteed that we have a measure to establish the integral? The R-M-K theorem answers this question affirmatively. The functional to be discussed is *positive*, which means that if $\Lambda$ is positive and $f(X) \subset [0,\infty)$, then $\Lambda{f} \in [0,\infty)$.

Let $X$ be a locally compact Hausdorff space, and let $\Lambda$ be a positive linear functional.) on $C_c(X)$. Then there exists a $\sigma$-algebra $\mathfrak{M}$ on $X$ which contains all Borel sets in $X$, and there exists a unique positive measure $\mu$ on $\mathfrak{M}$ which represents $\Lambda$ in the sense that

for all $f \in C_c(X)$.

For the measure $\mu$ and the $\sigma$-algebra $\mathfrak{M}$, we have four assertions:

- $\mu(K)<\infty$ for every compact set $K \subset X$.
- For every $E \in \mathfrak{M}$, we have

- For every open set $E$ and every $E \in \mathfrak{M}$, we have

- If $E \in \mathfrak{M}$, $A \subset E$, and $\mu(E)=0$, then $A \in \mathfrak{M}$.

**Remarks before proof.** It would be great if we can establish the Lebesgue measure $m$ by putting $X=\mathbb{R}^n$. But we need a little more extra work to get this result naturally. If 2 is satisfied, we say $\mu$ is *outer* regular, and *inner* regular for 3. If both hold, we say $\mu$ is *regular*. The partition of unity and Urysohn’s lemma will be heavily used in the proof of the main theorem, so make sure you have no problem with it.

The proof is rather long so we will split it into several steps. I will try my best to make every line clear enough.

For every open set $V \in X$, define

If $V_1 \subset V_2$ and both are open, we claim that $\mu(V_1) \leq \mu(V_2)$. For $f \prec V_1$, since $\operatorname{supp}f \subset V_1 \subset V_2$, we see $f \prec V_2$. But we are able to find some $g \prec V_2$ such that $g \geq f$, or more precisely, $\operatorname{supp}(g) \supset \operatorname{supp}(f)$. By taking another look at the proof of Urysohn’s lemma for locally compact Hausdorff space, we see there is an open set G with compact closure such that

By Urysohn’s lemma to the pair $(\overline{G},V_2)$, we see there exists a function $g \in C_c(X)$ such that

Therefore

Thus for any $f \prec V_1$ and $g \prec V_2$, we have $\Lambda{g} \geq \Lambda{f}$ (monotonic) since $\Lambda{g}-\Lambda{f}=\Lambda{(g-f)}\geq 0$. By taking the supremum over $f$ and $g$, we see

The ‘monotonic’ property of such $\mu$ enables us to *define* $\mu(E)$ for all $E \subset X$ by

The definition above is trivial to valid for open sets. Sometimes people say $\mu$ is the outer measure. We will discuss other kind of sets thoroughly in the following steps. Warning: we are not saying that $\mathfrak{M} = 2^X$. The crucial property of $\mu$, namely countable additivity, will be proved only on a certain $\sigma$-algebra.

It follows from the definition of $\mu$ that if $E_1 \subset E_2$, then $\mu(E_1) \leq \mu(E_2)$.

Let $\mathfrak{M}_F$ be the class of all $E \subset X$ which satisfy the two following conditions:

$\mu(E) <\infty$.

‘Inner regular’:

One may say here $\mu$ is the ‘inner measure’. Finally, let $\mathfrak{M}$ be the class of all $E \subset X$ such that for every compact $K$, we have $E \cap K \in \mathfrak{M}_F$. We shall show that $\mathfrak{M}$ is the desired $\sigma$-algebra.

**Remarks of Step 0.** So far, we have only proved that $\mu(E) \geq 0$ for all $E {\color\red{\subset}}X$. What about the countable additivity? It’s clear that $\mathfrak{M}_F$ and $\mathfrak{M}$ has some strong relation. We need to get a clearer view of it. Also, if we restrict $\mu$ to $\mathfrak{M}_F$, we restrict ourself to finite numbers. In fact, we will show finally $\mathfrak{M}_F \subset \mathfrak{M}$.

If $K$ is compact, then $K \in \mathfrak{M}_F$, and

Define $V_\alpha=f^{-1}(\alpha,1]$ for $K \prec f$ and $0 < \alpha < 1$. Since $f(x)=1$ for all $x \in K$, we have $K \subset V_{\alpha}$. Therefore by definition of $\mu$ for all $E \subset X$, we have

Note that $f \geq \alpha{g}$ whenever $g \prec V_{\alpha}$ since $\alpha{g} \leq \alpha < f$. Since $\mu(K)$ is an lower bound of $\frac{1}{\alpha}\Lambda{f}$ with $0<\alpha<1$, we see

Since $f(X) \in [0,1]$, we have $\Lambda{f}$ to be finite. Namely $\mu(K) <\infty$. Since $K$ itself is compact, we see $K \in \mathfrak{M}_F$.

To prove the identity, note that there exists some $V \supset K$ such that $\mu(V)<\mu(K)+\varepsilon$ for some $\varepsilon>0$. By Urysohn’s lemma, there exists some $h \in C_c(X)$ such that $K \prec h \prec V$. Therefore

Therefore $\mu(K)$ is the infimum of $\Lambda{h}$ with $K \prec h$.

**Remarks of Step 1.** We have just proved assertion 1 of the property of $\mu$. The hardest part of this proof is the inequality

But this is merely the $\varepsilon$-definition of $\inf$. Note that $\mu(K)$ is the infimum of $\mu(V)$ with $V \supset K$. For any $\varepsilon>0$, there exists some open $V$ for what? Under certain conditions, this definition is much easier to use. Now we will examine the relation between $\mathfrak{M}_F$ and $\tau_X$, namely the topology of $X$.

$\mathfrak{M}_F$ contains every open set $V$ with $\mu(V)<\infty$.

It suffices to show that for open set $V$, we have

For $0<\varepsilon<\mu(V)$, we see there exists an $f \prec V$ such that $\Lambda{f}>\mu(V)-\varepsilon$. If $W$ is any open set which contains $K= \operatorname{supp}(f)$, then $f \prec W$, and therefore $\Lambda{f} \leq \mu(W)$. Again by definition of $\mu(K)$, we see

Therefore

This is exactly the definition of $\sup$. The identity is proved.

**Remarks of Step 2.** It’s important to that this identity can only be satisfied by open sets and sets $E$ with $\mu(E)<\infty$, the latter of which will be proved in the following steps. This is the *flaw* of this theorem. With these preparations however, we are able to show the countable additivity of $\mu$ on $\mathfrak{M}_F$.

If $E_1,E_2,E_3,\cdots$ are arbitrary subsets of $X$, then

First we show this holds for finitely many open sets. This is tantamount to show that

if $V_1$ and $V_2$ are open. Pick $g \prec V_1 \cup V_2$. This is possible due to Urysohn’s lemma. By corollary 1.3, there is a partition of unity $(h_1,h_2)$ subordinate to $(V_1,V_2)$ in the sense of corollary 1.3. Therefore,

Notice that $h_1g \prec V_1$ and $h_2g \prec V_2$. By taking the supremum, we have

Now we back to arbitrary subsets of $X$. If $\mu(E_i)=\infty$ for some $i$, then there is nothing to prove. Therefore we shall assume that $\mu(E_i)<\infty$ for all $i$. By definition of $\mu(E_i)$, we see there are open sets $V_i \supset E_i$ such that

Put $V=\bigcup_{i=1}^{\infty}V_i$, and choose $f \prec V_i$. Since $f \in C_c(X)$, there is a finite collection of $V_i$ that covers the support of $f$. Therefore without loss of generality, we may say that

for some $n$. We therefore obtain

for all $f \prec V$. Since $\bigcup E_i \subset V$, we have $\mu(\bigcup E_i) \leq \mu(V)$. Therefore

Since $\varepsilon$ is arbitrary, the inequality is proved.

**Remarks of Step 3.** Again, we are using the $\varepsilon$-definition of $\inf$. One may say this step showed the subaddtivity of the outer measure. Also note the geometric series by $\sum_{k=1}^{\infty}\frac{\varepsilon}{2^k}=\varepsilon$.

Suppose $E=\bigcup_{i=1}^{\infty}E_i$, where $E_1,E_2,\cdots$ are pairwise disjoint members of $\mathfrak{M}_F$, then

If $\mu(E)<\infty$, we also have $E \in \mathfrak{M}_F$.

As a dual to Step 3, we firstly show this holds for finitely many compact sets. As proved in Step 1, compact sets are in $\mathfrak{M}_F$. Suppose now $K_1$ and $K_2$ are disjoint compact sets. We want to show that

Note that compact sets in a Hausdorff space is closed. Therefore we are able to apply Urysohn’s lemma to the pair $(K_1,K_2^c)$. That said, there exists a $f \in C_c(X)$ such that

In other words, $f(x)=1$ for all $x \in K_1$ and $f(x)=0$ for all $x \in K_2$, since $\operatorname{supp}(f) \cap K_2 = \varnothing$. By Step 1, since $K_1 \cup K_2$ is compact, there exists some $g \in C_c(X)$ such that

Now things become tricky. We are able to write $g$ by

But $K_1 \prec fg$ and $K_2 \prec (1-f)g$ by the properties of $f$ and $g$. Also since $\Lambda$ is linear, we have

Therefore we have

On the other hand, by Step 3, we have

Therefore they must equal.

If $\mu(E)=\infty$, there is nothing to prove. So now we should assume that $\mu(E)<\infty$. Since $E_i \in \mathfrak{M}_F$, there are compact sets $K_i \subset E_i$ with

Putting $H_n=K_1 \cup K_2 \cup \cdots \cup K_n$, we see $E \supset H_n$ and

This inequality holds for all $n$ and $\varepsilon$, therefore

Therefore by Step 3, the identity holds.

Finally we shall show that $E \in \mathfrak{M}_F$ if $\mu(E) <\infty$. To make it more understandable, we will use elementary calculus notation. If we write $\mu(E)=x$ and $x_n=\sum_{i=1}^{n}\mu(E_i)$, we see

Therefore, for any $\varepsilon>0$, there exists some $N \in \mathbb{N}$ such that

This is tantamount to

But by definition of the *compact* set $H_N$ above, we see

Hence $E$ satisfies the requirements of $\mathfrak{M}_F$, thus an element of it.

**Remarks of Step 4.** You should realize that we are heavily using the $\varepsilon$-definition of $\sup$ and $\inf$. As you may guess, $\mathfrak{M}_F$ should be a subset of $\mathfrak{M}$ though we don’t know whether it is a $\sigma$-algebra or not. In other words, we hope that the countable additivity of $\mu$ holds on a $\sigma$-algebra that is *properly extended* from $\mathfrak{M}_F$. However it’s still difficult to show that $\mathfrak{M}$ is a $\sigma$-algebra. We need more properties of $\mathfrak{M}_F$ to go on.

If $E \in \mathfrak{M}_F$ and $\varepsilon>0$, there is a compact $K$ and an open $V$ such that $K \subset E \subset V$ and $\mu(V-K)<\varepsilon$.

There are two ways to write $\mu(E)$, namely

where $K$ is compact and $V$ is open. Therefore there exists some $K$ and $V$ such that

Since $V-K$ is open, and $\mu(V-K)<\infty$, we have $V-K \in \mathfrak{M}_F$. By Step 4, we have

Therefore $\mu(V-K)<\varepsilon$ as proved.

**Remarks of Step 5.** You should be familiar with the $\varepsilon$-definitions of $\sup$ and $\inf$ now. Since $V-K =V\cap K^c \subset V$, we have $\mu(V-K)\leq\mu(V)<\mu(E)+\frac{\varepsilon}{2}<\infty$.

If $A,B \in \mathfrak{M}_F$, then $A-B,A\cup B$ and $A \cap B$ are elements of $\mathfrak{M}_F$.

This shows that $\mathfrak{M}_F$ is closed under union, intersection and relative complement. In fact, we merely need to prove $A-B \in \mathfrak{M}_F$, since $A \cup B=(A-B) \cup B$ and $A\cap B = A-(A-B)$.

By Step 5, for $\varepsilon>0$, there are sets $K_A$, $K_B$, $V_A$, $V_B$ such that $K_A \subset A \subset V_A$, $K_B \subset B \subset V_B$, and for $A-B$ we have

With an application of Step 3 and 5, we have

Since $K_A-V_B$ is a closed subset of $K_A$, we see $K_A-V_B$ is compact as well (a closed subset of a compact set is compact). But $K_A-V_B \subset A-B$, and $\mu(A-B) <\mu(K_A-V_B)+2\varepsilon$, we see $A-B$ meet the requirement of $\mathfrak{M}_F$ (, the fact that $\mu(A-B)<\infty$ is trivial since $\mu(A-B)<\mu(A)$).

Since $A-B$ and $B$ are pairwise disjoint members of $\mathfrak{M}_F$, we see

Thus $A \cup B \in \mathfrak{M}_F$. Since $A,A-B \in \mathfrak{M}_F$, we see $A \cap B = A-(A-B) \in \mathfrak{M}_F$.

**Remarks of Step 6.** In this step, we demonstrated several ways to express a set, all of which end up with a huge simplification. Now we are able to show that $\mathfrak{M}_F$ is a subset of $\mathfrak{M}$.

There is a precise relation between $\mathfrak{M}$ and $\mathfrak{M}_F$ by

If $E \in \mathfrak{M}_F$, we shall show that $E \in \mathfrak{M}$. For compact $K\in\mathfrak{M}_F$ (Step 1), by Step 6, we see $K \cap E \in \mathfrak{M}_F$, therefore $E \in \mathfrak{M}$.

If $E \in \mathfrak{M}$ with $\mu(E)<\infty$ however, we need to show that $E \in \mathfrak{M}_F$. By definition of $\mu$, for $\varepsilon>0$, there is an open $V$ such that

Therefore $V \in \mathfrak{M}_F$. By Step 5, there is a compact set $K$ such that $\mu(V-K)<\varepsilon$ (the open set containing $V$ should be $V$ itself). Since $E \cap K \in \mathfrak{M}_F$, there exists a compact set $H \subset E \cap K$ with

Since $E \subset (E \cap K) \cup (V-K)$, it follows from Step 1 that

Therefore $E \in \mathfrak{M}_F$.

**Remarks of Step 7.** Several tricks in the preceding steps are used here. Now we are pretty close to the fact that $(X,\mathfrak{M},\mu)$ is a measure space. Note that for $E \in \mathfrak{M}-\mathfrak{M}_F$, we have $\mu(E)=\infty$, but we have already proved the countable additivity for $\mathfrak{M}_F$. Is it ‘almost trivial’ for $\mathfrak{M}$? Before that, we need to show that $\mathfrak{M}$ is a $\sigma$-algebra. Note that assertion 3 of $\mu$ has been proved.

We will validate the definition of $\sigma$-algebra one by one.

$X \in \mathfrak{M}$.

For any compact $K \subset X$, we have $K \cap X=K$. But as proved in Step 1, $K \in \mathfrak{M}_F$, therefore $X \in \mathfrak{M}$.

If $A \in \mathfrak{M}$, then $A^c \in\mathfrak{M}$.

If $A \in \mathfrak{M}$, then $A \cap K \in \mathfrak{M}_F$. But

By Step 1 and Step 6, we see $K \cap A^c \in \mathfrak{M}_F$, thus $A^c \in \mathfrak{M}$.

If $A_n \in \mathfrak{M}$ for all $n \in \mathbb{N}$, then $A=\bigcup_{n=1}^{\infty}A_n \in \mathfrak{M}$.

We assign an auxiliary sequence of sets inductively. For $n=1$, we write $B_1=A_1 \cap K$ where $K$ is compact. Then $B_1 \in \mathfrak{M}_F$. For $n \geq 2$, we write

Since $A_n \cap K \in \mathfrak{M}_F$, $B_1,B_2,\cdots,B_{n-1} \in \mathfrak{M}_F$, by Step 6, $B_n \in \mathfrak{M}_F$. Also $B_n$ is pairwise disjoint.

Another set-theoretic manipulation shows that

Now we are able to evaluate $\mu(A \cap K)$ by Step 4.

Therefore $A \cap K \in \mathfrak{M}_F$, which implies that $A \in \mathfrak{M}$.

$\mathfrak{M}$ contains all Borel sets.

Indeed, it suffices to prove that $\mathfrak{M}$ contains all open sets and/or closed sets. We’ll show two different paths. Let $K$ be a compact set.

- If $C$ is closed, then $C \cap K$ is compact, therefore $C$ is an element of $\mathfrak{M}_F$. (By Step 2.)
- If $D$ is open, then $D \cap K \subset K$. Therefore $\mu(D \cap K) \leq \mu(K)<\infty$, which shows that $D$ is an element of $\mathfrak{M}_F$. (By Step 7.)

Therefore by 1 or 2, $\mathfrak{M}$ contains all Borel sets.

Again, we will verify all properties of $\mu$ one by one.

$\mu(E) \geq 0$ for all $E \in \mathfrak{M}$.

This follows immediately from the definition of $\mu$, since $\Lambda$ is positive and $0 \leq f \leq 1$.

$\mu$ is countably additive.

If $A_1,A_2,\cdots$ form a disjoint countable collection of members of $\mathfrak{M}$, we need to show that

If $A_n \in \mathfrak{M}_F$ for all $n$, then this is merely what we have just proved in Step 4. If $A_j \in \mathfrak{M}-\mathfrak{M}_F$ however, we have $\mu(A_j)=\infty$. So $\sum_n\mu(A_n)=\infty$. For $\mu(\cup_n A_n)$, notice that $\cup_n A_n \supset A_j$, we have $\mu(\cup_n A_n) \geq \mu(A_j)=\infty$. The identity is now proved.

So far assertion 1-3 have been proved. But the final assertion has not been proved explicitly. We do that since this property will be used when discussing the Lebesgue measure $m$. In fact, this will show that $(X,\mathfrak{M},\mu)$ is a complete measure space.

If $E \in \mathfrak{M}$, $A \subset E$, and $\mu(E)=0$, then $A \in \mathfrak{M}$.

It suffices to show that $A \in \mathfrak{M}_F$. By definition, $\mu(A)=0$ as well. If $K \subset A$, where $K$ is compact, then $\mu(K)=\mu(A)=0$. Therefore $0$ is the supremum of $\mu(K)$. It follows that $A \in \mathfrak{M}_F \subset \mathfrak{M}$.

For every $f \in C_c(X)$, $\Lambda{f}=\int_X fd\mu$.

This is the absolute main result of the theorem. It suffices to prove the inequality

for all $f \in C_c(X)$. What about the other side? By the linearity of $\Lambda$ and $\int_X \cdot d\mu$, once inequality above proved, we have

Therefore

holds as well, and this establish the equality.

Notice that since $K=\operatorname{supp}(f)$ is compact, we see the range of $f$ has to be compact. Namely we may assume that $[a,b]$ contains the range of $f$. For $\varepsilon>0$, we are able to pick a partition around $[a,b]$ such that $y_n - y_{n-1}<\varepsilon$ and

Put

Since $f$ is continuous, $f$ is Borel measurable. The sets $E_i$ are trivially pairwise disjoint Borel sets. Again, there are open sets $V_i \supset E_i$ such that

for $i=1,2,\cdots,n$, and such that $f(x)<y_i + \varepsilon$ for all $x \in V_i$. Notice that $(V_i)$ covers $K$, therefore by the partition of unity, there are a sequence of functions $(h_i)$ such that $h_i \prec V_i$ for all $i$ and $\sum h_i=1$ on $K$. By Step 1 and the fact that $f=\sum_i h_i$, we see

By the way we picked $V_i$, we see $h_if \leq (y_i+\varepsilon)h_i$. We have the following inequality:

Since $h_i \prec V_i$, we have $\mu(E_i)+\frac{\varepsilon}{n}>\mu(V_i) \geq \Lambda{h_i}$. And we already get $\sum_i \Lambda{h_i} \geq \mu(K)$. If we put them into the inequality above, we get

Observe that $\cup_i E_i=K$, by Step 9 we have $\sum_{i}\mu(E_i)=\mu(K)$. A slight manipulation shows that

Therefore for $\Lambda f$ we get

Now here comes the trickiest part of the whole blog post. By definition of $E_i$, we see $f(x) > y_{i-1}>y_{i}-\varepsilon$ for $x \in E_i$. Therefore we get simple function $s_n$ by

If we evaluate the Lebesgue integral of $f$ with respect to $\mu$, we see

For $2\varepsilon\mu(K)$, things are simple since $0\leq\mu(K)<\infty$. Therefore $2\varepsilon\mu(K) \to 0$ as $\varepsilon \to 0$. Now let’s estimate the final part of the inequality. It’s trivial that $\frac{\varepsilon}{n}\sum_{i=1}^{n}(|a|+\varepsilon)=\varepsilon(\varepsilon+|a|)$. For $y_i$, observe that $y_i \leq b$ for all $i$, therefore $\frac{\varepsilon}{n}\sum_{i=1}^{n}y_i \leq \frac{\varepsilon}{n}nb=\varepsilon b$. Thus

Notice that $b+|a| \geq 0$ since $b \geq a \geq -|a|$. Our estimation of $\Lambda{f}$ is finally done:

Since $\varepsilon$ is arbitrary, we see $\Lambda{f} \leq \int_X fd\mu$. The identity is proved.

If there are two measures $\mu_1$ and $\mu_2$ that satisfy assertion 1 to 4 and are correspond to $\Lambda$, then $\mu_1=\mu_2$.

In fact, according to assertion 2 and 3, $\mu$ is determined by the values on compact subsets of $X$. It suffices to show that

If $K$ is a compact subset of $X$, then $\mu_1(K)=\mu_2(K)$.

Fix $K$ compact and $\varepsilon>0$. By Step 1, there exists an open $V \supset K$ such that $\mu_2(V)<\mu_2(K)+\varepsilon$. By Urysohn’s lemma, there exists some $f$ such that $K \prec f \prec V$. Hence

Thus $\mu_1(K) \leq \mu_2(K)$. If $\mu_1$ and $\mu_2$ are exchanged, we see $\mu_2(K) \leq \mu_1(K)$. The uniqueness is proved.

Can we simply put $X=\mathbb{R}^k$ right now? The answer is no. Note that the outer regularity is for all sets but inner is only for open sets and members of $\mathfrak{M}_F$. But we expect the outer and inner regularity to be ‘symmetric’. There is an example showing that *locally compact* is far from being enough to offer the ‘symmetry’.

Define $X=\mathbb{R}_1 \times \mathbb{R}_2$, where $\mathbb{R}_1$ is the real line equipped with discrete metric $d_1$, and $\mathbb{R}_2$ is the real line equipped with euclidean metric $d_2$. The metric of $X$ is defined by

The topology $\tau_X$ induced by $d_X$ is naturally Hausdorff and locally compact by considering the vertical segments. So what would happen to this weird locally compact Hausdorff space?

If $f \in C_c(X)$, let $x_1,x_2,\cdots,x_n$ be those values of $x$ for which $f(x,y) \neq 0$ for at least one $y$. Since $f$ has compact support, it is ensured that there are only finitely many $x_i$’s. We are able to define a positive linear functional by

where $\mu$ is the measure associated with $\Lambda$ in the sense of R-M-K theorem. Let

By squeezing the disjoint vertical segments around $(x_i,0)$, we see $\mu(K)=0$ for all compact $K \subset E$ but $\mu(E)=\infty$.

This is in violent contrast to what we do expect. However, if $X$ is required to be $\sigma$-compact (note that the space in this example is not), this kind of problems disappear neatly.

- Walter Rudin,
*Real and Complex Analysis* - Serge Lang,
*Fundamentals of Differential Geometry* - Joel W. Robbin,
*Partition of Unity* - Brian Conrad,
*Paracompactness and local compactness* - Raoul Bott & Loring W. Tu,
*Differential Forms in Algebraic Topology*