Representation theory is important in various branches of mathematics and physics. When studying representation of finite groups, we have quite some algebra and combinatorics. When differentiation (more precisely, smoothness) joins the party, we have Lie group, involving calculus, linear algebra, geometry and much more. Especially, theories around \(SU(2)\) and \(SO(3)\) are of great importance. On one hand, they are those simplest non-elementary and high-dimensional Lie groups. On the other hand, they describes rotations in \(\mathbb{C}^2\) and \(\mathbb{R}^3\) respectively, which is "physically realistic". I believe students in physics have more to say.

In this post we develop a way to study irreducible representations of these two Lie groups, in a mathematician's way. I try my best to make sure that everything is down-to-earth, and everything can be "reduced" to 19th (pre-modern) mathematics.

Nevertheless, the reader has to be assumed to be familiar with elementary languages of representation theory (and you know that, there are a lot of abuse of language), which I think is not a problem because otherwise you wouldn't be reading this post. You need to recall eigenvalue theories in linear algebra, as well as Fourier series. We need the fact that the trigonometric system is complete. In other words trigonometric polynomials are dense in the space of continuous functions. \(\def\sym{\operatorname{Sym}}\)

We will first study \(SU(2)\) and a first classification of irreducible representations of \(SO(3)\) follows at once. This is because we have an isomorphism \[SU(2)/\{-I,I\} \cong SO(3).\] This is to say, \(SU(2)\) is a "double cover" of \(SO(3)\). To see this, notice that \(SU(2) \cong S^3\) and \(SO(3) \cong \mathbb{R}P^3\) as Lie groups, meanwhile \(\mathbb{R}P^3 \cong S^3/\{-1,1\}\) can be considered as the definition.

Of course, by representation we mean finite dimensional and unitary representations.

Indeed it seems we have nowhere to start. Instead of trying to find all of them, we will try to work on seemingly immediate representations and it turns out that they are all we are looking for.

Let \(V_0\) be the trivial representation on \(\mathbb{C}\) and \(V_1\) be the standard representation on \(\mathbb{C}^2\), which is given by ordinary matrix multiplication. These representations are irreducible. We want to extend this family to \(V_n\) for \(n \ge 2\). It is natural to think about generate representations of higher dimensions through \(V_1\). Here are several ways available.

- Direct sum: \(\bigoplus_{i=1}^{n}V_1\). The dimension is \(2n\) and unfortunately, the representation is determined by each component so essentially there is no "new thing".
- Tensor product: \(\bigotimes_{i=1}^{n}V_1\). The dimension is \(2^n\) which is way too big.
- Wedge product: \(\bigwedge^{n}V_1\). It stops at \(n=2\) and we have to deal with \(u \wedge v = - v \wedge u\). This can be annoying.
- Symmetric product: \(\sym^{n}V_1\). The dimension is \(n+1\) and it doesn't stop. Besides, it can be understood as homogeneous polynomials of degree \(n\) in two variables. This is a fantastic choice. Besides we have \(\sym^0 V_1=V_0\) so nothing is abruptly excluded.

Put \(V_n=\sym^nV_1\), which can be understood as the space of homogeneous polynomials of degree \(n\) in variables \(z_1\) and \(z_2\). \(V_n\) therefore has a canonical basis \[P_k=z_1^k z_2^{n-k}.\] And we will make use of it later.

For each \(g \in SU(2)\), we have a left action \[\begin{aligned}\rho:SU(2) &\to \operatorname{Aut}(V_n), \\ g &\mapsto (P(z) \mapsto P(zg)).\end{aligned}\] In other words, \(\rho(g)P(z)=P(zg)\) where \(z=(z_1,z_2)\) and \(zg\) is matrix multiplication. Each \(g \in SU(2)\) has matrix representation \[g=\begin{pmatrix}\alpha & \beta \\-\overline{\beta} & \overline{\alpha}\end{pmatrix}, \quad |\alpha|^2+|\beta|^2=1.\] Then \[zg=(\alpha z_1-\overline{\beta}z_2,\beta z_1+\overline{\alpha}z_2)\] When there is no confusion, we will write \(gP(z)=P(zg)\), viewing \(g\) itself as an automorphism of \(V_n\). One can also replace \(SU(2)\) with \(GL(2,\mathbb{C})\) but we are not studying that bigger one.

Since \(z \mapsto zg\) is a homogeneous map of degree \(1\) as it is linear and is non-degenerate, we have \(gP(z) \in V_n\). In other words, \(V_n\) are \(SU(2)\)-invariant. **We now have a well-defined representation.** Note \(V_0=\mathbb{C}\) so the representation is trivial, and \(V_1=\mathbb{C}^2\) yields linear maps. Again, nothing is abruptly excluded. Even more satisfyingly, those \(V_n\) are all irreducible.

Proposition 1.The representations \(V_n\) are irreducible.

*Proof.* By Schur's lemma, we need to show that each \(SU(2)\)-equivariant automorphism \(A\) of \(V_n\) is a non-zero multiple of the identity, i.e. \(A=\lambda I\) for some \(\lambda \ne 0\). By definition, for each \(g \in SU(2)\), we have \(A\rho(g)P=\rho(g)AP\) for all \(P \in V_n\). And for simplicity we write \(Ag=gA\), realising \(g\) as a linear transform of \(V_n\), instead of an element of \(SU(2)\).

The group \(SU(2)\) can be complicated, but \(U(1) \cong S^1\) is simple and can be considered as a subgroup of \(SU(2)\) in two ways. We show that these two ways are just enough to expose the irreducibility of \(V_n\).

First of all we embed \(S^1\) into \(SU(2)\) by \[a \mapsto \begin{pmatrix}a & 0 \\ 0 & a^{-1} \end{pmatrix}.\] Call the matrix right hand side \(g_a\). Then \[g_a P_k=(az_1)^{k}(a^{-1}z_2)^{n-k}=a^{2k-n}z_1^kz_2^{n-k}=a^{2k-n}P_k\] for all \(k\). This is to say, \(P_k\) is the **eigenvector** corresponding to **eigenvalue** \(a^{2k-n}\). As \(g_aA=Ag_a\), information on eigenvalues and eigenvectors can help a lot so we dig into it first.

Since \(\{P_k\}\) are linearly independent, under this basis, we have a matrix representation \[\rho(g_a) = \operatorname{diag}(a^{-n},a^{-n+2},\dots,a^{n-2},a^n).\] but we don't know how eigenspaces are spanned because we may have \(a^j=a^k\) for \(j \ne k\). However, the number \(a\) can always be chosen that \(a^{-n},a^{-n+2},\dots,a^n\) are pairwise distinct (for example, one can pick \(a\) to be a primitive \(m\)-th root of \(1\) and \(m\) is big enough). As a result, \(g_a\) has \(n\) distinct eigenvalues. Therefore, the \(a^{2k-n}\)-eigenspace can only be generated by \(P_k\).

On the other hand, by definition of \(A\), we have \[Ag_a P_k = g_aAP_k = A a^{2k-n}P_k = a^{2k-n}AP_k.\] Hence \(AP_k\) lies in \(a^{2k-n}\)-eigenspace. Therefore we have \(AP_k=c_kP_k\) for some \(c_k \ne 0\). In other words, \(P_k\) is the \(c_k\)-eigenvector of \(A\). We obtain another matrix representation under the basis \(\{P_k\}\) \[A = \begin{pmatrix}c_1 & & \\ & \ddots & \\ & & c_n\end{pmatrix}.\] We want this matrix to be a scalar matrix. The result follows from another embedding of \(U(1)\) into \(SU(2)\). Note \(a \in S^1\) can be determined by \(t \in [0,2\pi)\), and we therefore have a matrix \[g_t = \begin{pmatrix}\cos{t} & -\sin{t} \\\sin{t} & \cos{t}\end{pmatrix} \in SU(2).\] Still we have \(Ag_t=g_tA\). As we can see, \[\begin{aligned}Ag_tP_n &= A(z_1\cos{t}+z_2\sin{t})^n \\ &= A\sum_{k=0}^{n}{n \choose k}(z_1\cos{t})^k (z_2\sin{t})^{n-k} \\ &= A\sum_{k=0}^{n}{n \choose k}(\cos{t})^k(\sin{t})^{n-k}z_1^k z_2^{n-k} \\ &= A\sum_{k=0}^{n}{n \choose k}(\cos{t})^k(\sin{t})^{n-k} P_k \\ &= \sum_{k=0}^{n}{n \choose k}(\cos{t})^k(\sin{t})^{n-k} AP_k \\ &= \sum_{k=0}^{n}{n \choose k}(\cos{t})^k(\sin{t})^{n-k} c_kP_k.\end{aligned}\] This follows from our observation on eigenvalues. Next, we immediately use the eigenvalue \(c_n\) to obtain \[g_t AP_n = g_t c_nP_n = c_n \sum_{k=0}^{n}{n \choose k}(\cos{t})^k(\sin{t})^{n-k} P_k.\] This is the definition of \(g_tP_n\). Comparing coefficients of \(P_k\), we must have \(c_k=c_n\) for all \(0 \le k \le n\). Recall that \(\{P_k\}\) is a basis so coefficients must be unique for a given vector. But we have already obtained what we want: \(A=c_n I\). \(\square\)

So far we have used diagonalisation of representations of \(SU(2)\) but the diagonalisation of \(SU(2)\) itself is not touched yet. Neither have we made use of character functions. So now we invite them to the party.

Let's recall diagonalisation in \(SU(2)\). Pick \(g \in SU(2)\). First of all it is diagonalisable. Let \(\lambda_1\) and \(\lambda_2\) be their two eigenvalues, then \(|g|=\lambda_1\lambda_2=1\). Therefore we have \[g \sim \begin{pmatrix}\lambda & 0 \\ 0 & \lambda^{-1} \end{pmatrix} \sim \begin{pmatrix}\lambda^{-1} & 0 \\ 0 & \lambda \end{pmatrix}\] where \(\lambda\) is one of the eigenvalues of \(g\). Since the diagonalised matrix is still in \(SU(2)\), we have \(|\lambda|=1\), i.e., \(\lambda \in S^1\). We therefore write \(g \sim e(t) \sim e(-t)\) where \[e(t) = \begin{pmatrix} \exp(it) & 0 \\ 0 & \exp(-it) \end{pmatrix}.\] We see, \(e(s) \sim e(t)\) if and only if \(s = \pm t \mod 2\pi\). By periodicity of \(\exp\) function, we also see \(e(t)\) is in particular \(2\pi\)-periodic. If \(f:SU(2) \to \mathbb{C}\) is a class function, then \(f \circ e:\mathbb{R} \to \mathbb{C}\) is an even \(2\pi\)-periodic function. Conversely, given an even \(2\pi\)-periodic function \(h:\mathbb{R} \to \mathbb{C}\), we can recover it as a class function, and the process is as follows.

Define \(\Lambda:SU(2) \to S^1\) sending \(g \in SU(2)\) to the eigenvalue of \(g\) with non-negative imaginary part (one can also pick non-positive one, because \(h\) is even). Then \(E:SU(2) \to [0,\pi]\) given by \(g \mapsto \frac{1}{i}\log\Lambda(g)\) is a well defined function sending \(g\) into \(\mathbb{R}\) and \(h \circ E:SU(2) \to \mathbb{C}\) is a class function. Besides we have \(E \circ e(t)= \pm t \mod 2\pi\) and \(e \circ E(g)\) is the diagonalisation of \(g\). Therefore \(h \circ E \circ e(t)=h(t)\) and \(f \circ e \circ E(g)=f\) as is expected.

With help of this \(e(t)\) and \(E(t)\), we have this correspondence \[\{\text{Class functions }SU(2) \to \mathbb{C}\} \longleftrightarrow \{\text{even }2\pi-\text{periodic function }\mathbb{R} \to \mathbb{C}\}\] Recall that the space on the right hand side has a countable uniform basis \[1,\cos{t},\cos{2t},\dots.\] In other words, \(\{\cos{nt}\}_{n \ge 0}\) spans a dense subspace. This is about the completeness of trigonometric system. Since there are only even functions, \(\sin{nt}\) are excluded. For a reference to the completeness, one can check 4.25 *Real and Complex Analysis* by W. Rudin.

For class functions, we certainly want to know about characters. Let \(\chi_n\) be the character of \(V_n\), then \[\chi_n(e(t))=\operatorname{tr}(\rho(e(t)))=\operatorname{tr}(\operatorname{diag}(\exp(it)^{-n},\dots,\exp(it)^n))=\sum_{k=0}^{n}e^{i(n-2k)t}.\] When \(t \in \pi\mathbb{Z}\), then \(\chi_n(e(t)) \in \mathbb{Z}\). Otherwise, as a classic exercise in calculus, we have \[\kappa_n(t)=\chi_n(e(t))=\frac{\sin(n+1)t}{\sin{t}}.\] We have \(\kappa_0(t)=1\). For \(\kappa_n(t)\) when \(n >0\), we have \[\kappa_n(t)=\frac{\cos{nt}\sin{t}+\sin{nt}\cos{t}}{\sin{t}}=\cos{nt}+\kappa_{n-1}(t)\cos{t}.\] We see \(\kappa_1(t)=2\cos{t}\). By induction, every \(\kappa_n(t)\) is a polynomial in variables \(1,\cos{t},\dots,\cos{nt}\). Therefore \(\{\kappa_n(t)\}_{n \ge 0}\) spans the same space as \(\{\cos{nt}\}_{n \ge 0}\), which is dense in the space of even \(2\pi\)-periodic functions. Note the \(\kappa_n(t)\) are linearly independent, because the leading term is \(\cos{nt}\).

The argument above shows that \(\chi_n\) spans a dense subspace in the space of class functions. In other word, \(\chi_n\) is the Fourier basis of class functions. As we all know, Fourier series is powerful. Let's see how powerful it is in the calculus of Lie group \(SU(2)\) itself.

Proposition 2.For continuous class function \(f:SU(2) \to \mathbb{C}\), we have \[\int_{SU(2)}f(x)dx = \frac{1}{\pi}\int_0^\pi f \circ e(t)\sin^2{t}dt.\]

*Proof.* On one hand, since the \(V_n\) are irreducible, by fixed point theorem of representations, \[\int_{SU(2)}\chi_n(x)dx = \dim V_n^{SU(2)} = \begin{cases} 1 & n=0, \\ 0 & n>0. \end{cases}\] Here, for a group \(G\) and a representation \(V\), \(V^G\) is the fixed point set, i.e. the space of elements that are fixed by the action of \(G\) on \(V\). Since \(\chi_n\) is irreducible, fixed points can only be \(0\) unless the representation itself is trivial. Now we move on and check the right hand side.

On the right hand side we are looking for even \(2\pi\)-periodic continuous functions, reflecting the denseness of \(\kappa_n(t)\). However we have \(\int_{-\pi}^{\pi}\kappa_1(t)dt=\pi\) so it does not vanish on \(n>0\). However, if we multiply it by \(\sin^2{t}\), then it is transformed into the form \(\sin{mt}\sin{nt}\) and we are familiar with this orthonormality. More precisely, \[\frac{1}{2\pi}\int_{-\pi}^{\pi}f\circ e(t)\sin^2{t}dt = \frac{1}{\pi}\int_{-\pi}^{\pi}\sin(n+1)t\sin{t}dt = \begin{cases} 1 & n=0, \\ 0 & n>0. \end{cases}\] Since the functional \(h \mapsto \frac{1}{2\pi}\int_{-\pi}^{\pi}h\sin^2{t}dt\) is continuous in the uniform topology and \(\kappa_n\) spans a dense subspace, the result is now obtained. \(\square\)

Finally, surprisingly and satisfyingly enough, the denseness have actually axed out all other possibilities of irreducible representation. In other words, our search in symmetric products is optimal. We can see this through Parseval's identity. This is the heart of this blog post.

Proposition 3.Every irreducible representation of \(SU(2)\) is isomorphic to one of the \(V_n\).

*Proof.* Suppose we have a character that is different from all of the \(\chi_n\). Then the orthonormality shows that \(\langle \chi,\chi_n \rangle = 0\) for all \(n \ge 0\) and \(\langle \chi,\chi \rangle=1\). Now let's see why this is absurd.

Since \(\{\chi_n\}_{n \ge 0}\) spans a dense subspace in the space of class functions, we actually have \[\chi = \sum_{n = 0 }^{\infty} a_n \chi_n.\] Therefore \[\langle \chi,\chi_n \rangle = \int_{SU(2)}\overline\chi(x)\chi_n(x)dx = a_n=0,\quad n \ge 0\] and \[\langle \chi,\chi \rangle = \sum_{n=0}^{\infty}\langle a_n\chi_n,a_n\chi_n \rangle = \sum_{n=0}^{\infty}|a_n|^2=1.\] It is impossible to have the sum of \(0\) to be \(1\). \(\square\)

Now we head to \(SO(3)\). In fact the result follows immediately from the surjection \[\pi:SU(2) \to SO(3).\] We have \(\ker\pi=\{-I,I\}\). Let \(W\) be a representation of \(SO(3)\), i.e., we have a map \[\rho:SO(3) \to GL(W).\] Then \[\pi^\ast\rho:SU(2) \to GL(W)\] by \(g \mapsto \rho(\pi(g))\) is an induced representation, and we write \(\pi^\ast W\). If \(W\) is irreducible, then \(\pi^\ast W\) is also irreducible. In particular, \(\pi^\ast\rho(-I)=\operatorname{id}_W\).

On the other hand, if \(\vartheta:SU(2) \to GL(V)\) is an irreducible representation where \(\vartheta(-I)=\operatorname{id}_V\), then we have an associated representation \[\pi_\ast\vartheta:SO(3) \cong SU(2)/\{I,-I\} \to GL(V)\] given by \(g\ker\pi \mapsto \vartheta(g)\). Let's denote it by \(\pi_\ast V\). Again, if \(V\) is irreducible, then \(\pi_\ast V\) is irreducible.

Therefore we have realised a correspondence \[\{\text{Irreducible representations of $SO(3)$}\} \\\updownarrow \\\{\text{Irreducible representations of $SU(2)$ where $-I$ acts as identity.}\}\] So it remains to determine those of \(SU(2)\). Let \(\rho_n:SU(2) \to GL(V_n)\) be an irreducible representation, then \[\rho_n(-I)P(z)=P(z(-I))=P(-z)=(-1)^nP(z)\] because \(P \in \mathbb{C}[z_1,z_2]\) is homogeneous of degree \(n\). Therefore \(-I\) acts as an identity if and only if \(n\) is even. We obtain

Proposition 4.Every irreducible representation of \(SO(3)\) is of the form \[W_n = \pi_\ast V_{2n}\] where \(V_{2n}\) is described in proposition 2.

This is, of course, just a first classification. But to introduce a classification as explicit as what we have done for \(SU(2)\), there has to be another post. As a quick overview, here is the result.

Let \(P_{\ell}\) be the complex vector space of homogeneous polynomials in three variables of degree \(\ell\), which can be considered as functions on \(\mathbb{R}^3\) immediately. This setting makes sense immediately, just as what we have done for \(SU(2)\). Then, in fact, \[W_\ell=\mathfrak{H}_\ell = \{f \in P_\ell:\Delta f=0\}.\] This is to say, \(W_\ell\) can be understood as harmonic homogeneous polynomials in \(\mathbb{R}^3\), which can also be considered to be uniquely determined on the unit sphere \(S^2\).

- Tendor Bröker and Tammo tom Dieck,
*Representations of Compact Lie Groups*. - Walter Rudin,
*Real and Complex Analysis, 3rd Edition*.

Let's admit, trying to compute the integral straightforward is somewhat unrealistic. So we need to go through an alternative way. Here comes how we do that. For convenience (of writing MathJax codes) let's write \(\varphi(t)=\hat{f}_c(t)\).

First of all, \(\hat{f}_c(t)\) is always well-defined, this is because \[\int_{-\infty}^{+\infty}|f_c(x)e^{-ixt}|dx=\int_{-\infty}^{+\infty}|f_c(x)|dx<\infty\] so we can compute it without worrying anything.

It's hard to think about but we do have it. An integration by parts gives \[\begin{aligned}\varphi(t)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}\exp(-cx^2)e^{-itx}dx &= \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}\exp(-cx^2)\frac{1}{-it}de^{-itx} \\&=\frac{i}{t\sqrt{2\pi}}[\exp(-cx^2)e^{-itx}]|_{-\infty}^{+\infty} \\&\quad -\frac{i}{t\sqrt{2\pi}}\int_{-\infty}^{+\infty}e^{-itx}d\exp(-cx^2) \\&=\frac{-2c}{t\sqrt{2\pi}}\int_{-\infty}^{+\infty}-xi\exp(-cx^2)e^{-itx}dx\end{aligned}\] On the other hand, we have \[\varphi'(t)=\hat{f'_c}(t)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}-ixf_c(x)e^{-itx}dx.\] (The well-definedness of the integral can be verified easily.) Combining both, we obtain an differential equation \[t\varphi(t)+2c\varphi'(t)=0\] This differential equation corresponds to an integral equation \[\int2c\frac{d\varphi}{\varphi}=-\int tdt.\] And we solve it to obtain \[2c\log\varphi=-\frac{1}{2}t^2+C\] or alternatively, \[\varphi(t)=C\exp(-\frac{1}{4c}t^2).\] Now put the initial value back in. As we have shown above, this subjects to the Gaussian integral \[\varphi(0)=\frac{1}{\sqrt{2\pi}}\sqrt{\frac{\pi}{c}}=\frac{1}{\sqrt{2c}}.\] Therefore \[\varphi(t)=\frac{1}{\sqrt{2c}}\exp\left(-\frac{1}{4c}t^2\right)\] is exactly what we want.

Before showing another method, we first have an question: can we have \(\hat{f}_c=f_c\)? Solving an equation with variable in \(c\) answers this question affirmatively: \[\hat{f}_c=f_c \iff \begin{cases}\frac{1}{\sqrt{2c}}=1 \\ -\frac{1}{4c}=-c \end{cases} \iff c = \frac{1}{2}.\] In other words, \(f_\frac{1}{2}\) is a fixed point of the Fourier transform. For this class of functions, the fixed point is this and only this one.

As a classic property of the Fourier transform, for \(f,g \in L^1\), we have \[\widehat{f \ast g}(t)=\hat{f}(t)\hat{g}(t)\] where \[f \ast g(x) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}f(x-y)g(y)dy.\] By the way, \(f \in L^1\) means \(\int_{-\infty}^{\infty}|f(x)|dx<\infty\). One can verify that \(f \ast g \in L^1\) here as well.

With this result, we can compute \(f_a \ast f_b\) easily. Note \[\widehat{f_a \ast f_b}(t)=\hat{f_a}(t)\hat{f_b}(t)=\frac{1}{2\sqrt{ab}}\exp\left[-\left(\frac{1}{4a}+\frac{1}{4b}\right)x^2\right]\] Now let's see if we can have \(f_a \ast f_b = \gamma f_c\) for some \(\gamma\) and \(c\). We should have \[\frac{1}{4c}=\frac{1}{4a}+\frac{1}{4c} \implies c=\frac{1}{\frac{1}{a}+\frac{1}{b}}=\frac{ab}{a+b}.\] We also have \[\gamma \frac{1}{\sqrt{2c}}=\frac{1}{2\sqrt{ab}} \implies \gamma = \sqrt\frac{c}{2ab}=\sqrt{\frac{1}{2(a+b)}}\] Therefore we have \[f_a \ast f_b = \sqrt{\frac{1}{2(a+b)}}f_c\] where \(c\) is given above. We didn't even compute the integral explicitly.

]]>Is intended to supply a detailed proofs of the Riemann mapping theorem.

Riemann mapping theorem.Every simply connected region \(\Omega \subsetneq \mathbb{C}\) is conformally equivalent to the open unit disc \(U\).

Fortunately the proof can be found in many textbooks of complex analysis, but the proof is fairly technical so it can be painful to read. This post can be considered as a painkiller. In this post you will see the proof being filled with many details. However, the writer still encourage the reader to reproduce the proof by their own pen and paper. The writer also hopes that this post can increase the accessibility of this theorem and the proof.

However, there is a bar. We need to assume some background in complex analysis, although they are very basic already. Minimal prerequisite is being able to answer the following questions.

Contour integration, Cauchy's formula.

Almost uniform convergence. Let \(\Omega \subset \mathbb{C}\) be open and suppose that \(f_j \in H(\Omega)\) for all \(j=1,2,\dots\), and \(f_j \to f\) uniformly on every compact subset \(K \subset \Omega\). Does \(f \in H(\Omega)\)? What is the uniform limit of \(f'_j\)? Informally, we call the phenomenon that a sequence of functions uniformly converges on every compact subset

*almost uniform*convergence. This has nothing to do with*almost everywhere*in integration theory. In fact, this post does not require background in Lebesgue integration theory.Open mapping theorem (complex analysis version).

Maximum modulus principle and some variants.

Rouché's theorem. Or even more, the calculus of residues.

Despite of the prerequisites, we still need some preparation beforehand.

Definition 1.Let \(X\) be aconnectedtopological space. We say \(X\) issimply connectedif every curve is null-homotopic. Let \(\gamma:[0,1] \to X\) be a closed curve, i.e., it is a continuous map such that \(\gamma(0)=\gamma(1)\). We say \(\gamma\) is null-homotopic if it is homotopic to a constant map \(\gamma_0:[0,1] \to \{x\}\) with \(x \in X\).

Intuitively, if \(X\) is simply connected, then \(X\) contains no "hole". For example, the unit disc \(U\) is simply connected. However, \(U \setminus \{0\}\) is not. On the other hand, \(U \setminus [0,1)\) is still simply connected. Another satisfying result is that every convex and connected open set is simply connected. This is up to a convex combination.

There are a lot of good properties of simply connected region, which will be summarised below.

Proposition 1.For a region (open and connected subset of \(\mathbb{R}^2\)), the following conditions are equivalent. Each one can imply other eight.

- \(\Omega\) is homeomorphic to the open unit disc \(U\).
- \(\Omega\) is simply connected.
- \(\operatorname{Ind}_\gamma(\alpha)=0\) for every path \(\gamma\) in \(\Omega\) and \(\alpha \in S^2 \setminus \Omega\), where \(S^2\) is the Riemann sphere.
- \(S^2 \setminus \Omega\) is connected.
- Every \(f \in H(\Omega)\) can be approximated by polynomials, almost uniformly..
- For every \(f \in H(\Omega)\) and every closed path \(\gamma\) in \(\Omega\),
\[\int_\gamma f(z)\mathrm{d}z=0.\]

- Every \(f \in H(\Omega)\) has anti-derivative. That is, there exists an \(F \in H(\Omega)\) such that \(F'=f\).
- If \(f \in H(\Omega)\) and \(1/f \in H(\Omega)\), then there exists a \(g \in H(\Omega)\) such that \(f=\exp{g}\).
- For such \(f\), there also exists a \(\varphi \in H(\Omega)\) such that \(f=\varphi^2\).

5~9 are pretty much saying, calculus is fine here and we are not worrying about nightmare counterexamples, to some extent. Most of the implications \(n \implies n+1\) are not that difficult, but there are some deserve a mention. 4 implying 5 is a consequence of Runge's theorem. In the implication of 7 to 8, one needs to use the fact that \(\Omega\) is connected. When we have \(f=\exp{g}\), then we can put \(\varphi=\exp\frac{g}{2}\) from which we obtain \(f=\varphi^2\). 9 implying 1 is partly a consequence of the Riemann mapping theorem. Indeed, if \(\Omega\) is the plane then the homeomorphism is easy: \(z \mapsto \frac{z}{1+|z|}\) is a homeomorphism of \(\Omega\) onto \(U\). But we need the Riemann mapping theorem to give the remaining part, when \(\Omega\) is a proper subset.

If you know the definition of sheaf, you will realise that \((\mathbb{C},H(\cdot))\) is indeed a sheaf. For each open subset \(\Omega \subset \mathbb{C}\), \(H(\Omega)\) is a ring, even more precisely, a \(\mathbb{C}\)-algebra. The exponential map \(\exp:g \mapsto e^g\) is a sheaf morphism. However, we now see that it is surjective if and only if \(\Omega\) is simply connected. I hope this can help you figure out an exercise in algebraic geometry. You know, that celebrated book by Robin Hartshorne.

Since we haven't prove the Riemann mapping theorem, we cannot use the equivalence above yet. However, we can use 9 right away. This gives rise to Koebe's square root trick.

Equicontinuity is quite an important concept. You may have seen it in differential equation, harmonic function, maybe just sequence of functions. We will use it to describe a family of functions, where almost uniform convergence can be well established.

Definition 2.Let \(\mathscr{F}\) be a family of functions \((X,d) \to \mathbb{C}\) where \((X,d)\) is a metric space.We say that \(\mathscr{F}\) is

equicontinuousif, to every \(\varepsilon>0\), there corresponds a \(\delta>0\) such that whenever \(d(x,y)<\delta\), we have \(|f(x)-f(y)|<\varepsilon\) for all \(f \in \mathscr{F}\). In particular, by definition, all functions in \(\mathscr{F}\) are uniformly continuous.We say that \(\mathscr{F}\) is

pointwise boundedif, to every \(x \in X\), there corresponds some \(0 \le M(x) < \infty\) such that \(|f(x)| \le M(x)\) for every \(f \in \mathscr{F}\).We say that \(\mathscr{F}\) is

uniformly bounded on each compact subsetif, to each compact \(K \subset X\), there corresponds a number \(M(K)\) such that \(|f(z)| \le M(K)\) for all \(f \in \mathscr{F}\) and \(z \in K\).

These concepts are talking about "a family of" continuity and boundedness. In our proof of the Riemann mapping theorem, we do not construct the map explicitly, instead, we will use these concepts above to obtain one (which is a limit) that exists. In this post we simply put \(X=\Omega \subset \mathbb{C}\), a simply connected region and \(d\) is the natural one.

A famous result of equicontinuity is Arzelà-Ascoli, which says that pointwise boundedness and equicontinuity implies almost uniform convergence.

Theorem 1 (Arzelà-Ascoli)Let \(\mathscr{F}\) be a family of complex functions on a metric space \(X\), which is pointwise bounded and equicontinuous. \(X\) is separable, i.e., it contains a countable dense set. Then every sequence \(\{f_n\}\) in \(\mathscr{F}\) has then a subsequence that converges uniformly on every compact subset of \(X\).

Here is a self-contained proof.

Certainly it is OK to let \(X\) be a subset of \(\mathbb{R}\), \(\mathbb{C}\) or their product. We use this in real and complex analysis for this reason. We will need this almost uniform convergence to establish our conformal map. To specify its application in complex analysis, we introduce the concept of normal family.

Definition 3.Suppose \(\mathscr{F} \subset H(\Omega)\), for some region \(\Omega \subset \mathbb{C}\). We call \(\mathscr{F}\) anormal familyif every sequence of members of \(\mathscr{F}\) contains a subsequence, which converges uniformly on every compact subset of \(\mathscr{F}\). The limit function is not required to be in \(\mathscr{F}\).

We now apply Arzelà-Ascoli to complex analysis.

Theorem 2 (Montel).Suppose \(\mathscr{F} \subset H(\Omega)\) is uniformly bounded, then \(\mathscr{F}\) is a normal family.

*Proof.* We need to show that \(\mathscr{F}\) is "almost" equicontinuous, since uniformly boundedness clearly implies pointwise boundedness, we can apply Arzelà-Ascoli later.

Let \(\{K_n\}\) be a sequence of compact sets such that (1) \(\bigcup_n K_n = \Omega\) and (2) \(K_n \subset K^\circ_{n+1} \subset K_{n+1}\), the interior of \(K_{n+1}\). Then for **every** \(z \in K_n\), there exists a positive number \(\delta_n\) such that \[D(z,2\delta_n) \subset K_{n+1}^\circ \subset K_{n+1},\] where \(D(a,r)\) is the disc centred at \(a\) with radius \(r\). If such \(\delta_n\) does not exist, then there exists a point \(z \in K_{n}\) such that whenever \(\delta>0\), \(D(z,\delta) \setminus K_{n+1} \ne \varnothing\), which is to say, \(z\) is a boundary point. But this is impossible because \(z\) lies in the interior of \(K_{n+1}\) by definition.

For such \(\delta_n\), we pick \(z',z'' \in K_n\) such that \(|z'-z''| < \delta_n\). Let \(\gamma\) be the positively oriented circle with centre at \(z'\) and radius \(2\delta_n\), i.e. the boundary of \(D(z',2\delta_n)\). Recall that the Cauchy formula says \[f(z')=\frac{1}{2\pi{i}}\int_\gamma \frac{f(\zeta)}{\zeta-z'}\mathrm{d}\zeta.\] We will make use of this. By the formula above, we have \[\begin{aligned}f(z')-f(z'')&=\frac{1}{2\pi{i}}\int_\gamma f(\zeta)\left(\frac{1}{\zeta-z'}-\frac{1}{\zeta-z''} \right)\mathrm{d}\zeta \\ &=\frac{z'-z''}{2\pi{i}}\int_\gamma \frac{f(\zeta)}{(\zeta-z')(\zeta-z'')}\mathrm{d}\zeta.\end{aligned}\] Now we make use of our choice of \(z'\), \(z''\) and \(\gamma\). By definition, for \(\zeta \in \gamma^\ast\) (the range of \(\gamma\)), we have \(|\zeta-z'|=2\delta_n\). Since \(|z'-z''|<\delta_n\), we have \(|\zeta-z'|=2\delta_n=|\zeta-z''+z''-z|\le |\zeta-z''|+|z''-z'|\). Therefore \(|\zeta-z''| \ge 2\delta_n-|z''-z'|>\delta_n\). Bearing this in mind, we see \[\begin{aligned}|f(z')-f(z'')| &\le \frac{|z'-z''|}{2\pi}\int_\gamma \frac{|f(\zeta)|}{|\zeta-z'||\zeta-z''|}\mathrm{d}\zeta \\ &< \frac{|z'-z''|}{2\pi}\int_\gamma \frac{M(K_{n+1})}{2\delta_n\delta_n}\mathrm{d}\zeta \\ &= \frac{|z'-z''|}{2\pi}\frac{M(K_{n+1})}{2\delta_n\delta_n}2\pi\delta_n \\ &= \frac{M(K_{n+1})}{2\delta_n}|z'-z''|\end{aligned}\] This may looks confusing so we explain it a little more. Since \(D(z',2\delta_n) \subset K^\circ_{n+1}\), we must have \(\overline{D}(z',2\delta_n) \subset K_{n+1}\), therefore whenever \(\zeta \in \gamma^\ast=\partial D(z',2\delta_n)\), we have \(|f(\zeta)| \le M(K_{n+1})\). This is where we use the hypothesis of uniformly bounded. we have \(|(\zeta-z')(\zeta-z'')|>2\delta_n\delta_n\). The integral of the norm of the integrand \(\frac{f(\zeta)}{(\zeta-z')(\zeta-z'')}\), is therefore bounded by \(\frac{M(K_{n+1})}{2\delta_n^2}\). The integral over \(\gamma\) is therefore bounded by \(\frac{M(K_{n+1})}{2\delta_n^2}\) times \(2\pi\delta_n\) and the result follows.

What does this inequality imply? For \(\varepsilon>0\), if we pick \(\delta=\min\{\delta_n,\frac{2\delta_n\varepsilon}{M(K_{n+1})}\}\), then \(|f(z')-f(z'')|<\varepsilon\) for every \(f \in \mathscr{F}\) and \(|z'-z''|<\delta\). That is, for each \(K_n\), the **restrictions** of the members of \(\mathscr{F}\) to \(K_n\) form an equicontinuous family.

Now consider a sequence \(\{f_j\}\) in \(\mathscr{F}\). For each \(n\), we apply Arzelà-Ascoli theorem to the restriction of \(\mathscr{F}\) to \(K_n\), and it gives us an infinite subset \(S_n \subset \mathbb{N}\) such that \(f_j\) converges uniformly on \(K_n\) as $j $ and \(j \in S_n\). Note we can make sure \(S_n \supset S_{n+1}\) because if the subsequence converges uniformly within \(S_{n+1}\) then it converges uniformly within \(S_n\) as well. Pick a new sequence \(\{s_j\}\) where \(s_j \in S_j\), then we see \(\lim_{j \to \infty}f_{s_j}\) converges uniformly on every \(K_n\) and therefore on every compact subset \(K\) of \(\Omega\). The statement is now proved. \(\square\)

**Remarks.** We have no idea what the limit is, and this happens in our proof of the Riemann map theorem as well.

The sequence \(\{K_n\}\) can be constructed explicitly, however. In fact, for every open set \(\Omega\) in the plane there is a sequence \(\{K_n\}\) of compact sets such that

- \(\bigcup_n K_n=\Omega\).
- \(K_n \subset K_{n+1}^\circ\).
- For every compact \(K \subset \Omega\), there is some \(n\) such that \(K \subset K_n\).
- Every component of \(S^2 \setminus K_n\) contains a component of \(S^2 \setminus \Omega\).

The set is constructed as follows and can be verified to satisfy what we want above. or each \(n\), define \[V_n = D(\infty,n) \cup \bigcup_{a \not\in \Omega}D(a,1/n).\] Then \(K=S^2 \setminus V_n\) is what we want.

Is another important tool for our proof of the Riemann mapping theorem. We need this lemma to establish important inequalities. This lemma as well as its variants show the rigidity of holomorphic maps. We make use of the maximum modulus theorem. For simplicity, let \(H^\infty\) be the Banach space of bounded holomorphic functions on \(U\), equipped with supremum norm \(\| \cdot \|_\infty\).

Theorem 3 (Schwarz lemma).Suppose \(f:U \to \mathbb{C}\) is a holomorphic map in \(H^\infty\) such that \(f(0)=0\) and \(\|f\|_\infty \le 1\), then \[\begin{aligned}|f(z)| &\le |z| \quad (z \in U), \\|f'(0)| &\le 1;\end{aligned}\] on the other hand, if \(|f(z)|=|z|\) holds for some \(z \in U \setminus \{0\}\), or if \(|f'(0)|=1\) holds, then \(f(z)=\lambda{z}\) for some complex constant \(\lambda\) such that \(|\lambda|=1\).

*Proof.* Since \(f(0)=0\), \(f(z)/z\) has a removable singularity at \(z=0\). Hence there exists \(g \in H(U)\) such that \(f(z)=zg(z)\). Fix \(0<r<1\). For any \(z \in U\) such that \(|z|<r\), we have \[|g(z)| \le \max_\theta\frac{|f(re^{i\theta})|}{|re^{i\theta}|} \le \frac{1}{r}.\] Therefore when \(r \to 1\), we see \(|g(z)| \le 1\) for all \(z \in U\). Therefore \(|f(z)| \le |z|\) follows. On the other hand, if \(|g(z)|=1\) at some point, the maximum modulus forces \(g(z)\) to be a constant, say \(\lambda\), from which it follows that \(|\lambda|=|g(z)|=1\) and \(f(z)=\lambda{z}\). \(\square\)

There are many variances of the Schwarz lemma, and we will be using Schwarz-Pick.

Definition 4.For any \(\alpha \in U\), define \[\varphi_\alpha(z) = \frac{z-\alpha}{1-\overline\alpha z}.\]

This family is a subfamily of Möbius transformation, but we are not paying very much attention to this family right now. We need the fact that such \(\varphi_\alpha\) is always a one-to-one mapping which carries \(S^1\) (the unit circle) onto \(S^1\) and \(U\) onto \(U\) and \(\alpha\) to \(0\). This requires another application of the maximum modulus theorem. A direct computation shows that \[\varphi'_\alpha(0)=1-|\alpha|^2, \quad \varphi'_\alpha(\alpha)=\frac{1}{1-|\alpha|^2}.\]

Theorem 4 (Schwarz-Pick lemma).Suppose \(\alpha,\beta \in U\), \(f \in H^\infty\) and \(\| f\|_\infty \le 1\), \(f(\alpha)=\beta\). Then \[|f'(\alpha)| \le \frac{1-|\beta|^2}{1-|\alpha|^2}.\]

*Proof.* Consider \[g=\varphi_\beta \circ f \circ \varphi_{-\alpha}.\] We see \(g \in H^\infty\) and \(\|g\|_\infty \le 1\). What's more important, \(g(0)=\varphi_\beta \circ f(\alpha)=\varphi_\beta(\beta)=0\). By the Schwarz lemma, \(|g'(0)| \le 1\). On the other hand, we see \[g'(0)=\varphi_\beta'(\beta)f'(\alpha)\varphi_{-\alpha}'(0)\] and therefore \[|f'(\alpha)| \le \frac{1-|\beta|^2}{1-|\alpha|^2}.\] In particular, equality holds if and only if \(g(z)=\lambda{z}\) for some constant \(\lambda\). If this is the case, then \[\varphi_\beta \circ f \circ \varphi_{-\alpha}(z)=\lambda{z} \implies f(z)=\varphi_{-\beta}(\lambda\varphi_\alpha(z)).\] The story can go on but we halt here and continue our story of the Riemann mapping theorem.

Each \(z \ne 0\) determines a *direction* from the origin, which can be described by \[A[z]=\frac{z}{|z|}.\] Let \(f:\Omega \to \mathbb{C}\) be a map. We say \(f\) *preserves angles* at \(z_0 \in \Omega\) if \[\lim_{r \to 0}e^{-i\theta}A[f(z_0+re^{i\theta})-f(z_0)]\] exists and is independent of \(\theta\).

Conformal mappings preserves angles in a reasonable way. A function \(f\) is **conformal** if it is holomorphic and \(f'(z) \ne 0\) everywhere. We have a theorem describes that, but it is pretty elementary so we are not including the proof in this post.

Theorem 5.Let \(f\) map a region \(\Omega\) into the plane. If \(f'(z_0)\) exists at some \(z_0 \in \Omega\) and \(f'(z_0) \ne 0\), then \(f\) preserves angles at \(z_0\). Conversely, if the differential \(Df\) exists and is different from \(0\) at \(z_0\), and if \(f\) preserves angles at \(z_0\), then \(f'(z_0)\) exists and is different from \(0\).

There is no confusion about \(f'(z_0)\). By differential \(Df\) we mean a linear map \(L:\mathbb{R}^2 \to \mathbb{R}^2\) such that, writing \(z_0=(x_0,y_0)\), we have \[f(x_0+x,y_0+y)=f(x_0,y_0)+L(x,y)+{\sqrt{x^2+y^2}}\eta(x,y)\] where \(\eta(x,y) \to 0\) as \(x \to 0\) and \(y \to 0\). To prove this, one can assume that \(z_0=f(z_0)=0\). When the differential exists, one writes \[f(z)=\alpha{z}+\beta\overline{z}+|z|\eta(z).\] We say that two regions \(\Omega_1\) and \(\Omega_2\) are **conformally equivalent** if there is a conformal one-to-one mapping of \(\Omega_1\) onto \(\Omega_2\). The Riemann mapping theorem states that

Theorem 6 (Riemann mapping theorem).Every proper simply connected region \(\Omega\) in the plane is conformally equivalent to the open unit disc \(U\).

As a famous example, the upper plane \(\mathbb{H}\) is conformally equivalent to \(U\) by the Cayley transform.

As one may expect, this theorem asserts that the study of a simply connected region \(\Omega\) can be reduced to \(U\) to some extent. But a conformal equivalence is not just about homeomorphism. If \(\varphi:\Omega_1 \to \Omega_2\) is a conformal one-to-one mapping, then \(\varphi^{-1}:\Omega_2 \to \Omega_1\) is also a conformal mapping. In the language of algebra, such a mapping \(\varphi\) **induces** a ring isomorphism \[\begin{aligned}\varphi^\ast:H(\Omega_2) &\to H(\Omega_1) \\ f &\mapsto f \circ \varphi\end{aligned}\] Therefore, the ring \(H(\Omega_2)\) is algebraically the same as \(H(\Omega_1)\). The Riemann mapping theorem also states that, if \(\Omega\) is a simply connected region, then \(H(\Omega) \cong H(U)\). From this we can exploit much more information on top of homeomorphism. One can also extend the story to \(S^2\), the Riemann sphere, but that's another story.

The proof is fairly technical. But it is a good chance to attest to our skill in complex analysis. The bread and butter of this proof is the following set: \[\Sigma = \{\psi \in H(\Omega):\psi(\Omega) \subset U;\psi\text{ is one-to-one.}\}\] Our is to prove that there is some \(\psi \in \Sigma\) such that \(\psi(\Omega)=U\). Note, once the non-emptiness is proved, since \(|\psi|<1\) uniformly, we see \(\Sigma\) is a **normal family**.

Pick \(w_0 \in \mathbb{C} \setminus \Omega\). Then \(g(z)=z-w_0 \in H(\Omega)\) and what is more important, \(\frac{1}{g} \in H(\Omega)\). By 9 of proposition 1, there exists \(\varphi \in H(\Omega)\) such that \(\varphi^2(z)=g(z)\), i.e., informally, \(\varphi(z)=\sqrt{z-w_0}\) in \(\Omega\). If \(\varphi(z_1)=\varphi(z_2)\), then \(\varphi(z_1)^2=\varphi(z_2)^2=z_1-w_0=z_2-w_0\) and then \(z_1=z_2\). Therefore \(\varphi\) is one-to-one. On the other hand, if \(\varphi(z_1)=-\varphi(z_2)\), we still have \(\varphi^2(z_1)=\varphi^2(z_2)=z_1-w_0=z_2-w_0\), and \(z_1=z_2\). This is shows that the "square-root" is well-defined here. This is the Koebe's square root trick.

Since \(\varphi\) is an open mapping, there is an open disc \(D(a,r) \subset \varphi(\Omega)\), where \(a \in \varphi(\Omega)\), \(a \ne 0\) and \(0<r<|a|\). But by arguments above we have \(-a \not\in \varphi(\Omega)\), and therefore \(D(-a,r) \cap \varphi(\Omega) = \varnothing\). For this reason, we can put \[\psi(z) = \frac{r}{a+\varphi(z)}.\] It follows that \[|\psi(z)| = \frac{r}{|\varphi(z)-(-a)|}< \frac{r}{r}=1\] and therefore \(\psi(\Omega) \subset U\). Since \(\varphi\) is one-to-one, \(\psi\) is one-to-one as well and we deduce that \(\psi \in \Sigma\), this set is not empty.

**Remark.** You may have trouble believing that \(D(-a,r) \cap \varphi(\Omega)=\varnothing\). But if we pick any \(w \in D(-a,r) \cap \varphi(\Omega)\), we have some \(z' \in \Omega\) such that \(\varphi(z')=w\). We also have \(|-a-w|<r\) but this implies \(|a-(-w)|=|a+w|=|-a-w|<r\), and therefore \(-w \in D(a,r) \subset \varphi(\Omega)\). There exists some \(z'' \in \Omega\) such that \(\varphi(z'')=-w\). Hence \(-w=w=0\). It follows that \(|a|<r\) and this is a contradiction.

Since \(D(-a,r) \cap \varphi(\Omega)=\varnothing\), we have \(|\varphi(z)-(-a)|>r\) for all \(z \in \Omega\) and therefore \(|\psi(z)|<1\) is not a problem either.

If \(\psi \in \Sigma\) and \(\psi(\Omega) \subsetneqq U\), and \(z_0 \in \Omega\), then there exists a \(\psi_1 \in \Sigma\) such that \(|\psi_1'(z_0)|>|\psi'(z_0)|\).

This step shows that we can "enlarge" the range in some way.

For convenience we use the Möbius transformation \[\varphi_\alpha(z) = \frac{z-\alpha}{1-\overline{\alpha}z}.\] Pick \(\alpha \in U \setminus \psi(\Omega)\). Then \(\varphi_\alpha \circ \psi \in \Sigma\) and \(\varphi_\alpha \circ \psi\) has no zero in \(\Omega\). Hence there is some \(g \in H(\Omega)\) such that \[g^2=\varphi_\alpha \circ \psi.\] Since \(\varphi_\alpha \circ \psi\) is one-to-one, another application of Koebe's square root trick shows that \(g\) is one-to-one. Therefore we have \(g \in \Sigma\) as well. If \(\psi_1=\varphi_\beta \circ g\) where \(\beta=g(z_0)\), we have \(\psi_1 \in \Sigma\) (one-to-one). In particular, \(\psi_1(z_0)=0\).

By putting \(s(z)=z^2\), we have \[\begin{aligned}\psi(z)&=\varphi_{-\alpha} \circ g^2(z) \\ &= \varphi_{-\alpha} \circ s \circ g(z) \\ &= \varphi_{-\alpha} \circ s \circ \varphi_{-\beta} \circ \psi_1(z).\end{aligned}\] If we put \(F(z)=\varphi_{-\alpha} \circ s \circ \varphi_{-\beta}(z)\), then the chain rule shows that \[\psi'(z_0) = F'(0)\psi_1'(z_0).\] (Note we used the fact that \(\psi_1'(z_0)=0\).) If we can prove that \(0<|F'(0)|<1\) then this step is complete. Note \(F\) satisfy the condition in Schwarz-Pick lemma and therefore \[|F'(0)| \le \frac{1-|F(0)|^2}{1-0^2} \le 1.\] The first equality does not hold because \(F\) is not of the form \(\varphi_{-\sigma}(\lambda\varphi_{\eta}(z))\) for \(|\lambda|=1\). On the other hand we have \[\begin{aligned}F(0) &= \varphi_{-\alpha}(g(z_0)^2) \\ &= \varphi_{-\alpha}(\varphi_\alpha\circ \psi(z_0)) \\ &= \psi(z_0) \in U\end{aligned}\] Therefore \(0<|F'(0)|<1\) and the this step is complete.

We take the contraposition of step 2:

Fix \(z_0 \in \Omega\). If \(h \in \Sigma\) is an element such that \(|h'(z_0)| \ge |\psi'(z_0)|\) for all \(\psi \in \Sigma\), then \(h(\Omega)=U\).

The proof is complete once we have found such a function! To do this, we use the fact that \(\Sigma\) is a normal family. Put \[\eta = \sup\{|\psi'(z_0)|:\psi \in \Sigma\}.\] By definition of \(\eta\), there is a sequence \(\{\psi_n\}\) such that \(|\psi_n'(z_0)| \to \eta\) in \(\Sigma\). By normality of \(\Sigma\), we pick a subsequence \(\varphi_k=\psi_{n_k}\) that converges uniformly on compact subsets of \(\Omega\). Put the uniform limit to be \(h \in H(\Omega)\). It follows that \(|h'(z_0)|=\eta\). Since \(\Sigma \ne \varnothing\) and \(\eta \ne 0\), \(h\) cannot be a constant. Since \(\varphi_n(\Omega) \subset U\), we must have \(h(\Omega) \subset \overline{U}\). But since \(h\) is open, we are reduced to \(h(\Omega) \subset U\).

It remains to show that \(h\) is one-to-one. Fix distinct \(z_1, z_2 \in \Omega\). Put \(\alpha=h(z_1)\) and \(\alpha_n=\varphi_n(z_1)\), then \(\alpha_n \to \alpha\). Let \(\overline{D}\) be a closed disc in \(\Omega\) centred at \(z_2\) with interior denoted by \(D\) such that

- \(z_1 \not\in \overline{D}\).
- \(h-\alpha\) has no zero point on the boundary of \(\overline{D}\).

We see \(\varphi_n -\alpha_n\) converges to \(h-\alpha\), uniformly on \(\overline{D}\). They have no zero in \(D\) since they are one-to-one and have a zero at \(z_1\). By Rouché's theorem, \(h-\alpha\) has no zero in \(D\) either, and in particular \(h(z_2)-\alpha = h(z_2)-h(z_1) \ne 0\). This completes the proof. \(\square\)

**Remark.** First of all, such a \(\overline{D}\) is accessible. This is because zero points of \(h-\alpha\) has no limit point in \(\Omega\), i.e., they are discrete (when defining \(\overline{D}\), we don't know how many are there yet).

Our choice of \(\overline{D}\) enables us to use Rouché's theorem (chances are you didn't get it). Since \(h-\alpha\) has no zero on the boundary, we have \(\zeta=\inf_{z \in \partial D}|h(z)-\alpha|>0\). When \(n\) is big enough, we see \[|(h-\alpha)-(\varphi_n-\alpha_n)|<\zeta<|h-\alpha|.\] The second inequality is another application of the maximum modulus theorem. Rouché's theorem applies here naturally as well. \(\square\)

This proof is a reproduction of W. Rudin's *Real and Complex Analysis*. For a comprehensive further reading, I highly recommend Tao's blog post.

In the previous post we are convinced that the Galois group of a separable irreducible polynomial \(f\) can be realised as a subgroup of the symmetric group, the elements of which permute the roots of \(f\). We worked on cubic polynomials over a field with characteristic not equal to \(2\) and \(3\), and this definitely works with \(\mathbb{Q}\). In this post we go one step further.

Let \(f \in \mathbb{Q}[X]\) be an irreducible polynomial of prime degree \(p\). Since it is also separable (see lemma 9.12.1 on the stack project), we can safely work on its Galois group \(G\). One immediately wants to question the position of \(\mathfrak{S}_p\). Indeed we have \(G \subset \mathfrak{S}_p\). The question is, when does the equality hold? It is not likely to have an immediate answer. However, we have some interesting sufficient conditions, which will be discussed in this post.

We present some handy results in finite group theory that will be used in the main result. One may skip this section until needed. I will collapse the proof in case one wants to treat it as an exercise.

Lemma 1.Let \(p\) be a prime number. The symmetric group \(\mathfrak{S}_p\) is generated by \([12 \cdots p]\) and an arbitrary transposition \([rs]\).

- It is generated by cycles. This is a really, really routine verification and sometimes assumed as a fact.
- It is generated by transpositions, i.e., \(2\)-cycles. It suffices to show that a cycle is a product of transpositions. Indeed, for any cycle \([i_1\dots i_k]\) in \(\mathfrak{S}_n\), we have \([i_1\cdots i_k]=[i_1i_2][i_2i_3]\cdots[i_{k-1}i_k]\). This proves our statement.
- It is generated by translations of the form \([1k]\). It suffices to show that a transposition is generated as such. For any transposition \([rs]\), we have \([rs]=[1r][1s][1r]\).
- It is generated by adjacent translations, i.e. the generators can be of the form \([k-1 ,k]\). This follows from the following identity:

\[[1k]=[12][23]\cdots[k-1,k][k-2,k-1]\cdots [23][12]\]

- It is generated by two elements: \(\sigma=[12]\) and \(\tau=[12\cdots n]\). This follows from the following identity:

\[\tau^{k-2}\sigma\tau^{-(k-2)}=[\tau^{k-2}(1)\tau^{k-2}(2)]=[k-1,k].\]

Now, back to the case when \(n=p\) is prime. Put \(\sigma=[rs]\) and \(\tau=[12\cdots p]\). If \(s-r=1\) then it is already proved in 5 by several conjugations. Therefore we may assume that \(d=s-r>1\). From now on integers may be a number in either \(\mathbb{Z}\) or \(\mathbf{F}_p=\mathbb{Z}/p\mathbb{Z}\), depending on the context. Recall that \(\mathbf{F}_p\) is a field. Pick the integer \(w\) such that \(dw=1\) in \(\mathbf{F}_p\). By conjugation we see \(\tau\) and \(\sigma\) generate \[[1,1+d],[1+d,1+2d],\dots,[1+(w-1)d,1+wd].\] The product of elements above is \([1,1+wd]=[12]\). Therefore we are still back to 5. \(\square\)We have many good reasons to study the Galois group of *something*. It would be great if the group can be written down explicitly. In this section we show that the group can be revealed by the number of nonreal roots.

Proposition 1.Let \(f(X) \in \mathbb{Q}[X]\) be an irreducible polynomial of prime degree. If \(f\) has precisely two nonreal roots, then the Galois group \(G\) over \(\mathbb{Q}\) is \(\mathfrak{S}_p\).

*Proof.* Let \(L\) be the splitting field of \(f\). It suffices to show that \(G\) contains a transposition and a \(p\)-cycle, which is \([12\cdots p]\). By the Sylow's theorem, \(G\) has a subgroup \(H\) of order \(p\), which can only be cyclic. Say \(H=\langle \sigma \rangle\). Suppose \(\sigma\) is of cycle type \((k_1,\dots,k_r)\). Then the period of \(\sigma\), which equals \(p\), is the least common multiple of \(k_1,\dots,k_r\), where \(k_1+\dots+k_r=p\). This can only happen when \(r=1\) and \(k_1=p\). Therefore \(\sigma\) is a \(p\)-cycle.

In fact, \(\sigma\) can be considered as \([12\dots p]\). Suppose the order of roots of \(f\) is given, for which we have \(\sigma=[i_1 i_2 \dots i_p]\). Then If we re-order these roots, by putting the \(k\)th root to be the original \(i_k\)th root, then we can write \(\sigma=[12\dots p]\). (This re-ordering is, in fact, a conjugation.)

It remains to prove that \(G\) contains a transposition. Let \(\alpha\) and \(\beta\) be two nonreal roots of \(f\). Since \(\overline{\alpha}\) is also a root of \(f\) (because coefficients of \(f\) are real; if \(\sum_{n=0}^{p}a_n\alpha^n=0\), then \(\sum_{n=0}^{p}a_n\overline{\alpha}^n=\sum_{n=0}^{p}\overline{a_n\alpha^n}=\overline{0}=0\)) we see \(\beta=\overline{\alpha}\). Therefore complex conjugation over \(\mathbb{Q}(\alpha)\) extends to \(L\) as an element of order \(2\), which is a transposition in \(G\). This proves our assertion. \(\square\)

For example, consider the polynomial \[f(X)=X^5-4X+2.\] With calculus one can show that it has exactly three roots, hence it has two nonreal roots. Eisenstein's criterion shows that \(f\) is irreducible. Therefore we are allowed to use proposition 1. The Galois group of \(f\) is \(\mathfrak{S}_5\).

This also works fine when \(p=2\) or \(3\). The case when \(p=2\) is nothing but working around a quadratic polynomial. When \(f(X)\) is irreducible of degree \(3\), and it has two nonreal roots, we also know that it has an irrational root. Let the roots be \(a+bi,a-bi,c\) where \(b \ne 0\) and \(c\) is irrational. We see \[\sqrt\Delta=2bi(c-a-bi)(c-a+bi)=2bi[(c-a)^2+b^2] \not \in \mathbb{Q}.\] Therefore the Galois group is \(\mathfrak{S}_3\).

It is way too ambitious to restrict ourselves in one single pair of roots. Also, it seems we have ignored the alternating group \(\mathfrak{A}_p\) for no reason. Oz Ben-Shimol gave us a nice way to work around this (see arXiv:0709.2868). The whole paper is not easy but the result is pretty beautiful and generalised what we said above as \(p \ge 5\).

Proposition 2.Let \(f \in \mathbb{Q}[X]\) be an irreducible polynomial of prime degree \(p \ge 5\). Suppose that \(f\) has \(k>0\) pairs of nonreal roots. If \(p \ge 4k+1\), then the Galois group \(G\) is isomorphic to \(\mathfrak{A}_p\) or \(\mathfrak{S}_p\). If \(k\) is odd then \(G \cong \mathfrak{S}_p\).

The proof is done by showing that \(\mathfrak{A}_p \subset G \subset \mathfrak{S}_p\). As the index of \(\mathfrak{A}_p\) is \(2\), \(G\) can only be one of them. The solvability of \(G\) is also concerned here.

Indeed, what we have proved in "the simplest case" is nothing but \(k=1\). When \(p \ge 5\) we clearly have \(p \ge 1+4 \times 1\). This refined the result of A. Bialostocki and T. Shaska (see arXiv:math/0601397), and the inequality used to be \[p \ge k(k\log k+2\log k+3).\] When \(k\) is big enough, we have \(k(k\log{k}+2\log{k}+3) \ge 4k+1\). Oz Ben-Shimol's result is a refinement because it is saying, \(p\) does not need to that big. He also offered a refined algorithm to compute the Galois group, which we will present below. Also, computing \(4k+1\) is much easier than computing \(k^2\log{k}\) plus something.

1 | Input: An irreducible polynomial f(X) over Q with prime degree p >= 5 |

Here, \(\Delta(f)\) is the discriminant of \(f\). We have seen that whether \(\Delta\) is a perfect square matters a lot. The discussion of `ReductionMethod`

can be trailed in Oz Ben-Shimol's paper.

Let \(k\) be an arbitrary field and suppose \(f(X) \in k[X]\) is separable and, i.e., \(f\) has no multiple roots in an algebraic closure, and of degree \(\ge 1\). Let \[f(X)=(X-x_1)\cdots(X-x_n)\] be its factorisation in a splitting field \(F\). Put \(G=G(L/k)\). We say that \(G\) is the Galois group of \(f\) over \(k\). Let \(x_i\) be a root of \(f\) and pick any \(\sigma \in G\). By definition of Galois group, we see \(\sigma(x_i)\) is still a root of \(f\) (consider the map \(\tilde\sigma:L[X] \to L[X]\) induced by \(\sigma\) naturally; it is the identity when restricted to \(k[X]\)). This is to say, elements of \(G\) permutes the roots of \(f\).

For example, consider \(L=\mathbb{C}\), \(k=\mathbb{R}\), \(f(X)=X^2+1\). The Galois group \(G\) contains two elements and is generated by complex conjugation \(\sigma:a+bi \mapsto a-bi\). A root of \(f\) is \(i\), and \(\sigma(i)=-i\) is another root.

Based on this fact, we can consider \(G\) as a subgroup of \(\mathfrak{S}_n\), where \(n\) is the degree of \(f\). The structure of \(\mathfrak{S}_n\) can be extremely complicated, but for now we assume that they are well-known. The question is, what subgroup is \(G\) inside \(\mathfrak{S}_n\). Let's take a look into the case when \(n=3\).

To begin with we note that we can assume that the quadratic term is \(0\). Let \(f(X)=X^3+aX^2+bX+c\) be a polynomial, then \[\begin{aligned}f\left(X-\frac{a}{3}\right) &= \left( X-\frac{a}{3}\right)^3+a\left( X-\frac{a}{3}\right)^2+b\left( X-\frac{a}{3}\right)+c \\&= X^3-aX^2+\frac{a^2}{3}X-\frac{a^3}{27} + aX^2-\frac{6a^3}{3}X+\cdots\end{aligned}\] and as a result \(aX^2\) is cancelled. A translation does not change any property of a polynomial except the value of its roots. Therefore we can reduce our study to polynomials in the depressed form \[f(X)=X^3+aX+b.\] In fact, for all \(g(X)=X^n+a_{n-1}X^{n-1}+\dots+a_0\), we can cancel out \(a_{n-1}X^{n-1}\) by a substitution \(Y=X-\frac{a_{n-1}}{n}\).

Now back to our main story. First of all we study irreducibility. If \(f\) is irreducible, then clearly it has no root in \(K\). On the other hand, if \(f\) has no root in \(K\), does that mean \(f\) is irreducible over \(K\)? This does not hold in general for all polynomials. For example, the polynomial \(g(X)=(X^2+1)^2\) is not irreducible yet it has no root in \(\mathbb{R}\) or \(\mathbb{Q}\). But fortunately, \(3\) is a beautiful number and we can proceed. Were \(f\) irreducible, there would be a factorisation \[f(X)=p_1(X)p_2(X)\] with each \(p_i(X)\) being a proper factor of \(f(X)\). However, this is to say, at least one of \(p_i(X)\) has degree \(1\). A contradiction. We therefore have a result as follows:

Proposition 1.Let \(f(X)\) be a cubic polynomial in \(K[X]\) where \(\operatorname{char}K=0,5,7,\dots\), then \(f\) is irreducible over \(K\) if and only if \(f\) has no root in \(K\).

Notation being above, we assume that \(f\) is irreducible. Let \(L\) be the splitting field of \(f\). We claim that \(f\) is separable. Before proving the claim, one should notice that the characteristic matters a lot. For example, \(X^3-2\) is irreducible over \(\mathbb{Q}\) but \(X^3-2=(X+1)^3\) over \(\mathbf{F}_3[X]\) and we therefore have a triple root.

\(f\) is separable if and only if \(\gcd(f,f')=0\). The derivative of \(f\), which should be simplified because \(f\) has been, is given by \[f'(X)=3X^2+a.\] It is not equal to \(a\) because the characteristic of \(K\) is not \(3\). We will show carefully that \(f(X)\) is separable by working on these two polynomials.

The first question is the value of \(a\) and \(b\). If some of them is \(0\) then things may be easier or harder. Note first we must have \(b \ne 0\) because if not then \(f(X)=X(X^2+a)\) and this is not irreducible. If \(a=0\), then \(f(X)=X^3+b\) and \(f'(X)=3X^2 \ne 0\) because \(\operatorname{char}K \ne 3\). It follows that \((f,f')=0\) because either \(X\) or \(X^2\) divides \(X^3+b\).

Now there only remains the most general case: \(a \ne 0\) and \(b \ne 0\). This is where the Euclidean Algorithm kicks in. Recall that for any three polynomials \(p,q,r\) in \(K[X]\), we have \[\gcd(p,q)=\gcd(q,p)=\gcd(q,p+rq).\] This is how the Euclidean Algorithm works. Note we can write \[f(X)=\frac{1}{3}Xf'(X)+\underbrace{\frac{2}{3}aX+b}_{r_0(X)}.\] It follows that \(\gcd(f,f')=\gcd(f',r_0)\). We next work on \(f'\) and \(r_0\). \[f'(X)=\frac{9}{2a}X\left(\frac{2}{3}aX+b\right)+\underbrace{\left(-\frac{9b}{2a}X+a\right)}_{r_1(X)}.\] However, \(r_0(X)\) and \(r_1(X)\) has common divisor \(0\), which implies that \(f\) and \(f'\) has common divisor \(0\). Whichever the case is, we have \(\gcd(f,f')=0\) and therefore \(f\) is separable. Note the fact that the characteristic of \(K\) is not \(2\) or \(3\) is frequently used here, otherwise there are a lot of equations making no sense.

Where we are at? We want to ensure that \(f\) is separable so that working with the Galois group of \(f\) is not that troublesome. And \(f\) is. We now back to the study of the Galois group \(G=G(L/K)\), where \(L\) is the splitting field of \(f\). Let \(\alpha_1\), \(\alpha_2\), \(\alpha_3\) be the roots of \(f\) and pick one of them as \(\alpha\). We see \([K(\alpha):K]=3\).

Since \(G\) permutes three elements, \(G\) has to be a subgroup of \(\mathfrak{S}_3\). Therefore \(|G|=[L:K] \ge [K(\alpha):K]=3\), which implies that \(|G|=3\) or \(6\). In the first case, \(G=\mathfrak{A}_3\), the alternating group. In the second case, \(G=\mathfrak{S}_3\) and \(K(\alpha)\) is not normal over \(K\) because, there is an irreducible polynomial \(f(X) \in K[X]\) which has a root in \(K(\alpha)\) that does not split into linear factors in \(K(\alpha)\). This is the definition of normal extension.

The question now is, when \(G\) is \(\mathfrak{S}_3\) and when it is \(\mathfrak{A}_3\)? We get a good chance to review finite group theory. This is answered by the sign of elements in \(G\). To be precise, \(G=\mathfrak{S}_3\) if and only if \(G\) has an odd element. If not then \(G=\mathfrak{A}_3\). To work with this, we recall how the sign function work. Put \[\delta=(\alpha_1-\alpha_2)(\alpha_2-\alpha_3)(\alpha_3-\alpha_1).\] For any \(\sigma \in G\), we have \(\sigma(\delta)=\varepsilon(\sigma)\delta\), where \(\varepsilon(\sigma)\) is the sign of \(\sigma\). If we put \(\Delta=\delta^2\), which is the discriminant, we see \(\sigma(\Delta)=\Delta\). Therefore \(\Delta \in L^G=K\). But wait, since \(\sigma(\delta)=\pm\delta\), the sign is not guaranteed, we see \(\delta\) is not guaranteed to be in \(K\). This is where we crack the problem.

If \(\delta \in K\), or more precisely, \(\sqrt\Delta \in K\), then \(\sigma(\delta)=\delta\) and it follows that \(\varepsilon(\sigma)=1\) for all \(\sigma \in G\). This can only happen if \(G=\mathfrak{A}_3\).

If \(\sqrt\Delta \not\in K\), then \(\delta\) is not fixed by \(G\). There is some \(\sigma \in G\) such that \(\sigma(\delta)=-\delta\), which is to say that \(\varepsilon(\sigma)=-1\). This can only happen when \(G=\mathfrak{S}_3\).

We have the following conclusion.

Proposition 2.Notation being above. Assume that \(f\) is irreducible. Then the Galois group of \(f\) is \(\mathfrak{S}_3\) if and only if \(\sqrt\delta \not\in K\). The group is \(\mathfrak{A}_3\) if and only if \(\sqrt\Delta \in K\).

A dirty calculation shows that \(\Delta=-4a^3-27b^2\). One can show this using Vieta's formulas. You shan't feel this to be strange because in the quadratic case we have \(\Delta=b^2-4ac\) and we did care if \(\Delta>0\), which amounts to whether \(\sqrt\Delta \in \mathbb{R}\).

Let's conclude this post by a handy but nontrivial example. Consider \[f(X)=X^3-X-1\] The discriminant is \(-4 \cdot(-1)^3-27 \cdot (-1)^2=-23\), which lies in \(\mathbb{Q}(\sqrt{-23})\) and therefore the Galois group over it is \(\mathfrak{A}_3\). However, when the base field is a subfield, for example, \(\mathbb{Q}\), then the Galois group is \(\mathfrak{S}_3\).

]]>The method is presented by Artin: we will be actively using theories of the Sylow group theory. Recall that for a finite group \(G\), if \(p\) is a prime dividing \(|G|\), then there is a \(p\)-Sylow subgroup. We are not caring about *other* \(p\)-Sylow groups here. However, one needs to also recall that a \(p\)-group \(H\) is always solvable. If \(|H|>1\), then \(H\) admits nontrivial centre. If \(|H|=p^n\), then there is a sequence of subgroups \[\{e\}=H_0 \subset H_1 \subset \cdots \subset H_n=H\] where \(H_{i}\) is normal in \(H\) for all \(i=0,\dots,n\) and \(H_{i+1}/H_i\) is cyclic of order \(p\). This is to say \(|H_i|=p^i\).

On the other hand, we also make use of analysis (which is Gauss's idea). For every \(a>0\), there is a square root \(\sqrt{a}>0\). In other word, we have a positive root of the equation \(X^2-a=0\). On the other hand, every polynomial \(f(X) \in \mathbb{R}[X]\) of odd degree has a root in \(\mathbb{R}\). This is to say, such \(f(X)\) is *not* irreducible over \(\mathbb{R}\) unless \(\deg f=1\).

Next we take a look at \(\mathbb{C}=\mathbb{R}(i)\), where, \(i\) is the imaginary unit, or, algebraically speaking, a root of \(g(X)=X^2+1\). Note, every \(z \in \mathbb{C}\) has a root. If we write \(z=a+bi\), then \[c=\sqrt{\frac{|z|+a}{2}}, \quad d = \frac{b}{|b|}\sqrt\frac{|z|-a}{2}\] gives rise to \((c+di)^2=a+bi\). It follows that all polynomials \(f(X) \in \mathbb{C}[X]\) of order \(2\) has a root (if this is not very obvious, use a change-of-variable), hence *not* irreducible. With this being said, \(\mathbb{C}\) does not have an extension of order \(2\). Say, if \([E:\mathbb{C}]=2\), then \(E=\mathbb{C}[X]/(p(X))\) and \(p(X)\) is irreducible. But It can only be of order \(2\), which is absurd already.

We also need a part of the following lemma on field extension. In brief, finite separable extension induces a *minimal* Galois extension.

Lemma.Let \(E/F\) be a finite separable extension. Then \(E\) is contained in an extension \(K\) such that \(K/F\) is Galois. It is minimal in the sense that, in a fixed algebraic closure \(K^\mathrm{a}\) of \(K\), any other Galois extension \(L\) of \(F\) containing \(E\) must contain \(K\) as well. We have the following tower: \[F \subset E \subset K \subset L \subset K^\mathrm{a}.\]

*Proof.* First of all, we can find a finite Galois extension of \(F\) containing \(E\). For example, the composite of the splitting fields of the minimal polynomials for a basis for \(E\) as a \(F\)-vector space. The intersection of all Galois extensions is exactly what we want. \(\square\)

The complex field \(\mathbb{C}\) is algebraically closed.

The following proof focuses on algebra and tries its best to avoid analysis. If you are a fan of analysis, you can dive into complex analysis and use the maximum modulus theorem to study a polynomial. Or, you can study the behaviour of \(\frac{1}{f(z)}\) where \(f\) is a polynomial. If \(f\) has no root, then perhaps it can only be a constant.

*Proof.* Let's firstly make it a problem of Galois theory. Since \(\mathbb{R} \supset \mathbb{Q}\), it is of characteristic \(0\) (hence perfect) and every finite extension is separable. Hence, in particular, \(\mathbb{C}/\mathbb{R}\) is finite and separable. Let \(L/\mathbb{C}\) be a finite extension. Then \(L/\mathbb{R}\) is still a finite and separable extension, since both the class of finite extensions and the class of separable extensions are distinguished.

Applying the lemma above, we can find a finite and Galois extension \(K/\mathbb{R}\). We need to prove that \(K=\mathbb{C}\).

Put \(G=G(K/\mathbb{R})\). We want to show that \(|G|=2\) hence \([K:\mathbb{R}]=[K:\mathbb{C}][\mathbb{C}:\mathbb{R}]=2\) and our result follows immediately. To do this, we first show that \(|G|\) is even. Let \(H \subset G\) be a \(2\)-Sylow subgroup of \(G\) and we can say \(|H|=2^n\), \(|G|=2^nm\) and \(m\) is even. Now we use the Galois correspondence. Put \(F=K^H\). We see \(K/F\) is Galois and \([K:F]=2^n\). It follows that \([F:\mathbb{R}]=m\). We claim that \(m=1\).

Indeed, applying the lemma again, we see \(F/\mathbb{R}\) is separable. Hence we may apply the primitive element theorem to obtain \(F=\mathbb{R}(\alpha)\). \(\alpha\) is the root of an irreducible polynomial in \(\mathbb{R}[X]\) of degree \(m\). But \(m\) is odd, we must have \(m=1\).

Therefore \(G=H\) is a \(2\)-group. Since Galois extension remains normal under lifting, we see \(K/\mathbb{C}\) is Galois. Let \(G_1=G(K/\mathbb{C}) \subset G\) be the Galois group. We next claim that \(G_1\) is trivial. If not, then, being a \(2\)-group, it has a subgroup \(G_2\) of index \(2\). Put \(F'=K^{G_2}\), then we see \([K^{G_2}:\mathbb{C}]=G_1/G_2 \cong\mathbb{Z}/2\mathbb{Z}\). However, as mentioned above, \(\mathbb{C}\) has no extension of order \(2\). This contradiction implies that \(G_1\) is trivial and therefore \(K=\mathbb{C}\). \(\square\)

Why we have to prove that \(K=\mathbb{C}\)? If you didn't get it, let me remind you that a Galois extension is, by definition, an **algebraic** extension which is normal and separable.

Let \(G\) be a finite group and \(R\) be a commutative ring. The *algebra* of \(G\) over \(R\) is denoted by \(R[G]\), which firstly is an algebra over \(R\). The basis of \(R[G]\) is given by \(e_s\) where \(s \in G\). The product rule on \(R[G]\) is made of

\[e_s e_t = e_{st},\quad \forall s,t \in G.\]

With this being said, given \(u=\sum_{s \in G}a_se_s\) and \(v=\sum_{t \in G}b_te_t\), we have

\[uv = \sum_{s \in G}\sum_{t \in G}a_sb_te_{st}.\]

For example, take \(G=C_3=\{1,x,x^2\}\), the cyclic group of three elements. If \(u=a_1e_1+a_xe_x\) and \(v=b_xe_x+b_{x^2}e_{x^2}\), then

\[uv = a_xb_{x^2}e_1+a_1b_xe_x+(a_1b_{x^2}+a_xb_x)e_{x^2}.\]

As one will notice, the structure of this algebra should be determined by both \(G\) and \(R\) although we don't know what would happen at this moment. If we take \(R=\mathbb{C}\), then everything is very *simple*. A lot of things in elementary linear algebra can be recovered here. And that is part of the mission of this blog post. Before we dive in we need to look into group algebra in a general setting first. It is not often to see group algebra and representation theory to be treated together but let's try it. While the majority of this post is (non-commutative) ring theory and module theory, we encourage the reader to try to use representation theory as examples. Standalone examples may drive us too far and we may not have enough space for them.

First of all, we list some very obvious facts that do not even need proof.

\(R[G]\) is a free \(R\)-module with dimension \(|G|\).

\(R[G]\) is itself a ring. The commutativity of \(R[G]\) is determined by \(G\).

However, as one may ignore,

Proposition 1.\(R[G]\) isnota division ring.

*Proof.* Pick \(g \in G\) that is not the identity. Then \(e_1-e_g\) is a zero-divisor because if we take \(m=|G|\), then \[(e_1-e_g)(e_1+e_g+\cdots+e_{g^{m-1}})=e_1-e_1=0.\]

But in a division ring, there is no zero-divisor. \(\square\)

As a ring, we certainly can consider modules over \(R[G]\), which brings us the following section.

Let \(R\) be a ring (not assumed to be commutative here). An \(R\)-module \(E\) is called **simple** it has no nontrivial submodule. This may remind you of irreducible or simple representations of a group. We will see the connection later. Following the definition, we immediately have a special version of Schur's lemma:

Proposition 2 (Schur's Lemma).Let \(E,F\) be two simple \(R\)-modules. Every nontrivial homomorphism \(f:E \to F\) is an isomorphism.

*Proof.* Note \(\ker{f}\) and \(f(E)\) are submodules of \(E\) and \(F\) respectively. Since \(f\) is nontrivial and \(E,F\) are simple, we have \(\ker{f}=0\) and \(f(E)=F\), which is to say that \(f\) is an isomorphism. \(\square\)

Corollary 1.If \(E\) is a simple \(R\)-module, then \(\operatorname{End}_R(E)\) is a division ring.

*Proof.* If \(f:E \to E\) is nontrivial, then according to Schur's lemma, it has an inverse. \(\square\)

This definitely reminds you of irreducible representations. But irreducible representations are not always the case, so are simple modules. Recall the Maschke's theorem in representation theory: *Every representation of a finite group over \(\mathbb{C}\) having positive dimension is completely reducible.* For modules, we have a similar statement.

Definition-Proposition 3.Let \(E\) be an \(R\)-module. Then the following three conditions are equivalent:

SS 1.\(E\) is a sum of simple \(R\)-modules.

SS 2.\(E\) is a direct sum of simple \(R\)-modules.

SS 3.For every submodule \(E'\) of \(E\), there is another submodule \(F\) such that \(E = E' \oplus F\), i.e. every submodule is a direct summand.If \(E\) satisfies the three conditions above, then \(E\) is called

semisimple. A ring \(R\) is semisimple if it is a semisimple module over itself.

*Proof.* Assume **SS 1**, say we have \(E=\sum_{i \in I}E_i\). Let \(J\) be the maximal subset of \(I\) such that \(E_0=\sum_{j \in J}E_j\) is a direct sum (this \(J\) exists by Zorn's lemma). Pick any \(i \in I\). Then \(E_i \cap E_0\) is a submodule of \(E_i\), which can either be \(0\) or \(E_i\). If \(E_i \cap E_0 = E_i\) then \(E_i \subset E_0\). If the intersection is \(0\) however, \(E_0 +E_i\) is direct, which is to say that \(J \cup\{i\} \supsetneq J\) is the subset of \(I\) yielding a direct sum. A contradiction. Hence \(E_i \subset E_0\) holds for all \(i \in I\), i.e. \(E_0 = E\).

Next we assume **SS 2** and we have \(E = \bigoplus_{i \in I}E_i\). Pick any submodule \(E' \subset E\). Let \(J\) be the maximal subset of \(I\) such that \(E_0=E'+\bigoplus_{j \in J}E_j\) is direct. In the same manner we see \(E_i \cap E_0=E_i\) for all \(i \in I\), which proves **SS 3**.

Finally we assume **SS 3**. Let \(E_0=\sum_{i \in I}E_i\) be the sum of all simple modules of \(E\). Then there is a submodule \(F\) of \(E\) such that \(E=E_0 \oplus F\). Assume \(F \ne 0\), then \(F\) has a simple submodule, which contradicts the definition of \(E_0\). Hence \(F=0\) and \(E_0=E\). The reason why nontrivial \(F\) must have a simple submodule is contained in the following lemma. \(\square\)

Lemma 4.Let \(E\) be an \(R\)-module satisfyingSS 3, then every nontrivial submodule \(F\) has a simple submodule.

*Proof.* It suffices to show that every nontrivial principal module has a simple submodule. Indeed, for any \(F \ne 0\), we pick a nonzero \(v \in F\), then \(Rv \subset F\).

Let \(L\) be the kernel of the morphism

\[\begin{aligned}R &\to Rv \\a &\mapsto av.\end{aligned}\]

Then \(L\) is a left ideal, which is contained in a maximal ideal \(M\) of \(R\). It follows that \(Mv\) is a maximal submodule of \(Rv\) because \(M/L\) is a maximal ideal of \(R/L\) and the following isomorphism

\[R/L \cong Rv.\]

By **SS 3**, we can find a submodule \(M'\) such that

\[E = Mv \oplus M'\]

which gives

\[Rv = E \cap Rv = (Mv \cap Rv) \oplus (M' \cap Rv)=Mv \oplus (M' \cap Rv).\]

We claim that \(M' \cap Rv\) is maximal. Pick any proper submodule \(E' \subset M' \cap Rv\), then \(Mv \oplus E'\) is a submodule of \(Rv\), which has to be \(Mv\), i.e. \(E'=0\) because of the maximality of \(Mv\). This proves our statement. \(\square\)

Proposition 5.Let \(E\) be a semisimple \(R\)-module, then every nontrivial submodule and quotient module of \(E\) is semisimple.

*Proof.* Pick nontrivial submodule \(F\) of \(E\). Let \(J\) be the maximal subset of \(I\) such that

\[F + \bigoplus_{j \in J}E_j\]

is direct. Then the direct sum is actually \(E\). Therefore \(F=\bigoplus_{k \in K}E_k\) where \(K = I \setminus J\). In particular, since \((F \oplus F')/F \cong F'\), a quotient module of \(E\) is semisimple. \(\square\)

Corollary 6.\(R\) is a semisimple ring if and only if every \(R\)-module is semisimple.

*Proof.* By the universal property of free modules, every \(R\)-module is a factor module of a free \(R\)-module, while a free \(R\)-module is a direct sum of some copies of \(R\). Hence if \(R\) is semisimple then every \(R\)-module is semisimple. Conversely, if every \(R\)-module is semisimple, then \(R\) is semisimple because it is a left module over itself. \(\square\)

Let \(R\) be a ring. We say it is a finite dimensional algebra if it is also a vector space over some field \(K\) of finite dimension. We study the Jacobson radical \(J(R)=\bigcup\{\text{left maximal ideals of }R\}\) in this subsection, which will be used in next section.

We summarise what we want to prove in the following proposition.

Proposition 7 (Jacobson Radical).Let \(R\) be a ring (not necessarily commutative) and \(J(R)\) be the Jacobson radical of \(R\), then

\(J(R)\) is a two-sided ideal containing all nilpotent elements.

For every simple \(R\)-module \(E\) we have \(J(R)E=0\). More precisely, \(J(R)=\{a \in R:aE=0\text{ for all simple \(R\)-modulle \(E\)}\}\)

Suppose \(R\) is a finite dimensional algebra (or more generally, \(R\) is Artinian), then \(R/J(R)\) is semisimple, and if \(I\) is a two-sided ideal such that \(R/I\) is semisimple, then \(J(R) \subset I\). It follows that \(R\) is semisimple if and only if \(J(R)\) is trivial.

Assumption being above, \(J(R)\) is nilpotent.

*Proof.* We first prove 2. Pick any \(a \in R\) such that \(a\) annihilate all simple \(R\)-module. For any maximal left ideal \(M\), \(R/M\) is simple. Therefore \(a(R/M)=0\), which implies that \(a \in M\). Therefore \(a \in J(R)\).

Conversely, suppose \(J(R)E \ne 0\) for some simple \(E\). Since \(J(R)E\) is a submodule of \(E\) and \(E\) is simple, we have \(J(R)E=E\). More precisely, there exists some \(x \in E\) such that \(J(R)x=E\). Therefore there exists \(a \in J(R)\) such that \(ax=x\). \(a-1\) is in the annihilator \(\operatorname{Ann}(x)\), which is contained in a maximal ideal \(M\) (does not equal \(R\)). But we also have \(J(R) \subset M\). Therefore \(a \in M\) and \(a-1 \in M\), which implies that \(1 \in M\) and this is absurd. Hence 2 is proved.

Next we prove 1. By definition \(J(R)\) is a left ideal. Now pick any \(a \in J(R)\) and \(b \in R\). It follows that \(abE=0\) for all simple \(E\). Indeed, if \(bE \ne 0\), then \(bE=E\) and therefore \(abE=aE=0\). If \(a\) is nilpotent and \(E\) is simple, then \(aE=0\). If not, say \(aE=E\) and \(a^n=0\), then \(0=a^nE=a(a^{n-1}E)=aE=E\). A contradiction. Therefore 1 is proved as well.

To prove 3, we first note that \(R\) is Artinian: every descending chain of left ideals \(J_1 \supsetneq J_2 \supsetneq \cdots\) must stop. This is determined by the dimension of \(R\). It follows that \(J(R)\) is the intersection of finitely many maximal ideals, for the descending chain

\[M_1 \supset M_1 \cap M_2 \supset M_1 \cap M_2 \cap M_3 \supset \cdots\supset J(R)\]

must be finite. Therefore we can write \(J(R)=\bigcap_{i=1}^{n}M_n\) for some maximal ideals of \(R\). Now consider the map

\[\begin{aligned}\phi:R/J(R) &\to R/M_1 \oplus R/M_2 \oplus \cdots \oplus R/M_n \\ x+J(R) &\mapsto (x+M_1,x+M_2,\dots,x+M_n).\end{aligned}\]

Since \(J(R)=\bigcap_{i=1}^{n}M_i\), this follows from nothing but the Chinese Remainder Theorem. \(\phi\) is an isomorphism and each \(R/M_i\) is simple. We are done.

Now suppose \(I\) is a two-sided ideal such that \(R/I\) is semisimple. By definition we can write

\[R/I=\bigoplus_{j \in J}L_j\]

for some simple \(L_j\). Pick any \(a \in J(R)\), we have \(aL_j=0\) for all \(j\), therefore \(a(R/I)=0\), which implies that \(a \in I\), i.e. \(J(R) \subset I\). (In fact, according to the structure theorem of semisimple ring, \(J\) is finite.)

If \(J(R)=0\), then \(A/J(R)=A\) is semisimple. Conversely, if \(A\) is semisimple, then \(I=0\) is a two-sided ideal such that \(A/I\) is semisimple. Hence \(J(R)\) has to be trivial as well.

To prove 4, we work on the descending chain \(N \supset N^2 \supset N^3 \supset \cdots\). Let \(N^\infty\) be the ideal where the chain stops to shrink. Then according to Nakayama's lemma, \(NN^\infty=N^\infty\), which implies that \(N^\infty=0\). \(\square\)

Let \(R\) be a commutative ring and \(G\) a finite group. Let \(E\) be an \(R\)-module. We can study the representation

\[\rho: G \to \operatorname{Aut}_{R}E\]

and we can also study the ring homomorphism

\[\lambda:R[G] \to \operatorname{End}_{R}E.\]

We show that they are the same thing. Given \(\lambda\), then for any \(g \in G\), \(\lambda(e_g)\) is an automorphism because \(\lambda(e_g)\lambda(e_{g^{-1}})=\lambda(e_1)=1\). Therefore \(\lambda\) gives rise to representation \(\rho:g \mapsto \lambda(e_g)\).

Conversely, for an representation \(\rho\) and any \(g \in G\), \(\rho(g)\) is automatically an endomorphism and therefore we have a map

\[\begin{aligned}\lambda:R[G] &\to \operatorname{End}_{R}E \\\sum_{g \in G}a_ge_g &\mapsto \sum_{g \in G}a_g\rho(g).\end{aligned}\]

Therefore, the study of group representation can also be transferred into the study of group algebra. For simplicity we call such a module \(E\) together with a representation \(\rho\) as a \(G\)-module, which you may have known. *Note such a \(G\)-module can also be considered as a module over \(R[G]\) in the usual sense. Conversely, an \(R[G]\)-module is a \(G\)-module.* When the context is clear, we write \(gx\) in place of \(\rho(g)x\).

We generalise Maschke's theorem in an arbitrary field \(K\).

Theorem 8 (Maschke).Let \(G\) be a finite group of order \(n\). Let \(K\) be a field, then \(K[G]\) is semisimple if and only if the characteristic of \(K\) does not divide \(n\) (it can also be \(0\)).

In introductory representation theory, we study the case when \(K=\mathbb{R}\) or \(\mathbb{C}\), whose characteristic is definitely \(0\).

*Proof.* Let \(E\) be a \(G\)-module, and let \(F\) be a \(G\)-submodule. We show that \(F\) is a direct summand of \(E\), i.e., there exists some \(E' \subset E\) such that \(E = E' \oplus F\). It is natural to think about the projection \(\pi:E \to F\) where \(\pi(x)=x\) for all \(x \in F\). It is seemingly clear that \(E=\ker\pi \oplus F\) is what we want, but we can't do this: we only know that \(\pi\) is a \(K\)-linear map, but we have no idea if it is a \(K[G]\)-linear map. To work around this problem, we modify the projection into a \(K[G]\)-linear map.

To do this, we *average* \(\pi\) over conjugation. To be precise, we consider the map

\[\varphi:x \mapsto \frac{1}{n}\sum_{g \in G}g^{-1} \circ\pi\circ g(x)\]

This map is \(K[G]\)-linear. We therefore can write \(E=\ker\varphi \oplus F\) because it is the left inverse of the inclusion \(i:F \to E\). Indeed, for any \(x \in F\), we have

\[\varphi(x)=\frac{1}{n}\sum_{g \in G}g^{-1} \circ g(x)=\frac{1}{n}\sum_{g \in G}x=x.\]

Note, since \(F\) is a \(G\)-module, we have \(g(x) \in F\) and therefore \(\pi \circ g(x)=g(x)\). Also, the fact that \(\operatorname{char}K \nmid n\) is used here: if the characteristic divides \(n\), then \(\sum_{g \in G}x=0\). Moreover, \(n \cdot 1=0\) in \(K\) and therefore \(\frac{1}{n}\) is not defined.

Next we suppose that \(p=\operatorname{char} K\) divides \(n\). Consider the element

\[s=\sum_{g \in G}e_g.\]

Note \(gs:=e_gs\) for all \(g \in G\) and therefore \(s^2=(\sum_{g \in G}e_g)s=ns=0\) because \(p \mid n\). Therefore \(s\) is a nonzero nilpotent element, i.e. \(J(K[G]) \ne 0\), from which it follows that \(K[G]\) is not semisimple according to proposition 7. \(\square\)

In other words, if \(E\) is a finitely dimensional representation over \(K\) of group \(G\), and the characteristic of \(K\) does not divide \(|G|\), then \(E\) is completely reducible. Recall we also have matrix decomposition of a matrix representation. But this is not very easy to generalise. To work with it we need a clearer look at semisimple rings.

It would be great that, given a matrix representation of a representation, we can decompose it into diagonal block matrix, with each block being a subrepresentation. But it would not be a easy job: we need to know whether the field is algebraically closed, the characteristic of it, et cetera. Perhaps we need some Galois theory but it has gone too far from this post. Anyway we need to see through the structure to know how to work with it.

In this section we study the structure of \(R\) in a more detailed way. We say a ring is **simple** if it is semisimple and all of its simple left ideals are isomorphic. A left ideal is called simple if it is a simple left \(R\)-module.

Theorem 9 (Structure theorem of semisimple rings).Let \(R\) be a semisimple ring. Then the isomorphic class of left ideals of \(R\) is finite. Say it is represented by \(L_1,L_2,\dots,L_s\). If \(R_i = \sum_{L \cong L_i}L\) (the sum of all left ideals isomorphic to \(L_i\)), then \(R_i\) is a two-sided ideal, and is a simple ring. One can write \(R\) as a product\[R=\prod_{i=1}^{s}R_i.\]

Besides, \(R\) admits a Peirce decomposition with respect to these \(R_i\). There are elements \(e_i \in R_i\) such that \[1=e_1+\cdots+e_s.\] The \(e_i\) are idempotent (\(e_i^2=e_i\)), orthogonal (\(e_ie_j=0\) if \(i \ne j\)). As a ring, \(e_i\) is the multiplication identity of \(R_i\), and \(R_i=e_iR=Re_i\).

*Proof.* To begin with we first study the behaviour of simple left ideals.

Lemma 10.Let \(L\) be a simple left ideal of \(R\) and \(E\) be a simple \(R\)-module, then \(LE = 0\) unless \(L \cong E\).

*Proof of the lemma.* Since \(E\) is simple, \(LE=0\) or \(E\). If \(LE=E\), then there exists some \(y \in E\) such that \(Ly=E\) (again by the simplicity of \(E\)). Therefore the map \[a \mapsto ay\]

is surjective. It is injective because the kernel is a submodule of \(L\) and it has to be trivial. \(\blacksquare\)

According to this lemma, \(R_i R_j=0\) whenever \(i \ne j\). This will be frequently used. For the time being we can write \(R=\sum_{i \in I}R_i\) although we don't know whether \(I\) is finite. Firstly we show that \(R_i\) is also a right ideal (since it is a sum of left ideals, it is by default a left ideal):

\[R_i \subset R_i R = R_i R_i \subset R_i \implies R_iR=R.\]

Therefore \(R_i\) is also a right ideal for all \(i\). But before we proceed we need to explain the relation above. Since \(R\) contains the unit, we must have \(R_i \subset R_i R\). We have \(R_iR=R_iR_i\) because \(R_iR_j=0\) for all \(i \ne j\) and \(R\) is a sum of all \(R_j\) over \(j \in I\). Therefore other terms are eliminated. Finally, we have \(R_iR_i \subset R_i\) simply because \(R_i\) is a left ideal.

Also note that \(R_i \cap R_j=0\) for all \(i \ne j\) because it is an intersection of two distinct classes of simple modules. Therefore we can write \(R=\bigoplus_{i \in I}R_i\) for the time being.

Now consider \(1=\sum_{i \in I}e_i\) with \(e_i \in R_i\). This sum is finite (by definition of direct sum, where cofiniteness is required). Let \(J \subset I\) be the finite subset such that \(e_j \ne 0\) for all \(j \in J\). It follows that \(R_i=0\) for all \(i \in I \setminus J\) because \(R_i = 1 \cdot R_i = \sum_{j \in J}e_jR_i = 0\). We can therefore write \(R=\bigoplus_{i=1}^{n}R_i\). All other direct summands are trivial. Since each \(R_i\) represents a isomorphic class of simple left ideals, the class of simple left ideals are finite.

Now we study the relation of \(e_i\), \(R_i\) and \(R\). For any \(a_i \in R_i\), we have

\[a_i=a_i(e_1+\cdots+e_n)=a_ie_i=(e_1+\cdots+e_n)a_i=e_ia_i.\]

Therefore \(e_i\) is the unit in \(R_i\) (it follows automatically that \(e_i^2=e_i\)). For any \(a \in R\), we put \(a_i=ae_i\), then there is a unique decomposition

\[a=a_1+\cdots+a_n.\]

This gives us a projection \(R_i=Re_i=e_iR\). We also have \(e_ie_j=0\) if \(i \ne j\). Since \(R_iR_j=0\), we can safely write \(R=\prod_{i=1}^{n}R_i\). Each \(R_i\) is simple because (1) it is semisimple (\(R_i=\sum_{L \cong L_i}L\) and each \(L\) is also a simple \(R_i\)-module) and (2) all simple left ideals of \(R_i\) are isomorphic. To show this, assume that \(L \subset R_i\) is a left ideal that is not isomorphic to \(L_i\). Since we have \(L = R_iL = RR_iL = RL\), \(L\) is also a simple left ideal of \(R\). But it contradicts the definition of \(R_i\). \(\square\)

Let's extract more information from this theorem. First of all the sum of \(1\) is also finite in every \(R_i\), hence each \(R_i\) is also a finite direct sum. To be precise,

Theorem 11.Every simple ring \(R\) admits a finite direct sum of simple left ideals\[R = \bigoplus_{i=1}^{n}R_i.\]

*Proof.* Since \(R\) is semisimple, it is a sum of simple left ideals, the collection of which can be chosen to be direct. Say we have \(R=\bigoplus_{i \in I}R_i\).

Consider \(1 \in R\):

\[1=\sum_{i \in I}x_i\]

where \(x_i \in R_i\). This sum is finite, say we have \(1=\sum_{i=1}^{n}x_i\) and \(x_i \ne 0\). Then

\[R=1 \cdot R = \bigoplus_{i=1}^{n}x_iR=\bigoplus_{i=1}^{n}R_i.\]

This proves our assertion. \(\square\)

Combining theorem 9 and 11, we see

Corollary 12.Every semisimple ring \(R\) admits a decomposition\[R=n_1L_1 \oplus \cdots \oplus n_rL_r\]

where \(n_iL_i\) denotes \(n_i\) direct sums of isomorphic simple left ideals \(L_i\). This direct sum is unique in the following sense. \(L_1,\dots,L_r\) are unique up to isomorphism. \((n_i,L_i)\) are unique up to a permutation.

This must reminds you of the isotropical decomposition of a representation into irreducible representations. They are the same thing. It used the semisimplicity of \(\mathbb{C}[G]\) and here we are talking about the semisimplicity of an arbitrary ring.

We include here a elementary ring theory result that really doesn't need a proof here.

Proposition 13.Let \(R_1, R_2,\cdots, R_n\) be rings with units. The direct product\[R=R_1 \times \cdots \times R_n\]

has the following property. Every ideal (no matter left, right or two-sided) of \(R_i\) is an ideal of \(R\). Every minimal ideal of \(R_i\) is an ideal of \(R\). Every minimal ideal of \(R\) is an ideal of some \(R_i\).

The proof is quite similar to how we prove that \(R_i\) is simple in our proof of theorem 9. This actually shows that

Corollary 14.If \(R_1,\cdots,R_n\) are semisimple rings, then so is\[R=R_1 \times \cdots \times R_n.\]

We want to work with matrices, i.e., we want to work with linear equations. This becomes possible because of Wedderburn-Artin ring theory. We don't know what can happen yet, so we can only try to generalise things very carefully.

When talking about matrices, we can talk about endormorphisms as well. So our first step is to find a bridge to endormorphisms. We now to need to consider \(R\) as a left module over itself.

The most immediate one is multiplication. For \(a \in R\), we may consider the multiplication induced by \(a\):

\[\lambda_a:x \mapsto ax.\]

It may looks natural but unfortunately it is not necessarily an endomorphism. The reason is simple because we have \(\lambda_a(yx)\ne y\lambda_a(x)\) in general. However we can define

\[\rho_a:x \mapsto xa.\]

Now \(\rho_a(yx)=y\rho_a(x)\) holds naturally. We can show that every endomorphism is defined in this way. Consider the map \(\rho:a \mapsto (x \mapsto xa)\). We have

\(\rho\) is anti-homomorphism. Indeed, \(\rho(ab)=\rho(b)\rho(a)\) for all \(a,b \in R\) and \(\rho(a+b)=\rho(a)+\rho(b)\).

\(\rho\) is surjective (as a function, not a homomorphism). For any \(\psi:x \mapsto \psi(x)\), we have \(\psi(x)=\psi(x \cdot 1)=x\psi(1)\). Therefore \(\rho(\psi(1))=\psi\).

\(\rho\) is injective. If \(\rho(a)(x)=xa=0\) for all \(x \in R\), then in particular \(\rho(a)(1)=a=0\).

We can call \(\rho\) an *anti-isomorphism* but that causes headaches. Instead, if we consider the opposite ring \(A^{op}\) where addition is the same as \(A\) and multiplication \(\ast\) is given by

\[a \ast b = ba\]

then we have

Proposition 14.Let \(R\) be a ring. There is a natural isomorphism \(R^{op} \cong \operatorname{End}_R(R)\) given by \(a \mapsto (x \mapsto xa)\).

Note \((R^{op})^{op}=R\) so we may be able to take the opposite to decompose \(\operatorname{End}_R(R)\) and take the opposite again.

Now write \(R=\bigoplus_{i=1}^{r}n_iL_i\) as in corollary 12. We therefore have

\[R^{op} \cong \bigoplus_{i=1}^{r}\operatorname{End}_R(n_iL_i).\]

However, by Schur's lemma, \(D_i=\operatorname{End}_R(L_i)\) is a division ring (we don't necessarily have a field here). Therefore

\[\operatorname{End}_R(n_iL_i) \cong \operatorname{Mat}_{n_i}(D_i).\]

For each \(f \in \operatorname{End}_R(n_kL_k)\), we have a corresponding matrix \((p_ift_j)\):

\[L_k \xrightarrow{t_j}L_k \oplus \cdots \oplus L_k \xrightarrow{f} L_k \oplus \cdots\oplus L_k \xrightarrow{p_i}L_k\]

where \(t_j\) is the inclusion and \(p_i\) is projection. This is to say, the isomorphism is given by

\[f \mapsto (p_ift_j)\]

The verification is a matter of linear algebra and techniques frequently used in this post.

Therefore we have

\[R^{op}\cong \bigoplus_{i=1}^{r}\operatorname{Mat}_{n_i}(D_i).\]

Taking the opposite again we have

\[R=(R^{op})^{op} \cong \bigoplus_{i=1}^{r}\operatorname{Mat}_{n_i}(D_i^{op}).\]

The isomorphism \(\operatorname{Mat}_n(D)^{op} \cong \operatorname{Mat}_n(D^{op})\) is given by transpose of a matrix. However, the opposite ring of a division ring is still a division ring, we therefore have a decomposition

\[R=\bigoplus_{i=1}^{r}\operatorname{Mat}_{n_i}(D_i).\]

where \(D_i\) is a division ring.

Conversely, rings of the form above is semisimple. This is easy because for \(R=\operatorname{Mat}_n(D)\), the only proper two-sided ideal is trivial, hence \(J(R)\) is also trivial, but \(R/J(R)\) is semisimple. See the lemma below.

Lemma.Let \(R\) be a ring. All two-sided ideals of \(\operatorname{Mat}_n(R)\) are of the form \(\operatorname{Mat}_n(I)\) where \(I\) is a two-sided ideal of \(R\).

*Proof.* If \(I\) is a two-sided ideal of \(R\), then clearly \(\operatorname{Mat}_n(I)\) is a two-sided ideal of \(\operatorname{Mat}_n(R)\). Conversely, suppose \(J \subset \operatorname{Mat}_n(R)\) is a two-sided ideal, we show that \(J=\operatorname{Mat}_n(I)\) for some \(I \subset R\). To be precise, put

\[I=\{a \in R:\text{$a$ is the $(1,1)$-th element of $A$ for some $A \in J$}\}.\]

Then \(I\) is a two-sided ideal. Now pick some \(A \in \operatorname{Mat}_n(R)\). Let \(E_{ij}\) be the element whose is \(1\) on its \((i,j)\)-th element and \(0\) everywhere else. For any matrix \(A=(a_{ij})\), we have

\[E_{ij}AE_{k\ell}=a_{jk}E_{i\ell}.\]

Therefore if \(A \in J\), then \(a_{11} \in A\) and in particular,

\[E_{1j}AE_{k1}=a_{jk}E_{11} \in J \implies a_{jk} \in I\]

for all \(j,k\). Therefore \(J \subset \operatorname{Mat}_n(I)\). Conversely, for any \(a \in I\), we can find \(A=(a_{ij}) \in J\) such that \(a=a_{11}\). Now \(aE_{i\ell}=E_{i1}AE_{1\ell} \in J\). Note a matrix \(A=(a_{ij}) \in \operatorname{Mat}_n(I)\) can be written in the form \(\sum_{i,\ell}a_{i\ell}E_{i\ell}\) where \(a_{i\ell} \in I\). This proves that \(\operatorname{Mat}_n(I) \subset J\). \(\square\)

It follows that a matrix algebra over a division ring or a field is semisimple. But let's head back to where we were.

The direct sum (or product because it is finite) of matrix algebras over division rings

\[\operatorname{Mat}_{n_1}(D_1) \oplus \cdots \oplus \operatorname{Mat}_{n_r}(D_r).\]

To conclude, we have the Wedderburn-Artin theorem.

Theorem 15 (Wedderburn-Artin).\(R\) is a semisimple ring if and only if it can be written as a direct sum (or product because they are the same when finite) of matrix algebras over some division rings\[R \cong \operatorname{Mat}_{n_1}(D_1) \oplus \cdots \oplus \operatorname{Mat}_{n_r}(D_r).\]

Since the opposite of a division ring is a division ring, we also have

Corollary 16.A ring \(R\) is semisimple if and only if \(R^{op}\) is.

Now back to representation theory. But it can be extremely hard: we have no idea about the division ring. However, when the ring is algebraically closed, there is no problem. Note some author also use *skew field* in place of division ring.

Proposition 17.Let \(K\) be an algebraically closed field and \(D\) be a finite dimensional division ring over \(K\), then \(D \cong K\).

*Proof.* Pick \(a \in D\) that is not \(0\). Note the map \(\rho_a:x \mapsto ax\) is a \(K\)-linear map. Since \(K\) is algebraically closed, \(\rho_a\) has at least one eigenvalue, say \(\lambda\). It follows that

\[(\lambda{e}-a)x=0\]

for some nonzero \(x\) where \(e\) is the unit of \(D\). Since \(D\) is a division ring, we have \(a=\lambda{e}\). We actually established an isomorphism \(a \mapsto \lambda\) and therefore \(D \cong K\). \(\square\)

If you have studied Banach algebra theory, you will realise that this nothing but Gelfand-Mazur theorem (see any book in functional analysis that discusses Banach algebra, for example, *Functional Analysis* by W. Rudin). In infinite dimensional space we have to consider the topology of the field and the algebra.

Therefore we can now state Maschke's theorem in the finest way possible:

Theorem 18 (Maschke).Let \(G\) be a finite group, and \(K\) be an algebraically closed field whose characteristic does not divide the order of \(G\), then\[K[G]=\operatorname{Mat}_{n_1}(K) \oplus \cdots \oplus \operatorname{Mat}_{n_r}(K).\]

Those \(n_i\) are uniquely determined. In particular, \(n_1^2+\cdots+n_r^2=|G|\).

*Algebra Revised Third Edition*, Serge Lang.*Abstract Algebra*, Pierre Antoine Grillet.*Linear Representation of Finite Groups*, Jean-Pierre Serre

\[R'=\mathbb{C}[\cos{x},\sin{x}].\]

in a different style.

Again, if we consider the map

\[\begin{aligned}\Phi:\mathbb{C}[X,Y] &\to \mathbb{C}[\cos{x},\sin{x}] \\ f(X,Y) &\mapsto f(\cos{x},\sin{x})\end{aligned}\]

we will see that \(\ker\Phi=(X^2+Y^2-1)\) and therefore

\[\mathbb{C}[\cos{x},\sin{x}] \cong \mathbb{C}[X,Y]/(X^2+Y^2-1).\]

Following the same step as in the previous post, we can show that \(R'=\mathbb{C}[\cos{x},\sin{x}]\) is Dedekind. However, the map

\[\begin{aligned}\Psi:\mathbb{C}[U,V] &\to \mathbb{C}[X,Y]/(X^2+Y^2-1) \\ g(U,V) &\mapsto \overline{g(X+iY,X-iY)}\end{aligned}\]

shows that

(Proposition 1)\[\mathbb{C}[X,Y]/(X^2+Y^2-1) \cong \mathbb{C}[U,V]/(UV-1) \cong \mathbb{C}[T,T^{-1}] \cong \mathbb{C}[T]_T.\]

The localisation of a UFD is a UFD, hence we see \(\mathbb{C}[\sin{x},\cos{x}]\) is a UFD. There are other ways to do it. For example, we can directly put \(\mathbb{C}[\sin{x},\cos{x}]=\mathbb{C}[e^{ix},e^{-ix}]\). And this is even quicker. As another way, since \(\cos{x}=\frac{e^{ix}+e^{-ix}}{2}\) and \(\sin{x} = \frac{e^{ix}-e^{-ix}}{2i}\), all trigonometric polynomials can be decomposed into the following form

\[f(\cos{x},\sin{x}) = e^{-inx}P(e^{ix})\]

where \(P(X) \in \mathbb{C}[X]\). Conversely, All elements of the form \(e^{-inx}P(e^{ix})\) is in \(\mathbb{C}[\cos{x},\sin{x}]\) and therefore we have an isomorphism

\[\begin{aligned}\Lambda: \mathbb{C}[T]_{T} &\to \mathbb{C}[\cos{x},\sin{x}], \\ T &\mapsto \cos{x}+i\sin{x}.\end{aligned}\]

Note it follows that \(T^{-1}\) maps to \(\cos{x}-i\sin{x}\).

Now we return to the identity

\[\sin^2{x}=(1-\cos{x})(1+\cos{x}).\]

In \(\mathbb{R}[\cos{x},\sin{x}]\), since \(\sin{x}\), \(1-\cos{x}\), \(1+\cos{x}\) are all irreducible, or more precisely, elements of the form \(a+b\sin{x}+c\cos{x}\) are irreducible where \((b,c) \ne (0,0)\), we see \(\mathbb{R}[\cos{x},\sin{x}]\) is a UFD. In fact, we can deduce the fact that \(R\) is not a UFD by the fact that \(Cl(R) \cong \mathbb{Z}/2\mathbb{Z}\), i.e., the ideal class group is nontrivial (corollary 3.22).

However, since \(R'\) is a UFD, \(\sin^2{x}=(1-\cos{x})(1+\cos{x})\) tells us *nothing*. We need to figure out why and what is going on. To work with it we consider the form \(R'=\mathbb{C}[T,T^{-1}]\). What are irreducible elements in this ring? We will make use of the fact that \(\mathbb{C}\) is algebraically closed (why not!). Since \(T\) and \(T^{-1}\) are units in this ring, we can use them to modify the degree of an element. More precisely, as an application of the fundamental theorem of classical algebra,

\(P(T)=\sum_{j=m}^{n}a_jT^{j}\) (you should be reminded of Laurent series!) is irreducible where \(m,n \in \mathbb{Z}\) if and only if \(Q(T)=T^{-m}P(T)\) is irreducible. However, \(Q(T) \in \mathbb{C}[T]\) is irreducible if and only if \(Q\) is of degree \(1\)), which is equivalent to say that \(n-m=1\) in \(P(T)\).

Therefore irreducible elements in the form \(aT^m+bT^{m+1}\) where \(a,b \ne 0\) . Dropping \(bT^m\) because it is a unit, we obtain a finer result:

(Proposition 2)Irreducible elements of \(R'\) is of the form\[\cos{x}+i\sin{x}+a, a \in \mathbb{C}^\ast.\]

With this being said, \(\sin{x}\), \(1-\cos{x}\) and \(1+\cos{x}\) are all *not* irreducible. For example, for \(\sin{x}\) we actually have

\[\begin{aligned}\sin{x}&=\frac{1}{2i}(e^{ix}-e^{-ix})\\ & = \frac{1}{2ie^{ix}}(e^{2ix}-1) \\ & = \frac{1}{2ie^{ix}}(e^{ix}+1)(e^{ix}-1) \\ & = \frac{1}{2ie^{ix}}(\cos{x}+i\sin{x}+1)(\cos{x}+i\sin{x}-1)\end{aligned}\]

We can find some obvious facts about these two rings. For example, \(R\) is a free \(\mathbb{R}[\cos{x}]\)-algebra with basis \(\{1,\sin{x}\}\) (note all \(\sin^nx\) of even degree can be transformed into \(\cos{x}\) by the relation \(\sin^2{x}=1-\cos^2{x}\)). Likewise \(R'\) is a free \(\mathbb{C}[\cos{x}]\)-algebra with basis \(\{1,\sin{x}\}\). We can also write \(R'\) as \(R \oplus iR\) or \(R[i]\). That is, \(R'\) is a free \(R\)-algebra with basis \(\{1,i\}\). These are quite elementary and don't touch the structure of polynomial pretty much. Now we touch it by studying the quotient field of \(R\) and \(R'\) respectively.

Treating \(R\) as a free \(\mathbb{R}[\cos{x}]\)-algebra, we can write any polynomial \(f(\cos{x},\sin{x})\) as

\[f(\cos{x},\sin{x})=P(\cos{x})+Q(\cos{x})\sin{x}\]

where \(P,Q \in \mathbb{R}[X]\). For simplicity we write \(f=P+Q\sin{x}\). Suppose we now have \(f=P_1+Q_1\sin{x}\) and \(g=P_2+Q_2\sin{x}\) with \(g \ne 0\), then

\[\begin{aligned}\frac{f}{g} &= \frac{P_1+Q_1\sin{x}}{P_2+Q_2\sin{x}} \\ &= \frac{(P_1+Q_1\sin{x})(P_2-Q_2\sin{x})}{(P_2+Q_2\sin{x})(P_2-Q_2\sin{x})} \\ &=\frac{P_1P_2-Q_1Q_2(1-\cos^2{x})+(P_2Q_1-P_1Q_2)\sin{x}}{P_2^2+Q_2^2(1-\cos^{2}{x})}\end{aligned}\]

Therefore every element of \(K(R)\) can be written in the form \(U(\cos{x})+V(\cos{x})\sin{x}\) where \(U,V \in \mathbb{R}(\cos{x})\), the rational field of \(\cos{x}\) over \(\mathbb{R}\). Since \(\sin^2{x} \in \mathbb{R}(\cos{x})\), we obtain:

(Proposition 3)The quotient field of \(R\) is\[K(R)=\mathbb{R}(\cos{x})[\sin{x}].\]

Likewise,

\[K(R')=\mathbb{C}(\cos{x})[\sin{x}]\]

can be proved in exactly the same way.

Since \(R\) is Dedekind, it is integrally closed in \(K(R)\). But what about its relation with \(K(R')\)? For this we have an elegant result:

(Proposition 4)\(R'\) is the integral closure of \(R\) in \(K(R')\).

*Proof.* Let \(C\) be the closure of \(R\) in \(K(R')\). Note \(K(R')=K(R)[i]\). For any \(f+ig \in C\), we see \(f \in R\) and \(g \in R\) and hence \(f+ig \in R'\) because \(f,g \in K(R)\) and \(R\) is integrally closed. Therefore \(C \subset R'\). Conversely, any \(f+ig \in R'\) is in \(C\) because \(f,g \in R \subset C\) and \(i \in C\). Therefore \(R' \subset C\). \(\square\)

*We are using the notation that Hartshorne used in his book Algebraic Geometry.*

Put \(f(X,Y)=X^2+Y^2-1\), then \(Y=Z(f)\) is an irreducible affine curve in the affine space \(A^2_{\mathbb{C}}\). This curve is non-singular everywhere because the matrix

\[\begin{pmatrix}\partial f/\partial X \\\partial f/\partial Y\end{pmatrix} = \begin{pmatrix}2X \\2Y\end{pmatrix}\]

has rank \(1\). The coordinate ring \(A(Y)\) is exactly \(R'\).

Let \(P\) be a point on \(Y\), which, by Hilbert's Nullstelensatz, corresponds to a unique maximal ideal \(\mathfrak{m}_P \subset A(Y)\cong R'\). Since \(R'\) is a PID, and by proposition 2, \(\mathfrak{m}_P=(\cos{x}+i\sin{x}+a)\) where \(a \ne 0\). Hence \(P\) corresponds to a nonzero complex number \(a\).

(Proposition 5)Every point \(P\) on the curve \(Z(X^2+Y^2-1)\) corresponds to a unique nonzero complex number \(a \in C^\ast\).

Since \(Y\) is nonsingular, it also follows that \(\dim_{\mathbb{C}}\mathfrak{m}/\mathfrak{m}^2=\dim R'=1\) for all maximal ideal of \(R'\). This is to say, the tangent space is always of dimension \(1\) as a \(\mathbb{C}\)-vector space, or \(2\) as a \(\mathbb{R}\)-space. Besides, if we localise it at \(\mathfrak{m}_P\), we see \(\mathcal{O}_{P,Y} \cong R'_{\mathfrak{m}_P}\) is always a regular local ring.

*Introduction to Commutative Algebra*, M. F. Atiyah & I. G. MacDonald.*Algebraic Geometry*, Robin Hartshorne.*Commutative Ring Theory and Applications*, edited by Marco Fontana, Salah-Eddine Kabbaj and Sylvia Wiegand.

\[\chi:M \to K^\ast.\] By trivial character we mean a character such that \(\chi(M)=\{1\}\). We are particularly interested in the linear independence of characters. Functions \(f_i:M \to K\) are called **linearly independent over \(K\)** if whenever \[a_1f_1+\cdots+a_nf_n=0\] with all \(a_i \in K\), we have \(a_i=0\) for all \(i\). \(\def\Tr\operatorname{Tr}\)

In Fourier analysis we are always interested by functions like \(f(x)=e^{-inx}\) or \(g(x)= e^{-ixt}\), corresponding to Fourier series (integration on \(\mathbb{R}/2\pi\mathbb{Z}\)) and Fourier transform. Later mathematicians realised that everything can be set in a locally compact abelian (LCA) group. For this reason we need to generalise these functions, and the bounded ones coincide with our definition of characters.

Let \(G\) be a LCA group, then \(\gamma:G \to \mathbb{C}\) is called a *character* if \(|\gamma(x)|=1\) for all \(x \in G\) and \[\gamma(x+y)=\gamma(x)\gamma(y).\] Note since \(G\) is automatically a monoid, this coincide with our ordinary definition of character. The set of *continuous* characters form a group \(\Gamma\), which is called the *dual group* of \(G\).

If \(G=\mathbb{R}\), solving the equation \(\gamma(x+y)=\gamma(x)\gamma(y)\) in whatever way he or she likes we obtain \(\gamma(x)=e^{Ax}\) for some \(A \in \mathbb{C}\). But \(|e^{Ax}| \equiv 1\) (or merely being bounded) forces \(A\) to be purely imaginary, say \(A=it\), then we have \(\gamma(x)=e^{itx}\). Hence the dual group of \(\mathbb{R}\) can be determined by (the speed of) rotation on the unit circle.

With this we have our generalised version of Fourier transform. Let \(G\) be a LCA group, \(f \in L^1(G)\), then the **Fourier transform** is given by \[\hat{f}(\gamma) = \int_G f(x)\gamma(-x)dx, \quad \gamma \in \Gamma.\] One can intuitively verify that \(\hat{f}\) is exactly the Gelfand transform of \(f\), the step of which will be sketched below. On one hand, one can indeed verify that \(f \mapsto \hat{f}(\gamma)\) is indeed a Banach algebra homomorphism \(L^1(G) \to \mathbb{C}\), for all \(\gamma \in \Gamma\). This is a plain application of Fubini's theorem. On the other hand, let \(h:L^1(G) \to \mathbb{C}\) be any non-trivial Banach algebra homomorphism. One can investigate that \(\| h \| =1\) and hence \(h\) is a bounded linear functional. By Riesz's representation theorem, there is some \(\phi \in L^\infty(G)\) with \(\| \phi\|_\infty = 1\) such that \[h(f) = \int_G f(x)\phi(x)dx.\] We can indeed assume that \(\phi\) is continuous. With \(h\) being algebra homomorphism, we can see \[\phi(x+y)=\phi(x)\phi(y).\] We know that \(|\phi(x)| \le 1\) but \(\phi(-x)=\phi(x)^{-1}\) forces \(|\phi(x)|=1\). The proof is done after some routine verification of uniqueness.

Indeed, with this identification, we can also identify \(\Gamma\) as the maximal ideal space of \(L^1(G)\), which results in the following interesting characterisation.

If \(G\) is discrete, then \(\Gamma\) is compact; if \(G\) is compact, then \(\Gamma\) is discrete.

*Proof.* If \(G\) is discrete, then \(L^1(G)\) has a unit. The maximal ideal space, which can be identified as \(\Gamma\), is a compact Hausdorff space.

If \(G\) is compact, then its Haar measure can be normalised so that \(m(G)=1\). We prove that the singleton containing the unit alone is an open set. Let \(\gamma \in \Gamma\) be a character \(\ne 1\), then there exists some \(x_0\) such that \(\gamma(x_0) \ne 1\). As a result, \[\int_G \gamma(x)dx = \gamma(x_0)\int_G \gamma(x-x_0)dx = \gamma(x_0)\int_G \gamma(x)dx\] and hence \(\int_G\gamma(x)dx=0\). If \(\gamma=1\) then \(\int_G \gamma(x)=1\).

Besides, the compactness of \(G\) implies the constant function \(f \equiv 1\) is in \(L^1(G)\). As a result, \(\hat{f}(1)=1\) but \(\hat{f}(\gamma)=0\) whenever \(\gamma \ne 1\). But \(\hat{f}\) is continuous, \(\{\gamma:{f}(\gamma) \ne 0\}=\{1\}\) is open. \(\square\)

If characters of \(G\) are linear independent, then they are pairwise distinct, but what about the converse? Dedekind answered this question affirmatively. But his approach is rather complicated: it needed determinant. However, Artin found a neat way to do it:

Theorem (Dedekind-Artin)Let \(M\) be a monoid and \(K\) a field. Let \(\chi_1,\dots,\chi_n\) be distinct characters of \(G\) in \(K\). Then they are linearly independent over \(K\).

*Proof.* Suppose this is false. Let \(N\) be the smallest integer that \[a_1\chi_1+a_2\chi_2+\cdots+a_N\chi_N = 0\] but not all \(a_i\) are \(0\), for distinct \(\chi_i\). Since \(\chi_1 \ne \chi_2\), there is some \(z \in M\) such that \(\chi_1(z) \ne \chi_2(z)\). Yet still we have \[a_1\chi_1(zx)+\cdots+a_N\chi_N(zx)=0.\] Since \(\chi_i\) are characters, for all \(x \in M\) we have \[a_1\chi_1(z)\chi_1(x)+\cdots+a_N\chi_N(z)\chi_N(x)=0.\] We now have a linear system \[\begin{pmatrix}a_1 & a_2 & \cdots & a_N \\a_1\chi_1(z) & a_2\chi_2(z) & \cdots & a_N\chi_N(z)\end{pmatrix}\begin{pmatrix}\chi_1 \\\chi_2 \\\vdots \\\chi_N\end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}\] If we perform Gaussian elimination once, we see \[\begin{pmatrix}a_1 & a_2 & \cdots & a_N \\0 & \left(\frac{\chi_2(z)}{\chi_1(z)}-1\right)a_2 & \cdots & \left(\frac{\chi_N(z)}{\chi_1(z)}-1\right)a_N\chi_N(z)\end{pmatrix}\begin{pmatrix}\chi_1 \\\chi_2 \\\vdots \\\chi_N\end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}\] But this is to say \[\left(\frac{\chi_2(z)}{\chi_1(z)}-1\right)a_2\chi_2 + \cdots + \left(\frac{\chi_N(z)}{\chi_1(z)}-1\right)a_N\chi_N(z)\chi_N=0\] Note by assumption \(\frac{\chi_2(z)}{\chi_1(z)}-1 \ne 0\) and therefore we found \(N-1\) distinct and linearly independent characters, contradicting our assumption. \(\square\)

As an application, we consider an \(n\)-variable equation:

Let \(\alpha_1,\cdots,\alpha_n\) be distinct non-zero elements of a field \(K\). If \(a_1,\cdots,a_n\) are elements of \(K\) such that for all integers \(v \ge 0\) we have \[a_1\alpha_1^v + \cdots + a_n\alpha_n^v = 0\] then \(a_i=0\) for all \(i\).

*Proof.* Consider \(n\) distinct characters \(\chi_i(v)=\alpha^v\) of \(\mathbb{Z}_{\ge 0}\) into \(K^\ast\). \(\square\)

The linear independence of characters gives us a good chance of studying the relation of the field extension and the Galois group.

Hilbert's Theorem 90 (Modern Version)Let \(K/k\) be a Galois extension with Galois group \(G\), then \(H^1(G,K^\ast)=1\) and \(H^1(G,K)=0\). This is to say, the first cohomology group is trivial for both addition and multiplication.

It may look confusing but the classic version is about cyclic extensions (\(K/k\) is cyclic if it is Galois and the Galois group is cyclic).

Hilbert's Theorem 90 (Classic Version, Multiplicative Form)Let \(K/k\) be cyclic of degree \(n\) with Galois group \(G\) generated by \(\sigma\). Then \[\frac{\ker N}{1/\sigma{A}} \cong 1\] where \(1/\sigma{A}\) consists of all elements of the form \(\alpha/\sigma(\alpha)\) with \(\alpha \in A\), and \(N(\beta)\) is the norm of \(\beta \in K\) over \(k\).

This corresponds to the statement that \(H^1(G,K^\ast)=1\). On the other hand,

Hilbert's Theorem 90 (Classic Version, Additive Form)Let \(K/k\) be cyclic of degree \(n\) with Galois group \(G\) generated by \(\sigma\). Then \[\frac{\ker \Tr}{(1-\sigma){A}} \cong 0\] where \((1-\sigma)A\) consists of all elements of the form \((1-\sigma)(\alpha)\) with \(\alpha \in A\), and \(\Tr(\beta)\) is the norm of \(\beta \in K\) over \(k\).

This corresponds to, of course, the statement that \(H^1(G,K)=0\). Note this indeed asserts an exact sequence \[0 \to k \to K \xrightarrow{1-\sigma} K \xrightarrow{\Tr} K \to 0.\] Before we prove it we recall what is group cohomology. Let \(G\) be a group. We consider the category **\(G\)-mod** of left \(G\)-modules. The set of morphisms of two objects \(A\) and \(B\), for which we write \(\operatorname{Hom}_G(A,B)\), consists of all objects of \(G\)-set maps from \(A\) to \(B\). The *cohomology groups of \(G\) with coefficients in \(A\)* is the right derived functor of \(\operatorname{Hom}_G(\mathbb{Z},-)\): \[H^\ast (G,A) \cong \operatorname{Ext}^\ast_{\mathbb{Z}[G]}(\mathbb{Z},A).\] It follows that $H^0(G,A) _G(Z,A)=A/ga-a:g G,a A $. In particular, if \(G\) is trivial, then \(\operatorname{Hom}_G(\mathbb{Z},-)\) is exact and therefore \(H^\ast(G,A)=0\) whenever \(\ast \ne 0\). We will see what will happen when \(G\) is a Galois group of a Galois extension. If the modern version is beyond your reach, you can refer to the classic version. As a side note, the modern version can also be done using Shapiro's lemma.

$$ which is to say \(\alpha_\tau = \gamma/\tau\gamma\). Replacing \(\gamma\) with \(\gamma^{-1}\) gives what we want: cocycle coincides with coboundary. So much for the multiplicative form.

For the additive form, take \(\theta \in K \setminus \ker Tr\). Given a \(1\)-cocycle \(\alpha\) in the additive group \(K\), we put \[\beta = \frac{1}{\Tr(\theta)}\sum_{\tau \in G}\alpha_\tau \tau(\theta)\] Since cocycle satisfies \(\alpha_{\sigma\tau}=\alpha_\sigma+\sigma\alpha_\tau\), we get \[\sigma\beta = \frac{1}{\Tr(\theta)}\sum_{\tau \in G}(\alpha_{\sigma\tau}-\alpha_\sigma)\sigma\tau(\theta) = \beta -\alpha_\sigma\] which gives \(\alpha_\sigma = \beta-\sigma\beta\). Replacing \(\beta\) with \(-\beta\) gives what we want. \(\square\)

*Additive form.* Pick any \(\beta-\sigma\beta\), we see \(\Tr(\beta-\sigma\beta)=\sum_{\tau \in G}\tau\beta-\sum_{\tau \in G}\tau\beta=0\).

Conversely, assume \(\Tr(\alpha)=0\). By Artin's lemma, the trace function is not trivial, hence there exists some \(\theta \in K\) such that \(\Tr(\theta)\ne 0\), then we take \[\beta = \frac{1}{\Tr(\theta)}[\alpha\theta^\sigma+(\alpha+\sigma\alpha)\theta^{\sigma^2}+\cdots+(\alpha+\sigma\alpha+\cdots+\sigma^{n-2}\alpha)\theta^{\sigma^{n-1}}]\] where for convenience we write \(\sigma\theta=\theta^\sigma\). Therefore \[\beta-\sigma\beta = \frac{1}{\Tr(\theta)}\alpha(\theta+\theta^{\sigma}+\theta^{\sigma^2}+\cdots+\theta^{\sigma^{n-1}})=\alpha\] because other terms are cancelled. \(\square\)

*Multiplicative form.* This can be done in a quite similar setting. For any \(\alpha=\beta/\sigma\beta\), we have \[N(\alpha)=N(\beta)/N(\sigma\beta)=\left(\prod_{\tau \in G}\tau\beta\right)/ \left( \prod_{\tau \in G}\tau\sigma\beta\right)=1.\] Conversely, assume \(N(\alpha)=1\). By Artin's lemma, following function is not trivial: \[\Lambda:\operatorname{id}+\alpha\sigma+\alpha^{1+\sigma}\sigma^2+\cdots+\alpha^{1+\sigma+\cdots+\sigma^{n-2}}\sigma^{n-1}.\] Suppose now \(\beta=\Lambda(\theta) \ne 0\). It follows that \[\begin{aligned}\alpha\beta^\sigma &= \alpha(\theta+\alpha\theta^\sigma+\cdots+\alpha^{1+\sigma+\cdots+\sigma^{n-2}}\theta^{\sigma^{n-1}})^\sigma \\&= \alpha(\theta^\sigma+\alpha^\sigma\theta^{\sigma^2}+\cdots+\underbrace{\alpha^{\sigma+\sigma^2+\cdots+\sigma^{n-1}}\theta^{\sigma^n}}_{=\alpha^{-1}\theta}) \\&= \alpha\theta^\sigma+\alpha^{1+\sigma}+\cdots+\alpha^{1+\sigma+\cdots+\sigma^{n-2}}\theta^{n-1}+\theta \\&=\beta\end{aligned}\] and this is exactly what we want. \(\square\)

Consider the extension \(\mathbb{Q}(i)/\mathbb{Q}\). The Galois group \(G=\{1,\tau\}\) is cyclic and generated by \(\tau\) the complex conjugation. Now we pick whatever \(N(a+bi)=a^2+b^2=1\) where \(a,b \in \mathbb{Q}\), we have some \(r=s+ti \in \mathbb{Q}(i)\) such that \[a+bi = \frac{s+ti}{s-ti}=\frac{s^2-t^2+2sti}{s^2+t^2}= \frac{s^2-t^2}{s^2+t^2}+\frac{2st}{s^2+t^2}i\] If we put \((x,y,z)=(s^2-t^2,2st,s^2+t^2)\), we actually get a Pythagorean triple (if \(s,t\) are fractions, we can multiply them with the \(\gcd\) of the denominators so they are integers.). Conversely, all Pythagorean triple \((x,y,z)\), we assign it with \(\frac{x}{z}+\frac{y}{z}i \in \mathbb{Q}(i)\) then we have an element of norm \(1\). Through this we have found all solutions to \(x^2+y^2=z^2\). i.e.

TheoremIntegers \(x,y,z\) satisfy the Diophantine equation \(x^2+y^2=z^2\) if and only if \((x,y,z)\) is proportional to \((m^2-n^2,2mn,m^2+n^2)\) for some integers \(m,n\).

This can be generalised to all Diophantine equations of the form \(x^2+Axy+By^2=Cz^2\) for some nonzero constant \(C\) and constant \(A,B\) such that the discriminant \(A^2-4B\) is square-free. You can find some discussion here.

The additive form is a good friend of "character \(p\)" things. Artin-Schreier's theorem is a good example of \(p\)-to-the-\(p\).

Theorem (Artin-Schreier)Let \(k\) be a field of character \(p\) and \(K/k\) an extension of degree \(p\). Then there exists \(\alpha \in K\) and \(\alpha\) is the zeroof an equation \(X^p-X-a=0\) for some \(a \in k\).

*Proof.* Note the Galois group \(G\) of \(K/k\) is cyclic and \(\Tr(-1)=p(-1)=0\), we are able to use the additive form. Let \(\sigma\) be the generator of \(G\), there exists some \(\alpha \in K\) such that \[\sigma\alpha = \alpha+1.\] Hence \(\sigma(\sigma(\alpha))=\sigma(\alpha+1)=\alpha+1+1\), and by induction we get \[\sigma^i(\alpha) = \alpha+i, \quad i=1,2,\cdots,p\] and \(\alpha\) has \(p\) conjugates. Therefore \([k(\alpha):k] \ge p\). But in the meantime \[[K:k]=[K:k(\alpha)][k(\alpha):k]\] we can only have \([K:k(\alpha)]=1\), which is to say \(K=k(\alpha)\). In the meantime, \[\sigma(\alpha^p-\alpha)=(\alpha+1)^p-(\alpha+1)=\alpha^p+1^p-\alpha-1 = \alpha^p-\alpha.\] Hence \(\alpha^p - \alpha\) lies in the fixed field of \(\sigma\), which happens to be \(k\). Putting \(a=\alpha^p-\alpha\) and our proof is done. \(\square\).

For the case when the character is \(0\) please see here. There is a converse, which deserves a standalone blog post. It says that the polynomial \(f(X)=X^p-X-a\) either has one root in \(k\), in which case all its roots are in \(k\); or it is irredcible, in which case if \(\alpha\) is a root then \(k(\alpha)\) is cyclic of degree \(p\) over \(k\). But I don't know if many people are fans of "character \(p\)" things.

- Serge Lang,
*Algbra, Revised Third Edition*. - Charles A. Weibel,
*An Introduction to Homological Algebra*. - Noam D. Elkies,
*Pythagorean triples and Hilbert’s Theorem 90*. (https://abel.math.harvard.edu/~elkies/Misc/hilbert.pdf) - Jose Capco,
*The Two Artin-Schreier Theorems*. (https://www3.risc.jku.at/publications/download/risc_5477/the_two_artin_schreier_theorems__jcapco.pdf) - Walter Rudin,
*Fourier Analysis on Groups*.

In fact, the \(\mathbb{R}^n\) case can be generalised into any locally compact abelian group (see any abstract harmonic analysis books), this is because what really matters here is being locally compact and abelian. But at this moment we stick to Euclidean spaces. Note since \(\mathbb{R}^n\) is \(\sigma\)-compact, all Borel measures are regular.

To read this post you need to be familiar with some basic properties of Banach algebra, complex Borel measures, and the most important, Fubini's theorem.

The norm on \(M(\mathbb{R}^n)\) is the *total variation*: \[\lVert \mu \rVert = |\mu|(\mathbb{R}^n) = \sup \sum_{i=1}^{\infty}|\mu(E_i)|\] the supremum being taken over all partitions \((E_i)\) of \(\mathbb{R}^n\). The supremum on the right-hand side is finite because \(\mu\) is assumed to be complex. This norm makes \(M(\mathbb{R}^n)\) normed but we are interested in proving this space to be Banach.

Note each measure in \(M(\mathbb{R})\) gives rise to a bounded complex functional \[\begin{aligned}\Phi_\mu:C_0(\mathbb{R}^n) &\to \mathbb{C} \\ f &\to \int_{\mathbb{R}^n}fd\mu.\end{aligned}\] Note we have \(\vert \Phi_\mu(f)\vert = |\int f d\mu| \le \int |f|d|\mu| <\infty\). Indeed the norm of \(\Phi_\mu\) is \(\lVert \mu \rVert\).

Conversely, every bounded linear functional \(\Phi\) gives rise to a regular Borel measure \(\mu\) such that \(\Phi(f)=\int fd\mu\) and \(\lVert \Phi \rVert = \lVert \mu \rVert\), which is ensured by Riesz representation theorem. This is to say \[C_0(\mathbb{R}^n)^\ast \cong M(\mathbb{R}^n)\] in the sense of vector space isomorphism and homeomorphism (in fact, isometry). But it is well known that the dual space of a normed vector space is a Banach space, hence \(M(\mathbb{R}^n)\) is Banach as is expected.

A vector space \(V\) over a field \(\mathbb{F}\) is called an algebra if there is an \(\mathbb{F}\)-bilinear form \[B:V \times V \to V \]

. It is a Banach algebra if \(V\) itself is Banach and the bilinear form is associative, i.e. \(B(x,B(y,z)) = B(B(x,y),z)\) and \[\lVert B(x,y) \rVert \le \lVert x \rVert \lVert y \rVert.\] We show that \(M(\mathbb{R}^n)\) is Banach by taking \(B(\lambda,\mu)=\lambda \ast \mu\).

The convolution of measures is defined in the style of convolution of functions, in a natural sense. For any Borel set \(E \subset \mathbb{R}^n\), we can consider the set restricted by addition: \[E_2 = \{(x,y):x+y \in E\} \subset \mathbb{R}^{2n}.\] Then we define the convolution of \(\mu,\lambda \in M(\mathbb{R}^{2n})\) by product measure \[(\mu \ast \lambda)(E) = (\mu \times \lambda)(E_2).\] It looks natural but we need many routine verifications.

First, we need to show that \(E_2\) is Borel. In fact, we have \[\chi_{E_2}(x,y) = \chi_E(x+y).\] Since \(E\) is Borel, we see \(\chi_E\) is Borel. Meanwhile \(\varphi(x,y)=x+y\) is continuous hence Borel. Therefore \(\chi_{E_2}\) is Borel as well. It follows that \(E_2\) is a Borel set.

Next, is \(\mu \ast \lambda\) an element of \(M(\mathbb{R}^n)\)? For any Borel set \(E\), the value of \(\mu \ast \lambda(E)\) is defined in \(\mathbb{C}\), so we only need to verify that the definition of measure is satisfied. It shall be shown that \[(\mu \ast \lambda)\left(\bigcup_{k=1}^{\infty}E^k\right)=\sum_{k=1}^{\infty}(\mu \ast \lambda)(E^k)\] where \(E^k\) are pairwise disjoint. Since the "measure" of \(E\) is connected to \(E_2\), we first show that if \(E\) and \(F\) are disjoint, then so are \(E_2\) and \(F_2\). Indeed, if \((x,y) \in E_2 \cap F_2\), then we have \(x+y \in E \cap F\), and the set cannot be empty. Hence pairwise disjoint is preserved. Putting \(E= \bigcup_{k=1}^{\infty}E^k\), we also need to show that \(E_2 = \bigcup_{k=1}^{\infty}E_2^k\). If \(x+y \in E\), then it lies in one of \(E^k\), hence \((x,y) \in E_2 \implies (x,y) \in E_2^k\) for some \(k\). It follows that \(E_2 \subset \bigcup_{k=1}^{\infty}E_2^k\). Conversely, for \((x,y) \in \bigcup_{k=1}^{\infty}E_2^k\), we must have some \(k\) such that \(x+y \in E^k \subset E\), hence \((x,y) \in E_2\), which is to say that \(\bigcup_{k=1}^{\infty}E_2^k \subset E_2\). Therefore \[(\mu \ast \lambda)(E) = (\mu \times \lambda)(E_2) = (\mu \times \lambda)\left( \bigcup_{k=1}^{\infty}E_2^k\right) = \sum_{k=1}^{\infty}(\mu \times \lambda)(E_2^k) = \sum_{k=1}^{\infty}(\mu \ast \lambda)(E^k)\] as is desired.

For any \(f \in C_0(\mathbb{R}^n)\), we have a linear functional \[\Phi:f \mapsto \iint f(x+y)d\mu(x)d\lambda(y) = \int fd(\mu \ast \lambda)\] By Riesz representation theorem, there exists a unique measure \(\nu\) such that \(\Phi(f)=\int fd\nu\), it follows that \(\nu = \mu \ast \lambda\) is uniquely determined. However, we have \[\iint f(x+y)d\mu(x)d\lambda(y) = \iint f(x+y)d\lambda(x)d\mu(y)=\int fd(\lambda \ast \mu)\] It follows that \(\lambda \ast \mu = \nu = \mu \ast \lambda\). This convolution is commutative. Note for complex measures we always have \(|\mu|(\mathbb{R}^n)<\infty\) so Fubini's theorem is always valid.

Next we show that \(\ast\) is associative. It can be carried out by Riesz's theorem. Put \(\nu_1 = \lambda \ast (\mu \ast \gamma)\) and \(\nu_2 = (\lambda \ast \mu) \ast \gamma\). It follows that \[\begin{aligned} \int fd\nu_1 &= \iint f(x+y)d\lambda(x)d(\mu \ast \gamma)(y) \\ &= \iiint f(x+y+z)d\lambda(x)d\mu(y)d\gamma(z) \\ &= \iint f(x+y+z)d\gamma(z)d\lambda(x)d\mu(y) \\ &= \iint f(x+y)d\gamma(x)d(\lambda \ast \mu)(y) \\ &= \int fd(\gamma \ast (\lambda \ast \mu)) \\ &= \int fd\nu_2.\end{aligned}\] Hence \(\nu_1 = \nu_2\), which delivers the associativity of the convolution. To show that \(\ast\) makes \(M(\mathbb{R}^n)\) a Banach space, we need to show the distribution law. This follows from the definition of product measure because \[\mu \ast (\lambda_1 + \lambda_2)(E) = \int (\lambda_1 + \lambda_2)(E_{2}^{x})d\mu(x) = \int \lambda_1(E_2^x)d\mu(x) + \int \lambda_2(E_2^x)d\mu(x)\] which is to say \(\mu \ast \lambda_1 + \mu \ast \lambda_2 = \mu \ast (\lambda_1 + \lambda_2)\). Therefore \(M(\mathbb{R}^n)\) is a complex algebra. It remains to show that \(M(\mathbb{R}^n)\) is a Banach algebra. Let \(E^1, E^2, \cdots\) be a partition of \(\mathbb{R}^n\), we see \[\begin{aligned}\sum_{k=1}^{\infty}|\mu \ast \lambda(E^k)| &= \sum_{k=1}^{\infty}|(\mu \times \lambda)(E^k_2)| \\&= \sum_{k=1}^{\infty} \left|\iint \chi_{E_2^k}d\mu d\lambda\right| \\ &\leq \sum_{k=1}^{\infty} \iint \chi_{E_2^k}d|\mu|d|\lambda| \\&\leq |\mu|(\mathbb{R}^n) \cdot |\lambda|(\mathbb{R}^n) \\&\leq \lVert \mu \rVert \cdot \lVert \lambda \rVert.\end{aligned}\] Hence \(\lVert \mu \ast \lambda\rVert \le \lVert \mu \rVert \lVert \lambda \rVert\).

To conclude, \(M(\mathbb{R}^n)\) is a commutative Banach algebra. Even better, this space has a unit which is customarily called the **Dirac measure**. Let \(\delta\) be the measure determined by the evaluation functional \(\Lambda:f \mapsto f(0)\). It follows that \[\begin{aligned}\int f d(\delta \ast \mu) &= \iint f(x+y)d\delta(x)d\mu(y) \\ &= \int f(y)d\mu(y) \end{aligned}\] Hence \(\delta \ast \mu = \mu\) for all \(\mu \in M(\mathbb{R}^n)\). Besides, \(\delta\) has norm \(1\) because it attains value \(1\) at any Borel subset \(E \subset \mathbb{R}^n\) containing the origin and value \(0\) at any other Borel sets.

A measure \(\mu\) is said to be

discreteif there is a countable set \(E\) such that \(\mu(A)=\mu(A \cap E)\) for all measurable sets \(A\) (in general we say \(\mu\) is concentrated on \(E\)). \(\mu\) is said to becontinuousif \(\mu(A)=0\) whenever \(A\) only contains a single point. We write \(\mu \ll \lambda\), \(\mu\) isabsolutely continuouswith respect to \(\lambda\), if \(\lambda(A)=0 \implies \mu(A)=0\).

We now play some games between continuous and discrete measures. First, we study the subspace of discrete measures \(M_d(\mathbb{R}^n)\). For sum, things are quite straightforward. Suppose \(\mu\) is concentrated on \(A\) and \(\lambda\) is concentrated on \(B\), then \[\begin{aligned}\mu(E) + \lambda(E) &= \mu(E \cap A) + \lambda(E \cap B) \\ &= \mu(E \cap (A \cap (A \cup B))) + \lambda(E \cap (B \cap (A \cup B))) \\ &= \mu(E \cap (A \cup B))+ \lambda(E \cap (A \cup B)).\end{aligned}\] Hence \(\mu+\lambda\) is concentrated on \(A \cup B\).

For convolution, things are a little trickier. Suppose \(\mu = \sum_{i=1}^{\infty}a_i\delta_{x_i}\), \(\lambda=\sum_{i=1}^{\infty}b_i\delta_{y_i}\), where the \(x_i\) and \(y_i\) are distinct points, \(\delta_x\) is the Dirac measure concentrated on \(\{x\}\) (hence \(\delta=\delta_0\)), i.e. \(\mu\) is concentrated on \(A=\{x_i\}_{i=1}^{\infty}\) and \(\lambda\) is concentrated on \(\{y_i\}_{i=1}^{\infty}\), we see \[\begin{aligned}(\mu \ast \lambda)(E) &= \iint \chi_E(x+y)d\mu(x)d\lambda(y) \\ &= \int \sum_{i=1}^{\infty}a_i\chi_E(x_i+y)d\lambda(y) \\ &= \sum_{j=1}^{\infty}\sum_{i=1}^{\infty}a_ib_j\chi_E(x_i+y_j) \\ &= \sum_{j=1}^{\infty}\sum_{i=1}^{\infty}a_ib_j\chi_{E \cap (A+B)}(x_i+y_j) \\ &= (\mu \ast \lambda)(E \cap (A+B)).\end{aligned}\] Therefore \(M_d(\mathbb{R}^n)\) forms a subalgebra of \(M(\mathbb{R}^n)\).

Next, we focus on the subspace of continuous measures \(M_c(\mathbb{R}^n)\). To begin with, we first consider the following identity: \[\begin{aligned}(\mu \ast \lambda)(E) &= \iint \chi_E(x+y)d\mu(x)d\lambda(y) \\ &= \iint \chi_{E-y}(x)d\mu(x)d\lambda(y) \\ &= \int \mu(E-y)d\lambda(y).\end{aligned}\] Suppose \(\mu\) is continuous and \(E\) is a singleton, then \(E-y\) is still a singleton and hence \(\mu(E-y)=0\) for all \(y\), hence \((\mu \ast \lambda)(E)=0\), i.e. \(\mu \ast \lambda\) is still continuous. Therefore the subspace of continuous measures actually forms an ideal.

Next, suppose \(\mu \ll m\) and \(m(E)=0\). We see \[(\mu \ast \lambda)(E) = \int \mu(E-y)d\lambda(y) = 0\] because \(m(E)=0\) implies \(m(E-y)=0\) for all \(y\). Hence the subspace of absolutely continuous measures \(M_{ac}(\mathbb{R}^n)\) also forms an ideal.

Finally, we consider the Radon-Nikodym derivatives (which exists (surjective) and is unique almost everywhere (injective)) of absolutely continuous measures. If \[\mu(E) = \int_E fdm, \quad \lambda(E) = \int_E gd\mu,\] then the coincide \(\mu \ast \lambda\) coincide with \(f \ast g\) in the following sense: \[\begin{aligned}(\mu \ast \lambda)(E) &= \int_{\mathbb{R^n}} \mu(E-t)d\lambda(t) \\ &= \int_{\mathbb{R^n}}\left(\int_{E}f(x+t)dm(x) \right)g(t)dm(t) \\ &= \int_{\mathbb{R}^n}\int_E f(x+t)g(t)dm(x)d(t) \\ &= \int_E (f \ast g)dm\end{aligned}\] In other words, we have \(d(\mu \ast \lambda) = (f \ast g)dm\). Through this, we established an algebraic isomorphism \(M_{ac}(\mathbb{R}^n) \cong L^1(\mathbb{R}^n,m)\).

\(L^1(\mathbb{R}^n,m)\) could've been a Banach algebra, but the unit is missing. However one can embed it into \(M(\mathbb{R}^n)\) as a subspace of the subalgebra \(M_{L^1}(\mathbb{R}^n)\) which contains all complex Borel measures \(\mu\) satisfying \[d\mu = fdm + \lambda d\delta, \quad \lambda \in \mathbb{C}.\] Conversely, by the Lebesgue decomposition theorem, to every \(\mu \in M(\mathbb{R}^n)\), we have a unique decomposition \[\mu = \mu_a + \mu_s\] where \(\mu_a \ll m\) and \(\mu_s \perp m\). With this being said we have a direct sum \[M(\mathbb{R}^n) = L^1(\mathbb{R}^n,m) \oplus M_s(\mathbb{R}^n)\] where \(M_s(\mathbb{R}^n)\) is the subspace of complex measures singular to \(m\). Informally speaking, the Gelfand transform on \(L^1(\mathbb{R}^n,m)\) can be identified as the Fourier transform. Hence to study the Gelfand transform on \(M(\mathbb{R}^n)\) it suffices to work on \(M_s(\mathbb{R}^n)\). This shows the relation between \(L^1\) and \(C_0\).

\(G\) be the group of invertible elements of \(M=M(\mathbb{R})\), and \(G_1\) be the component of \(G\) that contains \(\delta\). \(G_1\) is an open normal subgroup of \(G\). Since \(M\) is commutative, \(G_1=\exp(M)\), and \(G/G_1\) contains no nontrivial element of finite order. We will show that \(G/G_1\) is actually uncountable. Pick \(\alpha \in \mathbb{R}\), assume \(\delta_\alpha \in G_1\), then \(\delta_\alpha = \exp(\mu_\alpha)\) for some \(\mu_\alpha \in M\). Performing Fourier transform on both sides gives \[\int e^{-ixt}d\delta_\alpha = e^{-i\alpha t} = \int e^{-ixt}d\exp(\mu_\alpha)(x)=e^{\hat{\mu}_\alpha(t)}\] Hence \[-i\alpha t = \hat{\mu}_\alpha(t)+2k\pi{i}\] Since \(\mu_\alpha\) is bounded, so is \(\hat{\mu}_\alpha(t)\). Hence \(\alpha=0\). This is to say \(\delta_\alpha \in G_1 \implies \alpha=0\). Next, consider any \(\lambda{G_1} \in G/G_1\). If \(\lambda=\delta_\alpha\) for some real \(\alpha\), then \(\delta_\alpha \in \lambda G_1\) is the only Dirac measure. If not, however, then \(\lambda G_1\) contains no Dirac measures. Hence we have obtained an injective but not surjective map \[\begin{aligned}\Lambda:\mathbb{R} &\to G/G_1, \\ \alpha &\mapsto \delta_\alpha G_1.\end{aligned}\] This is to say, \(G/G_1\) is uncountable.

]]>To begin with we consider a calculus problem that you may have seen in your exam:

Let \(f\) be a

continuousfunction on \([0,\infty)\) that \(\lim_{x \to \infty} f(x)=l\). Prove that \[\int_0^\infty \frac{f(ax)-f(bx)}{x}\mathrm{d}x = (f(0)-l)\ln\frac{b}{a}\]

And we solve this problem as follows. Put \(g(x)=f(x)-l\), then \(\lim_{x \to \infty}g(x)=0\). Consider the two variable function \(F(x,y)=-g'(xy)\) and the range \(D=\{(x,y):x \ge 0, a \le y \le b\}\), we have this result: \[\begin{aligned}\iint_D F(x,y)\mathrm{d}x\mathrm{d}y &= \int_0^\infty\mathrm{d}x\int_a^b -g'(xy)\mathrm{d}y \\ &= \int_0^\infty \frac{g(ax)-g(bx)}{x}\mathrm{d}x \\ &= \int_a^b \mathrm{d}y \int_0^\infty -g'(xy)\mathrm{d}x \\ &= \int_a^b \frac{g(0)}{y}\mathrm{d}y \\ &=g(0)\ln\frac{b}{a} \\\end{aligned}\] Substituting \(g(x)\) with \(f(x)-l\) gives exactly what we want, isn't it? **Well, the more analysis you learn, the more absurd this proof has been you will realise.** If you write this in an exam you will get \(0\) mark no matter what. There are two major mistakes:

- Can we change the order of integration? We have no idea. But it is certain that we cannot change the order with ease, and we have some counterexamples.
- Is this function
*even*differentiable? We also have no idea. It is*almost certain*that \(f\) is not (the probability that \(f\) is differentiable is \(0\)), see this post to learn why if you have some background in functional analysis.

For a good proof, please turn to math.stackexchange. This is not easy at all.

The problem is, it is really *unfair* that in some circumstances we have to axe out all properties of differentiation. If you are studying differential equations, and a non-differentiable function pops up, you have no way to go. Sometimes, chances are that you even have *no idea* whether a function is differentiable.

So this post is written. We introduce the concept of (Schwartz) **distribution** (a.k.a. **generalised functions**), where differentiation is significantly extended, to obtain **derivative** in a generalised sense. Roughly speaking, after distribution being introduced, differentiation can be done with absolute ease.

In fact, physicists have been using distribution long before mathematicians established formal theories. For example the \(\delta\) *function* introduced by Dirac that you may have met in Fourier transform: \[\delta(x) = \begin{cases}\infty &\quad x=0, \\ 0&\quad \text{others} .\end{cases}\] And it is required that \[\int_{-\infty}^{\infty}\delta(x)\mathrm{d}x=1.\] But this does not make any sense in calculus. Von Neumann, in his book on quantum physics, warned against the theory using this function, and dismissed this function because this was a "fiction". Not so pleasant. He tried with a lot of effort to demonstrate that, quantum physics could live without such a "fiction". As you can imagine, this function may have created some bad blood between von Neumann and Dirac.

Laurent Schwartz however, managed to be a peacemaker. He developed the theory of distribution (which is exactly what we are talking about in this post), and the "fiction" became an easy "fact". Years later, he became the 1950 Fields Medalist (one of the most prestigious medal/awards in mathematics) at the age of 35 with reason

Developed the theory of distributions, a new notion of generalized function motivated by the Dirac delta-function of theoretical physics. (Source)

As you can see later, thanks to Schwartz, the twisted \(\delta\) function is well-defined and is really plain and elegant. So von Neumann didn't need to be angry later.

By *concept* I mean, I will try to include basic ideas (without many proofs though they can be delivered), so that the serious study of it can be simpler (it can be really tough!). It is not possible that you can solve problems on distributions after reading this post.

There will be two parts. Part one focus on motivation and what is going on. I will try to make it readable to many people having finished calculus or more ideally undergraduate analysis and linear algebra, though rigour is not always guaranteed. It would be better if you know some differential equation theory, but that's not a must. If you already have the background to read part 2, then part 1 is much easier for you and therefore is served as a good source of intuition and motivation.

If you still need to understand differentiation in single-variable calculus, then you have no need to struggle on generalised differentiation at an early point. It does not help. The requirements of linear algebra are vector spaces, subspaces and linear maps. You should know that integration and differentiation are linear maps. This is a graduate course topic, it is not realistic to assume reader to have no idea about calculus and linear algebra.

The second part will be much more advanced, and you are expected to have some background in topological vector spaces (functional analysis). Both parts cannot be considered as a lecture note but they may help you find where you are when you study this concept seriously.

Throughout, we consider functions on \(\mathbb{R}\) with real value. These theories can be generalised to \(\mathbb{R}^n\) with complex value where partial derivative can take part in, but we are not doing that here. At the end of the day, these work would not be a big deal.

In calculus, a lot of functions we study are smooth (for example, \(y=\sin{x}\)), and we write \(C^\infty\) as they are *infinitely differentiable*. This is a vector space and this vector space differentiation can be done *with absolute ease*. For given \(f \in C^\infty\), we have \(f',f'',\cdots,f^{(k)}\) well defined for all \(k = 1,2,\cdots\). But in vector spaces like \(C^2\), \(C^1\), or even \(C\), differentiation can only be done with caution: we may only have \(f''\) and no \(f^{(3)}\), or even \(f'\) does not exist. We don't *feel like* this kind of caution. Hence we introduce the concept of **distribution** which is also known as **generalised functions**. We want a space where we can still do differentiation with absolute ease. We may need to *modify* our definition of differentiation such that it works on every continuous functions (but it shall not lost its meaning within \(C^\infty\)). Bearing these in mind, we have several settings or expectations for distributions:

- Every continuous function should be (considered as) a distribution. (So we can take
derivativesfor all continuous functions without to many worry. Unlike the calculus problem at the beginning.)- The "modified differentiation" should make sure that the "modified derivative" of a distribution is still a distribution. In other words, distributions are "infinitely differentiable" (which makes differential equation theory much easier). In the language of algebra, the "modified derivative" should be an endomorphism.
- The usual formal rules of calculus should hold. For example in the new sense we should still have \((fg)'=f'g+g'f\). (Our modified differentiation should not go to far.)
- Convergence properties should also be available. (Validating this requires more theories so this can only be mentioned in part 2.)

Let's write our desired distribution as \(\mathscr{D}'\), and all continuous functions \(C\). All \(C,C^\infty,\mathscr{D}'\) are considered as real vector spaces and we should have \[C^\infty \subset C \subset \mathscr{D}'\] in the sense of subspaces.

Here is a breakdown of these concepts. You will see terminologies and definitions later.

- A smooth, continuous or more generally, locally integrable function, give rise to a bounded linear functional. The converse is not guaranteed to be true, but we
pretendit to be true, so allbounded linear functionalsgive rise to distributions, a.k.a. generalised functions (this name is nice because wepretendthe converse to be true).Whenever you are asked what is generalised function, you can say, it is a linear map, and sometimes it can be determined by a normal function.- For these distributions or generalised functions, we modify the derivative with respect to integration by parts. The modified derivative cannot be put down explicitly but we don't care, because integration by parts doesn't give us many problems.
Whenever you are asked how the derivative of a non-differentiable function is given, you can say, it is given by pretending nothing wrong in integration by parts.

We now try to understand what we really what about distribution. We start our study through integration, **because differentiation does not work**. Given \(f \in C \subset \mathscr{D}'\), we first need to make sure \(\int f\phi\) is well-defined, for *some* \(\phi\in C^\infty\), because we want to do integration by parts, which involves **some differentiation**, and we may make use of it.

If \(f\) is not even a continuous function, we still need to consider *some* \(\phi\) in the same manner, or our extension would be abrupt.

Let's talk about these \(\phi\) a little bit, with respect to integration by parts. Consider the bump function \[\phi(x) = \begin{cases} \exp(\frac{1}{(x-a)(x-b)}) & \quad a < x < b, \\ 0 &\quad \text{ otherwise. }\end{cases}\] On \((a,b)\), we have $ $. On the boundary \(a\) and \(b\) we have \(\phi(x)=0\) but that shouldn't be a problem, because they are the alpha and omega. Points outside \([a,b]\) have no contribution to the value of this function. For some obvious reason we call \([a,b]\) the *closure* of \((a,b)\). In general, given a real-valued function \(f\), we call the closure of the set of points where \(f(x) \ne 0\) the **support** of \(f\). As you can tell, the support of \(\phi\) is \([a,b]\).

If \(\phi\) has unbounded support (the support of a function \(f\) is the closure of the set of points \(x\) where \(f(x) \ne 0\)), then we may need to discuss limit at infinity. But we don't want improper integrals at all. Hence the support of \(\phi\) are always assumed to be **closed and bounded** subset of \(\mathbb{R}\) It is closed because it is defined to be a closure. These closed and bounded sets are called *compact* sets. If you are not familiar with topology, it is OK at this moment to consider compact sets as bounded closed interval \([a,b]\).

The test function space \(\mathscr{D}\) is defined to be all \(C^\infty\) functions with compact support. This is indeed a vector space and the verification is a good excise on both linear algebra and calculus. What about \(\mathscr{D}'\)? Here we demonstrate how things are extended.

For each \(f \in C\) (which contains \(C^\infty\)), we have a functional (a functional is a linear map between a vector space and its base field, here is \(\mathbb{R}\). Nothing special, just a different name that has been used by mathematicians for decades!) \[\begin{aligned} \Lambda_f: \mathscr{D} &\to \mathbb{R}, \\ \phi &\mapsto \int f\phi.\end{aligned}\] This functional is **bounded** for all \(\phi \in \mathscr{D}\) because if \(\phi\) has support \(K\), then \[|\Lambda_f(\phi)|=\left|\int_K f\phi\right| \le \left(\int_K |f| \right)\sup_{x \in K}|\phi|.\] A continuous function on a compact set is always bounded (proof), hence the integral on the right hand side is always bounded. If it touches infinity a lot of problems are also touched.

In general, a **bounded linear functional** \(\Lambda:\mathscr{D} \to \mathbb{R}\) is called a *distribution*, which forms \(\mathscr{D}'\) exactly. Since every continuous function \(f\) gives rise to a unique bounded functional \(\Lambda_f\), we consider \(C\) as a subspace of \(\mathscr{D}'\). Such a function give rise to a functional, which is called distribution. The converse is not generally true, but we *pretend* it to be true (we pretend the functional gives rise to a function anyway), which makes our study easier, hence the name *generalised function* is well-deserved.

Differential operator \(D\) in \(C^\infty\) should be extended naturally into \(\mathscr{D}'\) naturally. There are many ways to extend a linear function. For example the identity map \(i:\mathbb{R} \to \mathbb{R}\) has at least two ways to be extended into \(\mathbb{R}^2\):

- \(I:\mathbb{R}^2 \to \mathbb{R}^2\) by \((x,y) \mapsto (x,y)\).
- \(\pi:\mathbb{R}^2 \to \mathbb{R}\) by \((x,y) \mapsto x\).

The restriction of these two maps on \(\mathbb{R}\) is the same as \(i\).

But if we extend \(D\) in several ways, things would be messy. Originally derivative is defined in the sense of limit, but for a non-differentiable function, we cannot do that. We need an extension that makes most sense: it is by validating **integration by parts**. It seems like we are developing some advanced concepts, but still we need to make use of elementary ones.

For \(f(x)=\sin{x}\) and \(\phi \in \mathscr{D}\), we have \[\Lambda_{f'}(\phi)=\int f'\phi = \int \phi\cos{x} = \underbrace{\phi\sin{x}|_{-\infty}^{\infty}}_{\text{zero}} -\int \phi'\sin x=-\Lambda_f(\phi')\] The derivative of \(f\) is assigned to the derivative of \(\phi\). Again we are using integration by parts. If \(f\) is not assumed to be differentiable, we *pretend* it is, skip the body and jump to the result immediately. For example, \(f(x)=|x|\) is not differentiable, but we do that anyway: \[\int |x|'\phi = -\int |x|\phi'.\] In general for \(f \in C^\infty\), we have (this can be verified by some computation) \[\Lambda_{D^k f}(\phi)=\int D^k f \phi = (-1)^k \int fD^k\phi = (-1)^k \Lambda _f(D^k\phi).\] Differentiation for distributions (on top of \(C^\infty\) functions) should be in the same **shape**, hence we define the \(k\)-th **distribution derivative** of a distribution \(\Lambda\) by \[D^k\Lambda: \phi \mapsto (-1)^{k}\Lambda(D^k\phi).\] Since all \(\phi\) are assumed to be of \(C^\infty\), there are no problem with this formula and this differentiation is defined for all \(\Lambda\). We don't care about first order limit on a continuous but not differentiable function. What matters here is the differentiation on test functions.

Try to recall what you have learnt about integration by parts. We have \[\int uv' = \int (uv)' - \int u'v\] because \[(uv)' = u'v+uv'.\] Therefore, if our generalisation of differentiation (though we do not know how to do yet) pays respect to integration by parts, then we can still work on product rule of differentiation, hence the usual formal rules of calculus would not go too far. If our extension conflicts with integration by parts, then the ordinary meaning of differentiation is damaged.

Let's sum up what has happened. We have obtained an inclusion \[C^\infty \subset C \subset \mathscr{D}'.\] Every distribution is infinitely differentiable because functions in \(\mathscr{D}\) are. If \(f \in C^\infty\), then the \(k\)-th derivative can be understood in both the sense of ordinary differentiation and the sense of distribution because it is given by \[\phi \mapsto (-1)^k\int f \phi^{(k)} = \int f^{(k)}\phi\quad \forall \phi \in \mathscr{D}. \] This is independent to the choice of \(\phi\). If \(h\) is a function such that \(\int h\phi = \int f^{(k)}\phi\), then \(h=f^{(k)}\).

If \(f\) is merely continuous, still we can write the \(k\)-th derivative as \[\phi \mapsto (-1)^{k} \int f \phi^{(k)} \quad \forall \phi \in \mathscr{D}.\]

At this point, whether \(f\) is differentiable or not is not of our concern. Since \(\phi\) is smooth, the formula above is well-defined. In general we don't even care whether \(f\) is continuous or even integrable, as long as it gives rise to a **bounded** linear functional, which can be guaranteed by being *locally integrable*. A function is locally integrable if \(\int_K |f|<\infty\) for all compact \(K \subset \mathbb{R}\). In particular, \(K\) can be taken to be any bounded closed interval. **As long as \(f\) is locally integrable (for example, differentiable, continuous, or simply bounded), we can assign derivative in the new sense (integration by parts).**

We want something like \((fg)'=f'g+fg'\). To avoid confusion we use \(D\) to denote the derivative on distribution and \(f'\) to denote the derivative in the ordinary sense. This is pretty hard but for a multiplication of a \(C^\infty\) function and a distribution it is not that hard. Suppose \(\Lambda \in \mathscr{D}'\) and \(f \in C^\infty\). We define their 'product' by \[(f\Lambda)(\phi) = \Lambda(f\phi).\] We have another distribution and derivative follows in a natural way: \[\begin{aligned} D(f\Lambda)(\phi) &=-(f\Lambda)(\phi') \\ &= -\Lambda(f\phi') \\\end{aligned}\] Meanwhile \[\begin{aligned}(f'\Lambda+fD\Lambda)(\phi) &= \Lambda(f'\phi)+D\Lambda(f\phi) \\ &= \Lambda(f'\phi)-\Lambda(f'\phi+f\phi') \\ &=-\Lambda(f\phi').\end{aligned}\] Things still work in this aspect.

We haven't verify convergence yet, but that requires much more knowledge on functional analysis, so we don't do that here but in part 2. Fortunately, things would go in an intuitive way.

Consider the linear functional on \(\mathscr{D}\) by \[\delta(\phi)=\phi(0).\] This is bounded and is in fact our rigour definition of Dirac \(\delta\) function (Von Neumann can relax then!). It does have the *required property*. Say, if we realise this function as integration (informally) as \[\delta(\phi)=\int \delta\phi=\phi(0) \quad \forall \phi \in \mathscr{D},\] then \(\delta\) can indeed be considered as a *function* whose support is the origin, and the integral over \(\mathbb{R}\) is \(1\).

The *derivative* of \(\delta\) is well-presented as well. Note \(\delta'(\phi)=\delta(\phi')\), hence we have \[\delta'(\phi)=\phi'(0).\]

So much for part 1. If you don't have many background in functional analysis, then part 2 is not recommended, as you have no idea what is going on at all. It is not feasible to make part 2 to be readable to more people.

Here we provide some basic facts of test functions and distributions, assuming the reader some background in functional analysis. No proof is delivered because if I do this post can be as long as I want. I hope by organising facts here I can help you realise what is going on before you drown yourself in details of a proof. It is recommended to see the table of content on the right hand side first if you are on PC.

In brief, test functions are smooth functions with compact support. By the **support** of a function we mean the *closure* of the set \(\{x:f(x) \ne 0\}\). Let \(K\) be a compact set in \(\mathbb{R}\), then \(\mathscr{D}_K\) denotes a subspace of \(C^\infty\) whose support lies in \(K\). Since a closed subset of a compact set itself is compact, we see all functions in \(\mathscr{D}_K\) have compact support.

Test function space is defined by \[\mathscr{D} := \bigcup_{K \text{ compact}}\mathscr{D}_K.\] And the distribution space \(\mathscr{D}'\) is defined to be the dual space of \(\mathscr{D}\), i.e. the space of *continuous* linear functionals of \(\mathscr{D}\). But if we don't know the topology of \(\mathscr{D}\), we cannot proceed. *Here is how we attempt to establish the norm.*

Consider the norm for \(\phi \in \mathscr{D}\) for all \(N=0,1,2,\cdots\) by \[\| \phi \|_N = \sup_{x \in \mathbb{R}; n \le N}|D^nf|.\] This induces a local base \[V_N = \left\{ \phi \in \mathscr{D}_K:\|\phi\|_N \le \frac{1}{N} \right\} \quad (N=1,2,3,\cdots).\]

And we get a locally convex metrisable topology on \(\mathscr{D}\).

If this topology makes \(\mathscr{D}\) a Banach space, then it would be fantastic - a lot of Banach space technique can be used. However, this topology is too *small* to be complete. One simply need to consider this sequence: \[\psi_m(x)=\phi(x-1)+\frac{1}{2}\phi(x-2)+\cdots+\frac{1}{m}\phi(x-m)\] where \(\phi \in \mathscr{D}_{[0,1]}\) and \(\phi>0\) on \((0,1)\). This sequence is Cauchy but the limit has no bounded support hence does not lie in \(\mathscr{D}\).

This time we do an *enhancement* on the previous topology, which makes \(\mathscr{D}\) a locally convex topological space, which is complete and has the Heine-Borel property (closed and bounded set is compact and vice versa). We still need the topology defined in our first attempt. It is broken into three steps:

- For each compact set \(K\), let \(\tau_K\) denote the subspace topology of \(\mathscr{D}\) defined in attempt 1.
- Let \(\beta\) be the collection of all convex balanced set \(W \subset \mathscr{D}\) such that \(\mathscr{D}_K \cap W \in \tau_K\) for all compact \(K\). (A set \(W\) is balanced if \(\alpha{W} \subset W\) for all \(|\alpha| \le 1\).)
- The new topology \(\tau\) is defined to be the collection of all unions of sets of the form \(\phi + W\) with \(\phi \in \mathscr{D}\) and \(W \in \beta\).

This is the topology we want, and one can indeed verify that \(\tau\) is a topology, with local base \(\beta\). This topology has the following properties:

- \(\tau\) makes \(\mathscr{D}\) a locally convex topological vector space.
- \(\mathscr{D}\) has the Heine-Borel property.
- In \(\mathscr{D}\), every Cauchy sequence converges.

Locally, **the topology of \(\mathscr{D}_K\) is the same as \(\tau_K\)**. Hence we can still use properties of these norms if we want. In fact, this \(\tau_K\) makes \(\mathscr{D}_K\) a Fréchet space, i.e. locally compact and complete metric space.

We cannot discuss continuity without topology. But still continuity has to be treated carefully. For example the space \(L^p([0,1])\) with \(0<p<1\) is weird: the dual space is trivial, due to its topology: the only two open convex sets are empty set and itself. Fortunately we have the following, which is quite intuitive.

Suppose \(\Lambda\) is a linear mapping of \(\mathscr{D}\) into a locally compact convex space \(Y\) (which can be \(\mathbb{R}\), \(\mathbb{C}\) or \(\mathscr{D}\) itself). Then the following are equivalent:

- \(\Lambda\) is continuous. (We care about the behaviour of \(\mathscr{D}'\))
- \(\Lambda\) is bounded. (You must have learnt the equivalence of 1 and 2 already)
- \(\phi_i \to 0\) in \(\mathscr{D}\) implies \(\Lambda\phi_i \to 0\) in \(Y\).
- The restriction of \(\Lambda\) to every \(\mathscr{D}_K\) is continuous.

In particular, it follows that the differential operator \(D^n\) is continuous for all \(n\). We also have some knowledge of the behaviour of \(\mathscr{D}'\) now:

If \(\Lambda\) is a linear functional on \(\mathscr{D}\), then the following are equivalent:

- \(\Lambda \in \mathscr{D}'\).
- To every compact set \(K\) there corresponds a nonnegative integer \(N\) and a constant \(C<\infty\) such that the inequality
\[|\Lambda\phi| \le C \|\phi\|_N\]

holds for every \(\mathscr{D}_K\).

Consider the Dirac distribution on \(x\) given by \[\delta_x(\phi)=\phi(x)\quad \phi \in \mathscr{D}.\] This is indeed a distribution. The case when \(x=0\) gives us the Dirac function in physics. Note \[\mathscr{D}_K = \bigcap_{x \in K^c}\ker\delta_x,\] \(\mathscr{D}_K\) is a **closed subspace** of \(\mathscr{D}\). Since \(\mathscr{D}_K\) is also nowhere dense, and there is a countable collection of \(K_i \subset \mathbb{R}\) (for example \(K_i=[-i,i]\)) such that \(\mathscr{D} = \bigcup \mathscr{D}_i\) (of the first category), and \(\mathscr{D}\) itself is complete, by Baire's Category Theorem, \(\mathscr{D}\) is not metrisable. This is a flaw of the topology of \(\mathscr{D}\), though is not that troublesome.

We have shown that every \(C^\infty\) functions can be considered as a distribution. In general, for a function \(f\) one only need to require that \(f\) is **locally integrable**, i.e. for every compact set \(K\) we have \[\int_K |f|<\infty.\] If we define \(\Lambda_f:\phi \mapsto \int f\phi\), we see \[|\Lambda_f(\phi)|\le \left( \int_K |f| \right)\sup|\phi|, \quad \phi \in \mathscr{D}_K.\]

In particular, at the very least, all \(L^1\) functions can be considered as distributions.

On the other hand, if \(\mu\) is a positive measure on \(\mathbb{R}\) with \(\mu(K)<\infty\) for all compact \(K\), then \[\Lambda_\mu:\phi \to \int \phi d\mu\] also defines a distribution.

We know the fundamental theorem of calculus in \(L^1\) only hold when the function \(f\) is *absolutely continuous*. The Cantor function \(f\) is differentiable almost everywhere on \([0,1]\) but \[\int_0^1 f'(x)\mathrm{d}x = 0, \quad f(1)-f(0)=1.\] This restriction still makes sense here. Pick \(f\) to be a left-continuous function with bounded variation. Then it can be shown that \[D\Lambda_f = \Lambda_\mu\] where \(\mu([a,b))=f(b)-f(a)\). Hence \(D\Lambda_f=\Lambda_{Df}\) if and only if \(f\) is *absolutely continuous*.

We consider the weak*-topology of \(\mathscr{D}'\) by \[\Lambda_i \to \Lambda: \lim_{i \to \infty}\Lambda_i\phi = \Lambda\phi \quad \forall \phi \in \mathscr{D}.\] Then fortunately this limit operator commutes with differential operator in a natural way, which may remind you of uniform convergence. In fact, \[\Lambda_i \to \Lambda \implies \Lambda \in \mathscr{D} \text{ and }D^k\Lambda_i \to D^k\Lambda \quad \forall k=1,2,\cdots.\] To prove this one needs Banach-Steinhaus theorem. Here concludes our four requirements of distributions.

Convolution plays an important role in Fourier analysis, and here is how to invite distribution to the party.

Normally for two \(L^1\) functions \(f,g\) we define \[(f \ast g)(x)=\int_\mathbb{R}f(y)g(x-y)\mathrm{d}y.\] We can create more symbols to make life easier:

- \(\tau_xu(y)=u(y-x)\).
- \(\check{u}(y)=u(-y)\).

It follows that \(\tau_x\check{u}(y)=\check{u}(y-x)=u(x-y)\). Hence \[(f \ast g)(x) = \int_\mathbb{R} f(y)(\tau_x\check{g})(y)\mathrm{d}y.\] It shows that \(g \to (f \ast g)(x)\) is actually a linear functional of \(\Lambda_f\), \(\tau_x\) and \(g \mapsto \check{g}\). But \(\Lambda_f\) itself can be a distribution, hence we define convolution for a distribution and a smooth function by \[L \ast \phi(x) = L(\tau_x\check{\phi}), \quad L \in \mathscr{D}', \phi \in \mathscr{D}.\] Convolution can be characterised in a natural way. In fact, for any \(T:\mathscr{D} \to C^\infty\), if \[\tau_x T = T\tau_x,\] then there is a unique \(L \in \mathscr{D}'\) such that \[T\phi = L\ast \phi.\] As you can imagine, this setting creates a lot of potentials for Fourier transform.

- Walter Rudin,
*Functional Analysis*, Second Edition. (Part II of the book) - Peter Lax,
*Functional Analysis*. (Appendix B) - Stanford Encyclopedia of Philosophy Archive (Fall 2018 Edition), Quantum Theory: von Neumann vs. Dirac.

Let us say you are a programmer who has been working in big companies for a decade. How does it feel when you want to help someone who starts studying programming from scratch? You may find it makes no sense that he or she cannot understand that, by copying several lines of code on the book, they has successfully made a programme printing "Hello, world!" on the screen. You know what I am talking about - the curse of knowledge.

When one has successfully learnt some certain skill, they may immediately lose the sense on why other people cannot understand and study. What is the holdup? It becomes increasingly difficult to teach beginners. Blunt simplification does not do the trick all the time.

This is one of the reasons why becoming a good teacher is so hard. Academia superstars may be super awful in teaching, while teaching superstars may have already ceased focusing on academia.

I am not writing this post to be a guru and give some steps on how to lift the curse. In fact I think I am suffering from this as well.

For example, Tien-Yien Li was a famous curse of knowledge lifter. When he did talks, he always tried to start from simple examples (this is adorable of course). When instructing his students, he may ask his students to treat him as a fool, as if he had known nothing. He was indeed a good mathematician and good maths teacher, but I do wonder how practical it is. Can his students do calculus in front of him while assuming he has no idea what is calculus? I have no idea.

Though I am only guessing, I think 'fool' is somewhat over-exaggerating. His students were in the similar field as him, hence it would not be too hard to follow his student at all. Of course the way he instruct his students is adorable as well.

There was a reader emailed me, giving me suggestion on, well, I should write my post simpler at some certain points. But I declined his suggestion in the end. Am I doing some Serge Lang thing? I have no idea.

In his 1983 book Fundamentals of Diophantine Geometry, he included L. J. Mordell's review of Lang's own book Diophantine Geometry which was ended by

In conclusion, the reader will need no convincing that Lang, as has already been said, is a very learned mathematician, thoroughly familiar with every aspect of the topics he deals with, and their developments. His interesting and valuable historical notes give further evidence of this. Lang assumes that his readers are as knowledgeable as he is, and can grapple with the subject with the same ease that he does. Even if they could, Lang's style is not such as to make matters easy for them. Lang in writing is not a follower of Gauss, whose motto was "

pauca sed matura." Further thought and care about his book, before publication, would have been well worth while. Those who can understand the book will be indebted to him for having brought together in one volume the important results contained in it. How much greater thanks would he have earned if the book had been written in such a way that more of it could have been more easily comprehended by a larger class of readers! It is to be hoped that so me one will undertake the task of writing such a book.And he also included his response:

All my books are meant to be understood by readers having the prerequisites for the level at which the books are written.

These prerequisites vary from book to book, depending on the subject matter, my mood, and other aesthetic feelings which I have at the moment of writing.When I write a standard text in Algebra, I attempt something very different from writing a book which for the first time gives a systematic point of view on the relations of Diophantine equations and the advanced contexts of algebraic geometry. The purpose of the latter is to jazz things up as much as possible. The purpose of the former is to educate someone in the first steps which might eventually culminate in his knowing the jazz too, if his tastes allow him that path. And if his tastes don't, then my blessings to him also. This is known as aesthetic tolerance. But just as a composer of music (be it Bach or the Beatles),I have to take my responsibility as to what I consider to be beautiful, and write my books accordingly, not just with the intent of pleasing one segment of the population. Let pleasure then fall where it may.With best regards, Serge Lang.

*Refer to this reddit post for a discussion.*

I can speak with absolute certainty that my posts are much more detailed than Serge Lang. And Lang never tried to lift the curse. But my posts cannot be readable to everyone. Say my posts on functional analysis, is not prepared for middle school students, unless they are ridiculously exceptional and have studied all prerequisites (linear algebra, real analysis, integration theory, topology) at that time. Though I shall never make my posts as terse as in Lang's book, it is never my duty to make my posts readable for everyone. So to some extent I fail as well.

If I try to, over-simplification has to be admitted. And it is against my rule. I do not like over-simplification so I try to make sure everything makes sense. But one would not understand unless he or she has certain prerequisites. I may recover some obstacles and show the clues, but that is so much for it. I can only lift the curse with respect to a certain group of people.

It seems I did not give a thoughtful discussion. But I do hope my inbox gives me good chance for discussion instead of chance to spark unnecessity. I did not try to close myself and a good evidence is that many of my posts can be found on the first page of Google search.

]]>Throughout we consider the polynomial ring \[R=\mathbb{R}[\cos{x},\sin{x}].\] This ring has a lot of non-trivial properties which give us a good chance to study commutative ring theory.

First of all, note it is immediate that \[R \cong \mathbb{R}[X,Y]/(X^2+Y^2-1)\] if the map is given by \(X \mapsto \cos x\) and \(Y \mapsto \sin x\). Besides, in \(R\) we have \[\sin^2x=(1-\cos{x})(1+\cos{x})=\sin{x}\cdot\sin{x}\] which indicates us to study whether \(R\) is a UFD. In fact, it is not, because the ideal class group is \(\mathbb{Z}/2\mathbb{Z}\). A Dedekind domain is a UFD if and only if its ideal class group is trivial (corollary 3.22).

This blog post is inspired by an exercise on Serge Lang's *Algebra*. But when writing this blog post, I found some paywalls. It would be absurd of me to direct a random reader to these paywalls. So it is very likely that I will include proofs as many as possible (when there is an absurd paywall, and chances are I will rework them for readability). But I can't remove the assumption that the reader has finished Atiyah-MacDonald full book or equivalences at the very least. I will add more topics in the future but that is not an easy job.

By Hilbert's basis theorem, \(\mathbb{R}[\cos{x},\sin{x}]\) is Noetherian because it is a finite \(\mathbb{R}\)-algebra. Now we are interested in the normality of it. Since \(\mathbb{R}[X,Y]/(X^2+Y^2-1) \cong \mathbb{R}[X][Y]/(Y^2-(1-X^2))\) and \(2\) is a unit, \(1-X^2\) is square-free but not a unit, we can apply the following lemma to show that \(R\) is a normal Noetherian ring (integrally closed in its field of fraction). For the definition and properties of a normal ring, please refer to the stack project.

(Lemma 1)Let \(A\) be a factorial ring with the field of fraction \(K\) in which \(2\) is a unit, \(a\) in \(A\) a square-free element (i.e., if \(p\) is a prime element in \(A\), then \(a \not\in p^2A)\) which is not a unit. Then \(A[T]/(T^2-a)\) is normal.

Let \(t\) be the image of \(T\) in \(A[T]/(T^2-a)\) and in \(L\). Then it is clear that \(A[t] \cong A[T]/(T^2-a)\) and we can write \(L=K(t)\). Note an element in \(K(t)\) is of degree at most \(1\), which is to say every element in \(L\) can be written uniquely as a sum \(r+st\) where \(r,s \in K\). To prove integral closeness, we need to find minimal polynomial of \(r+st\).

Next we show when \(A[t]\) is integrally closed. Note \[ \begin{aligned} \left[(r+st)-r\right]^2=(st)^2 &= s^2[T^2+(T^2-a)]\\ &= s^2[a+T^2-a+(T^2-a)]\\ &= as^2 \end{aligned}\] Hence \(f(X)=(X-r)^2-as^2\) sends \(r+st\) to \(0\). For polynomial of degree \(1\), we can only write \(g(X)=X-X\) such that \(g(r+st)=0\), which is absurd. Hence \(f(X)\) is the minimal polynomial of \(r+st\). With these being said, \(r+st\) is integral over \(A[t]\) if and only if \(-2r \in A[t]\) and \(r^2-as^2 \in A[t]\). We need to show this implies \(r+st \in A[t]\). Since we can consider \(A\) to be a subring of \(A[t]\), it suffices to show that \(r,s \in A\), provided \(-2r \in A\) and \(r^2-as^2 \in A\) when \(s \ne 0\).

Since \(2\) is a unit in \(A\), \(-2r \in A\) clearly implies \(r \in A\). It remains to prove that \(-as^2 \in A\). For \(s \in K\), we can write \(s=s_1/s_2\) with \(s_1,s_2 \in A\) relatively prime. We shall show that \(s_2\) will always be a unit, which implies that \(s \in A\). Write \(as^2=h\), then we have \(as_1^2=hs_2^2\). Assume \(s_2\) is not a unit, then there is a prime \(p\) divides \(s_2\) as \(A\) is a factorial ring. hence \(as_1^2 = hs_2^2 \in p^2A\). Since \(s_1\) and \(s_2\) are relatively prime, \(p\) and \(p^2\) do not divide \(s_1\), hence \(a \in p^2A\), a contradiction (we have assumed \(a\) to be square-free. Also, the assumption that \(a\) is not a unit is used here to reach the contradiction). Hence \(s_2\) is a unit, \(s \in A\) and therefore \(-as^2 \in A\). The proof is complete. \(\square\)

Of course, I shan't be this lazy. It is clear that in the factorial ring \(A=\mathbb{R}[X]\), \(2\) is a unit. By square-free, we mean, if \(p \in A\) is prime, then \(a \not \in p^2A\). For example, in \(\mathbb{Z}\), \(12\) is not square-free because \(12=2^2 \times 3 \in 2^2\mathbb{Z}\) while \(14\) is square-free because \(14=2 \times 7\) and square does not appear. And for \(1-X^2\) things is clear because we only have \(1-X^2=(1-X)(1+X)\) - there is no square. We require \(2\) to be a unit because if not this argument becomes much more difficult to prove. We shall return to normality after we study the irreducible elements.

To conclude we have got a satisfying result:

(Proposition 1)\(R\) is a normal Noetherian ring.

With help of Fourier transform or elementary trigonometric relations, every polynomial in \(R=\mathbb{R}[\cos{x},\sin{x}]\) can be written in the form \[P(x) = a_0+\sum_{k=1}^{n}(a_k\cos{kx}+b_k\sin{kx})\] where \(a_0,a_k,b_k \in \mathbb{R}\). Define the degree \(\delta(P)\) to be the maximum of integers \(r,s\) where \(a_r,b_s \ne 0\). Then a direct computation shows that \(\delta(PQ)=\delta(P)+\delta(Q)\). This is not the case when the scalar is complex. For example, \((\cos{x}+i\sin{x})(\cos{x}-i\sin{x})=1\).

If \(\delta(P)=0\), then \(P(x)=a_0\) is zero or a unit. If \(\delta(P)=1\), then if we have \(P=P_1P_2\), then \(\delta(P_1)+\delta(P_2)=1\). One of them has to be unit, hence \(P\) is irreducible. If \(\delta(P)=2\), then \(P\) is reducible because we can solve equations in the expansion of the product \[(a+b\sin{x}+c\cos{x})(a'+b'\sin{x}+c'\cos{x}).\] By induction all polynomials of degree \(\ge 2\) is reducible. Hence irreducible elements are of the form \[a+b\sin{x}+c\cos{x} \quad (b,c) \ne (0,0).\] But since \(R\) is not a UFD, we cannot work on the ideal \((a+b\sin{x}+c\cos{x})\) directly. We need to dive into abstraction for a long time.

We now proceed to another satisfying result.

(Proposition 2)\(R\) is a Dedekind domain.

*Proof.* Throughout, we work on the form \(R \cong \mathbb{R}[X,Y]/(X^2+Y^2-1)\). Since \(\mathbb{R}[X,Y]\) is of Krull dimension \(2\) (see Atiyah-MacDonald exercise 11.7, where a solution is almost given), \(X^2+Y^2-1\) is irreducible, we have a prime ideal \((X^2+Y^2-1)\), and all prime ideals \(P \subset \mathbb{R}[X,Y]\) strictly containing \((X^2+Y^2-1)\) are maximal. Next, let the canonical map \(\pi:\mathbb{R}[X,Y] \to \mathbb{R}[X,Y]/(X^2+Y^2-1)\) be given. By proposition 1.1 of Atiyah-MacDonald, \(\pi(P)\) are maximal ideals in \(\mathbb{R}[X,Y]/(X^2+Y^2-1)\) provided that \(P \supsetneq (X^2+Y^2-1)\) is prime. If nontrivial ideal \(Q \subset \mathbb{R}[X,Y]/(X^2+Y^2-1)\) is prime, then \(\pi^{-1}(Q)=Q^c\) is also prime, and it contains \((X^2+Y^2-1)\) strictly, which implies that \(Q\) is maximal. Hence \(R\) is of Krull dimension \(1\). By proposition 1, \(R\) is integrally closed, hence it is Dedekind. \(\square\)

Let \(A\) be an integral domain and \(P\) be the set of all prime ideals of height \(1\), i.e. the set of all prime ideals that only contain itself as a nonzero prime ideal. Then \(A\) is a Krull domain if

(KD1) \(A_{\mathfrak{p}}\) is a discrete valuation ring for all \(\mathfrak{p} \in P\). (KD2) \(A\) is the intersection of these discrete valuation rings (all considered as subrings of the field of fraction of \(A\). (KD3) Any nonzero element of \(A\) is contained in only a finite number of height \(1\) prime ideals.

To proceed our study of \(R\), we need a lemma:

(Lemma 2)If \(A\) is a Dedekind domain, then \(A\) is also a Krull domain.

Next we prove (KD3). Pick any nonzero \(a \in A\). If \(a\) is a unit, then it is contained in \(0\) ideals. If not, consider the ring \((a)=aA\). We have a unique factorisation as a product of prime ideals: \[ (a)= \mathfrak{p}_1^{r_1}\cdots\mathfrak{p}_n^{r_n} \subset \bigcap_{j=1}^{n}\mathfrak{p}_j.\] Hence (KD3) is proved.

For (KD2), note first \(A \subset \bigcap_{\mathfrak{p}}A_{\mathfrak{p}}\) because the natural map \(A \to A_{\mathfrak{p}}\) is injective for all \(\mathfrak{p}\). Hence it suffices to prove the reverse. But elements in \(A_{\mathfrak{p}}\) are of the form \(a/s\). Hence we expect those elements of the form \(b/1\) to be in \(A\). Therefore it suffices to prove that \(b/1 \in (a/1)A_{\mathfrak{p}}\) for all prime \(\mathfrak{p}\) implies \(b \in aA\) for all \(a,b \in A\), \(a ,b\ne 0\). Put \[ (a)=\mathfrak{p}_1^{r_1}\cdots\mathfrak{p}_n^{r_n}\] we see \(\mathfrak{q}_i = \mathfrak{p}_i^{r_i}\) is \(\mathfrak{p}_i\)-primary and we obtain a primary decomposition. Note we in particular have \[ b \in \bigcap_{j=1}^{n}\left(aA_{\mathfrak{p}_i} \cap A \right) = \bigcap_{j=1}^{n}\mathfrak{q}_i = aA\] because each \(\mathfrak{p}_i\) has height \(1\). \(\square\)

Which is to say that

(Proposition 3)\(R\) is a Krull domain.

We know that since \(R\) is Dedekind, its fractional ideals form an abelian group. This gives rise to the ideal class group. By a result of Samuel, we have a shockingly simple fact:

(Proposition 4)The ideal class group \(Cl(R) \cong \mathbb{Z}/2\mathbb{Z}\).

Which can be considered as a corollary to this following statement:

(Samuel)Let \(F\) be a non-degenerate quadratic form in \(k[X_1,X_2,X_3]\). Let \(A_F=k[X_1,X_2,X_3]/(F)\). Then \(Cl(A_F)=\mathbb{Z}/2\mathbb{Z}\) if and only if there is a nontrivial solution to \(F(X_1,X_2,X_3)=0\) in \(k\).

One can find this result via this link, and refer to **study of plane conics**.

With these being said, by theorem 8 of Zaks' paper, one sees that \(R\) is a HFD domain. To be precise, for polynomials \(x_1,x_2,\cdots,x_n\) and \(y_1,y_2,\cdots,y_m\), if \(x_1x_2\cdots x_n=y_1y_2\cdots y_m\), then \(m=n\). I may recover the proof here one day, but it would be much more difficult than writing everything you have seen here. This ring \(R\) also shows that HFD is not necessarily UFD.

Since \(Cl(R) \cong \mathbb{Z}/2\mathbb{Z}\), for any maximal ideal \(M \subset A\), either it is principal or \(M^2\) is principal. If \(M\) and \(M'\) are two non-principal ideal, then \(MM'\) is principal. Conversely, for any irreducible \(z \in R\), either \((z)\) is maximal or \((z)=MM'\) for some maximal ideal \(M\) and \(M'\), and \(M\) and \(M'\) may coincide. We have given the form of irreducible elements \[z = a+b\sin{x}+c\cos{x},\quad (b,c) \ne (0,0).\] So we are now interested in these \(a,b,c\). We will do some high school trick first. If we put \[\begin{cases}k = \frac{a}{\sqrt{b^2+c^2}} \\b' = \frac{b}{\sqrt{b^2+c^2}} \\c' = \frac{c}{\sqrt{b^2+c^2}}\end{cases}\] then \(z= \sqrt{a^2+b^2}(\sin(x+\alpha)+k)\) where \(b'=\cos\alpha\) and \(c' = \sin\alpha\). Since \(\sqrt{a^2+b^2} \in \mathbb{R}\) it suffices to study elements of the form \(\sin(x+\alpha)+k\).

Define a shift morphism \(h:R \to R\) by \[h(\cos{x})=\cos(x+\alpha), \quad h(\sin{x}) = \sin(x+\alpha), \quad h(t) = t\] This map is clearly an isomorphism. More importantly, since \[h(\sin{x}+k)=\sin(x+\alpha)+k,\] the primary decomposition of \((\sin(x+\alpha)+k)\) and \((\sin{x}+k)\) are of the same form. We are interested in the ring \(R/(\sin{x}+k)\), where it is natural to study the behaviour of \(\cos{x}\). For this reason we consider the substitution morphism \[\begin{aligned}g:\mathbb{R}[X] & \to R \\ X & \mapsto \cos{x}.\end{aligned}\] We first compute the inverse image \(g^{-1}[(\sin{x}+k)]\). It is natural to think about cancelling \(\sin x\) into \(\cos x\). Note \((\sin x + k)( \sin x - k) = (\sin^2x -k^2) = (1- \cos^2x-k^2)\), pick whichever \(P(X) \in (1-k^2-X^2)\), we have \[g(P(X))=P(\cos{x}) = (1-\cos^2x-k^2)Q(\cos{x})=(\sin{x}+k)(\sin{x}-k)Q(\cos{x})\] Hence \((1-k^2-X^2)\subset g^{-1}[(\sin{x}+k)]\). For the converse, note that if nonzero \(P \in g^{-1}[(\sin x + k)]\), we have \(\deg P > 1\) because trigonometric polynomial of the form \(a+b\cos x\) can never be divided by \(\sin x + k\). By Euclidean algorithm, we find \(Q(X)\), \(R(X)\) such that \[P(X)=Q(X)(1-k^2-X^2) + R(X)\] with \(\deg R \le 1.\) But when \(P \in g^{-1}[(\sin x + k)]\), we must have \(R(X)=0\), according to our study of the degree earlier. Hence we must have \(P(X) \in (1-k^2-X^2)\), which is to say \[g^{-1}[(\sin x + k)]= (1-k^2-X^2).\] This induces an isomorphism \[\mathbb{R}[X]/(1-k^2-X^2) \cong R/(k+\sin x).\] And it is much easier to study the ideal \(1-k^2-X^2\). To be precise,

- \(k^2=1 \iff (1-k^2-X^2)=(X)^2 \iff (k+\sin x)=M^2\) for some maximal ideal \(M\), because \((X)\) is a maximal ideal.
- \(k^2<1 \iff (1-k^2-X^2)\) is a product of two distinct maximal ideals \(\iff (k+\sin x)\) is a product of two distinct maximal ideals \(M\) and \(M'\).
- \(k^2>1 \iff (1-k^2-X^2)\) is maximal \(\iff\) \((k+\sin x)\) is maximal.

Therefore maximal ideals of \(R\) are determined by \(k\), or more precisely the relation between \(c^2\) and \(a^2+b^2\). Moreover, let \(M\) be a maximal ideal, we have

- If \(M\) is principal, then there exists \(\alpha\) and \(k\) such that

\[M = (\sin(x+\alpha) + k)\]

and \(R/M \cong \mathbb{C}\).

- If \(M\) is not principal, then there exists \(\alpha \in \mathbb{R}\) such that

\[M = (\sin(x+\alpha)+1,\cos(x+\alpha)), \quad M^2 = (\sin(x+\alpha)+1).\]

and \(R/M \cong \mathbb{R}\).

- Robert M. Fossum,
*The Divisor Class Group of a Krull Domain*. - M. F. Atiyah, FRS & I. G. MacDonald,
*Introduction to Commutative Algebra*. - Macro Fontana, Salah-Eddine Kabbaj, Sylvia Wiegand,
*Commutative Ring Theory and Applications.* - Hideyuki Matsumura,
*Commutative Ring Theory*. - P. Samuel,
*Lectures on Unique Factorization Domains*. - A. Zaks,
*Half Factorial Domains*.

Consider a sequence of real or complex numbers \(\{s_n\}\). If \(s_n \to s\), then \[\pi_n = \frac{s_1+\cdots+s_n}{n} \to s.\]

Here, \(\pi_n\) is called the Cesàro sum of \(\{s_n\}\). The proof is rather simple. Given \(\varepsilon>0\), there exists some \(N>0\) such that \(|s_n-s|<\varepsilon\) for all \(n > N\). Therefore we can write \[\begin{aligned} |\pi_n - s| &= \left|\frac{s_1+s_2+\cdots+s_N}{n}+\frac{s_{N+1}+\cdots+s_n}{n}-s\right| \\ &= \left|\frac{(s_1-s)+(s_2-s)+\cdots+(s_N-s)}{n}+\frac{(s_{N+1}-s)+\cdots+(s_n-s)}{n}\right| \\ &\leq \left| \frac{s_1+\cdots+s_N-Ns}{n} \right| + \frac{N}{n}\varepsilon\end{aligned}\] For fixed \(N\), we can pick \(n\) big enough such that \(N/n<1/2\) (i.e. \(n>2N\)) and \[\left| \frac{s_1+\cdots+s_N-Ns}{n} \right|<\frac{1}{2}\varepsilon.\] Hence \(\pi_n\) converges to \(s\). But the converse is not true in general. For example, if we put \(s_n=(-1)^n\), then it diverges but \(\pi_n \to 0\). If \(\pi_n\) converges, we say \(\{s_n\}\) is Cesàro summable.

If we treat \(\pi_n\) as an integration with respect to the counting measure, things become interesting. Why don't we investigate the operator defined to be \[C(f)(x)= \frac{1}{x}\int_0^xf(t)dt.\] In this blog post we investigate this operator in Hilbert space \(L^2(0,\infty)\).

Put \(L^2=L^2(0,\infty)\) relative to Lebesgue measure, and the Cèsaro operator \(C\) is defined as follows: \[\begin{aligned}(Cf)(s) = \frac{1}{s}\int_0^sf(t)dt.\end{aligned}\]

From the example above, we shouldn't expect \(C\) to be too normal or well-behaved, as convergence is not goes as expected. But fortunately it is at the very least continuous: due to Hardy's inequality, we have \(\lVert C \rVert = 2\). I have organised several proofs of this. But \(C\) is not compact.

Here is the proof. Consider a family of functions \(\{\varphi_A\}_{A>0}\) where \[\varphi_A = \sqrt{A}\chi_{(0,1/A]}.\] (I owe Oliver Diaz for this family of functions.) It's not hard to show that \(\lVert \varphi_A \rVert = 1\). If we apply \(C\) on it we see \[(C\varphi_A)(x) = \frac{1}{x}\int_0^x\sqrt{A}\chi_{(0,1/A]}dx = \sqrt{A}\left(\chi_{(0,1/A]}(x)+\frac{1}{Ax}\chi_{(1/A,+\infty)}\right)\] Hence \(\lVert C\varphi_A \rVert = \frac{\sqrt{1+A^2}}{A}\). Meanwhile for \(B>A\), we have \[\begin{aligned}C(\varphi_B-\varphi_A)(x) &=\left(\sqrt{B}-\sqrt{A} \right)\chi_{(0,1/B]}(x)+\left(\frac{1}{\sqrt{B}x}-\sqrt{A}\right)\chi_{(1/B,1/A]}(x) \\ &+\left(\frac{1}{\sqrt{B}} - \frac{1}{\sqrt{A}} \right)\frac{1}{x}\chi_{(1/A,+\infty)}(x)\end{aligned}\] It follows that \[|C(\varphi_B-\varphi_A)|(x) \geq \left(\frac{1}{\sqrt{A}}-\frac{1}{\sqrt{B}} \right)\frac{1}{x}\chi_{(1/A,\infty)}(x).\] If we compute the norm on the right hand side we get \[\|C(\varphi_B-\varphi_A)\| \geq \left|1-\sqrt{\frac{A}{B}} \right|.\] As a result, if we pick \(f_n=\varphi_{2^n}\), then for any \(m>n\) we get \[\|C(f_m-f_n)\| \geq \left|1 - \sqrt{2^{n-m}} \right| \geq 1-\frac{1}{\sqrt{2}}.\] Therefore, we find a sequence \((f_n)\) on the unit ball such that \((Cf_n)\) has no convergent subsequence.

Also we can find its adjoint operator: \[\begin{aligned}\langle Cf,g \rangle &= \int_0^\infty \left(\frac{1}{s}\int_0^sf(t)dt \right)\overline{g}(s)ds \\ &= \int_0^\infty\left(\int_t^\infty \frac{1}{s}f(t)\overline{g}(s)ds\right)dt \\ &= \int_0^\infty f(t) \left(\int_t^{\infty}\frac{1}{s}\overline{g}(s)ds\right)dt.\end{aligned}\] Hence the adjoint is given by \[(C^\ast f)(t) = \int_t^{\infty}\frac{1}{s}g(s)ds.\] \(C^\ast\) is not compact as well. Further, another application of Fubini's theorem shows that \[CC^\ast = C + C^\ast=C^\ast C \implies (I-C)(I-C^\ast)=I=(I-C^\ast)(I-C)\] Hence \(I-C\) is an isometry, and \(C\) is normal.

In this section we study the spectrum of \(C\) and \(C^\ast\), which will be derived from properties of bilateral shift, which comes from \(\ell^2\) space. For convenience we write \(\mathbb{N}=\mathbb{Z}_{\geq 0}\). This section can also help you understand the connection between \(L^2(0,1)\) and \(L^2(0,\infty)\).

An operator \(U\) on a Hilbert space \(H\) is called a *simple unilateral shift* if \(H\) has a orthonormal basis \(\{e_n\}\) such that \(U(e_n)=e_{n+1}\) for all \(n \in \mathbb{N}\). This is nothing but right-shift operator in the sense of basis. Besides, we call \(U\) a *unilateral shift of multiplicity \(m\)* if \(U\) is a direct sum of \(m\) simple unilateral shifts (note: \(m\) can be any cardinal number, finite or infinite).

If we consider the difference between \(\mathbb{N}\) and \(\mathbb{Z}\), we have the definition of *bilateral shift*. An operator \(W\) on \(K\) is called a *simple bilateral shift* if \(K\) has a orthonormal basis \(\{e_n\}\) such that \(We_{n}=e_{n+1}\) for all \(n \in \mathbb{Z}\). Besides, if we consider the subspace \(H\) which is spanned by \(\{e_n\}\), we see \(W|_H\) is simply a unilateral shift. Before we begin, we investigate some elementary properties of uni/bilateral shifts.

(Proposition 1)A simple unilateral shift \(U\) is an isometry.

*Proof.* Note \((Ue_m,Ue_n)=(e_{m+1},e_{n+1})=\delta_{m+1,n+1}=\delta_{mn}=(e_m,e_n)\). \(\square\)

(Proposition 2)A simple bilateral shift \(W\) is unitary, hence is also an isometry.

*Proof.* Note \((We_m,e_n)=(e_{m+1},e_n)=\delta_{m+1,n}=\delta_{m,n-1}=(e_m,W^{-1}e_n)\), from which it follows that \(W^\ast=W^{-1}\). \(\square\)

Now let the Hilbert space \(K\) and its subspace \(H\) (invariant under \(W\)) be given. Consider the 'orthonormal' operator given by \(Re_n=e_{-(n+1)}\). It follows that \(R\) is a unitary involution and \[Re_0=W^{-1}e_0 \quad RH = H^{\perp} \quad R \circ W = W^{-1} \circ R.\]

With these tools, we are ready for the most important theorems.

\(W=I-C^\ast\) is a simple bilateral shift on \(K=L^2\).

**Step 1 - Obtaining missing subspace, operator and basis**

Here we put \(H=L^2(0,1)\), which can be canonically embedded into \(L^2(0,\infty)\) in the obvious way (consider all \(L^2\) functions vanish outside \((0,1)\)). It is natural to put this, as there are many similarities between \(L^2(0,1)\) and \(L^2(0,\infty)\).

Explicitly, \[(Wf)(x) = f(x) - \int_x^\infty \frac{1}{t}f(t)dt, \quad x \in L^2(0,\infty).\] Also we claim the basis to be generated by \(e_0= \chi_{(0,1)}\). First of all we show that \((We_n)_{n \geq 0}\) is orthonormal. Note as we have proved, \(W^\ast W = (I-C)(I-C^\ast)=I\). Without loss of generality we assume that \(m \geq n\) and therefore \[(e_m,e_n)=(W^me_0,W^ne_0)=((W^\ast)^nW^me_0,e_0)=((W^\ast W)^nW^{m-n}e_0,e_0)=(W^{m-n}e_0,e_0).\] If \(m=n\), then \((e_m,e_n)=(e_0,e_0)=1\). Hence it is reduced to prove that \((W^ke_0,e_0)=0\) for all \(k>0\). First of all we have \[(We_0,e_0)=(e_0,e_0)-(C^\ast e_0,e_0)=1-(C^\ast e_0,e_0)\] meanwhile \[\begin{aligned} (C^\ast e_0,e_0) &= \int_0^1 \left(\int_x^1 \frac{1}{t}dt \right)dx \\ &= \int_0^1(-\ln{x})dx \\ &= (-x\ln{x}+x)|_0^1 = 1\end{aligned}\] Hence \(We_0 \perp e_0\). Suppose now we have \((W^{k-1}e_0,e_0)=0\), then $$\[\begin{aligned} (W^ke_0,e_0)&=(WW^{k-1}e_0,e_0) \\ &=((I-C^\ast)W^{k-1}e_0,e_0) \\ &= (W^{k-1}e_0,e_0)-(C^\ast W^{k-1}e_0,e_0) \\ &= -(W^{k-1}e_0,C e_0) \\ &= -\int_0^1W^{k-1}e_0(x)\frac{1}{x}\left(\int_0^xdt\right)dx \\ &= -\int_0^1 W^{k-1}e_0(x)\frac{1}{x} \cdot x dx \\ &= -(W^{k-1}e_0,e_0) \\ &= 0. \end{aligned}\]$$ Note \(W^ke_0 \in L^2(0,\infty)\) always vanishes when \(x \geq 1\): when we are doing inner product, \([1,\infty)\) is automatically excluded. With these being said, \((W^ne_0)_{n \geq 0}\) forms a orthonormal set. By The Hausdorff Maximality Theorem, it is contained in a maximal orthonormal set. But since \(H=L^2(0,1)\) is separable (admitting a countable basis, proof), \((W^ke_0)_k\) forms a basis of \(H\). From now on we write \(\{e_n\}\).

To find the involution \(R\), note first \(W=I-C^\ast\) is already unitary (also, if it is not unitary, then it cannot be a bilateral shift, we have nothing to prove), whose inverse or adjoint is \(W^\ast=I-C\) as we have proved earlier. Hence we have \[Re_0=e_{-1}=(I-C)e_0=\chi_{(0,1)}-\frac{1}{x}\int_0^xdt = -\frac{1}{x}\chi_{[1,\infty)}\] But we have no idea what \(R\) is exactly. We need to find it manually (or we have to guess). First of all it shall be guaranteed that \(RH=H^\perp\). Since \(H\) contains all \(L^2\) functions vanish on \([1,\infty)\), functions in \(RH\) should vanish on \((0,1)\). It is natural to put \(R(f)(x)=g(x)f\left( \frac{1}{x}\right)\) for the time being. \(g\) should be determined by \(e_{-1}\). Note \(e_0\left(\frac{1}{x}\right)=\chi_{[1,\infty)}\) almost everywhere, we shall put \(g(x)=-\frac{1}{x}\). It is then clear that \(Re_0=W^{-1}e_0\) and \(RH=H^\perp\). For the third condition, we need to show that \[W \circ R \circ W = R.\] Note \[\begin{aligned}W \circ R \circ W(f) &= W \circ R \left(f(x)-\int_x^\infty\frac{1}{t}f(t)dt\right) \\ &= W \left(-\frac{1}{x}f\left(\frac{1}{x}\right)+\frac{1}{x}\int_{1/x}^{\infty}f(t)dt \right) \\ &= -\frac{1}{x}f\left(\frac{1}{x}\right)+\underbrace{\frac{1}{x}\int_{1/x}^{\infty}f(t)dt + \int_x^\infty \frac{1}{t^2}f\left(\frac{1}{t}\right)dt + \int_x^\infty \frac{1}{t^2}\int_{1/t}^{\infty}f(u)du}_{=0 \text{ by Fubini's theorem, similar to proving }CC^\ast=C+C^\ast.} \\ &= R(f).\end{aligned}\] **Step 2 - With these, \(W\) in step 1 has to be a simple bilateral shift**

This is independent to the spaces chosen. To finish the proof, we need a lemma:

Suppose \(K\) is a Hilbert space, \(H\) is a subspace and \(e_0 \in H\). \(W\) is a unitary operator such that \(W^ne_0 \in H\) for all \(n \geq 0\) and \(\{e_n=W^ne_0\}_{n \geq 0}\) forms a orthonormal basis of \(H\). \(R\) is a unitary involution on \(K\) such that \[Re_0 = W^{-1}e_0 \quad RH=H^\perp \quad R \circ W = W^{-1} \circ R\] then \(W\) is a simple bilateral shift.

Indeed, objects mentioned in step 1 fit in this lemma. To begin with, we write \(e_n=W^ne_0\) for all \(n \in \mathbb{Z}\). Then \(\{e_n\}\) is an orthonormal set because for arbitrary \(m,n \in \mathbb{Z}\), there is a \(j \in \mathbb{Z}\) such that \(m+j,n+j \geq 0\). Therefore \[(e_m,e_n)=(W^je_m,W^je_n)=(W^{m+j}e_0,W^{n+j}e_0)=(e_{m+j},e_{n+j})=\delta_{m+j,n+j}=\delta_{m,n}.\] Since \((e_0,e_1,\cdots)\) spans \(H\), \(RH=H^{\perp}\), we see \((Re_0,Re_1,\cdots)\) spans \(H^{\perp}\). But \[Re_n=RW^ne_0=W^{-n}Re_0=W^{-n-1}e_0=e_{-n-1},\] hence \(\{e_{-1},e_{-2},\cdots\}\) spans \(H^\perp\). By definition of \(W\), it is indeed a bilateral shift. And our proof is done \(\square\)

- Walter Rudin,
*Functional Analysis*. - Arlen Brown, P. R. Halmos, A. L. Shields,
*Cesàro operators*.

Throughout we consider the Hilbert space \(L^2=L^2(\mathbb{R})\), the space of all complex-valued functions with real variable such that \(f \in L^2\) if and only if \[\lVert f \rVert_2^2=\int_{-\infty}^{\infty}|f(t)|^2dm(t)<\infty\] where \(m\) denotes the ordinary Lebesgue measure (in fact it's legitimate to consider Riemann integral in this context).

For each \(t \geq 0\), we assign an bounded linear operator \(Q(t)\) such that \[(Q(t)f)(s)=f(s+t).\] This is indeed bounded since we have \(\lVert Q(t)f \rVert_2 = \lVert f \rVert_2\) as the Lebesgue measure is translate-invariant. This is a left translation operator with a single step \(t\).

The inner product in \(L^2\) is defined by \[(f,g)=\int_{-\infty}^{\infty}f(s)\overline{g(s)}dm(s), \quad f,g\in L^2.\] If we apply \(Q(t)\) on \(f\), we see \[\begin{aligned} (Q(t)f,g) &= \int_{-\infty}^{\infty}f(s+t)\overline{g(s)}dm(s) \\ &= \int_{-\infty}^{\infty}f(u)\overline{g(u-t)}dm(u) \quad (u=s+t) \\ &= (f,Q(t)^{\ast}g)\end{aligned}\] where \(Q(t)^\ast\) is the adjoint of \(Q(t)\), which happens to be a left translation operator with a single step \(t\). Clearly we have \(Q(t)Q(t)^\ast=Q(t)^\ast Q(t)=I\), which indicates that \(Q(t)\) is unitary. Also we can check in a more manual way: \[(Q(t)f,Q(t)g) = \int_{-\infty}^{\infty}f(s+t)\overline{g(s+t)}dm(s) = \int_{-\infty}^{\infty}f(s+t)\overline{g(s+t)}dm(s+t)=(f,g).\] By operator theory, since \(Q(t)\) is unitary and bounded, the spectrum of \(Q(t)\) lies in the unit circle \(S^1\).

Note \(Q(0)=I\) and \[Q(t+u)f(s)=f(s+t+u)=f[(s+t)+u]=Q(u)f(s+t)=Q(t)Q(u)f(s)\] for all \(f \in L^2\), which is to say that \(Q(t+u)=Q(t)Q(u)\). Therefore we say \(\{Q(t)\}\) is a *semigroup*. But what's more important is that it satisfies strong continuity near the origin: \[\lim_{t \to 0}\lVert Q(t)f - f \rVert_2 = 0.\] This is not too hard to verify. It suffices to prove that \[\lim_{t \to 0}\int_{-\infty}^{\infty} |f(s+t)-f(s)|^2dm(s) =0.\] Note \(C_c(\mathbb{R})\) (continuous function with compact support) is dense in \(L^2\), and for \(f \in C_c(\mathbb{R})\), it follows immediately from properties of continuous functions. Next pick \(f \in L^2\). Then for \(\varepsilon>0\) there exists some \(f_1 \in C_c(\mathbb{R})\) such that \(\lVert f-f_1 \rVert_2 < \frac{\varepsilon}{4}\) and \(\lVert f_1(s+t)-f_1(s)\rVert_2<\frac{\varepsilon}{2}\) for \(t\) small enough. If we put \(f_2=f-f_1\) we get \[\begin{aligned} \lVert f(s+t)-f(s) \rVert_2 &\leq \lVert f_1(s+t)-f_1(s) \rVert_2+\lVert f_2(s+t)-f_2(s) \rVert \\ &< \frac{\varepsilon}{2}+2\lVert f_2(s)\rVert < \varepsilon.\end{aligned}\] The limit follows as \(\varepsilon \to 0\).

Recall that the infinitesimal generator of \(Q(t)\) is defined to be \[A=\lim_{t \to 0}\frac{1}{t}[Q(t)-I]\] which is inspired by \(\frac{d}{dt}e^{tA}=A\) (thanks to von Neumann). Note if \(f \in L^2\) is differentiable, then \[Af(s) = \lim_{t \to 0} \frac{f(s+t)-f(s)}{t} = f'(s).\] The infinitesimal generator of \(Q(t)\) being differentiation operator is quite intuitive. But we need to clarify it in \(L^2\) which is much larger. So what is the domain \(D(A)\)? We don't know yet but we can guess. When talking about differentiation in \(L^p\) space, it makes sense to extend our differentiation to absolute continuity. Also we need to make sure that \(Af \in L^2\), hence we put \[D=\{f\in L^2:f \text{ absolutely continuous, }f' \in L^2\}.\] For every \(x \in D(A)\) and any fixed \(t\) we already have \[\frac{d}{dt}Q(t)f(s)=f'(s+t)=Af(s+t)\] hence \(Af=f'\) for every \(x \in D(A)\) and it follows that \(D(A) \subset D\). In fact, \(A\) is the restriction of the differential operator on \(D(A)\). Conversely, By Hille-Yosida theorem, we see \(1 \in \rho(A)\) and also one can show that \(1 \in \rho(\frac{d}{dx})\). Therefore \[(I-\frac{d}{dx})D(A)=(I-A)D(A)=L^2.\] But we also have \[D=(I-\frac{d}{dx})^{-1}L^2.\] Thus \[D = \left(1-\frac{d}{dx}\right)^{-1}\left(1-\frac{d}{dx}\right)D(A)=D(A).\] The fact that \((I-\frac{d}{dx})D=L^2\) can be realised by the equation \(f-f'=g\), where the existence of solution can be proved using Fourier transform. Note \(\hat{f'}(y)=iy\hat{f}(y)\), with some knowledge of distribution, the result can also be given by \[D(A)=\left\{f\in L^2:\int_{-\infty}^{\infty}|y\hat{f}(y)|^2dy<\infty\right\}.\]

By the Hille-Yosida theorem, the half plane \(\{z:\Re z>0\} \subset \rho(A)\). But we can give a more precise result of it.

Pick any \(f \in D(A)\). It is directly verified that \[(A-\lambda{I})f = f'-\lambda{f}.\] Put \(g=(A-\lambda{I})f\) then \[\hat{g}(y)=iy\hat{f}(y)-\lambda{\hat{f}(y)}.\] Therefore \[\hat{f}(y) = \frac{\hat{g}(y)}{iy-\lambda} \in L^2.\] Conversely, suppose \(h(y)=\frac{\hat{g}(y)}{iy-\lambda} \in L^2\), then \(\hat{g}(y)=iyh(y)-\lambda{h}(y)\). If we take its Fourier inverse, we see \(g \in R(A-\lambda{I})\).

If \(g \in L^2\), then clearly \(\hat{g} \in L^2\). It remains to discuss \(\hat{g}(y)/(iy-\lambda)\). Note \(iy\) is on the imaginary axis, hence if \(\lambda\) is not purely imaginary, then \(\hat{g}(y)/(iy-\lambda) \in L^2\). If \(\lambda\) is purely imaginary however, then we may have \(\hat{g}(y)/(iy-\lambda)\not\in L^2\). For example, we can take \(\hat{g}=\chi_{[s-1,s+1]}\) where \(\lambda = is\). Hence if \(\lambda\) is purely imaginary, \(R(A-{\lambda}I)\) is a proper subspace of \(L^2\). Therefore we conclude: \[\sigma(A)= \{z \in \mathbb{C}:\Re z = 0\}.\] *This is an exercise on W. Rudin's Functional Analysis. You can find related theorems in Chapter 13.*

Guided by researches in function theory, operator theorists gave the analogue to quasi-analytic classes. Let \(A\) be an operator in a Banach space \(X\). \(A\) is not necessarily bounded hence the domain \(D(A)\) is not necessarily to be the whole space. We say \(x \in X\) is a \(C^\infty\) vector if \(x \in \bigcap_{n \geq 1}D(A^n)\). This is quite intuitive if we consider the differential operator. A vector is analytic if the series \[\sum_{n=0}^{\infty}\lVert{A^n x}\rVert\frac{t^n}{n!}\] has a positive radius of convergence. Finally, we say \(x\) is quasi-analytic for \(A\) provided that \[\sum_{n=0}^{\infty}\left(\frac{1}{\lVert A^n x \rVert}\right)^{1/n} = \infty\] or equivalently its nondecreasing majorant. Interestingly, if \(A\) is symmetric, then \(\lVert{A^nx}\rVert\) is log convex.

Based on the density of quasi-analytic vectors, we have an interesting result.

(Theorem)Let \(A\) be a symmetric operator in a Hilbert space \(\mathscr{H}\). If the set of quasi-analytic vectors spans a dense subset, then \(A\) is essentially self-adjoint.

This theorem can be considered as a corollary to the fundamental theorem of quasi-analytic classes, by applying suitable Banach space techniques in lieu.

For a positive sequence \(\{a_n\}\), we see it is the moment of a positive measure \(\mu\), i.e. \(a_n = \int_\mathbb{R}t^n d\mu(t)\) if and only if it is positively definite (proof). But the uniqueness is not guaranteed. Here we have a sufficient condition for this - using the concept of the quasi-analytic vector. This is an old theorem (1922) but we are using operator theory to prove it which appeared decades later.

(Carleman's condition)Suppose \(\{a_n\}\) is the moment sequence of a positive measure \(\mu\) on \(\mathbb{R}\), then \(\mu\) is uniquely determined provided that \(\sum a_{2n}^{-1/2n}=\infty\).

**Proof.** Consider the Hilbert space \[\mathscr{H}= L^2(\mathbb{R},\gamma)\] and the operator \[ A:f(t) \mapsto tf(t).\] It is clear that \(A\) is self-adjoint. We shall work on the constant function \(u(t) \equiv 1 \in \mathscr{H}\). Since \(A^nu = t^n\), we see \(u \in C^\infty\), otherwise, \(a_n\) is not defined. On the other hand, we have \[ (A^n u, u) = a_n \implies (A^{2n} u,u) = \lVert A^n u \rVert^2 = |(A^n u, u)|^2 = a_{2n}.\] But \(a_{2n}^{-1/2n}=\lVert A^n u \rVert^{-1/n}\) and as a result, we see \(\sum a_{2n}^{-1/2n}= \sum \lVert A^n u \rVert^{-1/n} = \infty\), hence \(u\) is quasi-analytic. In general, \(t^n = A^n u\) is quasi-analytic for all \(n \geq 0\). Consider the space of polynomial \(\mathcal{P}[t]\) with closure \(\mathscr{H}_1\). It follows from the theorem above that \(A_1 = A|_{\mathcal{P}[t]}\) is essentially self-adjoint in \(\mathscr{H}_1\). Hence \(\mathscr{H}_1\) is invariant under the one-parameter group \(e^{iAs}\). Pick \(y \in \mathcal{P}[t]^{\perp}\), we see \[(y,e^{iAs}u) = \int_\mathbb{R}e^{-ist}y(t)d\gamma(t) = 0,\] which implies that \(y = 0\) a.e. [\(\gamma\)]. It follows that \(\mathscr{H}_1 = \mathscr{H}\) or equivalently \(\mathcal{P}[t]\) is dense in \(\mathscr{H}\). Suppose now we have another generating measure \(\nu\) of \(\{a_n\}\). With respect to \(\nu\), \(\mathcal{P}[t]\) is still a dense space. But the norm on \(\mathcal{P}[t]\) is fixed by \(\{a_n\}\), hence we obtain an isometry between \(\mathcal{P}[t]_\gamma\) and \(\mathcal{P}[t]_\nu\), which extends to the isometry between \(L^2(\mathbb{R},\gamma)\) and \(L^2(\mathbb{R},\nu)\) which forces \(\gamma\) and \(\nu\) to be equal. \(\blacksquare\)

There are a lot of nice properties of analytic functions, whose class is denoted by \(C^\omega\). Formally we have the following definition:

If \(f \in C^\omega\) and \(x_0 \in \mathbb{R}\), one can write \[f = a_0+a_1(x-x_0)+a_2(x-x_0)^2+\cdots.\]

Obviously \(f \in C^\infty\) (and hence \(C^\omega \subset C^\infty\)) and alternatively we have the Taylor series converges to \(f\) for any \(x_0 \in \mathbb{R}\): \[T(x) = \sum_{n=0}^{\infty}\frac{D^nf(x_0)}{n!}(x-x_0)^n.\] One interesting thing is, every \(f \in C^\omega\) is uniquely determined by a sequence \(D^0f(x_0), Df(x_0),D^2f(x_0),\cdots\).

Unfortunately, this property is not generally true on \(C^\infty\). For example, we can consider the bump function \(\varphi\) (a simple example can be found on wikipedia). In brief, \(\varphi=0\) for all \(x \in (-\infty,-1] \cup [1,+\infty)\) but \(\varphi>0\) on \((-1,1)\). And more importantly, \(\varphi \in C^\infty\). However, if we take \(f = \varphi\) and \(g = 2\varphi\), then \(f \neq g\), but \(D^nf(-2)=D^ng(-2)=0\) for all \(n \geq 0\). We get a sequence of derivatives of different orders, but this sequence does not determine a unique \(C^\infty\) function.

The term "uniquely determined" can also be described in an alternative way: If \(f \in C^\omega\) and \(D^k(x_0)=0\) for all \(k \geq 0\), then \(f=0\) everywhere.

So a question comes up naturally: how many functions can be determined by its derivatives of all orders? Does \(C^\omega\) contain all we can get? If not, how can we describe them?

The class of analytics functions is our source of motivation, so it makes sense to dig into its properties to find more. For an analytic function it is natural to consider the restriction of a holomorphic function on the complex plane. Let \(\Omega\) be the set of all \(z=x+iy\) such that \(|y| < \delta\) and suppose \(f \in H(\Omega)\) and \(|f(z)|<\beta\) for all \(z \in \Omega\). By Cauchy's Estimate, we get \[|D^n f(x)| \leq \beta \delta^{-n}n!\quad n \in \mathbb{N},x\in \mathbb{R}.\] Also the restriction of \(f\) on \(\mathbb{R}\) is real-analytic. Here comes the interesting part: \(\beta\) and \(\frac{1}{\delta}\) is determined only by \(f\) and have nothing to do with \(n\), meanwhile \(n!\) is a special sequence that dominated \(f\) to some extent.

This motivates us to define a special class of functions, which is called the class \(C\{M_n\}\).

Let \(\{M_n\}\) be a sequence of positive numbers, we let \(C\{M_n\}\) denote the class of all \(f \in C^\infty\) such that \[\lVert D^nf\rVert_\infty \leq \beta_f B^n_f M_n,\] where \(\lVert \cdot \rVert_\infty\) is the supremum norm defined on \(\mathbb{R}\), and \(\beta_f,B_f\) are constants only determined by \(f\) but not \(n\).

In order to equip \(C\{M_n\}\) with some satisfying algebraic structures, which can simplify our work, we need some restrictions.

Indeed, \(B_f\) plays an much more important rule, since we have \[\limsup_{n \to \infty}\left(\frac{\lVert D^n f\rVert_\infty}{M_n}\right)^{1/n} \leq B_f\] while \(\beta_f\) was eliminated to \(1\) in this limit. However, if we eliminate \(\beta_f\) at the beginning, i.e. put \(\beta_f = 1\) for all \(f \in C\{M_n\}\), then when \(n=0\), we have \[\lVert f \rVert_\infty \leq M_0,\] which prevents \(C\{M_n\}\) to be a vector space. For example, if \(\lVert f \rVert_\infty = M_0\), then \(\lVert 2f \rVert_\infty = 2M_0 > M_0\), hence \(2f \not\in C\{M_n\}\). However, if we add \(\beta_f\) no matter what, say \(\lVert f \rVert_\infty \leq \beta_f M_0\), then whenever we do addition and scalar multiplication, there is a different constant with respect to the function, which makes sure that \(C\{M_n\}\) is closed under addition and scalar multiplication, i.e. is a vector space. If we don't add such a constant, our class contains way too few functions.

Further, we have some restriction on the sequence \(\{M_n\}\):

- \(M_0=1\).
- \(M_n^2 \leq M_{n-1}M_{n+1}\) (\(\{\log M_n\}\) is a convex sequence).

As we will see soon, this makes \(C\{M_n\}\) an algebra over \(\mathbb{R}\), where multiplication is defined pointwise.

*Proof.* If \(f,g \in C\{M_n\}\), then we need to show that \(fg \in C\{M_n\}\). We have the product rule for differentiation: \[D^n(fg) = \sum_{j=0}^{n}{n \choose k}(D^jf)(D^{n-j}g).\] Since \(f,g \in C\{M_n\}\), we have \[|D^n(fg)| \leq \sum_{j=0}^{n}{n \choose k}\beta_fB_f^jM_j\beta_gB_g^{n-j}M_{n-j} = \beta_f\beta_g\sum_{j=0}^{n}{n \choose k}B_f^jB_g^{n-j}M_jM_{n-j}.\] Of course we want to eliminate \(M_jM_{n-j}\) to obtain a binomial expansion. To do this we need the convexity of the sequence \(\{\log M_n\}\). Note \(M_n^2 \leq M_{n-1}M_{n+1}\) implies \[\log M_n - \log M_{n-1} \leq \log M_{n+1} - \log M_n.\] As a result, the line segment connecting \((n,\log M_n)\) and \((n-1,\log M_{n-1})\) is steeper and steeper as \(n\) grows. By connecting these points, we actually gets a convex function but we will be more rigorous. For \(0 < j < n\), we have \[\begin{aligned}\log M_n - \log M_j &= \sum_{k=j+1}^{n}\left(\log M_k - \log M_{k-1}\right) \\&\geq \sum_{k = j}^{n-1}\left(\log M_{k} - \log M_{k-1}\right) \\&\geq \sum_{k=1}^{n-j}(\log M_k - \log M_{k-1}) \quad\text{(note $\log M_0=0$)} \\&= \log M_{n-j}.\end{aligned}\] Hence \(M_n \geq M_jM_{n-j}\) for \(0<j<n\). It also hold when \(j=0\) or \(j=n\), hence we get \[|D^n(fg)|= \beta_f\beta_g\sum_{j=0}^{n}{n \choose k}B_f^jB_g^{n-j}M_jM_{n-j} \leq \beta_f\beta_g\sum_{j=0}^{n}{n \choose k}B_f^jB_g^{n-j}M_n = \beta_f\beta_g(B_f+B_g)^nM_n.\] Hence \(fg \in C\{M_n\}\). The reason why \(C\{M_n\}\) is a vector space has been stated already. \(\square\)

This restriction does not hurt the generality. In fact whenever we are given a positive sequence \(\{M_n\}\), we have another sequence \(\{M'_n\}\) satisfying the two restrictions such that \(C\{M_n\}=C\{M'_n\}\).

A class \(C\{M_n\}\) is said to be quasi-analytic if the condition \[f \in C\{M_n\},\quad (D^nf)(0)=0 \] for all \(n \in \mathbb{N}\) implies that \(f = 0\) for all \(x \in \mathbb{R}\).

The reason we try to check whether it's equal to \(0\) everywhere, instead of check whether it is 'uniquely determined' by a sequence of derivative of different order is, this one is much simpler to work with. If a sequence of derivative of different order determines two functions, then their difference is always \(0\).

We have seen that \(C\{n!\}\) contains all functions which is a restriction of a holomorphic function in the strip defined by \(|\Im(z)|<\delta\). Conversely, we show that any function in \(C\{n!\}\) defined on the real axis can be extended to a holomorphic function with the same property. As a result, \(C\{n!\}\) is a quasi-analytics class (which contains all bounded function of \(C^\omega\)). If we only consider functions defined on a closed and bounded interval \([a,b]\), then \(C\{n!\}\) is exactly \(C^\omega\).

Suppose \(f \in C\{n!\}\). First of all we have \[\lVert D^nf \rVert_\infty \leq \beta B^nn!\] for \(n \in \mathbb{N}\). By Taylor's formulae \[f(x) = \sum_{j=0}^{n-1}\frac{D^jf(a)}{j!}+\frac{1}{(n-1)!}\int_a^x(x-t)^{n-1}D^nf(t)dt.\] The remainder is therefore dominated by \[\frac{n!}{(n-1)!}\beta B^n\left\vert\int_a^x(x-t)^{n-1}dt\right\vert = \beta|B(x-a)|^n.\] If \(|B(x-a)|<1\), then \(\lim_{n \to \infty}|B(x-a)|^n = 0\), and we can safely write the expansion \[f(x) = \sum_{n=0}^{\infty}\frac{D^nf(a)}{n!}(x-a)^n.\] Pick \(0<\delta<\frac{1}{B}\), we can replace \(x\) in the expansion above with \(z\) such that \(|z-a|<\delta\). This defines a holomorphic function \(F_a\) on \(D(a,\delta)\) (the open disk centred at \(a\) with radius \(\delta\)). If \(x \in D(a,\delta)\) is real, then \(F_a(x)=f(x)\). Therefore \(F_a\) is the analytic continuation of \(f\); all \(F_a\) form a holomorphic extension \(F\) of \(f\) in the strip \(|\Im(z)|<\delta\). As a result, for \(z = a+iy\) with \(|y|<\delta\), we have \[|F(z)|=|F_a(z)| = \left\vert\sum_{n=0}^{\infty}\frac{D^nf(a)}{n!}(iy)^n\right\vert \leq \beta \sum_{n=0}^{\infty}(B\delta)^n = \frac{1}{1-B\delta}\] Hence \(F\) is bounded in such a region.

In general, if \(M_n \to \infty\) way too fast (at least faster than \(n!\)) as \(n \to \infty\), then \(C\{M_n\}\) is quasi-analytic. There are several equivalent statements on whether \(C\{M_n\}\) is a quasi-analytic class, which is given by the Denjoy-Carleman theorem. Here I collect all conditions that I have found:

(Denjoy-Carleman theorem)The following conditions are equivalent:

- \(C\{M_n\}\) is not quasi-analytic.
- \(\int_0^\infty \log Q(x)\frac{dx}{1+x^2}<\infty\), where \(Q(x)=\sum_{n=0}^{\infty}\frac{x^n}{M_n}\).
- \(\int_0^\infty \log q(x) \frac{dx}{1+x^2}<\infty\), where \(q(x) = \sup \frac{x^n}{M_n}\).
- \(\sum_{n=1}^{\infty}\left(\frac{1}{M_n}\right)^{1/n}<\infty\).
- \(\sum_{n=1}^{\infty}\frac{M_{n-1}}{M_n}<\infty\)
- \(C\{M_n\}\) contains nontrivial function with compact support.
- \(\sum_{n=1}^{\infty}\frac{1}{\lambda_n}<\infty\) where \(\lambda_n = \inf_{k \geq n}M_k^{\frac{1}{k}}\).

You may find condition 7 is ridiculous. In fact, in this condition \(\{M_n\}\) is not required to satisfy the two restriction. This one is what Denjoy and Carleman found initially. Later, mathematicians find that for a sequence \(\{M_n\}\) we can obtain its convex minorant ${M_n'} $ such that

- \(M_n \geq M_n'\) for all \(n\).
- \(\{\log M_n'\}\) is convex.
- There is a sequence \(0=n_0<n_1<\cdots\) such that \(M_{n_0} = M'_{n_0}\) and \(\log M_k\) is linear for \(n_i \leq k \leq n_{i+1}\).

And as you may guess, the convex minorant \(\{M_n'\}\) is what we are using today.

The proof of the Denjoy-Carleman theorem will come out in my next blog post. There are quite a lot of work to do to finish the proof, and it cannot be done within hours. We will be using many complex analysis theories. Also, I will try to cover some extra properties of quasi-analytic classes as well as why convex minorant is sufficient.

]]>This post is still on progress, neither is it finished nor polished properly. For the coming days there will be new contents, untill this line is deleted. What I'm planning to add at this moment:

- Transpose is not just about changing indices of its components.
- Norm and topology in vector spaces
- Representing groups using matrices

Since the background of the reader varies a lot, I will try to organise contents depending on topic and required background. For the following section, you are assumed to be familiar with basic abstract algebra terminologies, for example, group, ring, fields.

When learning linear algebra, we were always thinking about real or complex vectors, matrices. This makes sense because \(\mathbb{R}\) and \(\mathbb{C}\) are the closest number **fields** to our real life. But we should not have the stereotype that linear algebra is all about real and complex spaces, or properties of \(\mathbb{R}^n\) and \(\mathbb{C}^n\). Never has there been such an restriction. In fact, \(\mathbb{R}\) and \(\mathbb{C}\) can be replaced with any field \(\mathbb{F}\), and there are vast differences depending on the properties of \(\mathbb{F}\).

There are already some differences about linear algebra over \(\mathbb{R}\) and \(\mathbb{C}\). Since \(\mathbb{C}\) is algebraically closed, that is, all polynomials of order \(n \geq 1\) have \(n\) roots, dealing with eigen functions has been much 'safer'. Besides, for example, we can diagnoalise the matrix \[A = \begin{pmatrix}-1& -1 \\ 2 & 1 \end{pmatrix}\] in \(\mathbb{C}\) but not in \(\mathbb{R}\).

When \(\mathbb{F}\) above is finite, there are a lot more interesting things. It's not just saying, \(\mathbb{F}\) is a field, and is finite. For example, if \(\mathbb{F}=\mathbb{R}\), we have \[\begin{pmatrix}1&0&2 \\2&3&1 \\1&4&0\end{pmatrix}^{-1}=\begin{pmatrix}-\frac{2}{3} & -\frac{4}{3} & -1 \\\frac{1}{6}&-\frac{1}{3}&\frac{1}{2} \\\frac{5}{6}&-\frac{2}{3}&\frac{1}{2}\end{pmatrix}.\] There shouldn't be any problem. However, on the other hand, if \(\mathbb{F}=\mathbb{Z}_5\), we have \[\begin{pmatrix}1&0&2 \\2&3&1 \\1&4&0\end{pmatrix}^{-1}=\begin{pmatrix}1&3&4 \\1&3&3\\0&1&3\end{pmatrix}.\] In application, when working on applied algebra, it's quite often to meet finite fields. What if we want to solve linear equation over a finite field? That's when linear algebra over finite fields comes in. Realise this before it's late! By the way, we are working on rings in lieu of fields, we find ourselves in module theory.

The set of all invertible \(n \times n\) matrices forms a multiplicative group (and you should have no problem verifying this). The notation won't go further than \(GL(n)\), \(GL(n,\mathbb{F})\), \(GL_n(\mathbb{F})\) or simply \(GL_n\). The set of all orthomormal matrices, which is also a multiplicative group and written as \(O(n)\), is obviously subgroup of \(GL(n)\) since for all \(A \in O(n)\), we have \(\det{A} = \pm 1 \neq 0\) all the time. \(O(n)\) contains \(SO(n)\) as a subgroup, whose elements have determinant \(1\). One should not mess up with \(SO(n)\) and \(SL(n)\) which is the group of all matrices of determinant \(1\). In fact \(SO(n)\) is a proper subset of \(SL(n)\) and \(SL(n) \cap O(n) = SO(n)\). In general we have \[SO(n) \subset SL(n) \subset GL(n), \\SO(n) \subset O(n) \subset GL(n).\] Now we consider a more detailed group structure between \(GL(n)\) and \(O(n)\). I met the following problem on a differential topology book and was about fibre and structure group. But for now it's simply a linear algebra problem. The crux is finding the 'square root' of a positive defined matrix.

There is a direct product decomposition \[GL(n,\mathbb{R})=O(n) \times \{\text{positive definite symmetric matrices}\}.\]

This decomposition is pretty intuitive. For example if a matrix \(A \subset GL(n,\mathbb{R})\) has determinant \(a\), we may be looking for a positive definite matrix of determinant \(|a|\), and another matrix of determinant \(\frac{a}{|a|}\), which is expected to be orthonormal as well. We can consider \(O(n)\) as a rotation of basis (change the direction), and the positive definite symmetric matrix as scaling (change the size). Similar result hold if we change the order of multipication. It worth mentioning that by direct product we mean it's up to the order of eigenvalues.

**Proof.** For any invertible matrix \(A\), we see \(AA^T\) is positive definite and symmetric. Therefore there exists some \(P \in O(n)\) such that \[P^T AA^TP = \operatorname{diag}(\lambda_1,\lambda_2,\cdots,\lambda_n).\] We assume that \(\lambda_1\leq \lambda_2 \leq \cdots \leq \lambda_n\) to preserve uniqueness. Note \(\lambda_k>0\) for all \(1 \leq k \leq n\) since \(AA^T\) is positive definite. We write \(\Lambda=\operatorname{diag}(\sqrt\lambda_1,\sqrt\lambda_2,\cdots,\sqrt\lambda_n)\) which gives \[AA^T = P\Lambda^2P^T.\] Define the square root \(B=\sqrt{AA^T}=\sqrt{A^TA}\) by \[B = P\Lambda P^T.\] Then \(B^2=P\Lambda P^T P \Lambda P^T = AA^T\). Note \(B\) is also a positive definite symmetric matrix and is unique for given \(A\). Let \(v_1,v_2,\cdots,v_n\) be the orthonormal and linear independent eigenvectors of \(B\) with respect to \(\sqrt\lambda_1, \sqrt\lambda_2, \cdots, \sqrt\lambda_n\). We first take a look at the following basis: \[e_1=\frac{1}{\sqrt{\lambda_1}}Av_1,e_2=\frac{1}{\sqrt{\lambda_2}}Av_2,\cdots,e_n=\frac{1}{\sqrt{\lambda_n}}Av_n.\] Note \[\left(\frac{1}{\sqrt{\lambda_i}}Av_i\right)^{T}\left(\frac{1}{\sqrt{\lambda_j}}Av_j\right)=\frac{1}{\sqrt{\lambda_i\lambda_j}}v_i^TA^TAv_j=\frac{1}{\sqrt{\lambda_i\lambda_j}}v_i^TB^2v_j=\frac{\sqrt{\lambda_j}}{\sqrt{\lambda_i}}v_i^Tv_j.\] So if the value above is \(1\) if \(i = j\) and \(0\) if \(i \neq j\). \(\{e_1,e_2,\cdots,e_n\}\) is a basis since \(A\) is invertible, and later we know it is orthonormal.

Then we take \[U = (e_1,e_2,\cdots,e_n)\begin{pmatrix}v_1^T \\v_2^T \\\vdots \\v_n^T\end{pmatrix}\] We see \[UU^T = (e_1,e_2,\cdots,e_n)\begin{pmatrix}v_1^T \\v_2^T \\\vdots \\v_n^T\end{pmatrix}(v_1,v_2,\cdots,v_n)\begin{pmatrix}e_1^T \\e_2^T \\\vdots \\e_n^T\end{pmatrix} = I = U^TU\] since both \(\{e_1,e_2,\cdots,e_n\}\) and \(\{v_1,v_2,\cdots,v_n\}\) are orthonormal. On the other hand, we need to prove that \(A=UB\). First of all, \[Uv_k =(e_1,e_2,\cdots,e_n)\begin{pmatrix}v_1^T \\v_2^T \\\vdots \\v_n^T.\end{pmatrix} v_k = (e_1,e_2,\cdots,e_n)\begin{pmatrix}0\\\vdots \\v_k^Tv_k \\\vdots\end{pmatrix} = e_k.\] (Note we used the fact that \(\{v_k\}\) are orthonormal.) This yields \[UPv_k = U \sqrt\lambda_kv_k = \sqrt\lambda_ke_k=\frac{\sqrt\lambda_k}{\sqrt\lambda_k}Av_k=Av_k.\]

Therefore \(A=UB\) holds on a set of basis, therefore holds on \(\mathbb{R}^n\). This gives the desired conclusion. For any invertible \(n \times n\) matrix \(A\) we have a unique decomposition \[A = UB\] where \(U \in O(n)\) and \(B\) is a positive definitive symmetric matrix. \(\square\)

Basis of a vector space is not coming from nowhere. The statement that all vector spaces have a basis is derived from axiom of choice and the fact that all non-zero elements in a field is invertible. I have written an article proving this already, see here (this is relatively advanced). On the other hand, since elements of a ring are not necessarily invertible, modules over a ring are not equipped with basis in general.

It is also worth mentioning that, a vector space of finite dimension is not necessarily of finite dimension. Infinite dimensional vector space is not some fancy thing. It's quite simple: the set of basis is not finite. It can be countable or uncountable. And there is a pretty straightforward example: the set of all continuous functions \(f:\mathbb{R} \to \mathbb{R}\).

One of the most important concepts developed in 20th century is, when studying a set, one can study functions defined on it. For example, let's consider \([0,1]\) and \((0,1)\). If we consider the set of all continuous functions on \([0,1]\), which is written as \(C([0,1])\), we see everything is fine. It's fine to define norm on it, to define distance on it, and the norm and distance are complete. However, things are messy on \(C((0,1))\). Defining a norm on it results in abnormal behaviour. If you are interested you can check here.

Now let's consider the unit circle \(S^1\) on the plane. The real continuous functions defined on \(S^1\) can be considered as periodic functions defined on \(\mathbb{R}\). So we may have a lot to do with it. If we are interested in the torus (the picture below is from wikipedia),

which is homeomorphic to \(S^1 \times S^1\), how can we study the functions on it? We may consider \(C(S^1) \times C(S^1)\), but as we will show later, there are some problems about that. Anyways, it makes sense to define 'product' from two vector spaces, which can 'expand' it.

Let's review direct sum and direct product first. For the direct product of \(A\) and \(B\), we ask for a algebraic structure on the Cartesian product \(A \times B\). For example, \((a,b)+(a',b')=(a+a',b+b')\). That is, the operation is defined componentwise. This works fine for groups since for each group there is only one binary operation. But at this point we don't care about scalar multiplication.

There are two types of direct sum, inner and outer. For a vector space \(V\) over a field \(\mathbb{F}\), we consider two (or even more) subspaces \(W\) and \(W'\). We have a 'bigger' subspace generated by adding \(W\) and \(W'\) together, namely \(W+W'\), which contains all elements of the form \(w+w'\) where \(w \in W\) and \(w' \in W'\). The representation is not guaranteed to be unique. That is, for \(z=w+w'\), we may have \(w_1 \in W\) and \(w_1' \in W'\) such that \(z=w_1+w_1'\) but \(w \neq w_1'\). This would be weird. Fortunately, the representation is unique if and only if \(W \cap W'\) is trivial. In this case we say the sum of \(W\) and \(W'\) is direct, and write \(W \bigoplus W'\). This is inner direct sum.

Can we represent the direct sum using an ordered pair? Of course we can. Elements in \(W \bigoplus W'\) can be written in the form \((w,w') \in W \times W'\), and the addition is defined componentwise. That is, \((w,w')+(w_1,w_1')=(w+w_1,w'+w_1')\) (which is in fact \((w+w')+(w_1+w_1')=(w+w_1)+(w'+w_1')\)). It seems that we don't go further than direct product. However we need to consider the scalar product. For \(\alpha \in \mathbb{F}\), we have \(\alpha(w,w') = (\alpha{w},\alpha{w'})\) this is because \(\alpha(w+w')=\alpha{w}+\alpha{w'}\). We call this **inner** direct sum because \(W\) and \(W'\) are *inside* \(V\). One may ask, since \(w+w'=w'+w\), why the pair is ordered? For \(w+w'\) we have the first one to be an element of \(W\) and the second one to be \(W'\) but for \(w'+w\) we can't.

Outer direct sum is different. To define this one considers two *arbitrary* vector spaces \(W\) and \(V\) over \(\mathbb{F}\). It is not guaranteed that \(W\) and \(V\) are both subspaces of a bigger vector space. For example it's legit to take \(W\) to be \(\mathbb{R}\) over itself and \(V\) to be all real functions. \(W \bigoplus V\) is defined to be the set of all ordered pairs \((w,v)\) with \(w \in W\) and \(v \in V\). The addition is defined componentwise, and scalar multiplication is defined to be \(\alpha(w,v)=(\alpha{w},\alpha{v})\). One may also write \(w+v\) if context is clear.

When the number of vector spaces is finite, we don't distinguish between direct product and direct sum. When the index is infinite, for example when we consider \(\prod_{i=1}^{\infty}X_i\) and \(\bigoplus_{i=1}^{\infty}X_i\), things are different. To be precise, in the language of category theory, direct product is the *product*, and direct sum is the *coproduct*.

We are not touching the definition but first of all let's imagine what we have for multiplication. Let \(W\) and \(V\) be two vector spaces over \(\mathbb{F}\) and we use \(\cdot\) to be the multiplication for the time being. Law of distribution should hold, that is, we have \(w \cdot v + w' \cdot v = (w+w') \cdot v\) and \(w \cdot v + w \cdot v' = w \cdot (v+v')\). On the other hand, scalar multiplication should be operated on a single component, that is, \(\alpha(w \cdot v)=(\alpha w) \cdot v = w \cdot (\alpha v)\).

It seems illegal to use \(\cdot\) so let's use ordered pair. Under these laws, we have \[(w+w',v)=(w,v)+(w',v) \quad (w,v+v')=(w,v)+(w,v'), \\\alpha(w,v)=(\alpha w,v) = (w,\alpha v).\] It makes sense to call it 'bilinear'. Fixing one component, we have a linear transform. However, direct product and direct product do not work here at all. If it would work, we have \((w,v)+(w',v)=(w+w',v+v)\). This gives rise to the tensor product: we need a legit multiplication works on vector and vector.

We have got the spirit of tensor product. A direct product is not OK. There has to be bilinear operation on itself no matter what. For two vector spaces \(V\) and \(W\), we write the tensor product by \(V \bigotimes W\), for \(v \in V\) and \(w \in W\), we denote its tensor product by \(v \otimes w\), which can be considered as a image or value of a bilinear function \(\varphi(\cdot,\cdot):V \times W \to V \bigotimes W\). There are many bilinear map with domain \(V \times W\). We ask the tensor product to be the essential one.

The

tensor product\(V \bigotimes W\) of \(V\) and \(W\), is the vector space having the following properties.

There exists the canonical bilinear map \(\varphi(\cdot,\cdot):V \times W \to V \otimes W\), and we write \(\varphi(v,w) = v \otimes w \in V \bigotimes W\).

For any bilinear map \(h(\cdot,\cdot):V \times W \to U\), there exists a unique linear map \[\lambda:V \otimes W \to U\] such that \(\lambda(\varphi(v,w)) = h(v,w)\) for all \((v,w) \in V \times W\). This is called the

universal propertyof \(V \bigotimes W\).

It can be easily verified that, if \(V\) and \(W\) have two tensor products, then they are isomorphic (hint: use the universal property). So all tensor products of \(V\) and \(W\) are isomorphic, we only need to pick the obvious one (as long as it exists). But we don't have too much space for it. For further study I recommend the following documents:

- Definition and properties of tensor products. This one involves a considerable amount of explicit calculation and is of elementary approach.
- Tensor products and bases. This one proves the existence in an abstract way.
- Tensor Product as a Universal Object (Category Theory & Module Theory). One of my recent blog posts. The topics here are relatively advanced, and I don't think it's a good idea to use the language of category theory at this early point.

Let \(\mathbb{F}\) be any field (it can be replaced with a commutative ring if you want to), and \(E,F\) be two modules over \(\mathbb{F}\). We will have a glance at the definition of dual space and more importantly, we see what is a transpose. In general we study the bilinear form \[f:E \times F \to \mathbb{F}.\]

Sometimes for simplicity we also write \(f(x,y)=\langle x,y \rangle\). The set of all bilinear forms of \(E \times F\) into \(\mathbb{F}\) will be denoted by \(L^2(E,F;\mathbb{F})\) and you may have seen it earlier.

We define the **kernel** of \(f\) on the left to be \(F^\perp\) and on the right to be \(E^\perp\). Recall that for \(S \subset E\), \(S^\perp\) consists all \(y\) such that \(f(x,y)=0\) whenever \(x \in S\); similarly, for \(T \subset F\), \(T^\perp\) consists all \(x\) such that \(f(x,y)=0\) whenever \(y \in T\). Respectively, we say \(f\) is **non-degenerate** on the left/right if the kernel on the left/right is trivial.

One of the simplest example is the case when \(E=\mathbb{F}^m\) and \(F=\mathbb{F}^n\). We take a \(m \times n\) matrix \(A\) over \(\mathbb{F}\). Define \(f(x,y) = x^T A y\). This is a classic bilinear form. Whether it is non-degenerate on the left or on the right depends on the linear independency of row vectors and column vectors. \(\def\opn{\operatorname}\)

The bilinear form \(f\) gives rise to a homomorphism of \(E\) to a 'space of essential arrows': \[\varphi_f:E \to \opn{Hom}_\mathbb{F}(F,\mathbb{F})\] given by \[\varphi_f(x)(y) = f(x,y)=\langle x, y \rangle.\] \(\opn{Hom}_\mathbb{F}(F,\mathbb{F})\) contains all linear maps of \(F\) into \(\mathbb{F}\). One can imagine \(\opn{Hom}_\mathbb{F}(F,\mathbb{F})\) to be a set of 'arrows' from \(F\) to \(\mathbb{F}\).

Now let's see what we can do in analysis and topology.

Let's consider all complex polynomials of order \(\leq 5\). This is a complex vector space and is in fact isomorphic to \(\mathbb{C}^6\) since we have a bijection mapping \(a_0+a_1z+a_2z^2+a_3z^3+a_4z^4+a_5z^5\) to \((a_0,a_1,a_2,a_3,a_4,a_5)^T\). Therefore we can simply use matrix and vectors. We represent differentiation via matrices. This is a straightforward work. We pick the natural basis \(\{1,z,z^2,z^3,z^4,z^5\}\) to begin with and write the differentiation as \(\mathscr{D}\). Since \(\def\ms{\mathscr}\) \[\begin{aligned}\ms{D}(1)&=0 &\quad\ms{D}(z)&=1 \\\ms{D}(z^2)&=2z &\quad\ms{D}(z^3)&=3z^2 \\\ms{D}(z^4)&=4z^3 &\quad\ms{D}(z^5)&=5z^4\end{aligned}\]

We get a matrix corresponding to \(\ms{D}\) by \[D=\begin{pmatrix}0&1&0&0&0&0 \\0&0&2&0&0&0 \\0&0&0&3&0&0 \\0&0&0&0&4&0 \\0&0&0&0&0&5 \\0&0&0&0&0&0\end{pmatrix}\] Next we try to obtain the Jordan normal form of \(D\). Since the minimal polynomial of \(D\) is merely \(m(\lambda)=\lambda^6\), we cannot diagonalise it. After some computation we get \[D =SJS^{-1}= \begin{pmatrix}1&0&0&0&0&0 \\0&1&0&0&0&0 \\0&0&\frac{1}{2}&0&0&0 \\0&0&0&\frac{1}{6}&0&0 \\0&0&0&0&\frac{1}{24}&0 \\0&0&0&0&0&\frac{1}{120}\end{pmatrix}\begin{bmatrix}0&1&0&0&0&0 \\0&0&1&0&0&0 \\0&0&0&1&0&0 \\0&0&0&0&1&0 \\0&0&0&0&0&1 \\0&0&0&0&0&0\end{bmatrix}\begin{pmatrix}1&0&0&0&0&0 \\0&1&0&0&0&0 \\0&0&2&0&0&0 \\0&0&0&6&0&0 \\0&0&0&0&24&0 \\0&0&0&0&0&120\end{pmatrix}\] where the matrix \(J\) in the square bracket is our Jordan normal form. This makes sense since if we consider the basis \(\{1,z,\frac{1}{2}z^2,\frac{1}{6}z^3,\frac{1}{24}z^4,\frac{1}{120}z^5\}\), we see under this basis, \[\begin{aligned}\ms{D}(1) &= 0 &\quad \ms{D}(z) &=1 \\\ms{D}(\frac{1}{2}z^2)&= z &\quad \ms{D}(\frac{1}{6}z^3) &= \frac{1}{2}z^2 \\\ms{D}(\frac{1}{24}z^4)&=\frac{1}{6}z^3 &\quad \ms{D}(\frac{1}{120}z^5)&=\frac{1}{24}z^4\end{aligned}\] which coincides with \(J\).

We already know \(\ms{D}^6=0\) but we can also get this by considering \(D^6=SJ^6S^{-1}=0\) since \(J^6=0\). Further, the format of \(S\) should have you realise that we have a hidden \(e\), that is \[e = {\color\red{1+1+\frac{1}{2}+\frac{1}{6}+\frac{1}{24}+\frac{1}{120}}}+\frac{1}{720}+\cdots,\] and the basis is in fact first \(6\) terms of the expansion of \(\exp{z}\).

If this cannot fansinate you I don't know what can!

Next we consider an example on infinite dimensional vector spaces. Consider \(E=C_c^\infty(\mathbb{R})\), the infinite dimensional vectror space of \(C^\infty\) functions on \(\mathbb{R}\) with compact support, namely, for \(f \in C_c^\infty(\mathbb{R})\), we have \(f \in C^\infty\) and there exists some \(0<K<\infty\) such that \(f(x)=0\) outside \([-K,K]\). Next consider the bilinear form \(E \times E \to \mathbb{R}\) defined by the following inner product: \[\langle f,g \rangle =\int_{-\infty}^{\infty}f(x)g(x)dx.\] Note the differential operator \(\ms{D}:E \to E\) is a linear map of \(E\) into \(E\), so let's find its transpose \(\ms{D}^T\). That is, we need to find the unique linear map \(\ms{D}^T:E \to E\) such that \[\langle \ms{D}f,g\rangle = \langle f,\ms{D}^Tg \rangle.\] This is a simple application of integration by parts: \[\begin{aligned}\langle \ms{D}f,g \rangle &= \int_{-\infty}^{\infty}g(x)df(x) \\ &= f(x)g(x)|_{-\infty}^{\infty} - \int_{-\infty}^{\infty}f(x)dg(x) \\ &=\int_{-\infty}^{\infty}f(x)(-\ms{D})g(x)dx \\ &=\langle f,(-\ms{D})g\rangle\end{aligned}\] Hence the **transpose** of differentiation \(\ms{D}\) is \(-\ms{D}\). So we can say it's skew-symmetric for some obvious reason. But the matrix of \(\ms{D}\) in \(n\)-polynomial space is not.

(Perron's theorem)Let \(A\) be a \(n \times n\) matrix having all components \(a_{ij}>0\), then it must have a positive eigenvalue \(\lambda_0\), and a unique corresponding positive eigenvector, i.e., \(x=(x_1,x_2,\cdots,x_n)^T\) such that \(x_i>0\) for all \(i = 1,2,\cdots,n\).

In fact, the positive eigenvalue is the spectral radius of \(A\), which is often written as \(\rho(A)\). I recommend reading the following documents:

- A short proof of Perron's theorem. This mentioned more algebraic properties of \(\rho(A)\).
- The Perron-Frobenius Theorem. This paper mentioned some real life application (modelling growth of a population) and has some exercises to work on.
- Proof of the Frobenius-Perron Theorem. This paper is more elementary-focused.

But here we are using Brouwer's fixed point theorem (you may find an elementary proof on project Euclid). In the following proof, we write \(D_n\) to denote \(n\)-disk and \(\Delta^n\) to denote \(n\)-simplex. That is, \[D_n = \{x \in \mathbb{R}^{n+1}:\lVert x \rVert \leq 1\}, \quad \Delta^n = \left\{(x_1,x_2,\cdots,x_n,x_{n+1}) \in \mathbb{R}^{n+1}:\sum_{i=1}^{n+1}x_i=1, x_i \geq 0\right\}.\] Note \(D_n\) is homeomorphic to \(\Delta^n\). Further we have a lemma:

(Lemma)If \(f:X \to X\) is a continuous function and \(X\) is homeomorphic to \(D_n\), then \(f\) has a fixed point as well.

**Proof of the lemma.** Let \(\varphi\) be the homeomorphism from \(X\) to \(D_n\). Then \(\varphi \circ f \circ \varphi^{-1}:D_n \to D_n\) has a fixed point, according to Brouwer's fixed point theorem, suppose we have \[\varphi \circ f \circ \varphi^{-1}(y)=y.\] Then \[f \circ \varphi^{-1}(y)=\varphi^{-1}(y)\] and hence \(\varphi^{-1}(y) \in X\) is our fixed point. \(\square\)

Now we are ready to prove Perron's theorem using Brouwer's fixed point theorem.

**Proof of Perron's theorem.** Define \(\sigma(x)=\sum_{i=1}^{n}x_i\) where \(x = (x_1,x_2,\cdots,x_n)^T\), we see since it's linear, it's continuous (it's not generally true for infinite dimensional spaces, but it's safe now, and you can see this question on mathstackexchange for a proof). Similarly \(A\) is continuous as well. Also, by definition, \(x \in \Delta^{n-1}\) if and only if \(\sigma(x)=1\). We see Define a function \(g:\Delta^{n-1} \to \Delta^{n-1}\) by \[g(x)=\frac{Ax}{\sigma(Ax)}.\] We will show that this function is well-defined. Since \(x \in \Delta^{n-1}\), not all components of \(x\) are equal to \(0\) since if so, we get \(x_1+x_2+\cdots+x=0\), contradicting the assumption that \(x \in \Delta^{n-1}\). Note we can write down \(Ax\) explicitly (this is an elementary linear algebra thing): \[Ax = \left(\sum_{j=1}^{n}a_{1j}x_j,\sum_{j=1}^{n}a_{2j}x_j,\cdots,\sum_{j=1}^{n}a_{nj}x_j\right)^T.\] Since \(A\) has all components greater than \(0\), we see all components of \(Ax\) are greater than \(0\) as well. Hence \(\sigma(Ax)>0\). On the other hand, \(g(x) \in \Delta^{n-1}\) since \(\sigma(g(x))=\frac{\sigma(Ax)}{\sigma(Ax)}=1\). Since \(A\), \(\sigma\), \(y=\frac{1}{x}\) are continuous, being a composition of continuous functions, \(g\) is continuous.

However, since \(\Delta^{n-1}\) is homeomorphic to \(D_{n-1}\), \(g\) has a fixed point according to the lemma. Hence there exists some \(y \in \Delta^{n-1}\) such that \[g(x)=\frac{Ay}{\sigma(Ay)}=y \implies Ay = \sigma(Ay)y.\] But as we have already proved, \(\lambda_0=\sigma(Ay)\) is continuous. On the other hand, all components of \(y\) are positive since all components of \(Ay\) are positive. The proof is completed. \(\square\)

You are assumed to be familiar with multivariable calculus when reading this subsection since we are discussing it right now. But in general this section is much beyond elementary linear algebra. First of all we are presenting the *ultimate* abstract extension of the usual gradient, curl, and divergence. We simply consider the \(C^\infty\) functions \(\mathbb{R}^3 \to \mathbb{R}^3\). When working on gradient, we consider something like \(\def\pf[#1]{\frac{\partial f}{\partial #1}}\) \[df = \frac{\partial f}{\partial x}dx + \pf[y]dy+\pf[z]dz.\] When working on curl, we consider \[\left(\frac{\partial f_3}{\partial y}-\frac{\partial f_2}{\partial z}\right)dydz - \left(\frac{\partial f_1}{\partial z}-\frac{\partial f_3}{\partial x}\right)dydz + \left(\frac{\partial f_2}{\partial x}-\frac{\partial f_1}{\partial y}\right)dydz.\] Finally for divergence we consider \[\left(\frac{\partial f_1}{\partial x}+\frac{\partial f_2}{\partial y}+\frac{\partial f_3}{\partial z}\right)dxdydz.\] They were connected by Green's theorem, Gauss's theorem, Stokes' theorem. But are they abruptly connected for no reason but numerical equality? Fortunately, no. Let's see why.

First of all for convenience we write \((x_1,x_2,x_3)\) instead of \((x,y,z)\). Define \(dx_idx_j=-dx_jdx_i\) for all \(i,j = 1,2,3\). Note this implies that \(dx_idx_i=0\). For \(d\) we have the definition as follows:

- If \(f\) is a \(C^\infty\) function, then \(df = \sum_{i=1}^{3}\pf[x_i]dx_i\).
- If \(\omega\) is of the
*form*\(\sum f_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}\), then \(d\omega=\sum df_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}\).

Then gradient, curl and divergence follows in the nature of things. You can verify that the second one is actually equal to \(d(f_1dx+f_2dy+f_3dz)\) and the third one is equal to \(d(f_1dydz-f_2dxdz+f_3dxdy)\). We call \(d\) the exterior differentiation.

Linear algebra is not just for \(\mathbb{R}^3\) space, so is exterior differentiation. Let \(\Omega^\ast\) be the algebra over \(\mathbb{R}\) (for algebra over a field, see this), generated by \(dx_1,\dots,dx_n\) with the multiplication defined by an **anti-commutative** multiplication \(dx_idx_j=-dx_jdx_i\) for all \(i,j=1,2,\cdots,n\). As a vector space over \(\mathbb{R}\), \(\Omega^\ast\) is of dimension \(2^n\) with a basis \[1,dx_i,dx_idx_j,dx_idx_jdx_k,dx_1\dots dx_n\] where \(i<j<k\). Let \(C^\infty\) itself be the vector space of \(C^\infty\) functions on \(\mathbb{R}\), and we define the \(C^\infty\) differential *forms* on \(\mathbb{R}^n\) by \[\Omega^\ast(\mathbb{R^n}) = C^\infty \bigotimes_\mathbb{R} \Omega^\ast.\] For simplicity we omit the tensor product symbol \(\otimes\). As a result, for any \(\omega \in \Omega^\ast(\mathbb{R})\), we have \(\omega\) to be a simple \(C^\infty\) function (why don't we call it a \(0\)-form? ) or we have \(\omega = \sum f_{i_1\cdots i_q}dx_{i_1}\dots dx_{i_q}\), and we call it a \(q\)-form since the maximal degree of \(dx_j\) is \(q\). Also we can define \(\Omega^q(\mathbb{R}^n)\) to be the vector space of \(q\)-forms. Consider the differential defined \(d\) defined by \[d:\Omega^q(\mathbb{R}^n) \to \Omega^{q+1}(\mathbb{R}^n)\]

- If \(f\) is a \(C^\infty\) function, then \(df = \sum_{i=1}^{n}\pf[x_i]dx_i\).
- If \(\omega\) is of the
*form*\(\sum f_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}\), then \(d\omega=\sum df_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}\).

This is what we call the exterior differentiation. It's the ultimate abstract extension of gradient, curl and divergence. Your calculus teacher may have warned you, that you cannot deal with \(dx\) independently. So is it safe to work like this? Yes, there is nothing to worry about. We are doing abstraction algebraically.

There are so many concepts can be understood in a linear algebra way. For example we also have \[\Omega^\ast(\mathbb{R}) = \bigoplus_{q=0}^{n}\Omega^q(\mathbb{R}^n).\] In fact Green's theorem, Gauss' theorem and Stokes' theorem have a ultimate abstract extension as well, which is called the general Stokes' theorem:

If \(\omega\) is an \((n-1)\)-form with compact support on an oriented manifold \(M\) of dimension \(n\) and if \(\partial M\) is given the induced orientation, then \[\int_M d\omega = \int_{\partial M}\omega.\]

We are not diving into this theorem but we will conclude this subsection by a glimpse on integration. Recall that the Riemann integral of a differentiable function \(f:\mathbb{R}^n \to \mathbb{R}\) can be written as \[\int_{\mathbb{R}^n}f|dx_1\dots dx_n| = \lim_{\Delta x_i \to 0}f\Delta x_1 \dots \Delta x_n.\] Here we add the absolute value function to \(dx_1 \dots dx_n\) is to emphasise the distinction between the Riemann integral of a function and the integral of differential form, since order only matters in the latter case. For the latter case, if \(\pi\) is a permutation of \(1,2,\cdots,n\) or we simply say \(\pi \in S_n\), then \[\int_{\mathbb{R}^n} f dx_{\pi(1)}\dots dx_{\pi(n)} = (\operatorname{sgn}\pi) \int_{\mathbb{R}^n} f |dx_{\pi(1)}\dots dx_{\pi(n)}|=(\operatorname{sgn}\pi)\int_{\mathbb{R}^n}f|dx_1\dots dx_n|.\] This definition is natural and obvious. Since \(\operatorname{sgn} \pi\) is equal to the determinant of the matrix representing \(\pi\) (see here), it's natural to consider the determinant. Consider the function \[\begin{aligned}\Pi: \mathbb{R}^n &\to \mathbb{R}^n \\(x_1,x_2,\cdots,x_n)&\mapsto (x_{\pi(1)},x_{\pi(2)},\cdots,x_{\pi(n)})\end{aligned}\] Then \(J(\Pi)=\operatorname{sgn}\pi\). This is quite similar to what we expect from Jacobian determinant in general, which describes change-of-variable essentially. Let \(x_1,x_2,\cdots,x_n\) be a basis of \(\mathbb{R}^n\) and \(T:\mathbb{R}^n \to \mathbb{R}^n\) be a diffeomorphism. We have a new basis \(y_1,y_2,\cdots,y_n\) given by \[y_i = \pi_i (T(x_1,x_2,\cdots,x_n))\] where \(\pi_i:(a_1,a_2,\cdots,a_n) \mapsto a_i\) is the \(i\)th projection. Namely \[(y_1,y_2,\cdots,y_n)^T=T(x_1,x_2,\cdots,x_n)=(T_1,T_2,\cdots,T_n)^T,\] written in column vectors. We now show that \[dy_1\dots dy_n = J(T) dx_1\cdots dx_n.\] First we recall that \(J(T)\) is the determinant of \((\partial T_i / \partial x_j)\), and the determinant of a matrix \((a_{ij})\) is defined by \[\sum_{\sigma}\epsilon(\sigma)a_{1,\sigma(1)}a_{2,\sigma(2)}\cdots a_{n,\sigma(n)},\] where \(\epsilon(\sigma)\) is actually \(\operatorname{sgn}\sigma\) and \(\sigma\) ranges through all permutation of \(1,2,\cdots,n\). We need something to coincide. First of all, we compute \(dy_i\). Note \[\frac{\partial y_i}{\partial x_j} = \frac{\partial T_i}{\partial x_j}.\] Hence \[dy_i = \sum_{j=1}^{n}\frac{\partial T_i}{\partial x_j}dx_j.\] We get, as a result, \[dy_1dy_2\cdots dy_n = \prod_{i=1}^{n}\left(\sum_{j=1}^{n}\frac{\partial T_i}{\partial x_j}dx_j\right).\] After cancelling out so many zeros, we get \(J(T)\). You don't have to expand the identity. Pick a component \(\frac{\partial T_1}{\partial x_{j_1}}dx_{j_1}\) from \(dy_1\). Then when we pick another component from \(dy_2\) to get it multiplied with the first one, say \(\frac{\partial T_2}{\partial x_{j_2}}dx_{j_2}\), then we must have \(j_1 \neq j_2\) since if not, then \(dx_{j_1}dx_{j_2}=0\), and we cancel that. The rule remains the same (but even stricter) when we pick components from \(dy_3\), \(dy_4\), and until \(dy_n\). In the end, \(j_1,j_2,\cdots,j_n\) are pairwise unequal. This corresponds exactly a permutation of \(1,2,\cdots,n\). Hence we get \[dy_1dy_2\cdots dy_n = \sum_\sigma \left(\prod_{j=1}^{n} \frac{\partial T_i}{\partial x_{\sigma (i)}}dx_{\sigma(i)}\right) = \sum_{\sigma}\frac{\partial T_1}{\partial x_{\sigma(1)}}\frac{\partial T_2}{\partial x_{\sigma(2)}}\cdots \frac{\partial T_n}{\partial x_{\sigma(n)}}dx_{\sigma(1)}dx_{\sigma(2)}\cdots dx_{\sigma(n)}.\] On the other hand, \(dx_{\sigma(1)}dx_{\sigma(2)}\cdots dx_{\sigma(n)}=\epsilon(\sigma)dx_1dx_2\cdots dx_n\), and if we put this inside the expansion of \(dy_1dy_2\cdots dy_n\), we get \[\begin{aligned}dy_1dy_2\cdots dy_n &= \sum_{\sigma}\epsilon(\sigma)\frac{\partial T_1}{\partial x_{\sigma(1)}}\frac{\partial T_2}{\partial x_{\sigma(2)}}\cdots \frac{\partial T_n}{\partial x_{\sigma(n)}}dx_1dx_2\cdots dx_n \\&=\left(\sum_{\sigma}\epsilon(\sigma)\frac{\partial T_1}{\partial x_{\sigma(1)}}\frac{\partial T_2}{\partial x_{\sigma(2)}}\cdots \frac{\partial T_n}{\partial x_{\sigma(n)}}\right)dx_1dx_2\cdots dx_n \\&=J(T)dx_1dx_2\cdots dx_n.\end{aligned}\] We answered a calculus question in an algebraic way (and more than that if you review more related concepts in calculus).

]]>There are several ways to define Dedekind domain since there are several equivalent statements of it. We will start from the one based on ring of fractions. As a friendly reminder, \(\mb{Z}\) or any principal integral domain is already a Dedekind domain. In fact Dedekind domain may be viewed as a generalization of principal integral domain.

Let \(\mfk{o}\) be an integral domain (a.k.a. entire ring), and \(K\) be its quotient field. A **Dedekind domain** is an integral domain \(\mfk{o}\) such that the fractional ideals form a group under multiplication. Let's have a breakdown. By a **fractional ideal** \(\mfk{a}\) we mean a nontrivial additive subgroup of \(K\) such that

- \(\mfk{o}\mfk{a}=\mfk{a}\),
- there exists some nonzero element \(c \in \mfk{o}\) such that \(c\mfk{a} \subset \mfk{o}\).

What does the group look like? As you may guess, the unit element is \(\mfk{o}\). For a fractional ideal \(\mfk{a}\), we have the inverse to be another fractional ideal \(\mfk{b}\) such that \(\mfk{ab}=\mfk{ba}=\mfk{o}\). Note we regard \(\mfk{o}\) as a subring of \(K\). For \(a \in \mfk{o}\), we treat it as \(a/1 \in K\). This makes sense because the map \(i:a \mapsto a/1\) is injective. For the existence of \(c\), you may consider it as a restriction that the 'denominator' is *bounded*. Alternatively, we say that fractional ideal of \(K\) is a finitely generated \(\mfk{o}\)-submodule of \(K\). But in this post it is not assumed that you have learned module theory.

Let's take \(\mb{Z}\) as an example. The quotient field of \(\mb{Z}\) is \(\mb{Q}\). We have a fractional ideal \(P\) where all elements are of the type \(\frac{np}{2}\) with \(p\) prime and \(n \in \mb{Z}\). Then indeed we have \(\mb{Z}P=P\). On the other hand, take \(2 \in \mb{Z}\), we have \(2P \subset \mb{Z}\). For its inverse we can take a fractional ideal \(Q\) where all elements are of the type \(\frac{2n}{p}\). As proved in algebraic number theory, the ring of algebraic integers in a number field is a Dedekind domain.

Before we go on we need to clarify the definition of ideal multiplication. Let \(\mfk{a}\) and \(\mfk{b}\) be two ideals, we define \(\mfk{ab}\) to be the set of all sums \[x_1y_1+\cdots+x_ny_n\] where \(x_i \in \mfk{a}\) and \(y_i \in \mfk{b}\). Here the number \(n\) means finite but is not fixed. Alternatively we cay say \(\mfk{ab}\) contains all finite sum of products of \(\mfk{a}\) and \(\mfk{b}\).

(Proposition 1)A Dedekind domain \(\mfk{o}\) is Noetherian.

By Noetherian ring we mean that every ideal in a ring is finitely generated. Precisely, we will prove that for every ideal \(\mfk{a} \subset \mfk{o}\) there are \(a_1,a_2,\cdots,a_n \in \mfk{a}\) such that, for every \(r \in \mfk{a}\), we have an expression \[r = c_1a_1 + c_2a_2 + \cdots + c_na_n \qquad c_1,c_2,\cdots,c_n \in \mfk{o}.\] Also note that any ideal \(\mfk{a} \subset \mfk{o}\) can be viewed as a fractional ideal.

**Proof.** Since \(\mfk{a}\) is an ideal of \(\mfk{o}\), let \(K\) be the quotient field of \(\mfk{o}\), we see since \(\mfk{oa}=\mfk{a}\), we may also view \(\mfk{a}\) as a fractional ideal. Since \(\mfk{o}\) is a Dedekind domain, and fractional ideals of \(\mfk{a}\) is a group, there is an fractional ideal \(\mfk{b}\) such that \(\mfk{ab}=\mfk{ba}=\mfk{o}\). Since \(1 \in \mfk{o}\), we may say that there exists some \(a_1,a_2,\cdots, a_n \in \mfk{a}\) and \(b_1,b_2,\cdots,b_n \in \mfk{o}\) such that \(\sum_{i = 1 }^{n}a_ib_i=1\). For any \(r \in \mfk{a}\), we have an expression \[r = rb_1a_1+rb_2a_2+\cdots+rb_na_n.\] On the other hand, any element of the form \(c_1a_1+c_2a_2+\cdots+c_na_n\), by definition, is an element of \(\mfk{a}\). \(\blacksquare\)

From now on, the inverse of an fractional ideal \(\mfk{a}\) will be written like \(\mfk{a}^{-1}\).

(Proposition 2)For ideals \(\mfk{a},\mfk{b} \subset \mfk{o}\), \(\mfk{b}\subset\mfk{a}\) if and only if there exists some \(\mfk{c}\) such that \(\mfk{ac}=\mfk{b}\) (or we simply say \(\mfk{a}|\mfk{b}\))

**Proof.** If \(\mfk{b}=\mfk{ac}\), simply note that \(\mfk{ac} \subset \mfk{a} \cap \mfk{c} \subset \mfk{a}\). For the converse, suppose that \(a \supset \mfk{b}\), then \(\mfk{c}=\mfk{a}^{-1}\mfk{b}\) is an ideal of \(\mfk{o}\) since \(\mfk{c}=\mfk{a}^{-1}\mfk{b} \subset \mfk{a}^{-1}\mfk{a}=\mfk{o}\), hence we may write \(\mfk{b}=\mfk{a}\mfk{c}\). \(\blacksquare\)

(Proposition 3)If \(\mfk{a}\) is an ideal of \(\mfk{o}\), then there are prime ideals \(\mfk{p}_1,\mfk{p}_2,\cdots,\mfk{p}_n\) such that \[\mfk{a}=\mfk{p}_1\mfk{p}_2\cdots\mfk{p}_n.\]

**Proof.** For this problem we use a classical technique: contradiction on maximality. Suppose this is not true, let \(\mfk{A}\) be the set of ideals of \(\mfk{o}\) that cannot be written as the product of prime ideals. By assumption \(\mfk{U}\) is nonempty. Since as we have proved, \(\mfk{o}\) is Noetherian, we can pick an maximal element \(\mfk{a}\) of \(\mfk{A}\) with respect to inclusion. If \(\mfk{a}\) is maximal, then since all maximal ideals are prime, \(\mfk{a}\) itself is prime as well. If \(\mfk{a}\) is properly contained in an ideal \(\mfk{m}\), then we write \(\mfk{a}=\mfk{m}\mfk{m}^{-1}\mfk{a}\). We have \(\mfk{m}^{-1}\mfk{a} \supsetneq \mfk{a}\) since if not, we have \(\mfk{a}=\mfk{ma}\), which implies \(\mfk{m}=\mfk{o}\). But by maximality, \(\mfk{m}^{-1}\mfk{a}\not\in\mfk{U}\), hence it can be written as a product of prime ideals. But \(\mfk{m}\) is prime as well, we have a prime factorization for \(\mfk{a}\), contradicting the definition of \(\mfk{U}\).

Next we show uniqueness up to permutation. If \[\mfk{p}_1\mfk{p}_2\cdots\mfk{p}_k=\mfk{q}_1\mfk{q}_2\cdots\mfk{q}_j,\] since \(\mfk{p}_1\mfk{p}_2\cdots\mfk{p}_k\subset\mfk{p}_1\) and \(\mfk{p}_1\) is prime, we may assume that \(\mfk{q}_1 \subset \mfk{p}_1\). By the property of fractional ideal we have \(\mfk{q}_1=\mfk{p}_1\mfk{r}_1\) for some fractional ideal \(\mfk{r}_1\). However we also have \(\mfk{q}_1 \subset \mfk{r}_1\). Since \(\mfk{q}_1\) is prime, we either have \(\mfk{q}_1 \supset \mfk{p}_1\) or \(\mfk{q}_1 \supset \mfk{r}_1\). In the former case we get \(\mfk{p}_1=\mfk{q}_1\), and we finish the proof by continuing inductively. In the latter case we have \(\mfk{r}_1=\mfk{q}_1=\mfk{p}_1\mfk{q}_1\), which shows that \(\mfk{p}_1=\mfk{o}\), which is impossible. \(\blacksquare\)

(Proposition 4)Every nontrivial prime ideal \(\mfk{p}\) is maximal.

**Proof.** Let \(\mfk{m}\) be an maximal ideal containing \(\mfk{p}\). By proposition 2 we have some \(\mfk{c}\) such that \(\mfk{p}=\mfk{mc}\). If \(\mfk{m} \neq \mfk{p}\), then \(\mfk{c} \neq \mfk{o}\), and we may write \(\mfk{c}=\mfk{p}_1\cdots\mfk{p}_n\), hence \(\mfk{p}=\mfk{m}\mfk{p}_1\cdots\mfk{p}_n\), which is a prime factorisation, contradicting the fact that \(\mfk{p}\) has a unique prime factorisation, which is \(\mfk{p}\) itself. Hence any maximal ideal containing \(\mfk{p}\) is \(\mfk{p}\) itself. \(\blacksquare\)

(Proposition 5)Suppose the Dedekind domain \(\mfk{o}\) only contains one prime (and maximal) ideal \(\mfk{p}\), let \(t \in \mfk{p}\) and \(t \not\in \mfk{p}^2\), then \(\mfk{p}\) is generated by \(t\).

**Proof.** Let \(\mfk{t}\) be the ideal generated by \(t\). By proposition 3 we have a factorisation \[\mfk{t}=\mfk{p}^n\] for some \(n\) since \(\mfk{o}\) contains only one prime ideal. According to proposition 2, if \(n \geq 3\), we write \(\mfk{p}^n=\mfk{p}^2\mfk{p}^{n-2}\), we see \(\mfk{p}^2 \supset \mfk{p}^n\). But this is impossible since if so we have \(t \in \mfk{p}^n \subset \mfk{p}^2\) contradicting our assumption. Hence \(0<n<3\). But If \(n=2\) we have \(t \in \mfk{p}^2\) which is also not possible. So \(\mfk{t}=\mfk{p}\) provided that such \(t\) exists.

For the existence of \(t\), note if not, then for all \(t \in \mfk{p}\) we have \(t \in \mfk{p}^2\), hence \(\mfk{p} \subset \mfk{p}^2\). On the other hand we already have \(\mfk{p}^2 = \mfk{p}\mfk{p}\), which implies that \(\mfk{p}^2 \subset \mfk{p}\) (proposition 2), hence \(\mfk{p}^2=\mfk{p}\), contradicting proposition 3. Hence such \(t\) exists and our proof is finished. \(\blacksquare\)

In fact there is another equivalent definition of Dedekind domain:

A domain \(\mfk{o}\) is Dedekind if and only if

- \(\mfk{o}\) is Noetherian.
- \(\mfk{o}\) is integrally closed.
- \(\mfk{o}\) has Krull dimension \(1\) (i.e. every non-zero prime ideals are maximal).

This is equivalent to say that faction ideals form a group and is frequently used by mathematicians as well. But we need some more advanced techniques to establish the equivalence. Presumably there will be a post about this in the future.

]]>There are several ways to prove it. I think there are several good reasons to write them down thoroughly since that may be why you find this page. Maybe you are burnt out since it's *left as exercise*. You are assumed to have enough knowledge of Lebesgue measure and integration.

Let \(S_1,S_2 \subset \mathbb{R}\) be two measurable set, suppose \(F:S_1 \times S_2 \to \mathbb{R}\) is measurable, then \[\left[\int_{S_2} \left\vert\int_{S_1}F(x,y)dx \right\vert^pdy\right]^{\frac{1}{p}} \leq \int_{S_1} \left[\int_{S_2} |F(x,y)|^p dy\right]^{\frac{1}{p}}dx.\] A proof can be found at here by turning to Example A9. You may need to replace all measures with Lebesgue measure \(m\).

Now let's get into it. For a measurable function in this place we should have \(G(x,t)=\frac{f(t)}{x}\). If we put this function inside this inequality, we see \[\begin{aligned} \lrVert[F]_p &= \left[\int_0^\infty \left\vert \int_0^x \frac{f(t)}{x}dt \right\vert^p dx\right]^{\frac{1}{p}} \\ &= \left[\int_0^\infty \left\vert \int_0^1 f(ux)du \right\vert^p dx\right]^{\frac{1}{p}} \\ &\leq \int_0^1 \left[\int_0^\infty |f(ux)|^pdx\right]^{\frac{1}{p}}du \\ &= \int_0^1 \left[\int_0^\infty |f(ux)|^pudx\right]^{\frac{1}{p}}u^{-\frac{1}{p}}du \\ &= \lrVert[f]_p \int_0^1 u^{-\frac{1}{p}}du \\ &=q\lrVert[f]_p.\end{aligned}\] Note we have used change-of-variable twice and the inequality once.

I have no idea how people came up with this solution. Take \(xF(x)=\int_0^x f(t)t^{u}t^{-u}dt\) where \(0<u<1-\frac{1}{p}\). Hölder's inequality gives us \[\begin{aligned}xF(x) &= \int_0^x f(t)t^ut^{-u}dt \\ &\leq \left[\int_0^x t^{-uq}dt\right]^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}} \\ &=\left(\frac{1}{1-uq}x^{1-uq}\right)^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}}\end{aligned}\] Hence \[\begin{aligned}F(x)^p & \leq \frac{1}{x^p}\left\{\left(\frac{1}{1-uq}x^{1-uq}\right)^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}}\right\}^{p} \\&= \left(\frac{1}{1-uq}\right)^{\frac{p}{q}}x^{\frac{p}{q}(1-uq)-p}\int_0^x f(t)^pt^{up}dt \\&= \left(\frac{1}{1-uq}\right)^{p-1}x^{-up-1}\int_0^x f(t)^pt^{up}dt\end{aligned}\]

Note we have used the fact that \(\frac{1}{p}+\frac{1}{q}=1 \implies p+q=pq\) and \(\frac{p}{q}=p-1\). Fubini's theorem gives us the final answer: \[\begin{aligned}\int_0^\infty F(x)^pdx &\leq \int_0^\infty\left[\left(\frac{1}{1-uq}\right)^{p-1}x^{-up-1}\int_0^x f(t)^pt^{up}dt\right]dx \\&=\left(\frac{1}{1-uq}\right)^{p-1}\int_0^\infty dx\int_0^x f(t)^pt^{up}x^{-up-1}dt \\&=\left(\frac{1}{1-uq}\right)^{p-1}\int_0^\infty dt\int_t^\infty f(t)^pt^{up}x^{-up-1}dx \\&=\left(\frac{1}{1-uq}\right)^{p-1}\frac{1}{up}\int_0^\infty f(t)^pdt.\end{aligned}\] It remains to find the minimum of \(\varphi(u) = \left(\frac{1}{1-uq}\right)^{p-1}\frac{1}{up}\). This is an elementary calculus problem. By taking its derivative, we see when \(u=\frac{1}{pq}<1-\frac{1}{p}\) it attains its minimum \(\left(\frac{p}{p-1}\right)^p=q^p\). Hence we get \[\int_0^\infty F(x)^pdx \leq q^p\int_0^\infty f(t)^pdt,\] which is exactly what we want. Note the constant \(q\) cannot be replaced with a smaller one. We simply proved the case when \(f \geq 0\). For the general case, one simply needs to take absolute value.

This approach makes use of properties of \(L^p\) space. Still we assume that \(f \geq 0\) but we also assume \(f \in C_c((0,\infty))\), that is, \(f\) is continuous and has compact support. Hence \(F\) is differentiable in this situation. Integration by parts gives \[\int_0^\infty F^p(x)dx=xF(x)^p\vert_0^\infty- p\int_0^\infty xdF^p = -p\int_0^\infty xF^{p-1}(x)F'(x)dx.\] Note since \(f\) has compact support, there are some \([a,b]\) such that \(f >0\) only if \(0 < a \leq x \leq b < \infty\) and hence \(xF(x)^p\vert_0^\infty=0\). Next it is natural to take a look at \(F'(x)\). Note we have \[F'(x) = \frac{f(x)}{x}-\frac{\int_0^x f(t)dt}{x^2},\] hence \(xF'(x)=f(x)-F(x)\). A substitution gives us \[\int_0^\infty F^p(x)dx = -p\int_0^\infty F^{p-1}(x)[f(x)-F(x)]dx,\] which is equivalent to say \[\int_0^\infty F^p(x)dx = \frac{p}{p-1}\int_0^\infty F^{p-1}(x)f(x)dx.\] Hölder's inequality gives us \[\begin{aligned}\int_0^\infty F^{p-1}(x)f(x)dx &\leq \left[\int_0^\infty F^{(p-1)q}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty f(x)^pdx\right]^{\frac{1}{p}} \\&=\left[\int_0^\infty F^{p}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty f(x)^pdx\right]^{\frac{1}{p}}.\end{aligned}\] Together with the identity above we get \[\int_0^\infty F^p(x)dx = q\left[\int_0^\infty F^{p}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty f(x)^pdx\right]^{\frac{1}{p}}\] which is exactly what we want since \(1-\frac{1}{q}=\frac{1}{p}\) and all we need to do is divide \(\left[\int_0^\infty F^pdx\right]^{1/q}\) on both sides. So what's next? Note \(C_c((0,\infty))\) is dense in \(L^p((0,\infty))\). For any \(f \in L^p((0,\infty))\), we can take a sequence of functions \(f_n \in C_c((0,\infty))\) such that \(f_n \to f\) with respect to \(L^p\)-norm. Taking \(F=\frac{1}{x}\int_0^x f(t)dt\) and \(F_n = \frac{1}{x}\int_0^x f_n(t)dt\), we need to show that \(F_n \to F\) pointwise, so that we can use Fatou's lemma. For \(\varepsilon>0\), there exists some \(m\) such that \(\lrVert[f_n-f]_p < \frac{1}{n}\). Thus \[\begin{aligned}|F_n(x)-F(x)| &= \frac{1}{x}\left\vert \int_0^x f_n(t)dt - \int_0^x f(t)dt \right\vert \\ &\leq \frac{1}{x} \int_0^x |f_n(t)-f(t)|dt \\ &\leq \frac{1}{x} \left[\int_0^x|f_n(t)-f(t)|^pdt\right]^{\frac{1}{p}}\left[\int_0^x 1^qdt\right]^{\frac{1}{q}} \\ &=\frac{1}{x^{1/p}}\left[\int_0^x|f_n(t)-f(t)|^pdt\right]^{\frac{1}{p}} \\ &\leq \frac{1}{x^{1/p}}\lrVert[f_n-f]_p <\frac{\varepsilon}{x^{1/p}}.\end{aligned}\] Hence \(F_n \to F\) pointwise, which also implies that \(|F_n|^p \to |F|^p\) pointwise. For \(|F_n|\) we have \[\begin{aligned}\int_0^\infty |F_n(x)|^pdx &= \int_0^\infty \left\vert\frac{1}{x}\int_0^x f_n(t)dt\right\vert^p dx \\&\leq \int_0^\infty \left[\frac{1}{x}\int_0^x |f_n(t)|dt\right]^{p}dx \\&\leq q\int_0^\infty |f_n(t)|^pdt\end{aligned}\] note the third inequality follows since we have already proved it for \(f \geq 0\). By Fatou's lemma, we have \[\begin{aligned}\int_0^\infty |F(x)|^pdx &= \int_0^\infty \lim_{n \to \infty}|F_n(x)|^pdx \\&\leq \lim_{n \to \infty} \int_0^\infty |F_n(x)|^pdx \\&\leq \lim_{n \to \infty}q^p\int_0^\infty |f_n(x)|^pdx \\&=q^p\int_0^\infty |f(x)|^pdx.\end{aligned}\]

]]>