The Segre embedding allows us to define the product of projectivevarieties reasonably, and we will discuss it right now. To begin with weconsider the product of

Definition 1.TheSegre embeddingis defined as follows:\[\begin{aligned}\iota:\mathbb{P}^m \times \mathbb{P}^n &\to \mathbb{P}^N \\([X_0:\cdots:X_m],[Y_0:\cdots:Y_n]) &\mapsto[X_0Y_0:X_0Y_1:\cdots:X_mY_n]\end{aligned}\]

Clearly, \(N=(m+1)(n+1)-1=mn+m+n\).The image on the right hand side has

\(X_iY_j\) ordered lexicographically.

First of all we make sure that this function is well-defined,otherwise our work will be useless.

Proposition 1.The Segre embedding is a well-definedinjective map.

*Proof.* Assume \(X_i'=\lambdaX_i\) and \(Y_j'=\mu Y_j\)for some \(\lambda,\mu \ne 0\),then

\[\begin{aligned}\iota([X_0':\dots:X_m'],[Y_0':\dots:Y_n'])&=[X_0'Y_0':\dots:X_m'Y_n']\\ &=[\lambda\mu X_0Y_0:\dots:\lambda\mu X_mY_n] \\ &=\lambda\mu[X_0Y_0:\dots:X_mY_n] \\ &=[X_0Y_0:\dots:X_mY_n] \\\end{aligned}\]

Next suppose that

Next we study the image further using linear algebra

We can write elements in

\[\begin{bmatrix}Z_{00} & \dots & Z_{0n} \\\vdots & \ddots & \vdots \\Z_{m0} & \cdots & Z_{mn}\end{bmatrix}.\]

Therefore the image of \(\iota\) isgiven by \(Z_{ij}=X_iY_j\). Through anelementary observation, we see the matrix

\[\begin{bmatrix}X_0Y_0 & \cdots & X_0Y_n \\\vdots & \ddots & \vdots \\X_mY_0 & \cdots & X_mY_n\end{bmatrix}\]

has rank \(1\). The question is, isthe converse true? For this reason we study the set

\[Z=Z(\{Z_{ij}Z_{kl}-Z_{kj}Z_{il}:1 \le i,k \le m,1 \le j,l \le n\}).\]

Note \(Z_{ij}Z_{kl}-Z_{kj}Z_{il}\)is the determinant of all \(2\times2\)submatrices the matrix \([Z_{ij}]\).This \(Z\) contains all

\[V_{kl} \cap Z\to U_k \times U_l', \quad [Z_{ij}] \mapsto([Z_{0l}:\dots:Z_{ml}],[Z_{k0}:\dots:Z_{kn}])\]

This is indeed the inverse map of

Therefore, the image of the Segre embedding is a projective variety.As a classic example, the image of

In this section we offer a way to understand the Segre embedding innumber fields. To begin with, we need some definition.

Height is computed by absolute values on a field so we firstnormalise all absolute values on

The ordinary absolute value

\[M_\mathbb{Q}=\{|\cdot|_p:p\text{ is prime or}=\infty\}.\]

Likewise we define \(M_K\), wherethroughout \(K\) will always be anumber field. \(M_K\) consists of theordinary absolute value and extensions of

\[|x|_v=|N_{K_v/\mathbb{Q}_p}(x)|_p^{1/[K:\mathbb{Q}]},\quad \forall x\inK,v|p.\]

In particular, \(M_K\) satisfies theproduct formula:

\[\prod_{v \in M_K}|x|_v=1 \text{ or } \sum_{v \in M_K}\log|x|_v=0\]

for all \(x \in K^\times\). Thisrestriction allows us to work fine with projective spaces, as we willsee later.

Definition 2.The (absolute logarithmic)heightof \(x\in\mathbb{P}^n_{\overline{\mathbb{Q}}}\), with coordinates\((x_1,\dots,x_m) \in K\), is defined by \[h(x)=\sum_{v \in M_K}\max_j \log |x_j|_v.\]

Actually, the height function can show the "algebraic complication"of \(x\), and is well-defined in manysenses.

Proposition 2.The height\(h(x)\) is independent of the choice of\(K\).

*Sketch of the proof.* Let

\[\sum_{w|v}[L_w:K_v]=[L:K],\]

which implies that

\[\sum_{w|v}\log |x|_w=\log |x|_v.\]

Therefore

\[\sum_{w \in M_L}\max_j\log|x_j|_w=\sum_{v \inM_K}\sum_{w|v}\max_j\log|x_j|_w\]

gives what we want. \(\square\)

Proposition 3.\(h(x)\) is well-defined on \(\mathbb{P}^n_{\overline{\mathbb{Q}}}\).

*Proof.* It remains to show that

\[\begin{aligned}h(\lambda{x})&=\sum_{v \in M_K}\max_j\log|\lambda x_j|_v \\ &=\sum_{v \inM_K}\left(\log|\lambda|_v+\max_j\log|x_j|_v \right) \\ &=\sum_{v \in M_K}\log|\lambda|_v+\sum_{v \inM_K}\max_j\log|x_j|_v \\ &=\sum_{v \in M_K}\max_j\log|x_j|_v \\ &=h(x).\end{aligned}\]

Note \(\sum_{v \inM_K}\log|\lambda|_v=0\) because o the product formula.

To highlight the ability of height to measure algebraic complication,let's mention the following theorem of Kronecker.

Theorem 1 (Kronecker).The height of\(\zeta\in\overline{\mathbb{Q}}^\times\) is\(0\) if and only if \(\zeta\) is a root of unity.

One direction is straightforward. To prove the converse, one may needsome combinatorics, symmetric functions and Dirichlet's pigeon-holeprinciple. See theorem 2.4 of this

Now let's invite the Segre embedding into the party:

\[\begin{aligned}\iota:\mathbb{P}^n_{\overline{\mathbb{Q}}} \times\mathbb{P}^m_{\overline{\mathbb{Q}}} &\to\mathbb{P}^N_{\overline{\mathbb{Q}}} \\ (x,y) &\mapsto x \otimes y \\ &:=(x_iy_j).\end{aligned}\]

Using the fact that

\[h(x \otimes y) = h(x) + h(y).\]

The Segre embedding is immediately used after introducing the heightof a polynomial.

Definition 3.For\(f(t_1,\dots,t_n) \in K[t_1,\dots,t_n]\), wewrite \[f(t_1,\dots,t_n)=\sum_{\mathbf{j}}a_{\mathbf{j}}\mathbf{t^j}.\]

Then the

heightof\(f\) is defined to be \[h(f)=\sum_{v \in M_K}\log |f|_v\]

where

\[|f|_v=\max_{\mathbf{j}}|a_{\mathbf{j}}|_v.\]

Likewise, it can show the complication of

]]>

Proposition 4.Let\(f(t_1,\dots,t_n)\) and \(g(s_1,\dots,s_m)\) be polynomials indifferent sets of variables, then \[h(fg)=h(f)+h(g).\]

We want to apply calculus to fields, but tools are needed. For theordinary calculus, on \(\mathbb{R}\) or\(\mathbb{C}\), the most important roleis played by limit:

\[\lim_{x \to a}f(x)=A.\]

However we cannot immigrate absolute value into other fieldsdirectly. Indeed, if the field \(k\) isan extension of \(\mathbb{Q}\), then wemay define an absolute value on \(k\)to be the restriction of the absolute value of

Definition 1.Anabsolute valueona field \(K\) is a real valued function\(|\cdot|:K \to \mathbb{R}_+\) suchthat

For all \(x \in K\), we have\(|x| \ge 0\) and

\(|x|=0\) if and only if \(x=0\). \(|xy|=|x||y|\).

There exists \(c>0\) suchthat \(|x+y| \le c\max\{|x|,|y|\}\).

Before we dive into some technical details of the inequality, let'ssee some trivial and non-trivial examples.

On any field \(K\), we candefine \(|x|=1\) for all

\(x \ne 0\). This is the most trivialabsolute value and it carries little to none information. But whetherthe absolute value is trivial, we always have \(|1|=1\) because \(|1x|=|1||x|=|x|\). If \(K=\mathbb{Q}\), we candefine \(|m/n|\) to be the ordinaryabsolute value

\(\sqrt{\left(\frac{m}{n}\right)^2}\). We arefamiliar with it for sure. It is customary to write \(|\cdot|_\infty\). However, for \(K=\mathbb{Q}\),and \(m/n \in K\), we can alsowrite

\[\frac{m}{n}=p^a\frac{m'}{n'}.\]

where \(m'\) and

\(n'\) are integers coprime to \(p\). Under this presentation we canput

\[\left|\frac{m}{n}\right|_p=p^{-a}.\]

In this way we obtain an absolute value

- Let \(K=\mathbf{F}_q\) be a finitefield, then the only absolute value on
\(K\) is trivial. To see this, notice that\(K^\times\) is a cyclic group. Pickany \(x \in K^\times\), we have \(|x|^{q-1}=|x^{q-1}|=|1|=1\).

It seems we have ignored the triangle inequality for no reason, butactually we didn't. To see this, we give a refinement of the triangleinequality first.

Proposition 1.Let\(|\cdot|:K \to \mathbb{R}\) be an absolutevalue with \(|x+y|\lec\max\{|x|,|y|\}\), then the following two statements areequivalent:

\(c \le 2\).

For all \(a,b \in K\), we have\(|a+b|\le |a|+|b|\). This is the

triangle inequality.

*Proof.* It is obvious that

\[\left|\sum_{k=1}^{n}a_k \right| \le 2^m \max |a_k|=n\max|a_k| \le 2n\max|a_k|.\]

For all positive integers satisfying

\[\begin{aligned}\left|\sum_{k=1}^{n}a_k\right| &\le c\max\left\{\left|\sum_{k=1}^{2^{m-1}}a_k\right|,\left|\sum_{k=2^m+1}^{2^{m}}a_k\right|\right\}\\ &\le2\max\left\{\left|\sum_{k=1}^{2^{m-1}}a_k\right|,\left|\sum_{k=2^m+1}^{2^{m}}a_k\right|\right\}\\ &\le2\max\left\{2^{m-1}\cdot\max_{1\le k \le2^{m-1}}|a_k|,2^{m-1}\cdot\max_{2^{m-1}<k\le 2^m}|a_k|\right\} \\ &\le 2 \cdot 2^{m-1}\max_{1 \le k \le2^m}|a_k| \\ &\le 2n \max_{1 \le k \le 2^m}|a_k|\end{aligned}\]

Let \(\tilde{n}\) be the image of\(n\) in

\[\left|\sum_{k=1}^{n}a_k\right| \le 2n \sum_{k=1}^{n}|a_k|.\]

We therefore can write

\[\begin{aligned}|a+b|^n &= |(a+b)^n| \\ &=\left|\sum_{k=0}^{n}{n \choose k}a^k b^{n-k} \right| \\ &\le 2(n+1)\sum_{k=0}^{n}\left|\widetilde{n\choosek}\right||a|^k|b|^{n-k} \\ &\le 4(n+1)\sum_{k=0}^{n}{n \choose k}|a|^k|b|^{n-k} \\ &=4(n+1) (|a|+|b|)^n.\end{aligned}\]

It follows that

\[|a+b| \le \sqrt[n]{4(n+1)}(|a|+|b|), \quad \forall n \in \mathbb{N}.\]

Since \(\lim_{n \to\infty}\sqrt[n]{4(n+1)}=1\), we are done.

\[f(X)=\sum_{k=n}^{\infty}a_kX^k\]

where \(a_n \ne 0\). We can definean absolute value on

Notice that an absolute value induces a translate-invariant metric inan obvious way:

\[d(x,y)=|x-y|.\]

A topology comes up in the nature of things. We cannot apply theoremsin functional analysis directly because we do not have a real or complexvector space. But we can try to import those important concepts. Whenstudying open mapping theorem, we care about equivalent norms ormetrics, on whether they induce the same topology. Here we will also dothat.

Definition 2.Two absolute values\(|\cdot|_1\) and \(|\cdot|_2\) are equivalentif they induces the same topology (this is clearly an equivalencerelation). An equivalence class of absolute values is called aplace.

Clearly, the topology is discrete if and only if the absolute valueis trivial. Therefore a trivial absolute value is not equivalent to anynon-trivial ones. But let's see two non-trivial absolute values that arenot equivalent. On \(\mathbb{Q}\),consider \(|\cdot|_\infty\) and

\[ \limsup_{n \to \infty}\left|\frac{1}{n}\right|_2=1\]

if we take odd numbers into account. On the other hand,

We have an important characterisation of equivalent absolutevalues.

Proposition 2.Let\(|\cdot|_1\) and \(|\cdot|_2\) be two non-trivial absolutevalues, then the following statements are equivalent.

\(|\cdot|_1\) and

\(|\cdot|_2\) are equivalent. \(|x|_1<1\) implies that\(|x|_2<1\).

There exists \(\lambda>0\)such that

\(|\cdot|_1=|\cdot|_2^\lambda\).

*Proof.* Assume that

Assume that \(|x|_1<1\) alwaysimplies that \(|x|_2<1\). It followsthat \(|x|_1>1\) implies that

\[|x|_2=\left(|x_0|_1^{\log_ab}\right)^\alpha=|x_0|_1^{\alpha\lambda}=|x|_1^\lambda.\]

3 implying 1 is immediate because

If \(|\cdot|_1=|\cdot|_2^\lambda\),\(|x+y|_1\le c_1\max\{|x|_1,|y|_1\}\)and \(|x+y|_2 \lec_2\max\{|x|_2,|y|_2\}\), then

Proposition 3.Each absolute value is equivalent toone that satisfies the triangle inequality.

Bearing this in mind, we can study the case when

Proposition 4.Let\(|\cdot|\) be an absolute value on \(K\). Then the following statements areequivalent:

\(|\cdot|\) satisfies theultrametric inequality:

\(|x+y|\le\max\{|x|,|y|\}\). \(|\tilde{n}|\le 1\) for all\(n \in \mathbb{N}\).

*Proof.* Suppose that \(|x+y| \le\max\{|x|,|y|\}\). Then

\[|\widetilde{n+1}|=|\tilde{n}+1|\le\max\{|\tilde{n}|,|1|\}\le 1.\]

Conversely, suppose that \(|\tilde{n}| \le1\) for all \(n\). Replace theabsolute value with one satisfying triangle inequality if necessary. Itfollows that

\[\begin{aligned}|a+b|^n &\le \sum_{k=0}^{n}\left|{n \choose k}a^k b^{n-k}\right| \\&\le \sum_{k=0}^{n}\left|\widetilde{n \choosek}\right||a|^k|b|^{n-k} \\&\le (n+1)\max\{|a|^n,|b|^n\} \\&=(n+1)\max\{|a|,|b|\}^n.\end{aligned}\]

Therefore \(|a+b| \le\sqrt[n]{n+1}\max\{|a|,|b|\}\). The result follows from the factthat \(\sqrt[n]{n+1} \to 1\) as

Definition 3.An absolute value is callednon-Archimedean, orultrametric, ifthe condition in proposition 4 is satisfied. Otherwise it is calledArchimedeanorordinary.

For example, trivial absolute values are ultrametric but we are notinterested in that. What is interesting is that

There is a second classification - Ostrowski's theorem, which statesthat all nontrivial places on

Theorem 4.2 of

this note forthe ordinary theorem of Ostrowski on \(\mathbb{Q}\). This

expositorypaper for the theorem of Ostrowski on number fields. This

expositorypaper for the theorem of Ostrowski on function fields.

When we have a field extension

Definition 4.A field\(K\) is completewithrespect to \(|\cdot|\) if\(K\) is a complete metric space with respectto the metric \(d(x,y)=|x-y|\).

To employ the similar device, we will define completion in a similarstyle. Let \(\mathscr{P}_F\) be the setof all places of a field \(F\). Eachplace \(v\) on

\[\begin{aligned} r:\mathscr{P}_L &\to \mathscr{P}_K \\ v &\mapsto u\end{aligned}\]

from the places of \(L\) to placesof \(K\).

Definition 5.Let\(L/K\) be a field extension and \(u \in \mathscr{P}_K\). If \(v \in r^{-1}(u)\), we write \(v|w\) and say \(v\) divides\(w\) or \(v\) lies over\(u\).

Definition 6.A completion of\(K\) with respect to a place \(v\) is an extension field \(K_v\) with a place \(w\) such that

\(w|v\).

The topology of \(K_v\) inducedby \(w\) is complete.

\(K\) is a dense subset of

\(K_v\).

The extension exists and is unique up to isomorphism (to see this,modify the proof on the completion of

For \(|\cdot|_p\) on

As an striking example, in

\[\sum_{k=0}^{\infty}2^k=-1\]

because

\[\lim_{n \to \infty}\left|\sum_{k=0}^{n}2^k+1\right|_2=\lim_{n \to\infty}|2^n-1+1|_2=\lim_{n \to \infty} 2^{-n}=0.\]

There is nothing skippy or misunderstanding as that Numberphile videoon the "identity"

To conclude this post and prepare for future posts, we show thatabsolute values works fine with norms over a vector space (do notconfuse with norms in Galois theory).

Definition 7.Let\(K\) be a field with absolute value \(|\cdot|\) and \(E\) be a vector space over \(K\). A norm \(E\to \mathbb{R}\) compatible with \(|\cdot|\) is a function \(\|\cdot\|\) that satisfies

\(\|\xi\|\ge 0\) for all

\(\xi \in E\), and \(\|\xi\|=0\) if and only if \(\xi=0\). For all \(x \in K\) and

\(\xi \in E\), \(\|x\xi\|=|x|\|\xi\|\). \(\|\xi_1+\xi_2\| \le\|\xi_1\|+\|\xi_2\|\) for all

\(\xi_1,\xi_2 \in E\).

Two norms \(\|\cdot\|_1\) and **equivalent** if there exist numbers

\[C_1\|\xi\|_1 \le \|\xi\|_2 \le C_2 \|\xi\|_1.\]

This is an equivalence relation and we have already seen this inelementary linear algebra and functional analysis. This is equivalent tothe fact that \(\|\cdot\|_1\) and

\[\xi=x_1\xi_1+\cdots+x_n\xi_n,\quad x_1,\dots,x_n \in K.\]

We can define norms like

Proposition 5.Let\(K\) be a complete field under a non-trivialabsolute value \(|\cdot|\), and let\(E\) be a finite-dimensional spaceover \(K\). Then any two norms on \(E\) that are compatible with \(|\cdot|\) are equivalent.

*Proof.* It suffices to show that

\[\xi^{(\nu)} = x_1^{(\nu)}\xi_1+\cdots+x_n^{(\nu)}\xi_n\]

is a Cauchy sequence (with respect to a norm) in

Suppose this is false for

\[\frac{\xi^{(\nu)}}{x_1^{(\nu)}}-\xi_1=\frac{x_2^{(\nu)}}{x_1^{(\nu)}}\xi_2+\dots+\frac{x_n^{(\nu)}}{x_1^{(\nu)}}\xi_n.\]

Taking the limit, we see \(\xi_1\)is a linear combination of

We will need this proposition to work with finite fieldextensions.

Erico Bombieri and Walter Gubler,

*Heights in DiophantineGeometry*.Serge Lang,

*Algebra Revisited Third Edition*.Dinakar Ramakrishman and Robert J. Valenza,

*Fourier Analysison Number Fields*.

In our previous post on theirreducible representations of

The result is satisfying for

This post would be relatively easier to read. Other than the basiclanguage of representation theory (of Lie groups), only multivariablecalculus is needed.

Like in the previous post, we first determine a good playground andthen show that this is all we need. The playground here is

\[P_\ell=\mathbb{C} \otimes_\mathbb{R}\operatorname{Sym}^\ell\mathbb{R}^3.\]

The reason for the symmetric product of

Recall that

\[\dim \operatorname{Sym}^\ell \mathbb{R}^3 = {\ell+3-1 \choose\ell}={\ell + 2 \choose \ell}={\ell+2 \choose2}=\frac{(\ell+2)(\ell+1)}{2}.\]

Therefore \(\dimP_\ell=\frac{(\ell+2)(\ell+1)}{2}\), as a

We will extract what we want from the spaces of this form.

The action of \(SO(3)\) or

\[(Af)(x)=f(xA).\]

Here, \(x=(x_1,x_2,x_3) \in \mathbb{R}\times \mathbb{R} \times \mathbb{R}\), and

To study this representation, we need to find some morphisms

\[\Delta:f \mapsto \left(\frac{\partial^2}{\partialx_1^2}+\frac{\partial^2}{\partial x_2^2}+\frac{\partial^2}{\partialx_3^2}\right)f.\]

In other words, \(\Delta\) is thetrace of the Hessian matrix of \(f\).Trace is used in representation theory to define character so there is achance to find its good connection to the representation of

We shall also not forget the **kernel** of theLaplacian, which is called **harmonic polynomials** ofdegree \(\ell\) in this context:

\[\mathfrak{H}_\ell = \{f \in P_\ell:\Delta{f}=0\}.\]

Since functions in \(P_\ell\) arehomogeneous, the value of \(f\) at apoint \(x\) is determined by the valueat \(\frac{x}{\|x\|} \in S^2\), theunit sphere. Therefore we also call **sphericalharmonics** of degree \(\ell\).And we certainly need to know the nullity of

Lemma 1.The dimension of\(\mathfrak{H}_\ell\) is \(2\ell+1\).

*Proof.* First of all we perform a Taylor expansion of

\[f(x_1,x_2,x_3)=\sum_{k=0}^{\ell}\frac{f_k(x_2,x_3)}{k!}x_1^k.\]

Here, \(f_k(x_2,x_3)\) ishomogeneous of degree \(\ell-k\) in\(x_2,x_3\). Therefore we only need tostudy one term of the right hand side.

\[\begin{aligned}\Delta \frac{f_k(x_2,x_3)}{k!}x_1^k &=\frac{f_k(x_2,x_3)}{k!}k(k-1)x_1^{k-2}+\frac{x_1^k}{k!}\left(\frac{\partial^2f_k}{\partial x_2^2}+\frac{\partial^2 f_k}{\partial x_3^2}\right) \\&=\frac{f_k(x_2,x_3)}{(k-2)!}x_1^{k-2}+\frac{x_1^k}{k!}\left(\frac{\partial^2f_k}{\partial x_2^2}+\frac{\partial^2 f_k}{\partial x_3^2}\right)\end{aligned}\]

Now we can put them together naturally:

\[\Delta f =\sum_{k=0}^{\ell-2}\frac{f_{k+2}}{k!}x_1^{k}+\sum_{k=0}^{\ell}\frac{x_1^k}{k!}\left(\frac{\partial^2f_k}{\partial x_2^2}+\frac{\partial^2 f_k}{\partial x_3^2}\right)\]

Let's try to explore the last term a little bit more. If

\[\Delta f =\sum_{k=0}^{\ell-2}\frac{x_1^k}{k!}\left[f_{k+2}+\left(\frac{\partial^2f_k}{\partial x_2^2}+\frac{\partial^2 f_k}{\partial x_3^2}\right)\right]\]

Therefore \(\Delta{f}=0\) if andonly if

\[f_{k+2}+\left(\frac{\partial^2 f_k}{\partial x_2^2}+\frac{\partial^2f_k}{\partial x_3^2}\right)=0.\]

Therefore, once \(f_0\) and

\[\dim \mathfrak{H}_\ell=\dim P_\ell^2+\dim P_{\ell-2}^2\]

where \(P_k^2\) is the space ofhomogeneous polynomials with two variables, hence is isomorphic to

\[\dim P_\ell^2 = {\ell+2-1 \choose \ell}=\ell+1, \quad \dim P_{\ell-1}^2= \ell.\]

Hence

\[\dim \mathfrak{H}_\ell=2\ell+1.\]

\(\square\)

Recall that \(\dim W_n=2n+1\). Thisshould not be a coincidence, and we shall dive into it right now. To dothis we immediately establish the connection between

Lemma 2.The action of the Laplacian on\(C^\infty(\mathbb{R}^3;\mathbb{C})\) (whichcontains \(P_\ell\) for all \(\ell \ge 0\)) commutes with the action of\(SO(3)\), i.e. \(\Delta\) is \(SO(3)\)-equivariant.

*Proof.* Really routine verification.

As a result, we have an very important result:

Theorem 1.The space\(\mathfrak{H}_\ell\) is an \(SO(3)\)-invariant subspace of \(P_\ell\).

We start with a direct observation of matrices in

Lemma 2.Every element in\(SO(3)\) is conjugate to \(R(t)\), where \[R(t)= \begin{pmatrix}1 & 0 & 0 \\0&\cos{t}&-\sin{t} \\0&\sin{t}&\cos{t} \end{pmatrix}.\]

*Proof.* Pick any \(A \inSO(3)\). First of all we show that

\[\begin{aligned}\det (I-A)&=\det(AA^T-A) \\ &=\det(A(A^T-I)) \\ &=\det(A)\det(A^T-I) \\ &=\det(A-I) \\ &=-\det(I-A)\end{aligned}\]

we therefore have \(\det(I-A)=0\).Hence we can pick \(v_1 \in \ker(I-A)\)with norm \(1\). Pick

\[V^{-1}AV=R=\begin{pmatrix}1 & 0 & 0 \\ 0&a&b \\ 0&c&d \end{pmatrix} \in SO(3).\]

In particular, \(R \in SO(3)\) alsoimplies

\[\begin{cases}a^2+b^2=1 \\b^2+d^2=1 \\a^2+c^2=1 \\b^2+d^2=1 \\ad-bc=1\end{cases}\]

Solving this equation system we must have

Since characters are invariant under conjugation, the study of thecharacter of \(SO(3)\) is reduced to\(T\), the subgroup generated bymatrices of the form \(R(t)\). Butdirect computation is a nightmare so we try our best to do it elegantly.To do this, we return to the irreducible representations of

\[e(t)=\begin{pmatrix}\exp(it) & 0 \\0 & \exp(-it)\end{pmatrix} \mapsto\begin{pmatrix}1 & 0 & 0 \\0 & \cos{2t} & -\sin{2t} \\0 & \sin{2t} & \cos{2t}\end{pmatrix}=R(2t).\]

One can refer to

\[\chi_{V_{2n}}(e(t/2))=\sum_{k=0}^{2n}\exp\left(i(2n-2k)\frac{t}{2}\right)=\sum_{k=0}^{2n}\exp(i(n-k)t).\]

Now we are ready for the irreducible representations of

Since we basically have \(\dim\mathfrak{H}_\ell=\dim W_\ell\), it is natural to believe that\(\mathfrak{H}_\ell \cong W_\ell\), inthe sense of \(SO(3)\)-modules, and thefollowing theorem answers this question affirmatively.

Theorem 2.The space\(\mathfrak{H}_\ell\) is isomorphic to \(W_\ell\). In other words, irreducible \(SO(3)\)-modules are determined by sphericalharmonics.

*Proof.* We will use the fact every compact Lie group iscompletely reducible. (First of all,

Therefore we have

\[\mathfrak{H}_\ell= \bigoplus_{\nu}W_{n_\nu}\]

where each \(W_{n_\nu}\) is anirreducible representation of

\[2\ell+1 = \sum_{\nu}(2n_\nu+1).\]

To prove that

\[\chi_{\mathfrak{H}_\ell}(R(t))=\sum_{n_\nu}\sum_{k=-n_\nu}^{n_\nu}\exp(ikt).\]

In other words, the character is a linear combination of

To do this, we can consider vector

\[\begin{aligned}(R(t)f_\ell)(t)&=\left(x_2\cos{t}+x_3\sin{t}+i(-x_2\sin{t}+x_3\cos{t})\right)^\ell\\ &=(e^{-it}x_2+ie^{-it}x_3)^\ell \\ &=e^{-i\ell t}(x_2+ix_3)^\ell \\ &=e^{-i\ell t}f_\ell(t).\end{aligned}\]

\(\square\)

We find the eigenvalue because it shows that

\(\exp(-i\ell t)\) appears in the summand of\(\chi_{\mathfrak{H}_\ell}(R(t))\),hence \(|-\ell|=\ell \le \max n_\nu\).Since \(\{n_\nu\}\) is finite, themaximum can be attained, and therefore our argument on dimension isdone. The representation of \(U(2)\)can be deduced algebraically, for one only need to notice that

\(U(2) = (S^1 \times SU(2))/H\), where \(H=\{(1,I),(-1,-I)\}\). One will also needan odd-even argument just like we did to \(SO(3)\). Likewise, since \(O(3)=SO(3) \times\mathbb{Z}/2\mathbb{Z}\), we can deduce the irreduciblerepresentations of \(O(3)\) in a samefashion.

Representation theory is important in various branches of mathematicsand physics. When studying representation of finite groups, we havequite some algebra and combinatorics. When differentiation (moreprecisely, smoothness) joins the party, we have Lie group, involvingcalculus, linear algebra, geometry and much more. Especially, theoriesaround \(SU(2)\) and

In this post we develop a way to study irreducible representations ofthese two Lie groups, in a mathematician's way. I try my best to makesure that everything is down-to-earth, and everything can be "reduced"to 19th (pre-modern) mathematics.

Nevertheless, the reader has to be assumed to be familiar withelementary languages of representation theory (and you know that, thereare a lot of abuse of language), which I think is not a problem becauseotherwise you wouldn't be reading this post. You need to recalleigenvalue theories in linear algebra, as well as Fourier series. Weneed the fact that the trigonometric system is complete. In other wordstrigonometric polynomials are dense in the space of continuousfunctions.

We will first study \(SU(2)\) and afirst classification of irreducible representations of

\[SU(2)/\{-I,I\} \cong SO(3).\]

This is to say, \(SU(2)\) is a"double cover" of \(SO(3)\). To seethis, notice that \(SU(2) \cong S^3\)and \(SO(3) \cong \mathbb{R}P^3\) asLie groups, meanwhile \(\mathbb{R}P^3 \congS^3/\{-1,1\}\) can be considered as the definition.

Of course, by representation we mean finite dimensional and unitaryrepresentations.

Indeed it seems we have nowhere to start. Instead of trying to findall of them, we will try to work on seemingly immediate representationsand it turns out that they are all we are looking for.

Let \(V_0\) be the trivialrepresentation on \(\mathbb{C}\) and\(V_1\) be the standard representationon \(\mathbb{C}^2\), which is given byordinary matrix multiplication. These representations are irreducible.We want to extend this family to

Direct sum:

\(\bigoplus_{i=1}^{n}V_1\). The dimension is\(2n\) and unfortunately, therepresentation is determined by each component so essentially there isno "new thing". Tensor product:

\(\bigotimes_{i=1}^{n}V_1\). The dimension is\(2^n\) which is way too big. Wedge product:

\(\bigwedge^{n}V_1\). It stops at \(n=2\) and we have to deal with \(u \wedge v = - v \wedge u\). This can beannoying. Symmetric product:

\(\sym^{n}V_1\). The dimension is \(n+1\) and it doesn't stop. Besides, it canbe understood as homogeneous polynomials of degree \(n\) in two variables. This is a fantasticchoice. Besides we have \(\sym^0V_1=V_0\) so nothing is abruptly excluded.

Put \(V_n=\sym^nV_1\), which can beunderstood as the space of homogeneous polynomials of degree

\[P_k=z_1^k z_2^{n-k}.\]

And we will make use of it later.

For each \(g \in SU(2)\), we have aleft action

\[\begin{aligned}\rho:SU(2) &\to \operatorname{Aut}(V_n), \\ g &\mapsto (P(z) \mapsto P(zg)).\end{aligned}\]

In other words,

\[g=\begin{pmatrix}\alpha & \beta \\-\overline{\beta} & \overline{\alpha}\end{pmatrix}, \quad |\alpha|^2+|\beta|^2=1.\]

Then

\[zg=(\alpha z_1-\overline{\beta}z_2,\beta z_1+\overline{\alpha}z_2)\]

When there is no confusion, we will write

Since \(z \mapsto zg\) is ahomogeneous map of degree \(1\) as itis linear and is non-degenerate, we have **We now have awell-defined representation.** Note

Proposition 1.The representations\(V_n\) are irreducible.

*Proof.* By Schur's lemma, we need to show that each

The group \(SU(2)\) can becomplicated, but \(U(1) \cong S^1\) issimple and can be considered as a subgroup of

First of all we embed \(S^1\) into\(SU(2)\) by

\[a \mapsto \begin{pmatrix}a & 0 \\ 0 & a^{-1} \end{pmatrix}.\]

Call the matrix right hand side

\[g_a P_k=(az_1)^{k}(a^{-1}z_2)^{n-k}=a^{2k-n}z_1^kz_2^{n-k}=a^{2k-n}P_k\]

for all \(k\). This is to say, **eigenvector**corresponding to **eigenvalue**

Since \(\{P_k\}\) are linearlyindependent, under this basis, we have a matrix representation

\[\rho(g_a) = \operatorname{diag}(a^{-n},a^{-n+2},\dots,a^{n-2},a^n).\]

but we don't know how eigenspaces are spanned because we may have\(a^j=a^k\) for

On the other hand, by definition of

\[Ag_a P_k = g_aAP_k = A a^{2k-n}P_k = a^{2k-n}AP_k.\]

Hence \(AP_k\) lies in

\[A = \begin{pmatrix}c_1 & & \\ & \ddots & \\ & & c_n\end{pmatrix}.\]

We want this matrix to be a scalar matrix. The result follows fromanother embedding of \(U(1)\) into\(SU(2)\). Note

\[g_t = \begin{pmatrix}\cos{t} & -\sin{t} \\\sin{t} & \cos{t}\end{pmatrix} \in SU(2).\]

Still we have \(Ag_t=g_tA\). As wecan see,

\[\begin{aligned}Ag_tP_n &= A(z_1\cos{t}+z_2\sin{t})^n \\ &= A\sum_{k=0}^{n}{n \choose k}(z_1\cos{t})^k(z_2\sin{t})^{n-k} \\ &= A\sum_{k=0}^{n}{n \choosek}(\cos{t})^k(\sin{t})^{n-k}z_1^k z_2^{n-k} \\ &= A\sum_{k=0}^{n}{n \choose k}(\cos{t})^k(\sin{t})^{n-k}P_k \\ &= \sum_{k=0}^{n}{n \choose k}(\cos{t})^k(\sin{t})^{n-k}AP_k \\ &= \sum_{k=0}^{n}{n \choose k}(\cos{t})^k(\sin{t})^{n-k}c_kP_k.\end{aligned}\]

This follows from our observation on eigenvalues. Next, weimmediately use the eigenvalue \(c_n\)to obtain

\[g_t AP_n = g_t c_nP_n = c_n \sum_{k=0}^{n}{n \choosek}(\cos{t})^k(\sin{t})^{n-k} P_k.\]

This is the definition of

So far we have used diagonalisation of representations of

Let's recall diagonalisation in

\[g \sim \begin{pmatrix}\lambda & 0 \\ 0 & \lambda^{-1}\end{pmatrix} \sim \begin{pmatrix}\lambda^{-1} & 0 \\ 0 &\lambda \end{pmatrix}\]

where \(\lambda\) is one of theeigenvalues of \(g\). Since thediagonalised matrix is still in

\[e(t) = \begin{pmatrix} \exp(it) & 0 \\ 0 & \exp(-it)\end{pmatrix}.\]

We see, \(e(s) \sim e(t)\) if andonly if \(s = \pm t \mod 2\pi\). Byperiodicity of \(\exp\) function, wealso see \(e(t)\) is in particular\(2\pi\)-periodic. If

Define \(\Lambda:SU(2) \to S^1\)sending \(g \in SU(2)\) to theeigenvalue of \(g\) with non-negativeimaginary part (one can also pick non-positive one, because

With help of this \(e(t)\) and

\[\{\text{Class functions }SU(2) \to \mathbb{C}\} \longleftrightarrow\{\text{even }2\pi-\text{periodic function }\mathbb{R} \to \mathbb{C}\}\]

Recall that the space on the right hand side has a countable uniformbasis

\[1,\cos{t},\cos{2t},\dots.\]

In other words, \(\{\cos{nt}\}_{n \ge0}\) spans a dense subspace. This is about the completeness oftrigonometric system. Since there are only even functions, *Real and Complex Analysis*by W. Rudin.

For class functions, we certainly want to know about characters. Let\(\chi_n\) be the character of

\[\chi_n(e(t))=\operatorname{tr}(\rho(e(t)))=\operatorname{tr}(\operatorname{diag}(\exp(it)^{-n},\dots,\exp(it)^n))=\sum_{k=0}^{n}e^{i(n-2k)t}.\]

When \(t \in \pi\mathbb{Z}\), then\(\chi_n(e(t)) \in \mathbb{Z}\).Otherwise, as a classic exercise in calculus, we have

\[\kappa_n(t)=\chi_n(e(t))=\frac{\sin(n+1)t}{\sin{t}}.\]

We have \(\kappa_0(t)=1\). For

\[\kappa_n(t)=\frac{\cos{nt}\sin{t}+\sin{nt}\cos{t}}{\sin{t}}=\cos{nt}+\kappa_{n-1}(t)\cos{t}.\]

We see \(\kappa_1(t)=2\cos{t}\). Byinduction, every \(\kappa_n(t)\) is apolynomial in variables

The argument above shows that

Proposition 2.For continuous class function\(f:SU(2) \to \mathbb{C}\), we have \[\int_{SU(2)}f(x)dx = \frac{1}{\pi}\int_0^\pi f \circ e(t)\sin^2{t}dt.\]

*Proof.* On one hand, since the

\[\int_{SU(2)}\chi_n(x)dx = \dim V_n^{SU(2)} = \begin{cases} 1 & n=0,\\ 0 & n>0. \end{cases}\]

Here, for a group \(G\) and arepresentation \(V\),

On the right hand side we are looking for even

\[\frac{1}{2\pi}\int_{-\pi}^{\pi}f\circ e(t)\sin^2{t}dt =\frac{1}{\pi}\int_{-\pi}^{\pi}\sin(n+1)t\sin{t}dt = \begin{cases} 1& n=0, \\ 0 & n>0. \end{cases}\]

Since the functional \(h \mapsto\frac{1}{2\pi}\int_{-\pi}^{\pi}h\sin^2{t}dt\) is continuous inthe uniform topology and \(\kappa_n\)spans a dense subspace, the result is now obtained.

Finally, surprisingly and satisfyingly enough, the denseness haveactually axed out all other possibilities of irreducible representation.In other words, our search in symmetric products is optimal. We can seethis through Parseval's identity. This is the heart of this blogpost.

Proposition 3.Every irreducible representation of\(SU(2)\) is isomorphic to one of the\(V_n\).

*Proof.* Suppose we have a character that is different fromall of the \(\chi_n\). Then theorthonormality shows that \(\langle\chi,\chi_n \rangle = 0\) for all \(n\ge 0\) and \(\langle \chi,\chi\rangle=1\). Now let's see why this is absurd.

Since \(\{\chi_n\}_{n \ge 0}\) spansa dense subspace in the space of class functions, we actually have

\[\chi = \sum_{n = 0 }^{\infty} a_n \chi_n.\]

Therefore

\[\langle \chi,\chi_n \rangle = \int_{SU(2)}\overline\chi(x)\chi_n(x)dx =a_n=0,\quad n \ge 0\]

and

\[\langle \chi,\chi \rangle = \sum_{n=0}^{\infty}\langlea_n\chi_n,a_n\chi_n \rangle = \sum_{n=0}^{\infty}|a_n|^2=1.\]

It is impossible to have the sum of

Now we head to \(SO(3)\). In factthe result follows immediately from the surjection

\[\pi:SU(2) \to SO(3).\]

We have \(\ker\pi=\{-I,I\}\). Let\(W\) be a representation of

\[\rho:SO(3) \to GL(W).\]

Then

\[\pi^\ast\rho:SU(2) \to GL(W)\]

by \(g \mapsto \rho(\pi(g))\) is aninduced representation, and we write

On the other hand, if \(\vartheta:SU(2) \toGL(V)\) is an irreducible representation where

\[\pi_\ast\vartheta:SO(3) \cong SU(2)/\{I,-I\} \to GL(V)\]

given by \(g\ker\pi \mapsto\vartheta(g)\). Let's denote it by

Therefore we have realised a correspondence

\[\{\text{Irreducible representations of $SO(3)$}\} \\\updownarrow \\\{\text{Irreducible representations of $SU(2)$ where $-I$ acts asidentity.}\}\]

So it remains to determine those of

\[\rho_n(-I)P(z)=P(z(-I))=P(-z)=(-1)^nP(z)\]

because \(P \in\mathbb{C}[z_1,z_2]\) is homogeneous of degree

Proposition 4.Every irreducible representation of\(SO(3)\) is of the form\[W_n = \pi_\ast V_{2n}\]

where \(V_{2n}\) is described inproposition 2.

This is, of course, just a first classification. But to introduce aclassification as explicit as what we have done for

Let \(P_{\ell}\) be the complexvector space of homogeneous polynomials in three variables of degree\(\ell\), which can be considered asfunctions on \(\mathbb{R}^3\)immediately. This setting makes sense immediately, just as what we havedone for \(SU(2)\). Then, in fact,

\[W_\ell=\mathfrak{H}_\ell = \{f \in P_\ell:\Delta f=0\}.\]

This is to say, \(W_\ell\) can beunderstood as harmonic homogeneous polynomials in

- Tendor Bröker and Tammo tom Dieck,
*Representations of CompactLie Groups*. - Walter Rudin,
*Real and Complex Analysis, 3rd Edition*.

Let's admit, trying to compute the integral straightforward is somewhat unrealistic. So we need to go through an alternative way. Here comes how we do that. For convenience (of writing MathJax codes) let's write \(\varphi(t)=\hat{f}_c(t)\).

First of all, \(\hat{f}_c(t)\) is always well-defined, this is because \[\int_{-\infty}^{+\infty}|f_c(x)e^{-ixt}|dx=\int_{-\infty}^{+\infty}|f_c(x)|dx<\infty\] so we can compute it without worrying anything.

It's hard to think about but we do have it. An integration by parts gives \[\begin{aligned}\varphi(t)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}\exp(-cx^2)e^{-itx}dx &= \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}\exp(-cx^2)\frac{1}{-it}de^{-itx} \\&=\frac{i}{t\sqrt{2\pi}}[\exp(-cx^2)e^{-itx}]|_{-\infty}^{+\infty} \\&\quad -\frac{i}{t\sqrt{2\pi}}\int_{-\infty}^{+\infty}e^{-itx}d\exp(-cx^2) \\&=\frac{-2c}{t\sqrt{2\pi}}\int_{-\infty}^{+\infty}-xi\exp(-cx^2)e^{-itx}dx\end{aligned}\] On the other hand, we have \[\varphi'(t)=\hat{f'_c}(t)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}-ixf_c(x)e^{-itx}dx.\] (The well-definedness of the integral can be verified easily.) Combining both, we obtain an differential equation \[t\varphi(t)+2c\varphi'(t)=0\] This differential equation corresponds to an integral equation \[\int2c\frac{d\varphi}{\varphi}=-\int tdt.\] And we solve it to obtain \[2c\log\varphi=-\frac{1}{2}t^2+C\] or alternatively, \[\varphi(t)=C\exp(-\frac{1}{4c}t^2).\] Now put the initial value back in. As we have shown above, this subjects to the Gaussian integral \[\varphi(0)=\frac{1}{\sqrt{2\pi}}\sqrt{\frac{\pi}{c}}=\frac{1}{\sqrt{2c}}.\] Therefore \[\varphi(t)=\frac{1}{\sqrt{2c}}\exp\left(-\frac{1}{4c}t^2\right)\] is exactly what we want.

Before showing another method, we first have an question: can we have \(\hat{f}_c=f_c\)? Solving an equation with variable in \(c\) answers this question affirmatively: \[\hat{f}_c=f_c \iff \begin{cases}\frac{1}{\sqrt{2c}}=1 \\ -\frac{1}{4c}=-c \end{cases} \iff c = \frac{1}{2}.\] In other words, \(f_\frac{1}{2}\) is a fixed point of the Fourier transform. For this class of functions, the fixed point is this and only this one.

As a classic property of the Fourier transform, for \(f,g \in L^1\), we have \[\widehat{f \ast g}(t)=\hat{f}(t)\hat{g}(t)\] where \[f \ast g(x) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}f(x-y)g(y)dy.\] By the way, \(f \in L^1\) means \(\int_{-\infty}^{\infty}|f(x)|dx<\infty\). One can verify that \(f \ast g \in L^1\) here as well.

With this result, we can compute \(f_a \ast f_b\) easily. Note \[\widehat{f_a \ast f_b}(t)=\hat{f_a}(t)\hat{f_b}(t)=\frac{1}{2\sqrt{ab}}\exp\left[-\left(\frac{1}{4a}+\frac{1}{4b}\right)x^2\right]\] Now let's see if we can have \(f_a \ast f_b = \gamma f_c\) for some \(\gamma\) and \(c\). We should have \[\frac{1}{4c}=\frac{1}{4a}+\frac{1}{4c} \implies c=\frac{1}{\frac{1}{a}+\frac{1}{b}}=\frac{ab}{a+b}.\] We also have \[\gamma \frac{1}{\sqrt{2c}}=\frac{1}{2\sqrt{ab}} \implies \gamma = \sqrt\frac{c}{2ab}=\sqrt{\frac{1}{2(a+b)}}\] Therefore we have \[f_a \ast f_b = \sqrt{\frac{1}{2(a+b)}}f_c\] where \(c\) is given above. We didn't even compute the integral explicitly.

]]>Is intended to supply a detailed proofs of the Riemann mapping theorem.

Riemann mapping theorem.Every simply connected region \(\Omega \subsetneq \mathbb{C}\) is conformally equivalent to the open unit disc \(U\).

Fortunately the proof can be found in many textbooks of complex analysis, but the proof is fairly technical so it can be painful to read. This post can be considered as a painkiller. In this post you will see the proof being filled with many details. However, the writer still encourage the reader to reproduce the proof by their own pen and paper. The writer also hopes that this post can increase the accessibility of this theorem and the proof.

However, there is a bar. We need to assume some background in complex analysis, although they are very basic already. Minimal prerequisite is being able to answer the following questions.

Contour integration, Cauchy's formula.

Almost uniform convergence. Let \(\Omega \subset \mathbb{C}\) be open and suppose that \(f_j \in H(\Omega)\) for all \(j=1,2,\dots\), and \(f_j \to f\) uniformly on every compact subset \(K \subset \Omega\). Does \(f \in H(\Omega)\)? What is the uniform limit of \(f'_j\)? Informally, we call the phenomenon that a sequence of functions uniformly converges on every compact subset

*almost uniform*convergence. This has nothing to do with*almost everywhere*in integration theory. In fact, this post does not require background in Lebesgue integration theory.Open mapping theorem (complex analysis version).

Maximum modulus principle and some variants.

Rouché's theorem. Or even more, the calculus of residues.

Despite of the prerequisites, we still need some preparation beforehand.

Definition 1.Let \(X\) be aconnectedtopological space. We say \(X\) issimply connectedif every curve is null-homotopic. Let \(\gamma:[0,1] \to X\) be a closed curve, i.e., it is a continuous map such that \(\gamma(0)=\gamma(1)\). We say \(\gamma\) is null-homotopic if it is homotopic to a constant map \(\gamma_0:[0,1] \to \{x\}\) with \(x \in X\).

Intuitively, if \(X\) is simply connected, then \(X\) contains no "hole". For example, the unit disc \(U\) is simply connected. However, \(U \setminus \{0\}\) is not. On the other hand, \(U \setminus [0,1)\) is still simply connected. Another satisfying result is that every convex and connected open set is simply connected. This is up to a convex combination.

There are a lot of good properties of simply connected region, which will be summarised below.

Proposition 1.For a region (open and connected subset of \(\mathbb{R}^2\)), the following conditions are equivalent. Each one can imply other eight.

- \(\Omega\) is homeomorphic to the open unit disc \(U\).
- \(\Omega\) is simply connected.
- \(\operatorname{Ind}_\gamma(\alpha)=0\) for every path \(\gamma\) in \(\Omega\) and \(\alpha \in S^2 \setminus \Omega\), where \(S^2\) is the Riemann sphere.
- \(S^2 \setminus \Omega\) is connected.
- Every \(f \in H(\Omega)\) can be approximated by polynomials, almost uniformly..
- For every \(f \in H(\Omega)\) and every closed path \(\gamma\) in \(\Omega\),
\[\int_\gamma f(z)\mathrm{d}z=0.\]

- Every \(f \in H(\Omega)\) has anti-derivative. That is, there exists an \(F \in H(\Omega)\) such that \(F'=f\).
- If \(f \in H(\Omega)\) and \(1/f \in H(\Omega)\), then there exists a \(g \in H(\Omega)\) such that \(f=\exp{g}\).
- For such \(f\), there also exists a \(\varphi \in H(\Omega)\) such that \(f=\varphi^2\).

5~9 are pretty much saying, calculus is fine here and we are not worrying about nightmare counterexamples, to some extent. Most of the implications \(n \implies n+1\) are not that difficult, but there are some deserve a mention. 4 implying 5 is a consequence of Runge's theorem. In the implication of 7 to 8, one needs to use the fact that \(\Omega\) is connected. When we have \(f=\exp{g}\), then we can put \(\varphi=\exp\frac{g}{2}\) from which we obtain \(f=\varphi^2\). 9 implying 1 is partly a consequence of the Riemann mapping theorem. Indeed, if \(\Omega\) is the plane then the homeomorphism is easy: \(z \mapsto \frac{z}{1+|z|}\) is a homeomorphism of \(\Omega\) onto \(U\). But we need the Riemann mapping theorem to give the remaining part, when \(\Omega\) is a proper subset.

If you know the definition of sheaf, you will realise that \((\mathbb{C},H(\cdot))\) is indeed a sheaf. For each open subset \(\Omega \subset \mathbb{C}\), \(H(\Omega)\) is a ring, even more precisely, a \(\mathbb{C}\)-algebra. The exponential map \(\exp:g \mapsto e^g\) is a sheaf morphism. However, we now see that it is surjective if and only if \(\Omega\) is simply connected. I hope this can help you figure out an exercise in algebraic geometry. You know, that celebrated book by Robin Hartshorne.

Since we haven't prove the Riemann mapping theorem, we cannot use the equivalence above yet. However, we can use 9 right away. This gives rise to Koebe's square root trick.

Equicontinuity is quite an important concept. You may have seen it in differential equation, harmonic function, maybe just sequence of functions. We will use it to describe a family of functions, where almost uniform convergence can be well established.

Definition 2.Let \(\mathscr{F}\) be a family of functions \((X,d) \to \mathbb{C}\) where \((X,d)\) is a metric space.We say that \(\mathscr{F}\) is

equicontinuousif, to every \(\varepsilon>0\), there corresponds a \(\delta>0\) such that whenever \(d(x,y)<\delta\), we have \(|f(x)-f(y)|<\varepsilon\) for all \(f \in \mathscr{F}\). In particular, by definition, all functions in \(\mathscr{F}\) are uniformly continuous.We say that \(\mathscr{F}\) is

pointwise boundedif, to every \(x \in X\), there corresponds some \(0 \le M(x) < \infty\) such that \(|f(x)| \le M(x)\) for every \(f \in \mathscr{F}\).We say that \(\mathscr{F}\) is

uniformly bounded on each compact subsetif, to each compact \(K \subset X\), there corresponds a number \(M(K)\) such that \(|f(z)| \le M(K)\) for all \(f \in \mathscr{F}\) and \(z \in K\).

These concepts are talking about "a family of" continuity and boundedness. In our proof of the Riemann mapping theorem, we do not construct the map explicitly, instead, we will use these concepts above to obtain one (which is a limit) that exists. In this post we simply put \(X=\Omega \subset \mathbb{C}\), a simply connected region and \(d\) is the natural one.

A famous result of equicontinuity is Arzelà-Ascoli, which says that pointwise boundedness and equicontinuity implies almost uniform convergence.

Theorem 1 (Arzelà-Ascoli)Let \(\mathscr{F}\) be a family of complex functions on a metric space \(X\), which is pointwise bounded and equicontinuous. \(X\) is separable, i.e., it contains a countable dense set. Then every sequence \(\{f_n\}\) in \(\mathscr{F}\) has then a subsequence that converges uniformly on every compact subset of \(X\).

Here is a self-contained proof.

Certainly it is OK to let \(X\) be a subset of \(\mathbb{R}\), \(\mathbb{C}\) or their product. We use this in real and complex analysis for this reason. We will need this almost uniform convergence to establish our conformal map. To specify its application in complex analysis, we introduce the concept of normal family.

Definition 3.Suppose \(\mathscr{F} \subset H(\Omega)\), for some region \(\Omega \subset \mathbb{C}\). We call \(\mathscr{F}\) anormal familyif every sequence of members of \(\mathscr{F}\) contains a subsequence, which converges uniformly on every compact subset of \(\mathscr{F}\). The limit function is not required to be in \(\mathscr{F}\).

We now apply Arzelà-Ascoli to complex analysis.

Theorem 2 (Montel).Suppose \(\mathscr{F} \subset H(\Omega)\) is uniformly bounded, then \(\mathscr{F}\) is a normal family.

*Proof.* We need to show that \(\mathscr{F}\) is "almost" equicontinuous, since uniformly boundedness clearly implies pointwise boundedness, we can apply Arzelà-Ascoli later.

Let \(\{K_n\}\) be a sequence of compact sets such that (1) \(\bigcup_n K_n = \Omega\) and (2) \(K_n \subset K^\circ_{n+1} \subset K_{n+1}\), the interior of \(K_{n+1}\). Then for **every** \(z \in K_n\), there exists a positive number \(\delta_n\) such that \[D(z,2\delta_n) \subset K_{n+1}^\circ \subset K_{n+1},\] where \(D(a,r)\) is the disc centred at \(a\) with radius \(r\). If such \(\delta_n\) does not exist, then there exists a point \(z \in K_{n}\) such that whenever \(\delta>0\), \(D(z,\delta) \setminus K_{n+1} \ne \varnothing\), which is to say, \(z\) is a boundary point. But this is impossible because \(z\) lies in the interior of \(K_{n+1}\) by definition.

For such \(\delta_n\), we pick \(z',z'' \in K_n\) such that \(|z'-z''| < \delta_n\). Let \(\gamma\) be the positively oriented circle with centre at \(z'\) and radius \(2\delta_n\), i.e. the boundary of \(D(z',2\delta_n)\). Recall that the Cauchy formula says \[f(z')=\frac{1}{2\pi{i}}\int_\gamma \frac{f(\zeta)}{\zeta-z'}\mathrm{d}\zeta.\] We will make use of this. By the formula above, we have \[\begin{aligned}f(z')-f(z'')&=\frac{1}{2\pi{i}}\int_\gamma f(\zeta)\left(\frac{1}{\zeta-z'}-\frac{1}{\zeta-z''} \right)\mathrm{d}\zeta \\ &=\frac{z'-z''}{2\pi{i}}\int_\gamma \frac{f(\zeta)}{(\zeta-z')(\zeta-z'')}\mathrm{d}\zeta.\end{aligned}\] Now we make use of our choice of \(z'\), \(z''\) and \(\gamma\). By definition, for \(\zeta \in \gamma^\ast\) (the range of \(\gamma\)), we have \(|\zeta-z'|=2\delta_n\). Since \(|z'-z''|<\delta_n\), we have \(|\zeta-z'|=2\delta_n=|\zeta-z''+z''-z|\le |\zeta-z''|+|z''-z'|\). Therefore \(|\zeta-z''| \ge 2\delta_n-|z''-z'|>\delta_n\). Bearing this in mind, we see \[\begin{aligned}|f(z')-f(z'')| &\le \frac{|z'-z''|}{2\pi}\int_\gamma \frac{|f(\zeta)|}{|\zeta-z'||\zeta-z''|}\mathrm{d}\zeta \\ &< \frac{|z'-z''|}{2\pi}\int_\gamma \frac{M(K_{n+1})}{2\delta_n\delta_n}\mathrm{d}\zeta \\ &= \frac{|z'-z''|}{2\pi}\frac{M(K_{n+1})}{2\delta_n\delta_n}2\pi\delta_n \\ &= \frac{M(K_{n+1})}{2\delta_n}|z'-z''|\end{aligned}\] This may looks confusing so we explain it a little more. Since \(D(z',2\delta_n) \subset K^\circ_{n+1}\), we must have \(\overline{D}(z',2\delta_n) \subset K_{n+1}\), therefore whenever \(\zeta \in \gamma^\ast=\partial D(z',2\delta_n)\), we have \(|f(\zeta)| \le M(K_{n+1})\). This is where we use the hypothesis of uniformly bounded. we have \(|(\zeta-z')(\zeta-z'')|>2\delta_n\delta_n\). The integral of the norm of the integrand \(\frac{f(\zeta)}{(\zeta-z')(\zeta-z'')}\), is therefore bounded by \(\frac{M(K_{n+1})}{2\delta_n^2}\). The integral over \(\gamma\) is therefore bounded by \(\frac{M(K_{n+1})}{2\delta_n^2}\) times \(2\pi\delta_n\) and the result follows.

What does this inequality imply? For \(\varepsilon>0\), if we pick \(\delta=\min\{\delta_n,\frac{2\delta_n\varepsilon}{M(K_{n+1})}\}\), then \(|f(z')-f(z'')|<\varepsilon\) for every \(f \in \mathscr{F}\) and \(|z'-z''|<\delta\). That is, for each \(K_n\), the **restrictions** of the members of \(\mathscr{F}\) to \(K_n\) form an equicontinuous family.

Now consider a sequence \(\{f_j\}\) in \(\mathscr{F}\). For each \(n\), we apply Arzelà-Ascoli theorem to the restriction of \(\mathscr{F}\) to \(K_n\), and it gives us an infinite subset \(S_n \subset \mathbb{N}\) such that \(f_j\) converges uniformly on \(K_n\) as $j $ and \(j \in S_n\). Note we can make sure \(S_n \supset S_{n+1}\) because if the subsequence converges uniformly within \(S_{n+1}\) then it converges uniformly within \(S_n\) as well. Pick a new sequence \(\{s_j\}\) where \(s_j \in S_j\), then we see \(\lim_{j \to \infty}f_{s_j}\) converges uniformly on every \(K_n\) and therefore on every compact subset \(K\) of \(\Omega\). The statement is now proved. \(\square\)

**Remarks.** We have no idea what the limit is, and this happens in our proof of the Riemann map theorem as well.

The sequence \(\{K_n\}\) can be constructed explicitly, however. In fact, for every open set \(\Omega\) in the plane there is a sequence \(\{K_n\}\) of compact sets such that

- \(\bigcup_n K_n=\Omega\).
- \(K_n \subset K_{n+1}^\circ\).
- For every compact \(K \subset \Omega\), there is some \(n\) such that \(K \subset K_n\).
- Every component of \(S^2 \setminus K_n\) contains a component of \(S^2 \setminus \Omega\).

The set is constructed as follows and can be verified to satisfy what we want above. or each \(n\), define \[V_n = D(\infty,n) \cup \bigcup_{a \not\in \Omega}D(a,1/n).\] Then \(K=S^2 \setminus V_n\) is what we want.

Is another important tool for our proof of the Riemann mapping theorem. We need this lemma to establish important inequalities. This lemma as well as its variants show the rigidity of holomorphic maps. We make use of the maximum modulus theorem. For simplicity, let \(H^\infty\) be the Banach space of bounded holomorphic functions on \(U\), equipped with supremum norm \(\| \cdot \|_\infty\).

Theorem 3 (Schwarz lemma).Suppose \(f:U \to \mathbb{C}\) is a holomorphic map in \(H^\infty\) such that \(f(0)=0\) and \(\|f\|_\infty \le 1\), then \[\begin{aligned}|f(z)| &\le |z| \quad (z \in U), \\|f'(0)| &\le 1;\end{aligned}\] on the other hand, if \(|f(z)|=|z|\) holds for some \(z \in U \setminus \{0\}\), or if \(|f'(0)|=1\) holds, then \(f(z)=\lambda{z}\) for some complex constant \(\lambda\) such that \(|\lambda|=1\).

*Proof.* Since \(f(0)=0\), \(f(z)/z\) has a removable singularity at \(z=0\). Hence there exists \(g \in H(U)\) such that \(f(z)=zg(z)\). Fix \(0<r<1\). For any \(z \in U\) such that \(|z|<r\), we have \[|g(z)| \le \max_\theta\frac{|f(re^{i\theta})|}{|re^{i\theta}|} \le \frac{1}{r}.\] Therefore when \(r \to 1\), we see \(|g(z)| \le 1\) for all \(z \in U\). Therefore \(|f(z)| \le |z|\) follows. On the other hand, if \(|g(z)|=1\) at some point, the maximum modulus forces \(g(z)\) to be a constant, say \(\lambda\), from which it follows that \(|\lambda|=|g(z)|=1\) and \(f(z)=\lambda{z}\). \(\square\)

There are many variances of the Schwarz lemma, and we will be using Schwarz-Pick.

Definition 4.For any \(\alpha \in U\), define \[\varphi_\alpha(z) = \frac{z-\alpha}{1-\overline\alpha z}.\]

This family is a subfamily of Möbius transformation, but we are not paying very much attention to this family right now. We need the fact that such \(\varphi_\alpha\) is always a one-to-one mapping which carries \(S^1\) (the unit circle) onto \(S^1\) and \(U\) onto \(U\) and \(\alpha\) to \(0\). This requires another application of the maximum modulus theorem. A direct computation shows that \[\varphi'_\alpha(0)=1-|\alpha|^2, \quad \varphi'_\alpha(\alpha)=\frac{1}{1-|\alpha|^2}.\]

Theorem 4 (Schwarz-Pick lemma).Suppose \(\alpha,\beta \in U\), \(f \in H^\infty\) and \(\| f\|_\infty \le 1\), \(f(\alpha)=\beta\). Then \[|f'(\alpha)| \le \frac{1-|\beta|^2}{1-|\alpha|^2}.\]

*Proof.* Consider \[g=\varphi_\beta \circ f \circ \varphi_{-\alpha}.\] We see \(g \in H^\infty\) and \(\|g\|_\infty \le 1\). What's more important, \(g(0)=\varphi_\beta \circ f(\alpha)=\varphi_\beta(\beta)=0\). By the Schwarz lemma, \(|g'(0)| \le 1\). On the other hand, we see \[g'(0)=\varphi_\beta'(\beta)f'(\alpha)\varphi_{-\alpha}'(0)\] and therefore \[|f'(\alpha)| \le \frac{1-|\beta|^2}{1-|\alpha|^2}.\] In particular, equality holds if and only if \(g(z)=\lambda{z}\) for some constant \(\lambda\). If this is the case, then \[\varphi_\beta \circ f \circ \varphi_{-\alpha}(z)=\lambda{z} \implies f(z)=\varphi_{-\beta}(\lambda\varphi_\alpha(z)).\] The story can go on but we halt here and continue our story of the Riemann mapping theorem.

Each \(z \ne 0\) determines a *direction* from the origin, which can be described by \[A[z]=\frac{z}{|z|}.\] Let \(f:\Omega \to \mathbb{C}\) be a map. We say \(f\) *preserves angles* at \(z_0 \in \Omega\) if \[\lim_{r \to 0}e^{-i\theta}A[f(z_0+re^{i\theta})-f(z_0)]\] exists and is independent of \(\theta\).

Conformal mappings preserves angles in a reasonable way. A function \(f\) is **conformal** if it is holomorphic and \(f'(z) \ne 0\) everywhere. We have a theorem describes that, but it is pretty elementary so we are not including the proof in this post.

Theorem 5.Let \(f\) map a region \(\Omega\) into the plane. If \(f'(z_0)\) exists at some \(z_0 \in \Omega\) and \(f'(z_0) \ne 0\), then \(f\) preserves angles at \(z_0\). Conversely, if the differential \(Df\) exists and is different from \(0\) at \(z_0\), and if \(f\) preserves angles at \(z_0\), then \(f'(z_0)\) exists and is different from \(0\).

There is no confusion about \(f'(z_0)\). By differential \(Df\) we mean a linear map \(L:\mathbb{R}^2 \to \mathbb{R}^2\) such that, writing \(z_0=(x_0,y_0)\), we have \[f(x_0+x,y_0+y)=f(x_0,y_0)+L(x,y)+{\sqrt{x^2+y^2}}\eta(x,y)\] where \(\eta(x,y) \to 0\) as \(x \to 0\) and \(y \to 0\). To prove this, one can assume that \(z_0=f(z_0)=0\). When the differential exists, one writes \[f(z)=\alpha{z}+\beta\overline{z}+|z|\eta(z).\] We say that two regions \(\Omega_1\) and \(\Omega_2\) are **conformally equivalent** if there is a conformal one-to-one mapping of \(\Omega_1\) onto \(\Omega_2\). The Riemann mapping theorem states that

Theorem 6 (Riemann mapping theorem).Every proper simply connected region \(\Omega\) in the plane is conformally equivalent to the open unit disc \(U\).

As a famous example, the upper plane \(\mathbb{H}\) is conformally equivalent to \(U\) by the Cayley transform.

As one may expect, this theorem asserts that the study of a simply connected region \(\Omega\) can be reduced to \(U\) to some extent. But a conformal equivalence is not just about homeomorphism. If \(\varphi:\Omega_1 \to \Omega_2\) is a conformal one-to-one mapping, then \(\varphi^{-1}:\Omega_2 \to \Omega_1\) is also a conformal mapping. In the language of algebra, such a mapping \(\varphi\) **induces** a ring isomorphism \[\begin{aligned}\varphi^\ast:H(\Omega_2) &\to H(\Omega_1) \\ f &\mapsto f \circ \varphi\end{aligned}\] Therefore, the ring \(H(\Omega_2)\) is algebraically the same as \(H(\Omega_1)\). The Riemann mapping theorem also states that, if \(\Omega\) is a simply connected region, then \(H(\Omega) \cong H(U)\). From this we can exploit much more information on top of homeomorphism. One can also extend the story to \(S^2\), the Riemann sphere, but that's another story.

The proof is fairly technical. But it is a good chance to attest to our skill in complex analysis. The bread and butter of this proof is the following set: \[\Sigma = \{\psi \in H(\Omega):\psi(\Omega) \subset U;\psi\text{ is one-to-one.}\}\] Our is to prove that there is some \(\psi \in \Sigma\) such that \(\psi(\Omega)=U\). Note, once the non-emptiness is proved, since \(|\psi|<1\) uniformly, we see \(\Sigma\) is a **normal family**.

Pick \(w_0 \in \mathbb{C} \setminus \Omega\). Then \(g(z)=z-w_0 \in H(\Omega)\) and what is more important, \(\frac{1}{g} \in H(\Omega)\). By 9 of proposition 1, there exists \(\varphi \in H(\Omega)\) such that \(\varphi^2(z)=g(z)\), i.e., informally, \(\varphi(z)=\sqrt{z-w_0}\) in \(\Omega\). If \(\varphi(z_1)=\varphi(z_2)\), then \(\varphi(z_1)^2=\varphi(z_2)^2=z_1-w_0=z_2-w_0\) and then \(z_1=z_2\). Therefore \(\varphi\) is one-to-one. On the other hand, if \(\varphi(z_1)=-\varphi(z_2)\), we still have \(\varphi^2(z_1)=\varphi^2(z_2)=z_1-w_0=z_2-w_0\), and \(z_1=z_2\). This is shows that the "square-root" is well-defined here. This is the Koebe's square root trick.

Since \(\varphi\) is an open mapping, there is an open disc \(D(a,r) \subset \varphi(\Omega)\), where \(a \in \varphi(\Omega)\), \(a \ne 0\) and \(0<r<|a|\). But by arguments above we have \(-a \not\in \varphi(\Omega)\), and therefore \(D(-a,r) \cap \varphi(\Omega) = \varnothing\). For this reason, we can put \[\psi(z) = \frac{r}{a+\varphi(z)}.\] It follows that \[|\psi(z)| = \frac{r}{|\varphi(z)-(-a)|}< \frac{r}{r}=1\] and therefore \(\psi(\Omega) \subset U\). Since \(\varphi\) is one-to-one, \(\psi\) is one-to-one as well and we deduce that \(\psi \in \Sigma\), this set is not empty.

**Remark.** You may have trouble believing that \(D(-a,r) \cap \varphi(\Omega)=\varnothing\). But if we pick any \(w \in D(-a,r) \cap \varphi(\Omega)\), we have some \(z' \in \Omega\) such that \(\varphi(z')=w\). We also have \(|-a-w|<r\) but this implies \(|a-(-w)|=|a+w|=|-a-w|<r\), and therefore \(-w \in D(a,r) \subset \varphi(\Omega)\). There exists some \(z'' \in \Omega\) such that \(\varphi(z'')=-w\). Hence \(-w=w=0\). It follows that \(|a|<r\) and this is a contradiction.

Since \(D(-a,r) \cap \varphi(\Omega)=\varnothing\), we have \(|\varphi(z)-(-a)|>r\) for all \(z \in \Omega\) and therefore \(|\psi(z)|<1\) is not a problem either.

If \(\psi \in \Sigma\) and \(\psi(\Omega) \subsetneqq U\), and \(z_0 \in \Omega\), then there exists a \(\psi_1 \in \Sigma\) such that \(|\psi_1'(z_0)|>|\psi'(z_0)|\).

This step shows that we can "enlarge" the range in some way.

For convenience we use the Möbius transformation \[\varphi_\alpha(z) = \frac{z-\alpha}{1-\overline{\alpha}z}.\] Pick \(\alpha \in U \setminus \psi(\Omega)\). Then \(\varphi_\alpha \circ \psi \in \Sigma\) and \(\varphi_\alpha \circ \psi\) has no zero in \(\Omega\). Hence there is some \(g \in H(\Omega)\) such that \[g^2=\varphi_\alpha \circ \psi.\] Since \(\varphi_\alpha \circ \psi\) is one-to-one, another application of Koebe's square root trick shows that \(g\) is one-to-one. Therefore we have \(g \in \Sigma\) as well. If \(\psi_1=\varphi_\beta \circ g\) where \(\beta=g(z_0)\), we have \(\psi_1 \in \Sigma\) (one-to-one). In particular, \(\psi_1(z_0)=0\).

By putting \(s(z)=z^2\), we have \[\begin{aligned}\psi(z)&=\varphi_{-\alpha} \circ g^2(z) \\ &= \varphi_{-\alpha} \circ s \circ g(z) \\ &= \varphi_{-\alpha} \circ s \circ \varphi_{-\beta} \circ \psi_1(z).\end{aligned}\] If we put \(F(z)=\varphi_{-\alpha} \circ s \circ \varphi_{-\beta}(z)\), then the chain rule shows that \[\psi'(z_0) = F'(0)\psi_1'(z_0).\] (Note we used the fact that \(\psi_1'(z_0)=0\).) If we can prove that \(0<|F'(0)|<1\) then this step is complete. Note \(F\) satisfy the condition in Schwarz-Pick lemma and therefore \[|F'(0)| \le \frac{1-|F(0)|^2}{1-0^2} \le 1.\] The first equality does not hold because \(F\) is not of the form \(\varphi_{-\sigma}(\lambda\varphi_{\eta}(z))\) for \(|\lambda|=1\). On the other hand we have \[\begin{aligned}F(0) &= \varphi_{-\alpha}(g(z_0)^2) \\ &= \varphi_{-\alpha}(\varphi_\alpha\circ \psi(z_0)) \\ &= \psi(z_0) \in U\end{aligned}\] Therefore \(0<|F'(0)|<1\) and the this step is complete.

We take the contraposition of step 2:

Fix \(z_0 \in \Omega\). If \(h \in \Sigma\) is an element such that \(|h'(z_0)| \ge |\psi'(z_0)|\) for all \(\psi \in \Sigma\), then \(h(\Omega)=U\).

The proof is complete once we have found such a function! To do this, we use the fact that \(\Sigma\) is a normal family. Put \[\eta = \sup\{|\psi'(z_0)|:\psi \in \Sigma\}.\] By definition of \(\eta\), there is a sequence \(\{\psi_n\}\) such that \(|\psi_n'(z_0)| \to \eta\) in \(\Sigma\). By normality of \(\Sigma\), we pick a subsequence \(\varphi_k=\psi_{n_k}\) that converges uniformly on compact subsets of \(\Omega\). Put the uniform limit to be \(h \in H(\Omega)\). It follows that \(|h'(z_0)|=\eta\). Since \(\Sigma \ne \varnothing\) and \(\eta \ne 0\), \(h\) cannot be a constant. Since \(\varphi_n(\Omega) \subset U\), we must have \(h(\Omega) \subset \overline{U}\). But since \(h\) is open, we are reduced to \(h(\Omega) \subset U\).

It remains to show that \(h\) is one-to-one. Fix distinct \(z_1, z_2 \in \Omega\). Put \(\alpha=h(z_1)\) and \(\alpha_n=\varphi_n(z_1)\), then \(\alpha_n \to \alpha\). Let \(\overline{D}\) be a closed disc in \(\Omega\) centred at \(z_2\) with interior denoted by \(D\) such that

- \(z_1 \not\in \overline{D}\).
- \(h-\alpha\) has no zero point on the boundary of \(\overline{D}\).

We see \(\varphi_n -\alpha_n\) converges to \(h-\alpha\), uniformly on \(\overline{D}\). They have no zero in \(D\) since they are one-to-one and have a zero at \(z_1\). By Rouché's theorem, \(h-\alpha\) has no zero in \(D\) either, and in particular \(h(z_2)-\alpha = h(z_2)-h(z_1) \ne 0\). This completes the proof. \(\square\)

**Remark.** First of all, such a \(\overline{D}\) is accessible. This is because zero points of \(h-\alpha\) has no limit point in \(\Omega\), i.e., they are discrete (when defining \(\overline{D}\), we don't know how many are there yet).

Our choice of \(\overline{D}\) enables us to use Rouché's theorem (chances are you didn't get it). Since \(h-\alpha\) has no zero on the boundary, we have \(\zeta=\inf_{z \in \partial D}|h(z)-\alpha|>0\). When \(n\) is big enough, we see \[|(h-\alpha)-(\varphi_n-\alpha_n)|<\zeta<|h-\alpha|.\] The second inequality is another application of the maximum modulus theorem. Rouché's theorem applies here naturally as well. \(\square\)

This proof is a reproduction of W. Rudin's *Real and Complex Analysis*. For a comprehensive further reading, I highly recommend Tao's blog post.

In the previous post we are convinced that the Galois group of a separable irreducible polynomial \(f\) can be realised as a subgroup of the symmetric group, the elements of which permute the roots of \(f\). We worked on cubic polynomials over a field with characteristic not equal to \(2\) and \(3\), and this definitely works with \(\mathbb{Q}\). In this post we go one step further.

Let \(f \in \mathbb{Q}[X]\) be an irreducible polynomial of prime degree \(p\). Since it is also separable (see lemma 9.12.1 on the stack project), we can safely work on its Galois group \(G\). One immediately wants to question the position of \(\mathfrak{S}_p\). Indeed we have \(G \subset \mathfrak{S}_p\). The question is, when does the equality hold? It is not likely to have an immediate answer. However, we have some interesting sufficient conditions, which will be discussed in this post.

We present some handy results in finite group theory that will be used in the main result. One may skip this section until needed. I will collapse the proof in case one wants to treat it as an exercise.

Lemma 1.Let \(p\) be a prime number. The symmetric group \(\mathfrak{S}_p\) is generated by \([12 \cdots p]\) and an arbitrary transposition \([rs]\).

- It is generated by cycles. This is a really, really routine verification and sometimes assumed as a fact.
- It is generated by transpositions, i.e., \(2\)-cycles. It suffices to show that a cycle is a product of transpositions. Indeed, for any cycle \([i_1\dots i_k]\) in \(\mathfrak{S}_n\), we have \([i_1\cdots i_k]=[i_1i_2][i_2i_3]\cdots[i_{k-1}i_k]\). This proves our statement.
- It is generated by translations of the form \([1k]\). It suffices to show that a transposition is generated as such. For any transposition \([rs]\), we have \([rs]=[1r][1s][1r]\).
- It is generated by adjacent translations, i.e. the generators can be of the form \([k-1 ,k]\). This follows from the following identity:

\[[1k]=[12][23]\cdots[k-1,k][k-2,k-1]\cdots [23][12]\]

- It is generated by two elements: \(\sigma=[12]\) and \(\tau=[12\cdots n]\). This follows from the following identity:

\[\tau^{k-2}\sigma\tau^{-(k-2)}=[\tau^{k-2}(1)\tau^{k-2}(2)]=[k-1,k].\]

Now, back to the case when \(n=p\) is prime. Put \(\sigma=[rs]\) and \(\tau=[12\cdots p]\). If \(s-r=1\) then it is already proved in 5 by several conjugations. Therefore we may assume that \(d=s-r>1\). From now on integers may be a number in either \(\mathbb{Z}\) or \(\mathbf{F}_p=\mathbb{Z}/p\mathbb{Z}\), depending on the context. Recall that \(\mathbf{F}_p\) is a field. Pick the integer \(w\) such that \(dw=1\) in \(\mathbf{F}_p\). By conjugation we see \(\tau\) and \(\sigma\) generate \[[1,1+d],[1+d,1+2d],\dots,[1+(w-1)d,1+wd].\] The product of elements above is \([1,1+wd]=[12]\). Therefore we are still back to 5. \(\square\)We have many good reasons to study the Galois group of *something*. It would be great if the group can be written down explicitly. In this section we show that the group can be revealed by the number of nonreal roots.

Proposition 1.Let \(f(X) \in \mathbb{Q}[X]\) be an irreducible polynomial of prime degree. If \(f\) has precisely two nonreal roots, then the Galois group \(G\) over \(\mathbb{Q}\) is \(\mathfrak{S}_p\).

*Proof.* Let \(L\) be the splitting field of \(f\). It suffices to show that \(G\) contains a transposition and a \(p\)-cycle, which is \([12\cdots p]\). By the Sylow's theorem, \(G\) has a subgroup \(H\) of order \(p\), which can only be cyclic. Say \(H=\langle \sigma \rangle\). Suppose \(\sigma\) is of cycle type \((k_1,\dots,k_r)\). Then the period of \(\sigma\), which equals \(p\), is the least common multiple of \(k_1,\dots,k_r\), where \(k_1+\dots+k_r=p\). This can only happen when \(r=1\) and \(k_1=p\). Therefore \(\sigma\) is a \(p\)-cycle.

In fact, \(\sigma\) can be considered as \([12\dots p]\). Suppose the order of roots of \(f\) is given, for which we have \(\sigma=[i_1 i_2 \dots i_p]\). Then If we re-order these roots, by putting the \(k\)th root to be the original \(i_k\)th root, then we can write \(\sigma=[12\dots p]\). (This re-ordering is, in fact, a conjugation.)

It remains to prove that \(G\) contains a transposition. Let \(\alpha\) and \(\beta\) be two nonreal roots of \(f\). Since \(\overline{\alpha}\) is also a root of \(f\) (because coefficients of \(f\) are real; if \(\sum_{n=0}^{p}a_n\alpha^n=0\), then \(\sum_{n=0}^{p}a_n\overline{\alpha}^n=\sum_{n=0}^{p}\overline{a_n\alpha^n}=\overline{0}=0\)) we see \(\beta=\overline{\alpha}\). Therefore complex conjugation over \(\mathbb{Q}(\alpha)\) extends to \(L\) as an element of order \(2\), which is a transposition in \(G\). This proves our assertion. \(\square\)

For example, consider the polynomial \[f(X)=X^5-4X+2.\] With calculus one can show that it has exactly three roots, hence it has two nonreal roots. Eisenstein's criterion shows that \(f\) is irreducible. Therefore we are allowed to use proposition 1. The Galois group of \(f\) is \(\mathfrak{S}_5\).

This also works fine when \(p=2\) or \(3\). The case when \(p=2\) is nothing but working around a quadratic polynomial. When \(f(X)\) is irreducible of degree \(3\), and it has two nonreal roots, we also know that it has an irrational root. Let the roots be \(a+bi,a-bi,c\) where \(b \ne 0\) and \(c\) is irrational. We see \[\sqrt\Delta=2bi(c-a-bi)(c-a+bi)=2bi[(c-a)^2+b^2] \not \in \mathbb{Q}.\] Therefore the Galois group is \(\mathfrak{S}_3\).

It is way too ambitious to restrict ourselves in one single pair of roots. Also, it seems we have ignored the alternating group \(\mathfrak{A}_p\) for no reason. Oz Ben-Shimol gave us a nice way to work around this (see arXiv:0709.2868). The whole paper is not easy but the result is pretty beautiful and generalised what we said above as \(p \ge 5\).

Proposition 2.Let \(f \in \mathbb{Q}[X]\) be an irreducible polynomial of prime degree \(p \ge 5\). Suppose that \(f\) has \(k>0\) pairs of nonreal roots. If \(p \ge 4k+1\), then the Galois group \(G\) is isomorphic to \(\mathfrak{A}_p\) or \(\mathfrak{S}_p\). If \(k\) is odd then \(G \cong \mathfrak{S}_p\).

The proof is done by showing that \(\mathfrak{A}_p \subset G \subset \mathfrak{S}_p\). As the index of \(\mathfrak{A}_p\) is \(2\), \(G\) can only be one of them. The solvability of \(G\) is also concerned here.

Indeed, what we have proved in "the simplest case" is nothing but \(k=1\). When \(p \ge 5\) we clearly have \(p \ge 1+4 \times 1\). This refined the result of A. Bialostocki and T. Shaska (see arXiv:math/0601397), and the inequality used to be \[p \ge k(k\log k+2\log k+3).\] When \(k\) is big enough, we have \(k(k\log{k}+2\log{k}+3) \ge 4k+1\). Oz Ben-Shimol's result is a refinement because it is saying, \(p\) does not need to that big. He also offered a refined algorithm to compute the Galois group, which we will present below. Also, computing \(4k+1\) is much easier than computing \(k^2\log{k}\) plus something.

1 | Input: An irreducible polynomial f(X) over Q with prime degree p >= 5 |

Here, \(\Delta(f)\) is the discriminant of \(f\). We have seen that whether \(\Delta\) is a perfect square matters a lot. The discussion of `ReductionMethod`

can be trailed in Oz Ben-Shimol's paper.

Let \(k\) be an arbitrary field and suppose \(f(X) \in k[X]\) is separable and, i.e., \(f\) has no multiple roots in an algebraic closure, and of degree \(\ge 1\). Let \[f(X)=(X-x_1)\cdots(X-x_n)\] be its factorisation in a splitting field \(F\). Put \(G=G(L/k)\). We say that \(G\) is the Galois group of \(f\) over \(k\). Let \(x_i\) be a root of \(f\) and pick any \(\sigma \in G\). By definition of Galois group, we see \(\sigma(x_i)\) is still a root of \(f\) (consider the map \(\tilde\sigma:L[X] \to L[X]\) induced by \(\sigma\) naturally; it is the identity when restricted to \(k[X]\)). This is to say, elements of \(G\) permutes the roots of \(f\).

For example, consider \(L=\mathbb{C}\), \(k=\mathbb{R}\), \(f(X)=X^2+1\). The Galois group \(G\) contains two elements and is generated by complex conjugation \(\sigma:a+bi \mapsto a-bi\). A root of \(f\) is \(i\), and \(\sigma(i)=-i\) is another root.

Based on this fact, we can consider \(G\) as a subgroup of \(\mathfrak{S}_n\), where \(n\) is the degree of \(f\). The structure of \(\mathfrak{S}_n\) can be extremely complicated, but for now we assume that they are well-known. The question is, what subgroup is \(G\) inside \(\mathfrak{S}_n\). Let's take a look into the case when \(n=3\).

To begin with we note that we can assume that the quadratic term is \(0\). Let \(f(X)=X^3+aX^2+bX+c\) be a polynomial, then \[\begin{aligned}f\left(X-\frac{a}{3}\right) &= \left( X-\frac{a}{3}\right)^3+a\left( X-\frac{a}{3}\right)^2+b\left( X-\frac{a}{3}\right)+c \\&= X^3-aX^2+\frac{a^2}{3}X-\frac{a^3}{27} + aX^2-\frac{6a^3}{3}X+\cdots\end{aligned}\] and as a result \(aX^2\) is cancelled. A translation does not change any property of a polynomial except the value of its roots. Therefore we can reduce our study to polynomials in the depressed form \[f(X)=X^3+aX+b.\] In fact, for all \(g(X)=X^n+a_{n-1}X^{n-1}+\dots+a_0\), we can cancel out \(a_{n-1}X^{n-1}\) by a substitution \(Y=X-\frac{a_{n-1}}{n}\).

Now back to our main story. First of all we study irreducibility. If \(f\) is irreducible, then clearly it has no root in \(K\). On the other hand, if \(f\) has no root in \(K\), does that mean \(f\) is irreducible over \(K\)? This does not hold in general for all polynomials. For example, the polynomial \(g(X)=(X^2+1)^2\) is not irreducible yet it has no root in \(\mathbb{R}\) or \(\mathbb{Q}\). But fortunately, \(3\) is a beautiful number and we can proceed. Were \(f\) irreducible, there would be a factorisation \[f(X)=p_1(X)p_2(X)\] with each \(p_i(X)\) being a proper factor of \(f(X)\). However, this is to say, at least one of \(p_i(X)\) has degree \(1\). A contradiction. We therefore have a result as follows:

Proposition 1.Let \(f(X)\) be a cubic polynomial in \(K[X]\) where \(\operatorname{char}K=0,5,7,\dots\), then \(f\) is irreducible over \(K\) if and only if \(f\) has no root in \(K\).

Notation being above, we assume that \(f\) is irreducible. Let \(L\) be the splitting field of \(f\). We claim that \(f\) is separable. Before proving the claim, one should notice that the characteristic matters a lot. For example, \(X^3-2\) is irreducible over \(\mathbb{Q}\) but \(X^3-2=(X+1)^3\) over \(\mathbf{F}_3[X]\) and we therefore have a triple root.

\(f\) is separable if and only if \(\gcd(f,f')=0\). The derivative of \(f\), which should be simplified because \(f\) has been, is given by \[f'(X)=3X^2+a.\] It is not equal to \(a\) because the characteristic of \(K\) is not \(3\). We will show carefully that \(f(X)\) is separable by working on these two polynomials.

The first question is the value of \(a\) and \(b\). If some of them is \(0\) then things may be easier or harder. Note first we must have \(b \ne 0\) because if not then \(f(X)=X(X^2+a)\) and this is not irreducible. If \(a=0\), then \(f(X)=X^3+b\) and \(f'(X)=3X^2 \ne 0\) because \(\operatorname{char}K \ne 3\). It follows that \((f,f')=0\) because either \(X\) or \(X^2\) divides \(X^3+b\).

Now there only remains the most general case: \(a \ne 0\) and \(b \ne 0\). This is where the Euclidean Algorithm kicks in. Recall that for any three polynomials \(p,q,r\) in \(K[X]\), we have \[\gcd(p,q)=\gcd(q,p)=\gcd(q,p+rq).\] This is how the Euclidean Algorithm works. Note we can write \[f(X)=\frac{1}{3}Xf'(X)+\underbrace{\frac{2}{3}aX+b}_{r_0(X)}.\] It follows that \(\gcd(f,f')=\gcd(f',r_0)\). We next work on \(f'\) and \(r_0\). \[f'(X)=\frac{9}{2a}X\left(\frac{2}{3}aX+b\right)+\underbrace{\left(-\frac{9b}{2a}X+a\right)}_{r_1(X)}.\] However, \(r_0(X)\) and \(r_1(X)\) has common divisor \(0\), which implies that \(f\) and \(f'\) has common divisor \(0\). Whichever the case is, we have \(\gcd(f,f')=0\) and therefore \(f\) is separable. Note the fact that the characteristic of \(K\) is not \(2\) or \(3\) is frequently used here, otherwise there are a lot of equations making no sense.

Where we are at? We want to ensure that \(f\) is separable so that working with the Galois group of \(f\) is not that troublesome. And \(f\) is. We now back to the study of the Galois group \(G=G(L/K)\), where \(L\) is the splitting field of \(f\). Let \(\alpha_1\), \(\alpha_2\), \(\alpha_3\) be the roots of \(f\) and pick one of them as \(\alpha\). We see \([K(\alpha):K]=3\).

Since \(G\) permutes three elements, \(G\) has to be a subgroup of \(\mathfrak{S}_3\). Therefore \(|G|=[L:K] \ge [K(\alpha):K]=3\), which implies that \(|G|=3\) or \(6\). In the first case, \(G=\mathfrak{A}_3\), the alternating group. In the second case, \(G=\mathfrak{S}_3\) and \(K(\alpha)\) is not normal over \(K\) because, there is an irreducible polynomial \(f(X) \in K[X]\) which has a root in \(K(\alpha)\) that does not split into linear factors in \(K(\alpha)\). This is the definition of normal extension.

The question now is, when \(G\) is \(\mathfrak{S}_3\) and when it is \(\mathfrak{A}_3\)? We get a good chance to review finite group theory. This is answered by the sign of elements in \(G\). To be precise, \(G=\mathfrak{S}_3\) if and only if \(G\) has an odd element. If not then \(G=\mathfrak{A}_3\). To work with this, we recall how the sign function work. Put \[\delta=(\alpha_1-\alpha_2)(\alpha_2-\alpha_3)(\alpha_3-\alpha_1).\] For any \(\sigma \in G\), we have \(\sigma(\delta)=\varepsilon(\sigma)\delta\), where \(\varepsilon(\sigma)\) is the sign of \(\sigma\). If we put \(\Delta=\delta^2\), which is the discriminant, we see \(\sigma(\Delta)=\Delta\). Therefore \(\Delta \in L^G=K\). But wait, since \(\sigma(\delta)=\pm\delta\), the sign is not guaranteed, we see \(\delta\) is not guaranteed to be in \(K\). This is where we crack the problem.

If \(\delta \in K\), or more precisely, \(\sqrt\Delta \in K\), then \(\sigma(\delta)=\delta\) and it follows that \(\varepsilon(\sigma)=1\) for all \(\sigma \in G\). This can only happen if \(G=\mathfrak{A}_3\).

If \(\sqrt\Delta \not\in K\), then \(\delta\) is not fixed by \(G\). There is some \(\sigma \in G\) such that \(\sigma(\delta)=-\delta\), which is to say that \(\varepsilon(\sigma)=-1\). This can only happen when \(G=\mathfrak{S}_3\).

We have the following conclusion.

Proposition 2.Notation being above. Assume that \(f\) is irreducible. Then the Galois group of \(f\) is \(\mathfrak{S}_3\) if and only if \(\sqrt\delta \not\in K\). The group is \(\mathfrak{A}_3\) if and only if \(\sqrt\Delta \in K\).

A dirty calculation shows that \(\Delta=-4a^3-27b^2\). One can show this using Vieta's formulas. You shan't feel this to be strange because in the quadratic case we have \(\Delta=b^2-4ac\) and we did care if \(\Delta>0\), which amounts to whether \(\sqrt\Delta \in \mathbb{R}\).

Let's conclude this post by a handy but nontrivial example. Consider \[f(X)=X^3-X-1\] The discriminant is \(-4 \cdot(-1)^3-27 \cdot (-1)^2=-23\), which lies in \(\mathbb{Q}(\sqrt{-23})\) and therefore the Galois group over it is \(\mathfrak{A}_3\). However, when the base field is a subfield, for example, \(\mathbb{Q}\), then the Galois group is \(\mathfrak{S}_3\).

]]>The method is presented by Artin: we will be actively using theories of the Sylow group theory. Recall that for a finite group \(G\), if \(p\) is a prime dividing \(|G|\), then there is a \(p\)-Sylow subgroup. We are not caring about *other* \(p\)-Sylow groups here. However, one needs to also recall that a \(p\)-group \(H\) is always solvable. If \(|H|>1\), then \(H\) admits nontrivial centre. If \(|H|=p^n\), then there is a sequence of subgroups \[\{e\}=H_0 \subset H_1 \subset \cdots \subset H_n=H\] where \(H_{i}\) is normal in \(H\) for all \(i=0,\dots,n\) and \(H_{i+1}/H_i\) is cyclic of order \(p\). This is to say \(|H_i|=p^i\).

On the other hand, we also make use of analysis (which is Gauss's idea). For every \(a>0\), there is a square root \(\sqrt{a}>0\). In other word, we have a positive root of the equation \(X^2-a=0\). On the other hand, every polynomial \(f(X) \in \mathbb{R}[X]\) of odd degree has a root in \(\mathbb{R}\). This is to say, such \(f(X)\) is *not* irreducible over \(\mathbb{R}\) unless \(\deg f=1\).

Next we take a look at \(\mathbb{C}=\mathbb{R}(i)\), where, \(i\) is the imaginary unit, or, algebraically speaking, a root of \(g(X)=X^2+1\). Note, every \(z \in \mathbb{C}\) has a root. If we write \(z=a+bi\), then \[c=\sqrt{\frac{|z|+a}{2}}, \quad d = \frac{b}{|b|}\sqrt\frac{|z|-a}{2}\] gives rise to \((c+di)^2=a+bi\). It follows that all polynomials \(f(X) \in \mathbb{C}[X]\) of order \(2\) has a root (if this is not very obvious, use a change-of-variable), hence *not* irreducible. With this being said, \(\mathbb{C}\) does not have an extension of order \(2\). Say, if \([E:\mathbb{C}]=2\), then \(E=\mathbb{C}[X]/(p(X))\) and \(p(X)\) is irreducible. But It can only be of order \(2\), which is absurd already.

We also need a part of the following lemma on field extension. In brief, finite separable extension induces a *minimal* Galois extension.

Lemma.Let \(E/F\) be a finite separable extension. Then \(E\) is contained in an extension \(K\) such that \(K/F\) is Galois. It is minimal in the sense that, in a fixed algebraic closure \(K^\mathrm{a}\) of \(K\), any other Galois extension \(L\) of \(F\) containing \(E\) must contain \(K\) as well. We have the following tower: \[F \subset E \subset K \subset L \subset K^\mathrm{a}.\]

*Proof.* First of all, we can find a finite Galois extension of \(F\) containing \(E\). For example, the composite of the splitting fields of the minimal polynomials for a basis for \(E\) as a \(F\)-vector space. The intersection of all Galois extensions is exactly what we want. \(\square\)

The complex field \(\mathbb{C}\) is algebraically closed.

The following proof focuses on algebra and tries its best to avoid analysis. If you are a fan of analysis, you can dive into complex analysis and use the maximum modulus theorem to study a polynomial. Or, you can study the behaviour of \(\frac{1}{f(z)}\) where \(f\) is a polynomial. If \(f\) has no root, then perhaps it can only be a constant.

*Proof.* Let's firstly make it a problem of Galois theory. Since \(\mathbb{R} \supset \mathbb{Q}\), it is of characteristic \(0\) (hence perfect) and every finite extension is separable. Hence, in particular, \(\mathbb{C}/\mathbb{R}\) is finite and separable. Let \(L/\mathbb{C}\) be a finite extension. Then \(L/\mathbb{R}\) is still a finite and separable extension, since both the class of finite extensions and the class of separable extensions are distinguished.

Applying the lemma above, we can find a finite and Galois extension \(K/\mathbb{R}\). We need to prove that \(K=\mathbb{C}\).

Put \(G=G(K/\mathbb{R})\). We want to show that \(|G|=2\) hence \([K:\mathbb{R}]=[K:\mathbb{C}][\mathbb{C}:\mathbb{R}]=2\) and our result follows immediately. To do this, we first show that \(|G|\) is even. Let \(H \subset G\) be a \(2\)-Sylow subgroup of \(G\) and we can say \(|H|=2^n\), \(|G|=2^nm\) and \(m\) is even. Now we use the Galois correspondence. Put \(F=K^H\). We see \(K/F\) is Galois and \([K:F]=2^n\). It follows that \([F:\mathbb{R}]=m\). We claim that \(m=1\).

Indeed, applying the lemma again, we see \(F/\mathbb{R}\) is separable. Hence we may apply the primitive element theorem to obtain \(F=\mathbb{R}(\alpha)\). \(\alpha\) is the root of an irreducible polynomial in \(\mathbb{R}[X]\) of degree \(m\). But \(m\) is odd, we must have \(m=1\).

Therefore \(G=H\) is a \(2\)-group. Since Galois extension remains normal under lifting, we see \(K/\mathbb{C}\) is Galois. Let \(G_1=G(K/\mathbb{C}) \subset G\) be the Galois group. We next claim that \(G_1\) is trivial. If not, then, being a \(2\)-group, it has a subgroup \(G_2\) of index \(2\). Put \(F'=K^{G_2}\), then we see \([K^{G_2}:\mathbb{C}]=G_1/G_2 \cong\mathbb{Z}/2\mathbb{Z}\). However, as mentioned above, \(\mathbb{C}\) has no extension of order \(2\). This contradiction implies that \(G_1\) is trivial and therefore \(K=\mathbb{C}\). \(\square\)

Why we have to prove that \(K=\mathbb{C}\)? If you didn't get it, let me remind you that a Galois extension is, by definition, an **algebraic** extension which is normal and separable.

Let \(G\) be a finite group and \(R\) be a commutative ring. The *algebra* of \(G\) over \(R\) is denoted by \(R[G]\), which firstly is an algebra over \(R\). The basis of \(R[G]\) is given by \(e_s\) where \(s \in G\). The product rule on \(R[G]\) is made of

\[e_s e_t = e_{st},\quad \forall s,t \in G.\]

With this being said, given \(u=\sum_{s \in G}a_se_s\) and \(v=\sum_{t \in G}b_te_t\), we have

\[uv = \sum_{s \in G}\sum_{t \in G}a_sb_te_{st}.\]

For example, take \(G=C_3=\{1,x,x^2\}\), the cyclic group of three elements. If \(u=a_1e_1+a_xe_x\) and \(v=b_xe_x+b_{x^2}e_{x^2}\), then

\[uv = a_xb_{x^2}e_1+a_1b_xe_x+(a_1b_{x^2}+a_xb_x)e_{x^2}.\]

As one will notice, the structure of this algebra should be determined by both \(G\) and \(R\) although we don't know what would happen at this moment. If we take \(R=\mathbb{C}\), then everything is very *simple*. A lot of things in elementary linear algebra can be recovered here. And that is part of the mission of this blog post. Before we dive in we need to look into group algebra in a general setting first. It is not often to see group algebra and representation theory to be treated together but let's try it. While the majority of this post is (non-commutative) ring theory and module theory, we encourage the reader to try to use representation theory as examples. Standalone examples may drive us too far and we may not have enough space for them.

First of all, we list some very obvious facts that do not even need proof.

\(R[G]\) is a free \(R\)-module with dimension \(|G|\).

\(R[G]\) is itself a ring. The commutativity of \(R[G]\) is determined by \(G\).

However, as one may ignore,

Proposition 1.\(R[G]\) isnota division ring.

*Proof.* Pick \(g \in G\) that is not the identity. Then \(e_1-e_g\) is a zero-divisor because if we take \(m=|G|\), then \[(e_1-e_g)(e_1+e_g+\cdots+e_{g^{m-1}})=e_1-e_1=0.\]

But in a division ring, there is no zero-divisor. \(\square\)

As a ring, we certainly can consider modules over \(R[G]\), which brings us the following section.

Let \(R\) be a ring (not assumed to be commutative here). An \(R\)-module \(E\) is called **simple** it has no nontrivial submodule. This may remind you of irreducible or simple representations of a group. We will see the connection later. Following the definition, we immediately have a special version of Schur's lemma:

Proposition 2 (Schur's Lemma).Let \(E,F\) be two simple \(R\)-modules. Every nontrivial homomorphism \(f:E \to F\) is an isomorphism.

*Proof.* Note \(\ker{f}\) and \(f(E)\) are submodules of \(E\) and \(F\) respectively. Since \(f\) is nontrivial and \(E,F\) are simple, we have \(\ker{f}=0\) and \(f(E)=F\), which is to say that \(f\) is an isomorphism. \(\square\)

Corollary 1.If \(E\) is a simple \(R\)-module, then \(\operatorname{End}_R(E)\) is a division ring.

*Proof.* If \(f:E \to E\) is nontrivial, then according to Schur's lemma, it has an inverse. \(\square\)

This definitely reminds you of irreducible representations. But irreducible representations are not always the case, so are simple modules. Recall the Maschke's theorem in representation theory: *Every representation of a finite group over \(\mathbb{C}\) having positive dimension is completely reducible.* For modules, we have a similar statement.

Definition-Proposition 3.Let \(E\) be an \(R\)-module. Then the following three conditions are equivalent:

SS 1.\(E\) is a sum of simple \(R\)-modules.

SS 2.\(E\) is a direct sum of simple \(R\)-modules.

SS 3.For every submodule \(E'\) of \(E\), there is another submodule \(F\) such that \(E = E' \oplus F\), i.e. every submodule is a direct summand.If \(E\) satisfies the three conditions above, then \(E\) is called

semisimple. A ring \(R\) is semisimple if it is a semisimple module over itself.

*Proof.* Assume **SS 1**, say we have \(E=\sum_{i \in I}E_i\). Let \(J\) be the maximal subset of \(I\) such that \(E_0=\sum_{j \in J}E_j\) is a direct sum (this \(J\) exists by Zorn's lemma). Pick any \(i \in I\). Then \(E_i \cap E_0\) is a submodule of \(E_i\), which can either be \(0\) or \(E_i\). If \(E_i \cap E_0 = E_i\) then \(E_i \subset E_0\). If the intersection is \(0\) however, \(E_0 +E_i\) is direct, which is to say that \(J \cup\{i\} \supsetneq J\) is the subset of \(I\) yielding a direct sum. A contradiction. Hence \(E_i \subset E_0\) holds for all \(i \in I\), i.e. \(E_0 = E\).

Next we assume **SS 2** and we have \(E = \bigoplus_{i \in I}E_i\). Pick any submodule \(E' \subset E\). Let \(J\) be the maximal subset of \(I\) such that \(E_0=E'+\bigoplus_{j \in J}E_j\) is direct. In the same manner we see \(E_i \cap E_0=E_i\) for all \(i \in I\), which proves **SS 3**.

Finally we assume **SS 3**. Let \(E_0=\sum_{i \in I}E_i\) be the sum of all simple modules of \(E\). Then there is a submodule \(F\) of \(E\) such that \(E=E_0 \oplus F\). Assume \(F \ne 0\), then \(F\) has a simple submodule, which contradicts the definition of \(E_0\). Hence \(F=0\) and \(E_0=E\). The reason why nontrivial \(F\) must have a simple submodule is contained in the following lemma. \(\square\)

Lemma 4.Let \(E\) be an \(R\)-module satisfyingSS 3, then every nontrivial submodule \(F\) has a simple submodule.

*Proof.* It suffices to show that every nontrivial principal module has a simple submodule. Indeed, for any \(F \ne 0\), we pick a nonzero \(v \in F\), then \(Rv \subset F\).

Let \(L\) be the kernel of the morphism

\[\begin{aligned}R &\to Rv \\a &\mapsto av.\end{aligned}\]

Then \(L\) is a left ideal, which is contained in a maximal ideal \(M\) of \(R\). It follows that \(Mv\) is a maximal submodule of \(Rv\) because \(M/L\) is a maximal ideal of \(R/L\) and the following isomorphism

\[R/L \cong Rv.\]

By **SS 3**, we can find a submodule \(M'\) such that

\[E = Mv \oplus M'\]

which gives

\[Rv = E \cap Rv = (Mv \cap Rv) \oplus (M' \cap Rv)=Mv \oplus (M' \cap Rv).\]

We claim that \(M' \cap Rv\) is maximal. Pick any proper submodule \(E' \subset M' \cap Rv\), then \(Mv \oplus E'\) is a submodule of \(Rv\), which has to be \(Mv\), i.e. \(E'=0\) because of the maximality of \(Mv\). This proves our statement. \(\square\)

Proposition 5.Let \(E\) be a semisimple \(R\)-module, then every nontrivial submodule and quotient module of \(E\) is semisimple.

*Proof.* Pick nontrivial submodule \(F\) of \(E\). Let \(J\) be the maximal subset of \(I\) such that

\[F + \bigoplus_{j \in J}E_j\]

is direct. Then the direct sum is actually \(E\). Therefore \(F=\bigoplus_{k \in K}E_k\) where \(K = I \setminus J\). In particular, since \((F \oplus F')/F \cong F'\), a quotient module of \(E\) is semisimple. \(\square\)

Corollary 6.\(R\) is a semisimple ring if and only if every \(R\)-module is semisimple.

*Proof.* By the universal property of free modules, every \(R\)-module is a factor module of a free \(R\)-module, while a free \(R\)-module is a direct sum of some copies of \(R\). Hence if \(R\) is semisimple then every \(R\)-module is semisimple. Conversely, if every \(R\)-module is semisimple, then \(R\) is semisimple because it is a left module over itself. \(\square\)

Let \(R\) be a ring. We say it is a finite dimensional algebra if it is also a vector space over some field \(K\) of finite dimension. We study the Jacobson radical \(J(R)=\bigcup\{\text{left maximal ideals of }R\}\) in this subsection, which will be used in next section.

We summarise what we want to prove in the following proposition.

Proposition 7 (Jacobson Radical).Let \(R\) be a ring (not necessarily commutative) and \(J(R)\) be the Jacobson radical of \(R\), then

\(J(R)\) is a two-sided ideal containing all nilpotent elements.

For every simple \(R\)-module \(E\) we have \(J(R)E=0\). More precisely, \(J(R)=\{a \in R:aE=0\text{ for all simple \(R\)-modulle \(E\)}\}\)

Suppose \(R\) is a finite dimensional algebra (or more generally, \(R\) is Artinian), then \(R/J(R)\) is semisimple, and if \(I\) is a two-sided ideal such that \(R/I\) is semisimple, then \(J(R) \subset I\). It follows that \(R\) is semisimple if and only if \(J(R)\) is trivial.

Assumption being above, \(J(R)\) is nilpotent.

*Proof.* We first prove 2. Pick any \(a \in R\) such that \(a\) annihilate all simple \(R\)-module. For any maximal left ideal \(M\), \(R/M\) is simple. Therefore \(a(R/M)=0\), which implies that \(a \in M\). Therefore \(a \in J(R)\).

Conversely, suppose \(J(R)E \ne 0\) for some simple \(E\). Since \(J(R)E\) is a submodule of \(E\) and \(E\) is simple, we have \(J(R)E=E\). More precisely, there exists some \(x \in E\) such that \(J(R)x=E\). Therefore there exists \(a \in J(R)\) such that \(ax=x\). \(a-1\) is in the annihilator \(\operatorname{Ann}(x)\), which is contained in a maximal ideal \(M\) (does not equal \(R\)). But we also have \(J(R) \subset M\). Therefore \(a \in M\) and \(a-1 \in M\), which implies that \(1 \in M\) and this is absurd. Hence 2 is proved.

Next we prove 1. By definition \(J(R)\) is a left ideal. Now pick any \(a \in J(R)\) and \(b \in R\). It follows that \(abE=0\) for all simple \(E\). Indeed, if \(bE \ne 0\), then \(bE=E\) and therefore \(abE=aE=0\). If \(a\) is nilpotent and \(E\) is simple, then \(aE=0\). If not, say \(aE=E\) and \(a^n=0\), then \(0=a^nE=a(a^{n-1}E)=aE=E\). A contradiction. Therefore 1 is proved as well.

To prove 3, we first note that \(R\) is Artinian: every descending chain of left ideals \(J_1 \supsetneq J_2 \supsetneq \cdots\) must stop. This is determined by the dimension of \(R\). It follows that \(J(R)\) is the intersection of finitely many maximal ideals, for the descending chain

\[M_1 \supset M_1 \cap M_2 \supset M_1 \cap M_2 \cap M_3 \supset \cdots\supset J(R)\]

must be finite. Therefore we can write \(J(R)=\bigcap_{i=1}^{n}M_n\) for some maximal ideals of \(R\). Now consider the map

\[\begin{aligned}\phi:R/J(R) &\to R/M_1 \oplus R/M_2 \oplus \cdots \oplus R/M_n \\ x+J(R) &\mapsto (x+M_1,x+M_2,\dots,x+M_n).\end{aligned}\]

Since \(J(R)=\bigcap_{i=1}^{n}M_i\), this follows from nothing but the Chinese Remainder Theorem. \(\phi\) is an isomorphism and each \(R/M_i\) is simple. We are done.

Now suppose \(I\) is a two-sided ideal such that \(R/I\) is semisimple. By definition we can write

\[R/I=\bigoplus_{j \in J}L_j\]

for some simple \(L_j\). Pick any \(a \in J(R)\), we have \(aL_j=0\) for all \(j\), therefore \(a(R/I)=0\), which implies that \(a \in I\), i.e. \(J(R) \subset I\). (In fact, according to the structure theorem of semisimple ring, \(J\) is finite.)

If \(J(R)=0\), then \(A/J(R)=A\) is semisimple. Conversely, if \(A\) is semisimple, then \(I=0\) is a two-sided ideal such that \(A/I\) is semisimple. Hence \(J(R)\) has to be trivial as well.

To prove 4, we work on the descending chain \(N \supset N^2 \supset N^3 \supset \cdots\). Let \(N^\infty\) be the ideal where the chain stops to shrink. Then according to Nakayama's lemma, \(NN^\infty=N^\infty\), which implies that \(N^\infty=0\). \(\square\)

Let \(R\) be a commutative ring and \(G\) a finite group. Let \(E\) be an \(R\)-module. We can study the representation

\[\rho: G \to \operatorname{Aut}_{R}E\]

and we can also study the ring homomorphism

\[\lambda:R[G] \to \operatorname{End}_{R}E.\]

We show that they are the same thing. Given \(\lambda\), then for any \(g \in G\), \(\lambda(e_g)\) is an automorphism because \(\lambda(e_g)\lambda(e_{g^{-1}})=\lambda(e_1)=1\). Therefore \(\lambda\) gives rise to representation \(\rho:g \mapsto \lambda(e_g)\).

Conversely, for an representation \(\rho\) and any \(g \in G\), \(\rho(g)\) is automatically an endomorphism and therefore we have a map

\[\begin{aligned}\lambda:R[G] &\to \operatorname{End}_{R}E \\\sum_{g \in G}a_ge_g &\mapsto \sum_{g \in G}a_g\rho(g).\end{aligned}\]

Therefore, the study of group representation can also be transferred into the study of group algebra. For simplicity we call such a module \(E\) together with a representation \(\rho\) as a \(G\)-module, which you may have known. *Note such a \(G\)-module can also be considered as a module over \(R[G]\) in the usual sense. Conversely, an \(R[G]\)-module is a \(G\)-module.* When the context is clear, we write \(gx\) in place of \(\rho(g)x\).

We generalise Maschke's theorem in an arbitrary field \(K\).

Theorem 8 (Maschke).Let \(G\) be a finite group of order \(n\). Let \(K\) be a field, then \(K[G]\) is semisimple if and only if the characteristic of \(K\) does not divide \(n\) (it can also be \(0\)).

In introductory representation theory, we study the case when \(K=\mathbb{R}\) or \(\mathbb{C}\), whose characteristic is definitely \(0\).

*Proof.* Let \(E\) be a \(G\)-module, and let \(F\) be a \(G\)-submodule. We show that \(F\) is a direct summand of \(E\), i.e., there exists some \(E' \subset E\) such that \(E = E' \oplus F\). It is natural to think about the projection \(\pi:E \to F\) where \(\pi(x)=x\) for all \(x \in F\). It is seemingly clear that \(E=\ker\pi \oplus F\) is what we want, but we can't do this: we only know that \(\pi\) is a \(K\)-linear map, but we have no idea if it is a \(K[G]\)-linear map. To work around this problem, we modify the projection into a \(K[G]\)-linear map.

To do this, we *average* \(\pi\) over conjugation. To be precise, we consider the map

\[\varphi:x \mapsto \frac{1}{n}\sum_{g \in G}g^{-1} \circ\pi\circ g(x)\]

This map is \(K[G]\)-linear. We therefore can write \(E=\ker\varphi \oplus F\) because it is the left inverse of the inclusion \(i:F \to E\). Indeed, for any \(x \in F\), we have

\[\varphi(x)=\frac{1}{n}\sum_{g \in G}g^{-1} \circ g(x)=\frac{1}{n}\sum_{g \in G}x=x.\]

Note, since \(F\) is a \(G\)-module, we have \(g(x) \in F\) and therefore \(\pi \circ g(x)=g(x)\). Also, the fact that \(\operatorname{char}K \nmid n\) is used here: if the characteristic divides \(n\), then \(\sum_{g \in G}x=0\). Moreover, \(n \cdot 1=0\) in \(K\) and therefore \(\frac{1}{n}\) is not defined.

Next we suppose that \(p=\operatorname{char} K\) divides \(n\). Consider the element

\[s=\sum_{g \in G}e_g.\]

Note \(gs:=e_gs\) for all \(g \in G\) and therefore \(s^2=(\sum_{g \in G}e_g)s=ns=0\) because \(p \mid n\). Therefore \(s\) is a nonzero nilpotent element, i.e. \(J(K[G]) \ne 0\), from which it follows that \(K[G]\) is not semisimple according to proposition 7. \(\square\)

In other words, if \(E\) is a finitely dimensional representation over \(K\) of group \(G\), and the characteristic of \(K\) does not divide \(|G|\), then \(E\) is completely reducible. Recall we also have matrix decomposition of a matrix representation. But this is not very easy to generalise. To work with it we need a clearer look at semisimple rings.

It would be great that, given a matrix representation of a representation, we can decompose it into diagonal block matrix, with each block being a subrepresentation. But it would not be a easy job: we need to know whether the field is algebraically closed, the characteristic of it, et cetera. Perhaps we need some Galois theory but it has gone too far from this post. Anyway we need to see through the structure to know how to work with it.

In this section we study the structure of \(R\) in a more detailed way. We say a ring is **simple** if it is semisimple and all of its simple left ideals are isomorphic. A left ideal is called simple if it is a simple left \(R\)-module.

Theorem 9 (Structure theorem of semisimple rings).Let \(R\) be a semisimple ring. Then the isomorphic class of left ideals of \(R\) is finite. Say it is represented by \(L_1,L_2,\dots,L_s\). If \(R_i = \sum_{L \cong L_i}L\) (the sum of all left ideals isomorphic to \(L_i\)), then \(R_i\) is a two-sided ideal, and is a simple ring. One can write \(R\) as a product\[R=\prod_{i=1}^{s}R_i.\]

Besides, \(R\) admits a Peirce decomposition with respect to these \(R_i\). There are elements \(e_i \in R_i\) such that \[1=e_1+\cdots+e_s.\] The \(e_i\) are idempotent (\(e_i^2=e_i\)), orthogonal (\(e_ie_j=0\) if \(i \ne j\)). As a ring, \(e_i\) is the multiplication identity of \(R_i\), and \(R_i=e_iR=Re_i\).

*Proof.* To begin with we first study the behaviour of simple left ideals.

Lemma 10.Let \(L\) be a simple left ideal of \(R\) and \(E\) be a simple \(R\)-module, then \(LE = 0\) unless \(L \cong E\).

*Proof of the lemma.* Since \(E\) is simple, \(LE=0\) or \(E\). If \(LE=E\), then there exists some \(y \in E\) such that \(Ly=E\) (again by the simplicity of \(E\)). Therefore the map \[a \mapsto ay\]

is surjective. It is injective because the kernel is a submodule of \(L\) and it has to be trivial. \(\blacksquare\)

According to this lemma, \(R_i R_j=0\) whenever \(i \ne j\). This will be frequently used. For the time being we can write \(R=\sum_{i \in I}R_i\) although we don't know whether \(I\) is finite. Firstly we show that \(R_i\) is also a right ideal (since it is a sum of left ideals, it is by default a left ideal):

\[R_i \subset R_i R = R_i R_i \subset R_i \implies R_iR=R.\]

Therefore \(R_i\) is also a right ideal for all \(i\). But before we proceed we need to explain the relation above. Since \(R\) contains the unit, we must have \(R_i \subset R_i R\). We have \(R_iR=R_iR_i\) because \(R_iR_j=0\) for all \(i \ne j\) and \(R\) is a sum of all \(R_j\) over \(j \in I\). Therefore other terms are eliminated. Finally, we have \(R_iR_i \subset R_i\) simply because \(R_i\) is a left ideal.

Also note that \(R_i \cap R_j=0\) for all \(i \ne j\) because it is an intersection of two distinct classes of simple modules. Therefore we can write \(R=\bigoplus_{i \in I}R_i\) for the time being.

Now consider \(1=\sum_{i \in I}e_i\) with \(e_i \in R_i\). This sum is finite (by definition of direct sum, where cofiniteness is required). Let \(J \subset I\) be the finite subset such that \(e_j \ne 0\) for all \(j \in J\). It follows that \(R_i=0\) for all \(i \in I \setminus J\) because \(R_i = 1 \cdot R_i = \sum_{j \in J}e_jR_i = 0\). We can therefore write \(R=\bigoplus_{i=1}^{n}R_i\). All other direct summands are trivial. Since each \(R_i\) represents a isomorphic class of simple left ideals, the class of simple left ideals are finite.

Now we study the relation of \(e_i\), \(R_i\) and \(R\). For any \(a_i \in R_i\), we have

\[a_i=a_i(e_1+\cdots+e_n)=a_ie_i=(e_1+\cdots+e_n)a_i=e_ia_i.\]

Therefore \(e_i\) is the unit in \(R_i\) (it follows automatically that \(e_i^2=e_i\)). For any \(a \in R\), we put \(a_i=ae_i\), then there is a unique decomposition

\[a=a_1+\cdots+a_n.\]

This gives us a projection \(R_i=Re_i=e_iR\). We also have \(e_ie_j=0\) if \(i \ne j\). Since \(R_iR_j=0\), we can safely write \(R=\prod_{i=1}^{n}R_i\). Each \(R_i\) is simple because (1) it is semisimple (\(R_i=\sum_{L \cong L_i}L\) and each \(L\) is also a simple \(R_i\)-module) and (2) all simple left ideals of \(R_i\) are isomorphic. To show this, assume that \(L \subset R_i\) is a left ideal that is not isomorphic to \(L_i\). Since we have \(L = R_iL = RR_iL = RL\), \(L\) is also a simple left ideal of \(R\). But it contradicts the definition of \(R_i\). \(\square\)

Let's extract more information from this theorem. First of all the sum of \(1\) is also finite in every \(R_i\), hence each \(R_i\) is also a finite direct sum. To be precise,

Theorem 11.Every simple ring \(R\) admits a finite direct sum of simple left ideals\[R = \bigoplus_{i=1}^{n}R_i.\]

*Proof.* Since \(R\) is semisimple, it is a sum of simple left ideals, the collection of which can be chosen to be direct. Say we have \(R=\bigoplus_{i \in I}R_i\).

Consider \(1 \in R\):

\[1=\sum_{i \in I}x_i\]

where \(x_i \in R_i\). This sum is finite, say we have \(1=\sum_{i=1}^{n}x_i\) and \(x_i \ne 0\). Then

\[R=1 \cdot R = \bigoplus_{i=1}^{n}x_iR=\bigoplus_{i=1}^{n}R_i.\]

This proves our assertion. \(\square\)

Combining theorem 9 and 11, we see

Corollary 12.Every semisimple ring \(R\) admits a decomposition\[R=n_1L_1 \oplus \cdots \oplus n_rL_r\]

where \(n_iL_i\) denotes \(n_i\) direct sums of isomorphic simple left ideals \(L_i\). This direct sum is unique in the following sense. \(L_1,\dots,L_r\) are unique up to isomorphism. \((n_i,L_i)\) are unique up to a permutation.

This must reminds you of the isotropical decomposition of a representation into irreducible representations. They are the same thing. It used the semisimplicity of \(\mathbb{C}[G]\) and here we are talking about the semisimplicity of an arbitrary ring.

We include here a elementary ring theory result that really doesn't need a proof here.

Proposition 13.Let \(R_1, R_2,\cdots, R_n\) be rings with units. The direct product\[R=R_1 \times \cdots \times R_n\]

has the following property. Every ideal (no matter left, right or two-sided) of \(R_i\) is an ideal of \(R\). Every minimal ideal of \(R_i\) is an ideal of \(R\). Every minimal ideal of \(R\) is an ideal of some \(R_i\).

The proof is quite similar to how we prove that \(R_i\) is simple in our proof of theorem 9. This actually shows that

Corollary 14.If \(R_1,\cdots,R_n\) are semisimple rings, then so is\[R=R_1 \times \cdots \times R_n.\]

We want to work with matrices, i.e., we want to work with linear equations. This becomes possible because of Wedderburn-Artin ring theory. We don't know what can happen yet, so we can only try to generalise things very carefully.

When talking about matrices, we can talk about endormorphisms as well. So our first step is to find a bridge to endormorphisms. We now to need to consider \(R\) as a left module over itself.

The most immediate one is multiplication. For \(a \in R\), we may consider the multiplication induced by \(a\):

\[\lambda_a:x \mapsto ax.\]

It may looks natural but unfortunately it is not necessarily an endomorphism. The reason is simple because we have \(\lambda_a(yx)\ne y\lambda_a(x)\) in general. However we can define

\[\rho_a:x \mapsto xa.\]

Now \(\rho_a(yx)=y\rho_a(x)\) holds naturally. We can show that every endomorphism is defined in this way. Consider the map \(\rho:a \mapsto (x \mapsto xa)\). We have

\(\rho\) is anti-homomorphism. Indeed, \(\rho(ab)=\rho(b)\rho(a)\) for all \(a,b \in R\) and \(\rho(a+b)=\rho(a)+\rho(b)\).

\(\rho\) is surjective (as a function, not a homomorphism). For any \(\psi:x \mapsto \psi(x)\), we have \(\psi(x)=\psi(x \cdot 1)=x\psi(1)\). Therefore \(\rho(\psi(1))=\psi\).

\(\rho\) is injective. If \(\rho(a)(x)=xa=0\) for all \(x \in R\), then in particular \(\rho(a)(1)=a=0\).

We can call \(\rho\) an *anti-isomorphism* but that causes headaches. Instead, if we consider the opposite ring \(A^{op}\) where addition is the same as \(A\) and multiplication \(\ast\) is given by

\[a \ast b = ba\]

then we have

Proposition 14.Let \(R\) be a ring. There is a natural isomorphism \(R^{op} \cong \operatorname{End}_R(R)\) given by \(a \mapsto (x \mapsto xa)\).

Note \((R^{op})^{op}=R\) so we may be able to take the opposite to decompose \(\operatorname{End}_R(R)\) and take the opposite again.

Now write \(R=\bigoplus_{i=1}^{r}n_iL_i\) as in corollary 12. We therefore have

\[R^{op} \cong \bigoplus_{i=1}^{r}\operatorname{End}_R(n_iL_i).\]

However, by Schur's lemma, \(D_i=\operatorname{End}_R(L_i)\) is a division ring (we don't necessarily have a field here). Therefore

\[\operatorname{End}_R(n_iL_i) \cong \operatorname{Mat}_{n_i}(D_i).\]

For each \(f \in \operatorname{End}_R(n_kL_k)\), we have a corresponding matrix \((p_ift_j)\):

\[L_k \xrightarrow{t_j}L_k \oplus \cdots \oplus L_k \xrightarrow{f} L_k \oplus \cdots\oplus L_k \xrightarrow{p_i}L_k\]

where \(t_j\) is the inclusion and \(p_i\) is projection. This is to say, the isomorphism is given by

\[f \mapsto (p_ift_j)\]

The verification is a matter of linear algebra and techniques frequently used in this post.

Therefore we have

\[R^{op}\cong \bigoplus_{i=1}^{r}\operatorname{Mat}_{n_i}(D_i).\]

Taking the opposite again we have

\[R=(R^{op})^{op} \cong \bigoplus_{i=1}^{r}\operatorname{Mat}_{n_i}(D_i^{op}).\]

The isomorphism \(\operatorname{Mat}_n(D)^{op} \cong \operatorname{Mat}_n(D^{op})\) is given by transpose of a matrix. However, the opposite ring of a division ring is still a division ring, we therefore have a decomposition

\[R=\bigoplus_{i=1}^{r}\operatorname{Mat}_{n_i}(D_i).\]

where \(D_i\) is a division ring.

Conversely, rings of the form above is semisimple. This is easy because for \(R=\operatorname{Mat}_n(D)\), the only proper two-sided ideal is trivial, hence \(J(R)\) is also trivial, but \(R/J(R)\) is semisimple. See the lemma below.

Lemma.Let \(R\) be a ring. All two-sided ideals of \(\operatorname{Mat}_n(R)\) are of the form \(\operatorname{Mat}_n(I)\) where \(I\) is a two-sided ideal of \(R\).

*Proof.* If \(I\) is a two-sided ideal of \(R\), then clearly \(\operatorname{Mat}_n(I)\) is a two-sided ideal of \(\operatorname{Mat}_n(R)\). Conversely, suppose \(J \subset \operatorname{Mat}_n(R)\) is a two-sided ideal, we show that \(J=\operatorname{Mat}_n(I)\) for some \(I \subset R\). To be precise, put

\[I=\{a \in R:\text{$a$ is the $(1,1)$-th element of $A$ for some $A \in J$}\}.\]

Then \(I\) is a two-sided ideal. Now pick some \(A \in \operatorname{Mat}_n(R)\). Let \(E_{ij}\) be the element whose is \(1\) on its \((i,j)\)-th element and \(0\) everywhere else. For any matrix \(A=(a_{ij})\), we have

\[E_{ij}AE_{k\ell}=a_{jk}E_{i\ell}.\]

Therefore if \(A \in J\), then \(a_{11} \in A\) and in particular,

\[E_{1j}AE_{k1}=a_{jk}E_{11} \in J \implies a_{jk} \in I\]

for all \(j,k\). Therefore \(J \subset \operatorname{Mat}_n(I)\). Conversely, for any \(a \in I\), we can find \(A=(a_{ij}) \in J\) such that \(a=a_{11}\). Now \(aE_{i\ell}=E_{i1}AE_{1\ell} \in J\). Note a matrix \(A=(a_{ij}) \in \operatorname{Mat}_n(I)\) can be written in the form \(\sum_{i,\ell}a_{i\ell}E_{i\ell}\) where \(a_{i\ell} \in I\). This proves that \(\operatorname{Mat}_n(I) \subset J\). \(\square\)

It follows that a matrix algebra over a division ring or a field is semisimple. But let's head back to where we were.

The direct sum (or product because it is finite) of matrix algebras over division rings

\[\operatorname{Mat}_{n_1}(D_1) \oplus \cdots \oplus \operatorname{Mat}_{n_r}(D_r).\]

To conclude, we have the Wedderburn-Artin theorem.

Theorem 15 (Wedderburn-Artin).\(R\) is a semisimple ring if and only if it can be written as a direct sum (or product because they are the same when finite) of matrix algebras over some division rings\[R \cong \operatorname{Mat}_{n_1}(D_1) \oplus \cdots \oplus \operatorname{Mat}_{n_r}(D_r).\]

Since the opposite of a division ring is a division ring, we also have

Corollary 16.A ring \(R\) is semisimple if and only if \(R^{op}\) is.

Now back to representation theory. But it can be extremely hard: we have no idea about the division ring. However, when the ring is algebraically closed, there is no problem. Note some author also use *skew field* in place of division ring.

Proposition 17.Let \(K\) be an algebraically closed field and \(D\) be a finite dimensional division ring over \(K\), then \(D \cong K\).

*Proof.* Pick \(a \in D\) that is not \(0\). Note the map \(\rho_a:x \mapsto ax\) is a \(K\)-linear map. Since \(K\) is algebraically closed, \(\rho_a\) has at least one eigenvalue, say \(\lambda\). It follows that

\[(\lambda{e}-a)x=0\]

for some nonzero \(x\) where \(e\) is the unit of \(D\). Since \(D\) is a division ring, we have \(a=\lambda{e}\). We actually established an isomorphism \(a \mapsto \lambda\) and therefore \(D \cong K\). \(\square\)

If you have studied Banach algebra theory, you will realise that this nothing but Gelfand-Mazur theorem (see any book in functional analysis that discusses Banach algebra, for example, *Functional Analysis* by W. Rudin). In infinite dimensional space we have to consider the topology of the field and the algebra.

Therefore we can now state Maschke's theorem in the finest way possible:

Theorem 18 (Maschke).Let \(G\) be a finite group, and \(K\) be an algebraically closed field whose characteristic does not divide the order of \(G\), then\[K[G]=\operatorname{Mat}_{n_1}(K) \oplus \cdots \oplus \operatorname{Mat}_{n_r}(K).\]

Those \(n_i\) are uniquely determined. In particular, \(n_1^2+\cdots+n_r^2=|G|\).

*Algebra Revised Third Edition*, Serge Lang.*Abstract Algebra*, Pierre Antoine Grillet.*Linear Representation of Finite Groups*, Jean-Pierre Serre

\[R'=\mathbb{C}[\cos{x},\sin{x}].\]

in a different style.

Again, if we consider the map

\[\begin{aligned}\Phi:\mathbb{C}[X,Y] &\to \mathbb{C}[\cos{x},\sin{x}] \\ f(X,Y) &\mapsto f(\cos{x},\sin{x})\end{aligned}\]

we will see that \(\ker\Phi=(X^2+Y^2-1)\) and therefore

\[\mathbb{C}[\cos{x},\sin{x}] \cong \mathbb{C}[X,Y]/(X^2+Y^2-1).\]

Following the same step as in the previous post, we can show that \(R'=\mathbb{C}[\cos{x},\sin{x}]\) is Dedekind. However, the map

\[\begin{aligned}\Psi:\mathbb{C}[U,V] &\to \mathbb{C}[X,Y]/(X^2+Y^2-1) \\ g(U,V) &\mapsto \overline{g(X+iY,X-iY)}\end{aligned}\]

shows that

(Proposition 1)\[\mathbb{C}[X,Y]/(X^2+Y^2-1) \cong \mathbb{C}[U,V]/(UV-1) \cong \mathbb{C}[T,T^{-1}] \cong \mathbb{C}[T]_T.\]

The localisation of a UFD is a UFD, hence we see \(\mathbb{C}[\sin{x},\cos{x}]\) is a UFD. There are other ways to do it. For example, we can directly put \(\mathbb{C}[\sin{x},\cos{x}]=\mathbb{C}[e^{ix},e^{-ix}]\). And this is even quicker. As another way, since \(\cos{x}=\frac{e^{ix}+e^{-ix}}{2}\) and \(\sin{x} = \frac{e^{ix}-e^{-ix}}{2i}\), all trigonometric polynomials can be decomposed into the following form

\[f(\cos{x},\sin{x}) = e^{-inx}P(e^{ix})\]

where \(P(X) \in \mathbb{C}[X]\). Conversely, All elements of the form \(e^{-inx}P(e^{ix})\) is in \(\mathbb{C}[\cos{x},\sin{x}]\) and therefore we have an isomorphism

\[\begin{aligned}\Lambda: \mathbb{C}[T]_{T} &\to \mathbb{C}[\cos{x},\sin{x}], \\ T &\mapsto \cos{x}+i\sin{x}.\end{aligned}\]

Note it follows that \(T^{-1}\) maps to \(\cos{x}-i\sin{x}\).

Now we return to the identity

\[\sin^2{x}=(1-\cos{x})(1+\cos{x}).\]

In \(\mathbb{R}[\cos{x},\sin{x}]\), since \(\sin{x}\), \(1-\cos{x}\), \(1+\cos{x}\) are all irreducible, or more precisely, elements of the form \(a+b\sin{x}+c\cos{x}\) are irreducible where \((b,c) \ne (0,0)\), we see \(\mathbb{R}[\cos{x},\sin{x}]\) is a UFD. In fact, we can deduce the fact that \(R\) is not a UFD by the fact that \(Cl(R) \cong \mathbb{Z}/2\mathbb{Z}\), i.e., the ideal class group is nontrivial (corollary 3.22).

However, since \(R'\) is a UFD, \(\sin^2{x}=(1-\cos{x})(1+\cos{x})\) tells us *nothing*. We need to figure out why and what is going on. To work with it we consider the form \(R'=\mathbb{C}[T,T^{-1}]\). What are irreducible elements in this ring? We will make use of the fact that \(\mathbb{C}\) is algebraically closed (why not!). Since \(T\) and \(T^{-1}\) are units in this ring, we can use them to modify the degree of an element. More precisely, as an application of the fundamental theorem of classical algebra,

\(P(T)=\sum_{j=m}^{n}a_jT^{j}\) (you should be reminded of Laurent series!) is irreducible where \(m,n \in \mathbb{Z}\) if and only if \(Q(T)=T^{-m}P(T)\) is irreducible. However, \(Q(T) \in \mathbb{C}[T]\) is irreducible if and only if \(Q\) is of degree \(1\)), which is equivalent to say that \(n-m=1\) in \(P(T)\).

Therefore irreducible elements in the form \(aT^m+bT^{m+1}\) where \(a,b \ne 0\) . Dropping \(bT^m\) because it is a unit, we obtain a finer result:

(Proposition 2)Irreducible elements of \(R'\) is of the form\[\cos{x}+i\sin{x}+a, a \in \mathbb{C}^\ast.\]

With this being said, \(\sin{x}\), \(1-\cos{x}\) and \(1+\cos{x}\) are all *not* irreducible. For example, for \(\sin{x}\) we actually have

\[\begin{aligned}\sin{x}&=\frac{1}{2i}(e^{ix}-e^{-ix})\\ & = \frac{1}{2ie^{ix}}(e^{2ix}-1) \\ & = \frac{1}{2ie^{ix}}(e^{ix}+1)(e^{ix}-1) \\ & = \frac{1}{2ie^{ix}}(\cos{x}+i\sin{x}+1)(\cos{x}+i\sin{x}-1)\end{aligned}\]

We can find some obvious facts about these two rings. For example, \(R\) is a free \(\mathbb{R}[\cos{x}]\)-algebra with basis \(\{1,\sin{x}\}\) (note all \(\sin^nx\) of even degree can be transformed into \(\cos{x}\) by the relation \(\sin^2{x}=1-\cos^2{x}\)). Likewise \(R'\) is a free \(\mathbb{C}[\cos{x}]\)-algebra with basis \(\{1,\sin{x}\}\). We can also write \(R'\) as \(R \oplus iR\) or \(R[i]\). That is, \(R'\) is a free \(R\)-algebra with basis \(\{1,i\}\). These are quite elementary and don't touch the structure of polynomial pretty much. Now we touch it by studying the quotient field of \(R\) and \(R'\) respectively.

Treating \(R\) as a free \(\mathbb{R}[\cos{x}]\)-algebra, we can write any polynomial \(f(\cos{x},\sin{x})\) as

\[f(\cos{x},\sin{x})=P(\cos{x})+Q(\cos{x})\sin{x}\]

where \(P,Q \in \mathbb{R}[X]\). For simplicity we write \(f=P+Q\sin{x}\). Suppose we now have \(f=P_1+Q_1\sin{x}\) and \(g=P_2+Q_2\sin{x}\) with \(g \ne 0\), then

\[\begin{aligned}\frac{f}{g} &= \frac{P_1+Q_1\sin{x}}{P_2+Q_2\sin{x}} \\ &= \frac{(P_1+Q_1\sin{x})(P_2-Q_2\sin{x})}{(P_2+Q_2\sin{x})(P_2-Q_2\sin{x})} \\ &=\frac{P_1P_2-Q_1Q_2(1-\cos^2{x})+(P_2Q_1-P_1Q_2)\sin{x}}{P_2^2+Q_2^2(1-\cos^{2}{x})}\end{aligned}\]

Therefore every element of \(K(R)\) can be written in the form \(U(\cos{x})+V(\cos{x})\sin{x}\) where \(U,V \in \mathbb{R}(\cos{x})\), the rational field of \(\cos{x}\) over \(\mathbb{R}\). Since \(\sin^2{x} \in \mathbb{R}(\cos{x})\), we obtain:

(Proposition 3)The quotient field of \(R\) is\[K(R)=\mathbb{R}(\cos{x})[\sin{x}].\]

Likewise,

\[K(R')=\mathbb{C}(\cos{x})[\sin{x}]\]

can be proved in exactly the same way.

Since \(R\) is Dedekind, it is integrally closed in \(K(R)\). But what about its relation with \(K(R')\)? For this we have an elegant result:

(Proposition 4)\(R'\) is the integral closure of \(R\) in \(K(R')\).

*Proof.* Let \(C\) be the closure of \(R\) in \(K(R')\). Note \(K(R')=K(R)[i]\). For any \(f+ig \in C\), we see \(f \in R\) and \(g \in R\) and hence \(f+ig \in R'\) because \(f,g \in K(R)\) and \(R\) is integrally closed. Therefore \(C \subset R'\). Conversely, any \(f+ig \in R'\) is in \(C\) because \(f,g \in R \subset C\) and \(i \in C\). Therefore \(R' \subset C\). \(\square\)

*We are using the notation that Hartshorne used in his book Algebraic Geometry.*

Put \(f(X,Y)=X^2+Y^2-1\), then \(Y=Z(f)\) is an irreducible affine curve in the affine space \(A^2_{\mathbb{C}}\). This curve is non-singular everywhere because the matrix

\[\begin{pmatrix}\partial f/\partial X \\\partial f/\partial Y\end{pmatrix} = \begin{pmatrix}2X \\2Y\end{pmatrix}\]

has rank \(1\). The coordinate ring \(A(Y)\) is exactly \(R'\).

Let \(P\) be a point on \(Y\), which, by Hilbert's Nullstelensatz, corresponds to a unique maximal ideal \(\mathfrak{m}_P \subset A(Y)\cong R'\). Since \(R'\) is a PID, and by proposition 2, \(\mathfrak{m}_P=(\cos{x}+i\sin{x}+a)\) where \(a \ne 0\). Hence \(P\) corresponds to a nonzero complex number \(a\).

(Proposition 5)Every point \(P\) on the curve \(Z(X^2+Y^2-1)\) corresponds to a unique nonzero complex number \(a \in C^\ast\).

Since \(Y\) is nonsingular, it also follows that \(\dim_{\mathbb{C}}\mathfrak{m}/\mathfrak{m}^2=\dim R'=1\) for all maximal ideal of \(R'\). This is to say, the tangent space is always of dimension \(1\) as a \(\mathbb{C}\)-vector space, or \(2\) as a \(\mathbb{R}\)-space. Besides, if we localise it at \(\mathfrak{m}_P\), we see \(\mathcal{O}_{P,Y} \cong R'_{\mathfrak{m}_P}\) is always a regular local ring.

*Introduction to Commutative Algebra*, M. F. Atiyah & I. G. MacDonald.*Algebraic Geometry*, Robin Hartshorne.*Commutative Ring Theory and Applications*, edited by Marco Fontana, Salah-Eddine Kabbaj and Sylvia Wiegand.

\[\chi:M \to K^\ast.\] By trivial character we mean a character such that \(\chi(M)=\{1\}\). We are particularly interested in the linear independence of characters. Functions \(f_i:M \to K\) are called **linearly independent over \(K\)** if whenever \[a_1f_1+\cdots+a_nf_n=0\] with all \(a_i \in K\), we have \(a_i=0\) for all \(i\). \(\def\Tr\operatorname{Tr}\)

In Fourier analysis we are always interested by functions like \(f(x)=e^{-inx}\) or \(g(x)= e^{-ixt}\), corresponding to Fourier series (integration on \(\mathbb{R}/2\pi\mathbb{Z}\)) and Fourier transform. Later mathematicians realised that everything can be set in a locally compact abelian (LCA) group. For this reason we need to generalise these functions, and the bounded ones coincide with our definition of characters.

Let \(G\) be a LCA group, then \(\gamma:G \to \mathbb{C}\) is called a *character* if \(|\gamma(x)|=1\) for all \(x \in G\) and \[\gamma(x+y)=\gamma(x)\gamma(y).\] Note since \(G\) is automatically a monoid, this coincide with our ordinary definition of character. The set of *continuous* characters form a group \(\Gamma\), which is called the *dual group* of \(G\).

If \(G=\mathbb{R}\), solving the equation \(\gamma(x+y)=\gamma(x)\gamma(y)\) in whatever way he or she likes we obtain \(\gamma(x)=e^{Ax}\) for some \(A \in \mathbb{C}\). But \(|e^{Ax}| \equiv 1\) (or merely being bounded) forces \(A\) to be purely imaginary, say \(A=it\), then we have \(\gamma(x)=e^{itx}\). Hence the dual group of \(\mathbb{R}\) can be determined by (the speed of) rotation on the unit circle.

With this we have our generalised version of Fourier transform. Let \(G\) be a LCA group, \(f \in L^1(G)\), then the **Fourier transform** is given by \[\hat{f}(\gamma) = \int_G f(x)\gamma(-x)dx, \quad \gamma \in \Gamma.\] One can intuitively verify that \(\hat{f}\) is exactly the Gelfand transform of \(f\), the step of which will be sketched below. On one hand, one can indeed verify that \(f \mapsto \hat{f}(\gamma)\) is indeed a Banach algebra homomorphism \(L^1(G) \to \mathbb{C}\), for all \(\gamma \in \Gamma\). This is a plain application of Fubini's theorem. On the other hand, let \(h:L^1(G) \to \mathbb{C}\) be any non-trivial Banach algebra homomorphism. One can investigate that \(\| h \| =1\) and hence \(h\) is a bounded linear functional. By Riesz's representation theorem, there is some \(\phi \in L^\infty(G)\) with \(\| \phi\|_\infty = 1\) such that \[h(f) = \int_G f(x)\phi(x)dx.\] We can indeed assume that \(\phi\) is continuous. With \(h\) being algebra homomorphism, we can see \[\phi(x+y)=\phi(x)\phi(y).\] We know that \(|\phi(x)| \le 1\) but \(\phi(-x)=\phi(x)^{-1}\) forces \(|\phi(x)|=1\). The proof is done after some routine verification of uniqueness.

Indeed, with this identification, we can also identify \(\Gamma\) as the maximal ideal space of \(L^1(G)\), which results in the following interesting characterisation.

If \(G\) is discrete, then \(\Gamma\) is compact; if \(G\) is compact, then \(\Gamma\) is discrete.

*Proof.* If \(G\) is discrete, then \(L^1(G)\) has a unit. The maximal ideal space, which can be identified as \(\Gamma\), is a compact Hausdorff space.

If \(G\) is compact, then its Haar measure can be normalised so that \(m(G)=1\). We prove that the singleton containing the unit alone is an open set. Let \(\gamma \in \Gamma\) be a character \(\ne 1\), then there exists some \(x_0\) such that \(\gamma(x_0) \ne 1\). As a result, \[\int_G \gamma(x)dx = \gamma(x_0)\int_G \gamma(x-x_0)dx = \gamma(x_0)\int_G \gamma(x)dx\] and hence \(\int_G\gamma(x)dx=0\). If \(\gamma=1\) then \(\int_G \gamma(x)=1\).

Besides, the compactness of \(G\) implies the constant function \(f \equiv 1\) is in \(L^1(G)\). As a result, \(\hat{f}(1)=1\) but \(\hat{f}(\gamma)=0\) whenever \(\gamma \ne 1\). But \(\hat{f}\) is continuous, \(\{\gamma:{f}(\gamma) \ne 0\}=\{1\}\) is open. \(\square\)

If characters of \(G\) are linear independent, then they are pairwise distinct, but what about the converse? Dedekind answered this question affirmatively. But his approach is rather complicated: it needed determinant. However, Artin found a neat way to do it:

Theorem (Dedekind-Artin)Let \(M\) be a monoid and \(K\) a field. Let \(\chi_1,\dots,\chi_n\) be distinct characters of \(G\) in \(K\). Then they are linearly independent over \(K\).

*Proof.* Suppose this is false. Let \(N\) be the smallest integer that \[a_1\chi_1+a_2\chi_2+\cdots+a_N\chi_N = 0\] but not all \(a_i\) are \(0\), for distinct \(\chi_i\). Since \(\chi_1 \ne \chi_2\), there is some \(z \in M\) such that \(\chi_1(z) \ne \chi_2(z)\). Yet still we have \[a_1\chi_1(zx)+\cdots+a_N\chi_N(zx)=0.\] Since \(\chi_i\) are characters, for all \(x \in M\) we have \[a_1\chi_1(z)\chi_1(x)+\cdots+a_N\chi_N(z)\chi_N(x)=0.\] We now have a linear system \[\begin{pmatrix}a_1 & a_2 & \cdots & a_N \\a_1\chi_1(z) & a_2\chi_2(z) & \cdots & a_N\chi_N(z)\end{pmatrix}\begin{pmatrix}\chi_1 \\\chi_2 \\\vdots \\\chi_N\end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}\] If we perform Gaussian elimination once, we see \[\begin{pmatrix}a_1 & a_2 & \cdots & a_N \\0 & \left(\frac{\chi_2(z)}{\chi_1(z)}-1\right)a_2 & \cdots & \left(\frac{\chi_N(z)}{\chi_1(z)}-1\right)a_N\chi_N(z)\end{pmatrix}\begin{pmatrix}\chi_1 \\\chi_2 \\\vdots \\\chi_N\end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}\] But this is to say \[\left(\frac{\chi_2(z)}{\chi_1(z)}-1\right)a_2\chi_2 + \cdots + \left(\frac{\chi_N(z)}{\chi_1(z)}-1\right)a_N\chi_N(z)\chi_N=0\] Note by assumption \(\frac{\chi_2(z)}{\chi_1(z)}-1 \ne 0\) and therefore we found \(N-1\) distinct and linearly independent characters, contradicting our assumption. \(\square\)

As an application, we consider an \(n\)-variable equation:

Let \(\alpha_1,\cdots,\alpha_n\) be distinct non-zero elements of a field \(K\). If \(a_1,\cdots,a_n\) are elements of \(K\) such that for all integers \(v \ge 0\) we have \[a_1\alpha_1^v + \cdots + a_n\alpha_n^v = 0\] then \(a_i=0\) for all \(i\).

*Proof.* Consider \(n\) distinct characters \(\chi_i(v)=\alpha^v\) of \(\mathbb{Z}_{\ge 0}\) into \(K^\ast\). \(\square\)

The linear independence of characters gives us a good chance of studying the relation of the field extension and the Galois group.

Hilbert's Theorem 90 (Modern Version)Let \(K/k\) be a Galois extension with Galois group \(G\), then \(H^1(G,K^\ast)=1\) and \(H^1(G,K)=0\). This is to say, the first cohomology group is trivial for both addition and multiplication.

It may look confusing but the classic version is about cyclic extensions (\(K/k\) is cyclic if it is Galois and the Galois group is cyclic).

Hilbert's Theorem 90 (Classic Version, Multiplicative Form)Let \(K/k\) be cyclic of degree \(n\) with Galois group \(G\) generated by \(\sigma\). Then \[\frac{\ker N}{1/\sigma{A}} \cong 1\] where \(1/\sigma{A}\) consists of all elements of the form \(\alpha/\sigma(\alpha)\) with \(\alpha \in A\), and \(N(\beta)\) is the norm of \(\beta \in K\) over \(k\).

This corresponds to the statement that \(H^1(G,K^\ast)=1\). On the other hand,

Hilbert's Theorem 90 (Classic Version, Additive Form)Let \(K/k\) be cyclic of degree \(n\) with Galois group \(G\) generated by \(\sigma\). Then \[\frac{\ker \Tr}{(1-\sigma){A}} \cong 0\] where \((1-\sigma)A\) consists of all elements of the form \((1-\sigma)(\alpha)\) with \(\alpha \in A\), and \(\Tr(\beta)\) is the norm of \(\beta \in K\) over \(k\).

This corresponds to, of course, the statement that \(H^1(G,K)=0\). Note this indeed asserts an exact sequence \[0 \to k \to K \xrightarrow{1-\sigma} K \xrightarrow{\Tr} K \to 0.\] Before we prove it we recall what is group cohomology. Let \(G\) be a group. We consider the category **\(G\)-mod** of left \(G\)-modules. The set of morphisms of two objects \(A\) and \(B\), for which we write \(\operatorname{Hom}_G(A,B)\), consists of all objects of \(G\)-set maps from \(A\) to \(B\). The *cohomology groups of \(G\) with coefficients in \(A\)* is the right derived functor of \(\operatorname{Hom}_G(\mathbb{Z},-)\): \[H^\ast (G,A) \cong \operatorname{Ext}^\ast_{\mathbb{Z}[G]}(\mathbb{Z},A).\] It follows that $H^0(G,A) _G(Z,A)=A/ga-a:g G,a A $. In particular, if \(G\) is trivial, then \(\operatorname{Hom}_G(\mathbb{Z},-)\) is exact and therefore \(H^\ast(G,A)=0\) whenever \(\ast \ne 0\). We will see what will happen when \(G\) is a Galois group of a Galois extension. If the modern version is beyond your reach, you can refer to the classic version. As a side note, the modern version can also be done using Shapiro's lemma.

$$ which is to say \(\alpha_\tau = \gamma/\tau\gamma\). Replacing \(\gamma\) with \(\gamma^{-1}\) gives what we want: cocycle coincides with coboundary. So much for the multiplicative form.

For the additive form, take \(\theta \in K \setminus \ker Tr\). Given a \(1\)-cocycle \(\alpha\) in the additive group \(K\), we put \[\beta = \frac{1}{\Tr(\theta)}\sum_{\tau \in G}\alpha_\tau \tau(\theta)\] Since cocycle satisfies \(\alpha_{\sigma\tau}=\alpha_\sigma+\sigma\alpha_\tau\), we get \[\sigma\beta = \frac{1}{\Tr(\theta)}\sum_{\tau \in G}(\alpha_{\sigma\tau}-\alpha_\sigma)\sigma\tau(\theta) = \beta -\alpha_\sigma\] which gives \(\alpha_\sigma = \beta-\sigma\beta\). Replacing \(\beta\) with \(-\beta\) gives what we want. \(\square\)

*Additive form.* Pick any \(\beta-\sigma\beta\), we see \(\Tr(\beta-\sigma\beta)=\sum_{\tau \in G}\tau\beta-\sum_{\tau \in G}\tau\beta=0\).

Conversely, assume \(\Tr(\alpha)=0\). By Artin's lemma, the trace function is not trivial, hence there exists some \(\theta \in K\) such that \(\Tr(\theta)\ne 0\), then we take \[\beta = \frac{1}{\Tr(\theta)}[\alpha\theta^\sigma+(\alpha+\sigma\alpha)\theta^{\sigma^2}+\cdots+(\alpha+\sigma\alpha+\cdots+\sigma^{n-2}\alpha)\theta^{\sigma^{n-1}}]\] where for convenience we write \(\sigma\theta=\theta^\sigma\). Therefore \[\beta-\sigma\beta = \frac{1}{\Tr(\theta)}\alpha(\theta+\theta^{\sigma}+\theta^{\sigma^2}+\cdots+\theta^{\sigma^{n-1}})=\alpha\] because other terms are cancelled. \(\square\)

*Multiplicative form.* This can be done in a quite similar setting. For any \(\alpha=\beta/\sigma\beta\), we have \[N(\alpha)=N(\beta)/N(\sigma\beta)=\left(\prod_{\tau \in G}\tau\beta\right)/ \left( \prod_{\tau \in G}\tau\sigma\beta\right)=1.\] Conversely, assume \(N(\alpha)=1\). By Artin's lemma, following function is not trivial: \[\Lambda:\operatorname{id}+\alpha\sigma+\alpha^{1+\sigma}\sigma^2+\cdots+\alpha^{1+\sigma+\cdots+\sigma^{n-2}}\sigma^{n-1}.\] Suppose now \(\beta=\Lambda(\theta) \ne 0\). It follows that \[\begin{aligned}\alpha\beta^\sigma &= \alpha(\theta+\alpha\theta^\sigma+\cdots+\alpha^{1+\sigma+\cdots+\sigma^{n-2}}\theta^{\sigma^{n-1}})^\sigma \\&= \alpha(\theta^\sigma+\alpha^\sigma\theta^{\sigma^2}+\cdots+\underbrace{\alpha^{\sigma+\sigma^2+\cdots+\sigma^{n-1}}\theta^{\sigma^n}}_{=\alpha^{-1}\theta}) \\&= \alpha\theta^\sigma+\alpha^{1+\sigma}+\cdots+\alpha^{1+\sigma+\cdots+\sigma^{n-2}}\theta^{n-1}+\theta \\&=\beta\end{aligned}\] and this is exactly what we want. \(\square\)

Consider the extension \(\mathbb{Q}(i)/\mathbb{Q}\). The Galois group \(G=\{1,\tau\}\) is cyclic and generated by \(\tau\) the complex conjugation. Now we pick whatever \(N(a+bi)=a^2+b^2=1\) where \(a,b \in \mathbb{Q}\), we have some \(r=s+ti \in \mathbb{Q}(i)\) such that \[a+bi = \frac{s+ti}{s-ti}=\frac{s^2-t^2+2sti}{s^2+t^2}= \frac{s^2-t^2}{s^2+t^2}+\frac{2st}{s^2+t^2}i\] If we put \((x,y,z)=(s^2-t^2,2st,s^2+t^2)\), we actually get a Pythagorean triple (if \(s,t\) are fractions, we can multiply them with the \(\gcd\) of the denominators so they are integers.). Conversely, all Pythagorean triple \((x,y,z)\), we assign it with \(\frac{x}{z}+\frac{y}{z}i \in \mathbb{Q}(i)\) then we have an element of norm \(1\). Through this we have found all solutions to \(x^2+y^2=z^2\). i.e.

TheoremIntegers \(x,y,z\) satisfy the Diophantine equation \(x^2+y^2=z^2\) if and only if \((x,y,z)\) is proportional to \((m^2-n^2,2mn,m^2+n^2)\) for some integers \(m,n\).

This can be generalised to all Diophantine equations of the form \(x^2+Axy+By^2=Cz^2\) for some nonzero constant \(C\) and constant \(A,B\) such that the discriminant \(A^2-4B\) is square-free. You can find some discussion here.

The additive form is a good friend of "character \(p\)" things. Artin-Schreier's theorem is a good example of \(p\)-to-the-\(p\).

Theorem (Artin-Schreier)Let \(k\) be a field of character \(p\) and \(K/k\) an extension of degree \(p\). Then there exists \(\alpha \in K\) and \(\alpha\) is the zeroof an equation \(X^p-X-a=0\) for some \(a \in k\).

*Proof.* Note the Galois group \(G\) of \(K/k\) is cyclic and \(\Tr(-1)=p(-1)=0\), we are able to use the additive form. Let \(\sigma\) be the generator of \(G\), there exists some \(\alpha \in K\) such that \[\sigma\alpha = \alpha+1.\] Hence \(\sigma(\sigma(\alpha))=\sigma(\alpha+1)=\alpha+1+1\), and by induction we get \[\sigma^i(\alpha) = \alpha+i, \quad i=1,2,\cdots,p\] and \(\alpha\) has \(p\) conjugates. Therefore \([k(\alpha):k] \ge p\). But in the meantime \[[K:k]=[K:k(\alpha)][k(\alpha):k]\] we can only have \([K:k(\alpha)]=1\), which is to say \(K=k(\alpha)\). In the meantime, \[\sigma(\alpha^p-\alpha)=(\alpha+1)^p-(\alpha+1)=\alpha^p+1^p-\alpha-1 = \alpha^p-\alpha.\] Hence \(\alpha^p - \alpha\) lies in the fixed field of \(\sigma\), which happens to be \(k\). Putting \(a=\alpha^p-\alpha\) and our proof is done. \(\square\).

For the case when the character is \(0\) please see here. There is a converse, which deserves a standalone blog post. It says that the polynomial \(f(X)=X^p-X-a\) either has one root in \(k\), in which case all its roots are in \(k\); or it is irredcible, in which case if \(\alpha\) is a root then \(k(\alpha)\) is cyclic of degree \(p\) over \(k\). But I don't know if many people are fans of "character \(p\)" things.

- Serge Lang,
*Algbra, Revised Third Edition*. - Charles A. Weibel,
*An Introduction to Homological Algebra*. - Noam D. Elkies,
*Pythagorean triples and Hilbert’s Theorem 90*. (https://abel.math.harvard.edu/~elkies/Misc/hilbert.pdf) - Jose Capco,
*The Two Artin-Schreier Theorems*. (https://www3.risc.jku.at/publications/download/risc_5477/the_two_artin_schreier_theorems__jcapco.pdf) - Walter Rudin,
*Fourier Analysis on Groups*.

In fact, the \(\mathbb{R}^n\) case can be generalised into any locally compact abelian group (see any abstract harmonic analysis books), this is because what really matters here is being locally compact and abelian. But at this moment we stick to Euclidean spaces. Note since \(\mathbb{R}^n\) is \(\sigma\)-compact, all Borel measures are regular.

To read this post you need to be familiar with some basic properties of Banach algebra, complex Borel measures, and the most important, Fubini's theorem.

The norm on \(M(\mathbb{R}^n)\) is the *total variation*: \[\lVert \mu \rVert = |\mu|(\mathbb{R}^n) = \sup \sum_{i=1}^{\infty}|\mu(E_i)|\] the supremum being taken over all partitions \((E_i)\) of \(\mathbb{R}^n\). The supremum on the right-hand side is finite because \(\mu\) is assumed to be complex. This norm makes \(M(\mathbb{R}^n)\) normed but we are interested in proving this space to be Banach.

Note each measure in \(M(\mathbb{R})\) gives rise to a bounded complex functional \[\begin{aligned}\Phi_\mu:C_0(\mathbb{R}^n) &\to \mathbb{C} \\ f &\to \int_{\mathbb{R}^n}fd\mu.\end{aligned}\] Note we have \(\vert \Phi_\mu(f)\vert = |\int f d\mu| \le \int |f|d|\mu| <\infty\). Indeed the norm of \(\Phi_\mu\) is \(\lVert \mu \rVert\).

Conversely, every bounded linear functional \(\Phi\) gives rise to a regular Borel measure \(\mu\) such that \(\Phi(f)=\int fd\mu\) and \(\lVert \Phi \rVert = \lVert \mu \rVert\), which is ensured by Riesz representation theorem. This is to say \[C_0(\mathbb{R}^n)^\ast \cong M(\mathbb{R}^n)\] in the sense of vector space isomorphism and homeomorphism (in fact, isometry). But it is well known that the dual space of a normed vector space is a Banach space, hence \(M(\mathbb{R}^n)\) is Banach as is expected.

A vector space \(V\) over a field \(\mathbb{F}\) is called an algebra if there is an \(\mathbb{F}\)-bilinear form \[B:V \times V \to V \]

. It is a Banach algebra if \(V\) itself is Banach and the bilinear form is associative, i.e. \(B(x,B(y,z)) = B(B(x,y),z)\) and \[\lVert B(x,y) \rVert \le \lVert x \rVert \lVert y \rVert.\] We show that \(M(\mathbb{R}^n)\) is Banach by taking \(B(\lambda,\mu)=\lambda \ast \mu\).

The convolution of measures is defined in the style of convolution of functions, in a natural sense. For any Borel set \(E \subset \mathbb{R}^n\), we can consider the set restricted by addition: \[E_2 = \{(x,y):x+y \in E\} \subset \mathbb{R}^{2n}.\] Then we define the convolution of \(\mu,\lambda \in M(\mathbb{R}^{2n})\) by product measure \[(\mu \ast \lambda)(E) = (\mu \times \lambda)(E_2).\] It looks natural but we need many routine verifications.

First, we need to show that \(E_2\) is Borel. In fact, we have \[\chi_{E_2}(x,y) = \chi_E(x+y).\] Since \(E\) is Borel, we see \(\chi_E\) is Borel. Meanwhile \(\varphi(x,y)=x+y\) is continuous hence Borel. Therefore \(\chi_{E_2}\) is Borel as well. It follows that \(E_2\) is a Borel set.

Next, is \(\mu \ast \lambda\) an element of \(M(\mathbb{R}^n)\)? For any Borel set \(E\), the value of \(\mu \ast \lambda(E)\) is defined in \(\mathbb{C}\), so we only need to verify that the definition of measure is satisfied. It shall be shown that \[(\mu \ast \lambda)\left(\bigcup_{k=1}^{\infty}E^k\right)=\sum_{k=1}^{\infty}(\mu \ast \lambda)(E^k)\] where \(E^k\) are pairwise disjoint. Since the "measure" of \(E\) is connected to \(E_2\), we first show that if \(E\) and \(F\) are disjoint, then so are \(E_2\) and \(F_2\). Indeed, if \((x,y) \in E_2 \cap F_2\), then we have \(x+y \in E \cap F\), and the set cannot be empty. Hence pairwise disjoint is preserved. Putting \(E= \bigcup_{k=1}^{\infty}E^k\), we also need to show that \(E_2 = \bigcup_{k=1}^{\infty}E_2^k\). If \(x+y \in E\), then it lies in one of \(E^k\), hence \((x,y) \in E_2 \implies (x,y) \in E_2^k\) for some \(k\). It follows that \(E_2 \subset \bigcup_{k=1}^{\infty}E_2^k\). Conversely, for \((x,y) \in \bigcup_{k=1}^{\infty}E_2^k\), we must have some \(k\) such that \(x+y \in E^k \subset E\), hence \((x,y) \in E_2\), which is to say that \(\bigcup_{k=1}^{\infty}E_2^k \subset E_2\). Therefore \[(\mu \ast \lambda)(E) = (\mu \times \lambda)(E_2) = (\mu \times \lambda)\left( \bigcup_{k=1}^{\infty}E_2^k\right) = \sum_{k=1}^{\infty}(\mu \times \lambda)(E_2^k) = \sum_{k=1}^{\infty}(\mu \ast \lambda)(E^k)\] as is desired.

For any \(f \in C_0(\mathbb{R}^n)\), we have a linear functional \[\Phi:f \mapsto \iint f(x+y)d\mu(x)d\lambda(y) = \int fd(\mu \ast \lambda)\] By Riesz representation theorem, there exists a unique measure \(\nu\) such that \(\Phi(f)=\int fd\nu\), it follows that \(\nu = \mu \ast \lambda\) is uniquely determined. However, we have \[\iint f(x+y)d\mu(x)d\lambda(y) = \iint f(x+y)d\lambda(x)d\mu(y)=\int fd(\lambda \ast \mu)\] It follows that \(\lambda \ast \mu = \nu = \mu \ast \lambda\). This convolution is commutative. Note for complex measures we always have \(|\mu|(\mathbb{R}^n)<\infty\) so Fubini's theorem is always valid.

Next we show that \(\ast\) is associative. It can be carried out by Riesz's theorem. Put \(\nu_1 = \lambda \ast (\mu \ast \gamma)\) and \(\nu_2 = (\lambda \ast \mu) \ast \gamma\). It follows that \[\begin{aligned} \int fd\nu_1 &= \iint f(x+y)d\lambda(x)d(\mu \ast \gamma)(y) \\ &= \iiint f(x+y+z)d\lambda(x)d\mu(y)d\gamma(z) \\ &= \iint f(x+y+z)d\gamma(z)d\lambda(x)d\mu(y) \\ &= \iint f(x+y)d\gamma(x)d(\lambda \ast \mu)(y) \\ &= \int fd(\gamma \ast (\lambda \ast \mu)) \\ &= \int fd\nu_2.\end{aligned}\] Hence \(\nu_1 = \nu_2\), which delivers the associativity of the convolution. To show that \(\ast\) makes \(M(\mathbb{R}^n)\) a Banach space, we need to show the distribution law. This follows from the definition of product measure because \[\mu \ast (\lambda_1 + \lambda_2)(E) = \int (\lambda_1 + \lambda_2)(E_{2}^{x})d\mu(x) = \int \lambda_1(E_2^x)d\mu(x) + \int \lambda_2(E_2^x)d\mu(x)\] which is to say \(\mu \ast \lambda_1 + \mu \ast \lambda_2 = \mu \ast (\lambda_1 + \lambda_2)\). Therefore \(M(\mathbb{R}^n)\) is a complex algebra. It remains to show that \(M(\mathbb{R}^n)\) is a Banach algebra. Let \(E^1, E^2, \cdots\) be a partition of \(\mathbb{R}^n\), we see \[\begin{aligned}\sum_{k=1}^{\infty}|\mu \ast \lambda(E^k)| &= \sum_{k=1}^{\infty}|(\mu \times \lambda)(E^k_2)| \\&= \sum_{k=1}^{\infty} \left|\iint \chi_{E_2^k}d\mu d\lambda\right| \\ &\leq \sum_{k=1}^{\infty} \iint \chi_{E_2^k}d|\mu|d|\lambda| \\&\leq |\mu|(\mathbb{R}^n) \cdot |\lambda|(\mathbb{R}^n) \\&\leq \lVert \mu \rVert \cdot \lVert \lambda \rVert.\end{aligned}\] Hence \(\lVert \mu \ast \lambda\rVert \le \lVert \mu \rVert \lVert \lambda \rVert\).

To conclude, \(M(\mathbb{R}^n)\) is a commutative Banach algebra. Even better, this space has a unit which is customarily called the **Dirac measure**. Let \(\delta\) be the measure determined by the evaluation functional \(\Lambda:f \mapsto f(0)\). It follows that \[\begin{aligned}\int f d(\delta \ast \mu) &= \iint f(x+y)d\delta(x)d\mu(y) \\ &= \int f(y)d\mu(y) \end{aligned}\] Hence \(\delta \ast \mu = \mu\) for all \(\mu \in M(\mathbb{R}^n)\). Besides, \(\delta\) has norm \(1\) because it attains value \(1\) at any Borel subset \(E \subset \mathbb{R}^n\) containing the origin and value \(0\) at any other Borel sets.

A measure \(\mu\) is said to be

discreteif there is a countable set \(E\) such that \(\mu(A)=\mu(A \cap E)\) for all measurable sets \(A\) (in general we say \(\mu\) is concentrated on \(E\)). \(\mu\) is said to becontinuousif \(\mu(A)=0\) whenever \(A\) only contains a single point. We write \(\mu \ll \lambda\), \(\mu\) isabsolutely continuouswith respect to \(\lambda\), if \(\lambda(A)=0 \implies \mu(A)=0\).

We now play some games between continuous and discrete measures. First, we study the subspace of discrete measures \(M_d(\mathbb{R}^n)\). For sum, things are quite straightforward. Suppose \(\mu\) is concentrated on \(A\) and \(\lambda\) is concentrated on \(B\), then \[\begin{aligned}\mu(E) + \lambda(E) &= \mu(E \cap A) + \lambda(E \cap B) \\ &= \mu(E \cap (A \cap (A \cup B))) + \lambda(E \cap (B \cap (A \cup B))) \\ &= \mu(E \cap (A \cup B))+ \lambda(E \cap (A \cup B)).\end{aligned}\] Hence \(\mu+\lambda\) is concentrated on \(A \cup B\).

For convolution, things are a little trickier. Suppose \(\mu = \sum_{i=1}^{\infty}a_i\delta_{x_i}\), \(\lambda=\sum_{i=1}^{\infty}b_i\delta_{y_i}\), where the \(x_i\) and \(y_i\) are distinct points, \(\delta_x\) is the Dirac measure concentrated on \(\{x\}\) (hence \(\delta=\delta_0\)), i.e. \(\mu\) is concentrated on \(A=\{x_i\}_{i=1}^{\infty}\) and \(\lambda\) is concentrated on \(\{y_i\}_{i=1}^{\infty}\), we see \[\begin{aligned}(\mu \ast \lambda)(E) &= \iint \chi_E(x+y)d\mu(x)d\lambda(y) \\ &= \int \sum_{i=1}^{\infty}a_i\chi_E(x_i+y)d\lambda(y) \\ &= \sum_{j=1}^{\infty}\sum_{i=1}^{\infty}a_ib_j\chi_E(x_i+y_j) \\ &= \sum_{j=1}^{\infty}\sum_{i=1}^{\infty}a_ib_j\chi_{E \cap (A+B)}(x_i+y_j) \\ &= (\mu \ast \lambda)(E \cap (A+B)).\end{aligned}\] Therefore \(M_d(\mathbb{R}^n)\) forms a subalgebra of \(M(\mathbb{R}^n)\).

Next, we focus on the subspace of continuous measures \(M_c(\mathbb{R}^n)\). To begin with, we first consider the following identity: \[\begin{aligned}(\mu \ast \lambda)(E) &= \iint \chi_E(x+y)d\mu(x)d\lambda(y) \\ &= \iint \chi_{E-y}(x)d\mu(x)d\lambda(y) \\ &= \int \mu(E-y)d\lambda(y).\end{aligned}\] Suppose \(\mu\) is continuous and \(E\) is a singleton, then \(E-y\) is still a singleton and hence \(\mu(E-y)=0\) for all \(y\), hence \((\mu \ast \lambda)(E)=0\), i.e. \(\mu \ast \lambda\) is still continuous. Therefore the subspace of continuous measures actually forms an ideal.

Next, suppose \(\mu \ll m\) and \(m(E)=0\). We see \[(\mu \ast \lambda)(E) = \int \mu(E-y)d\lambda(y) = 0\] because \(m(E)=0\) implies \(m(E-y)=0\) for all \(y\). Hence the subspace of absolutely continuous measures \(M_{ac}(\mathbb{R}^n)\) also forms an ideal.

Finally, we consider the Radon-Nikodym derivatives (which exists (surjective) and is unique almost everywhere (injective)) of absolutely continuous measures. If \[\mu(E) = \int_E fdm, \quad \lambda(E) = \int_E gd\mu,\] then the coincide \(\mu \ast \lambda\) coincide with \(f \ast g\) in the following sense: \[\begin{aligned}(\mu \ast \lambda)(E) &= \int_{\mathbb{R^n}} \mu(E-t)d\lambda(t) \\ &= \int_{\mathbb{R^n}}\left(\int_{E}f(x+t)dm(x) \right)g(t)dm(t) \\ &= \int_{\mathbb{R}^n}\int_E f(x+t)g(t)dm(x)d(t) \\ &= \int_E (f \ast g)dm\end{aligned}\] In other words, we have \(d(\mu \ast \lambda) = (f \ast g)dm\). Through this, we established an algebraic isomorphism \(M_{ac}(\mathbb{R}^n) \cong L^1(\mathbb{R}^n,m)\).

\(L^1(\mathbb{R}^n,m)\) could've been a Banach algebra, but the unit is missing. However one can embed it into \(M(\mathbb{R}^n)\) as a subspace of the subalgebra \(M_{L^1}(\mathbb{R}^n)\) which contains all complex Borel measures \(\mu\) satisfying \[d\mu = fdm + \lambda d\delta, \quad \lambda \in \mathbb{C}.\] Conversely, by the Lebesgue decomposition theorem, to every \(\mu \in M(\mathbb{R}^n)\), we have a unique decomposition \[\mu = \mu_a + \mu_s\] where \(\mu_a \ll m\) and \(\mu_s \perp m\). With this being said we have a direct sum \[M(\mathbb{R}^n) = L^1(\mathbb{R}^n,m) \oplus M_s(\mathbb{R}^n)\] where \(M_s(\mathbb{R}^n)\) is the subspace of complex measures singular to \(m\). Informally speaking, the Gelfand transform on \(L^1(\mathbb{R}^n,m)\) can be identified as the Fourier transform. Hence to study the Gelfand transform on \(M(\mathbb{R}^n)\) it suffices to work on \(M_s(\mathbb{R}^n)\). This shows the relation between \(L^1\) and \(C_0\).

\(G\) be the group of invertible elements of \(M=M(\mathbb{R})\), and \(G_1\) be the component of \(G\) that contains \(\delta\). \(G_1\) is an open normal subgroup of \(G\). Since \(M\) is commutative, \(G_1=\exp(M)\), and \(G/G_1\) contains no nontrivial element of finite order. We will show that \(G/G_1\) is actually uncountable. Pick \(\alpha \in \mathbb{R}\), assume \(\delta_\alpha \in G_1\), then \(\delta_\alpha = \exp(\mu_\alpha)\) for some \(\mu_\alpha \in M\). Performing Fourier transform on both sides gives \[\int e^{-ixt}d\delta_\alpha = e^{-i\alpha t} = \int e^{-ixt}d\exp(\mu_\alpha)(x)=e^{\hat{\mu}_\alpha(t)}\] Hence \[-i\alpha t = \hat{\mu}_\alpha(t)+2k\pi{i}\] Since \(\mu_\alpha\) is bounded, so is \(\hat{\mu}_\alpha(t)\). Hence \(\alpha=0\). This is to say \(\delta_\alpha \in G_1 \implies \alpha=0\). Next, consider any \(\lambda{G_1} \in G/G_1\). If \(\lambda=\delta_\alpha\) for some real \(\alpha\), then \(\delta_\alpha \in \lambda G_1\) is the only Dirac measure. If not, however, then \(\lambda G_1\) contains no Dirac measures. Hence we have obtained an injective but not surjective map \[\begin{aligned}\Lambda:\mathbb{R} &\to G/G_1, \\ \alpha &\mapsto \delta_\alpha G_1.\end{aligned}\] This is to say, \(G/G_1\) is uncountable.

]]>To begin with we consider a calculus problem that you may have seen in your exam:

Let \(f\) be a

continuousfunction on \([0,\infty)\) that \(\lim_{x \to \infty} f(x)=l\). Prove that \[\int_0^\infty \frac{f(ax)-f(bx)}{x}\mathrm{d}x = (f(0)-l)\ln\frac{b}{a}\]

And we solve this problem as follows. Put \(g(x)=f(x)-l\), then \(\lim_{x \to \infty}g(x)=0\). Consider the two variable function \(F(x,y)=-g'(xy)\) and the range \(D=\{(x,y):x \ge 0, a \le y \le b\}\), we have this result: \[\begin{aligned}\iint_D F(x,y)\mathrm{d}x\mathrm{d}y &= \int_0^\infty\mathrm{d}x\int_a^b -g'(xy)\mathrm{d}y \\ &= \int_0^\infty \frac{g(ax)-g(bx)}{x}\mathrm{d}x \\ &= \int_a^b \mathrm{d}y \int_0^\infty -g'(xy)\mathrm{d}x \\ &= \int_a^b \frac{g(0)}{y}\mathrm{d}y \\ &=g(0)\ln\frac{b}{a} \\\end{aligned}\] Substituting \(g(x)\) with \(f(x)-l\) gives exactly what we want, isn't it? **Well, the more analysis you learn, the more absurd this proof has been you will realise.** If you write this in an exam you will get \(0\) mark no matter what. There are two major mistakes:

- Can we change the order of integration? We have no idea. But it is certain that we cannot change the order with ease, and we have some counterexamples.
- Is this function
*even*differentiable? We also have no idea. It is*almost certain*that \(f\) is not (the probability that \(f\) is differentiable is \(0\)), see this post to learn why if you have some background in functional analysis.

For a good proof, please turn to math.stackexchange. This is not easy at all.

The problem is, it is really *unfair* that in some circumstances we have to axe out all properties of differentiation. If you are studying differential equations, and a non-differentiable function pops up, you have no way to go. Sometimes, chances are that you even have *no idea* whether a function is differentiable.

So this post is written. We introduce the concept of (Schwartz) **distribution** (a.k.a. **generalised functions**), where differentiation is significantly extended, to obtain **derivative** in a generalised sense. Roughly speaking, after distribution being introduced, differentiation can be done with absolute ease.

In fact, physicists have been using distribution long before mathematicians established formal theories. For example the \(\delta\) *function* introduced by Dirac that you may have met in Fourier transform: \[\delta(x) = \begin{cases}\infty &\quad x=0, \\ 0&\quad \text{others} .\end{cases}\] And it is required that \[\int_{-\infty}^{\infty}\delta(x)\mathrm{d}x=1.\] But this does not make any sense in calculus. Von Neumann, in his book on quantum physics, warned against the theory using this function, and dismissed this function because this was a "fiction". Not so pleasant. He tried with a lot of effort to demonstrate that, quantum physics could live without such a "fiction". As you can imagine, this function may have created some bad blood between von Neumann and Dirac.

Laurent Schwartz however, managed to be a peacemaker. He developed the theory of distribution (which is exactly what we are talking about in this post), and the "fiction" became an easy "fact". Years later, he became the 1950 Fields Medalist (one of the most prestigious medal/awards in mathematics) at the age of 35 with reason

Developed the theory of distributions, a new notion of generalized function motivated by the Dirac delta-function of theoretical physics. (Source)

As you can see later, thanks to Schwartz, the twisted \(\delta\) function is well-defined and is really plain and elegant. So von Neumann didn't need to be angry later.

By *concept* I mean, I will try to include basic ideas (without many proofs though they can be delivered), so that the serious study of it can be simpler (it can be really tough!). It is not possible that you can solve problems on distributions after reading this post.

There will be two parts. Part one focus on motivation and what is going on. I will try to make it readable to many people having finished calculus or more ideally undergraduate analysis and linear algebra, though rigour is not always guaranteed. It would be better if you know some differential equation theory, but that's not a must. If you already have the background to read part 2, then part 1 is much easier for you and therefore is served as a good source of intuition and motivation.

If you still need to understand differentiation in single-variable calculus, then you have no need to struggle on generalised differentiation at an early point. It does not help. The requirements of linear algebra are vector spaces, subspaces and linear maps. You should know that integration and differentiation are linear maps. This is a graduate course topic, it is not realistic to assume reader to have no idea about calculus and linear algebra.

The second part will be much more advanced, and you are expected to have some background in topological vector spaces (functional analysis). Both parts cannot be considered as a lecture note but they may help you find where you are when you study this concept seriously.

Throughout, we consider functions on \(\mathbb{R}\) with real value. These theories can be generalised to \(\mathbb{R}^n\) with complex value where partial derivative can take part in, but we are not doing that here. At the end of the day, these work would not be a big deal.

In calculus, a lot of functions we study are smooth (for example, \(y=\sin{x}\)), and we write \(C^\infty\) as they are *infinitely differentiable*. This is a vector space and this vector space differentiation can be done *with absolute ease*. For given \(f \in C^\infty\), we have \(f',f'',\cdots,f^{(k)}\) well defined for all \(k = 1,2,\cdots\). But in vector spaces like \(C^2\), \(C^1\), or even \(C\), differentiation can only be done with caution: we may only have \(f''\) and no \(f^{(3)}\), or even \(f'\) does not exist. We don't *feel like* this kind of caution. Hence we introduce the concept of **distribution** which is also known as **generalised functions**. We want a space where we can still do differentiation with absolute ease. We may need to *modify* our definition of differentiation such that it works on every continuous functions (but it shall not lost its meaning within \(C^\infty\)). Bearing these in mind, we have several settings or expectations for distributions:

- Every continuous function should be (considered as) a distribution. (So we can take
derivativesfor all continuous functions without to many worry. Unlike the calculus problem at the beginning.)- The "modified differentiation" should make sure that the "modified derivative" of a distribution is still a distribution. In other words, distributions are "infinitely differentiable" (which makes differential equation theory much easier). In the language of algebra, the "modified derivative" should be an endomorphism.
- The usual formal rules of calculus should hold. For example in the new sense we should still have \((fg)'=f'g+g'f\). (Our modified differentiation should not go to far.)
- Convergence properties should also be available. (Validating this requires more theories so this can only be mentioned in part 2.)

Let's write our desired distribution as \(\mathscr{D}'\), and all continuous functions \(C\). All \(C,C^\infty,\mathscr{D}'\) are considered as real vector spaces and we should have \[C^\infty \subset C \subset \mathscr{D}'\] in the sense of subspaces.

Here is a breakdown of these concepts. You will see terminologies and definitions later.

- A smooth, continuous or more generally, locally integrable function, give rise to a bounded linear functional. The converse is not guaranteed to be true, but we
pretendit to be true, so allbounded linear functionalsgive rise to distributions, a.k.a. generalised functions (this name is nice because wepretendthe converse to be true).Whenever you are asked what is generalised function, you can say, it is a linear map, and sometimes it can be determined by a normal function.- For these distributions or generalised functions, we modify the derivative with respect to integration by parts. The modified derivative cannot be put down explicitly but we don't care, because integration by parts doesn't give us many problems.
Whenever you are asked how the derivative of a non-differentiable function is given, you can say, it is given by pretending nothing wrong in integration by parts.

We now try to understand what we really what about distribution. We start our study through integration, **because differentiation does not work**. Given \(f \in C \subset \mathscr{D}'\), we first need to make sure \(\int f\phi\) is well-defined, for *some* \(\phi\in C^\infty\), because we want to do integration by parts, which involves **some differentiation**, and we may make use of it.

If \(f\) is not even a continuous function, we still need to consider *some* \(\phi\) in the same manner, or our extension would be abrupt.

Let's talk about these \(\phi\) a little bit, with respect to integration by parts. Consider the bump function \[\phi(x) = \begin{cases} \exp(\frac{1}{(x-a)(x-b)}) & \quad a < x < b, \\ 0 &\quad \text{ otherwise. }\end{cases}\] On \((a,b)\), we have $ $. On the boundary \(a\) and \(b\) we have \(\phi(x)=0\) but that shouldn't be a problem, because they are the alpha and omega. Points outside \([a,b]\) have no contribution to the value of this function. For some obvious reason we call \([a,b]\) the *closure* of \((a,b)\). In general, given a real-valued function \(f\), we call the closure of the set of points where \(f(x) \ne 0\) the **support** of \(f\). As you can tell, the support of \(\phi\) is \([a,b]\).

If \(\phi\) has unbounded support (the support of a function \(f\) is the closure of the set of points \(x\) where \(f(x) \ne 0\)), then we may need to discuss limit at infinity. But we don't want improper integrals at all. Hence the support of \(\phi\) are always assumed to be **closed and bounded** subset of \(\mathbb{R}\) It is closed because it is defined to be a closure. These closed and bounded sets are called *compact* sets. If you are not familiar with topology, it is OK at this moment to consider compact sets as bounded closed interval \([a,b]\).

The test function space \(\mathscr{D}\) is defined to be all \(C^\infty\) functions with compact support. This is indeed a vector space and the verification is a good excise on both linear algebra and calculus. What about \(\mathscr{D}'\)? Here we demonstrate how things are extended.

For each \(f \in C\) (which contains \(C^\infty\)), we have a functional (a functional is a linear map between a vector space and its base field, here is \(\mathbb{R}\). Nothing special, just a different name that has been used by mathematicians for decades!) \[\begin{aligned} \Lambda_f: \mathscr{D} &\to \mathbb{R}, \\ \phi &\mapsto \int f\phi.\end{aligned}\] This functional is **bounded** for all \(\phi \in \mathscr{D}\) because if \(\phi\) has support \(K\), then \[|\Lambda_f(\phi)|=\left|\int_K f\phi\right| \le \left(\int_K |f| \right)\sup_{x \in K}|\phi|.\] A continuous function on a compact set is always bounded (proof), hence the integral on the right hand side is always bounded. If it touches infinity a lot of problems are also touched.

In general, a **bounded linear functional** \(\Lambda:\mathscr{D} \to \mathbb{R}\) is called a *distribution*, which forms \(\mathscr{D}'\) exactly. Since every continuous function \(f\) gives rise to a unique bounded functional \(\Lambda_f\), we consider \(C\) as a subspace of \(\mathscr{D}'\). Such a function give rise to a functional, which is called distribution. The converse is not generally true, but we *pretend* it to be true (we pretend the functional gives rise to a function anyway), which makes our study easier, hence the name *generalised function* is well-deserved.

Differential operator \(D\) in \(C^\infty\) should be extended naturally into \(\mathscr{D}'\) naturally. There are many ways to extend a linear function. For example the identity map \(i:\mathbb{R} \to \mathbb{R}\) has at least two ways to be extended into \(\mathbb{R}^2\):

- \(I:\mathbb{R}^2 \to \mathbb{R}^2\) by \((x,y) \mapsto (x,y)\).
- \(\pi:\mathbb{R}^2 \to \mathbb{R}\) by \((x,y) \mapsto x\).

The restriction of these two maps on \(\mathbb{R}\) is the same as \(i\).

But if we extend \(D\) in several ways, things would be messy. Originally derivative is defined in the sense of limit, but for a non-differentiable function, we cannot do that. We need an extension that makes most sense: it is by validating **integration by parts**. It seems like we are developing some advanced concepts, but still we need to make use of elementary ones.

For \(f(x)=\sin{x}\) and \(\phi \in \mathscr{D}\), we have \[\Lambda_{f'}(\phi)=\int f'\phi = \int \phi\cos{x} = \underbrace{\phi\sin{x}|_{-\infty}^{\infty}}_{\text{zero}} -\int \phi'\sin x=-\Lambda_f(\phi')\] The derivative of \(f\) is assigned to the derivative of \(\phi\). Again we are using integration by parts. If \(f\) is not assumed to be differentiable, we *pretend* it is, skip the body and jump to the result immediately. For example, \(f(x)=|x|\) is not differentiable, but we do that anyway: \[\int |x|'\phi = -\int |x|\phi'.\] In general for \(f \in C^\infty\), we have (this can be verified by some computation) \[\Lambda_{D^k f}(\phi)=\int D^k f \phi = (-1)^k \int fD^k\phi = (-1)^k \Lambda _f(D^k\phi).\] Differentiation for distributions (on top of \(C^\infty\) functions) should be in the same **shape**, hence we define the \(k\)-th **distribution derivative** of a distribution \(\Lambda\) by \[D^k\Lambda: \phi \mapsto (-1)^{k}\Lambda(D^k\phi).\] Since all \(\phi\) are assumed to be of \(C^\infty\), there are no problem with this formula and this differentiation is defined for all \(\Lambda\). We don't care about first order limit on a continuous but not differentiable function. What matters here is the differentiation on test functions.

Try to recall what you have learnt about integration by parts. We have \[\int uv' = \int (uv)' - \int u'v\] because \[(uv)' = u'v+uv'.\] Therefore, if our generalisation of differentiation (though we do not know how to do yet) pays respect to integration by parts, then we can still work on product rule of differentiation, hence the usual formal rules of calculus would not go too far. If our extension conflicts with integration by parts, then the ordinary meaning of differentiation is damaged.

Let's sum up what has happened. We have obtained an inclusion \[C^\infty \subset C \subset \mathscr{D}'.\] Every distribution is infinitely differentiable because functions in \(\mathscr{D}\) are. If \(f \in C^\infty\), then the \(k\)-th derivative can be understood in both the sense of ordinary differentiation and the sense of distribution because it is given by \[\phi \mapsto (-1)^k\int f \phi^{(k)} = \int f^{(k)}\phi\quad \forall \phi \in \mathscr{D}. \] This is independent to the choice of \(\phi\). If \(h\) is a function such that \(\int h\phi = \int f^{(k)}\phi\), then \(h=f^{(k)}\).

If \(f\) is merely continuous, still we can write the \(k\)-th derivative as \[\phi \mapsto (-1)^{k} \int f \phi^{(k)} \quad \forall \phi \in \mathscr{D}.\]

At this point, whether \(f\) is differentiable or not is not of our concern. Since \(\phi\) is smooth, the formula above is well-defined. In general we don't even care whether \(f\) is continuous or even integrable, as long as it gives rise to a **bounded** linear functional, which can be guaranteed by being *locally integrable*. A function is locally integrable if \(\int_K |f|<\infty\) for all compact \(K \subset \mathbb{R}\). In particular, \(K\) can be taken to be any bounded closed interval. **As long as \(f\) is locally integrable (for example, differentiable, continuous, or simply bounded), we can assign derivative in the new sense (integration by parts).**

We want something like \((fg)'=f'g+fg'\). To avoid confusion we use \(D\) to denote the derivative on distribution and \(f'\) to denote the derivative in the ordinary sense. This is pretty hard but for a multiplication of a \(C^\infty\) function and a distribution it is not that hard. Suppose \(\Lambda \in \mathscr{D}'\) and \(f \in C^\infty\). We define their 'product' by \[(f\Lambda)(\phi) = \Lambda(f\phi).\] We have another distribution and derivative follows in a natural way: \[\begin{aligned} D(f\Lambda)(\phi) &=-(f\Lambda)(\phi') \\ &= -\Lambda(f\phi') \\\end{aligned}\] Meanwhile \[\begin{aligned}(f'\Lambda+fD\Lambda)(\phi) &= \Lambda(f'\phi)+D\Lambda(f\phi) \\ &= \Lambda(f'\phi)-\Lambda(f'\phi+f\phi') \\ &=-\Lambda(f\phi').\end{aligned}\] Things still work in this aspect.

We haven't verify convergence yet, but that requires much more knowledge on functional analysis, so we don't do that here but in part 2. Fortunately, things would go in an intuitive way.

Consider the linear functional on \(\mathscr{D}\) by \[\delta(\phi)=\phi(0).\] This is bounded and is in fact our rigour definition of Dirac \(\delta\) function (Von Neumann can relax then!). It does have the *required property*. Say, if we realise this function as integration (informally) as \[\delta(\phi)=\int \delta\phi=\phi(0) \quad \forall \phi \in \mathscr{D},\] then \(\delta\) can indeed be considered as a *function* whose support is the origin, and the integral over \(\mathbb{R}\) is \(1\).

The *derivative* of \(\delta\) is well-presented as well. Note \(\delta'(\phi)=\delta(\phi')\), hence we have \[\delta'(\phi)=\phi'(0).\]

So much for part 1. If you don't have many background in functional analysis, then part 2 is not recommended, as you have no idea what is going on at all. It is not feasible to make part 2 to be readable to more people.

Here we provide some basic facts of test functions and distributions, assuming the reader some background in functional analysis. No proof is delivered because if I do this post can be as long as I want. I hope by organising facts here I can help you realise what is going on before you drown yourself in details of a proof. It is recommended to see the table of content on the right hand side first if you are on PC.

In brief, test functions are smooth functions with compact support. By the **support** of a function we mean the *closure* of the set \(\{x:f(x) \ne 0\}\). Let \(K\) be a compact set in \(\mathbb{R}\), then \(\mathscr{D}_K\) denotes a subspace of \(C^\infty\) whose support lies in \(K\). Since a closed subset of a compact set itself is compact, we see all functions in \(\mathscr{D}_K\) have compact support.

Test function space is defined by \[\mathscr{D} := \bigcup_{K \text{ compact}}\mathscr{D}_K.\] And the distribution space \(\mathscr{D}'\) is defined to be the dual space of \(\mathscr{D}\), i.e. the space of *continuous* linear functionals of \(\mathscr{D}\). But if we don't know the topology of \(\mathscr{D}\), we cannot proceed. *Here is how we attempt to establish the norm.*

Consider the norm for \(\phi \in \mathscr{D}\) for all \(N=0,1,2,\cdots\) by \[\| \phi \|_N = \sup_{x \in \mathbb{R}; n \le N}|D^nf|.\] This induces a local base \[V_N = \left\{ \phi \in \mathscr{D}_K:\|\phi\|_N \le \frac{1}{N} \right\} \quad (N=1,2,3,\cdots).\]

And we get a locally convex metrisable topology on \(\mathscr{D}\).

If this topology makes \(\mathscr{D}\) a Banach space, then it would be fantastic - a lot of Banach space technique can be used. However, this topology is too *small* to be complete. One simply need to consider this sequence: \[\psi_m(x)=\phi(x-1)+\frac{1}{2}\phi(x-2)+\cdots+\frac{1}{m}\phi(x-m)\] where \(\phi \in \mathscr{D}_{[0,1]}\) and \(\phi>0\) on \((0,1)\). This sequence is Cauchy but the limit has no bounded support hence does not lie in \(\mathscr{D}\).

This time we do an *enhancement* on the previous topology, which makes \(\mathscr{D}\) a locally convex topological space, which is complete and has the Heine-Borel property (closed and bounded set is compact and vice versa). We still need the topology defined in our first attempt. It is broken into three steps:

- For each compact set \(K\), let \(\tau_K\) denote the subspace topology of \(\mathscr{D}\) defined in attempt 1.
- Let \(\beta\) be the collection of all convex balanced set \(W \subset \mathscr{D}\) such that \(\mathscr{D}_K \cap W \in \tau_K\) for all compact \(K\). (A set \(W\) is balanced if \(\alpha{W} \subset W\) for all \(|\alpha| \le 1\).)
- The new topology \(\tau\) is defined to be the collection of all unions of sets of the form \(\phi + W\) with \(\phi \in \mathscr{D}\) and \(W \in \beta\).

This is the topology we want, and one can indeed verify that \(\tau\) is a topology, with local base \(\beta\). This topology has the following properties:

- \(\tau\) makes \(\mathscr{D}\) a locally convex topological vector space.
- \(\mathscr{D}\) has the Heine-Borel property.
- In \(\mathscr{D}\), every Cauchy sequence converges.

Locally, **the topology of \(\mathscr{D}_K\) is the same as \(\tau_K\)**. Hence we can still use properties of these norms if we want. In fact, this \(\tau_K\) makes \(\mathscr{D}_K\) a Fréchet space, i.e. locally compact and complete metric space.

We cannot discuss continuity without topology. But still continuity has to be treated carefully. For example the space \(L^p([0,1])\) with \(0<p<1\) is weird: the dual space is trivial, due to its topology: the only two open convex sets are empty set and itself. Fortunately we have the following, which is quite intuitive.

Suppose \(\Lambda\) is a linear mapping of \(\mathscr{D}\) into a locally compact convex space \(Y\) (which can be \(\mathbb{R}\), \(\mathbb{C}\) or \(\mathscr{D}\) itself). Then the following are equivalent:

- \(\Lambda\) is continuous. (We care about the behaviour of \(\mathscr{D}'\))
- \(\Lambda\) is bounded. (You must have learnt the equivalence of 1 and 2 already)
- \(\phi_i \to 0\) in \(\mathscr{D}\) implies \(\Lambda\phi_i \to 0\) in \(Y\).
- The restriction of \(\Lambda\) to every \(\mathscr{D}_K\) is continuous.

In particular, it follows that the differential operator \(D^n\) is continuous for all \(n\). We also have some knowledge of the behaviour of \(\mathscr{D}'\) now:

If \(\Lambda\) is a linear functional on \(\mathscr{D}\), then the following are equivalent:

- \(\Lambda \in \mathscr{D}'\).
- To every compact set \(K\) there corresponds a nonnegative integer \(N\) and a constant \(C<\infty\) such that the inequality
\[|\Lambda\phi| \le C \|\phi\|_N\]

holds for every \(\mathscr{D}_K\).

Consider the Dirac distribution on \(x\) given by \[\delta_x(\phi)=\phi(x)\quad \phi \in \mathscr{D}.\] This is indeed a distribution. The case when \(x=0\) gives us the Dirac function in physics. Note \[\mathscr{D}_K = \bigcap_{x \in K^c}\ker\delta_x,\] \(\mathscr{D}_K\) is a **closed subspace** of \(\mathscr{D}\). Since \(\mathscr{D}_K\) is also nowhere dense, and there is a countable collection of \(K_i \subset \mathbb{R}\) (for example \(K_i=[-i,i]\)) such that \(\mathscr{D} = \bigcup \mathscr{D}_i\) (of the first category), and \(\mathscr{D}\) itself is complete, by Baire's Category Theorem, \(\mathscr{D}\) is not metrisable. This is a flaw of the topology of \(\mathscr{D}\), though is not that troublesome.

We have shown that every \(C^\infty\) functions can be considered as a distribution. In general, for a function \(f\) one only need to require that \(f\) is **locally integrable**, i.e. for every compact set \(K\) we have \[\int_K |f|<\infty.\] If we define \(\Lambda_f:\phi \mapsto \int f\phi\), we see \[|\Lambda_f(\phi)|\le \left( \int_K |f| \right)\sup|\phi|, \quad \phi \in \mathscr{D}_K.\]

In particular, at the very least, all \(L^1\) functions can be considered as distributions.

On the other hand, if \(\mu\) is a positive measure on \(\mathbb{R}\) with \(\mu(K)<\infty\) for all compact \(K\), then \[\Lambda_\mu:\phi \to \int \phi d\mu\] also defines a distribution.

We know the fundamental theorem of calculus in \(L^1\) only hold when the function \(f\) is *absolutely continuous*. The Cantor function \(f\) is differentiable almost everywhere on \([0,1]\) but \[\int_0^1 f'(x)\mathrm{d}x = 0, \quad f(1)-f(0)=1.\] This restriction still makes sense here. Pick \(f\) to be a left-continuous function with bounded variation. Then it can be shown that \[D\Lambda_f = \Lambda_\mu\] where \(\mu([a,b))=f(b)-f(a)\). Hence \(D\Lambda_f=\Lambda_{Df}\) if and only if \(f\) is *absolutely continuous*.

We consider the weak*-topology of \(\mathscr{D}'\) by \[\Lambda_i \to \Lambda: \lim_{i \to \infty}\Lambda_i\phi = \Lambda\phi \quad \forall \phi \in \mathscr{D}.\] Then fortunately this limit operator commutes with differential operator in a natural way, which may remind you of uniform convergence. In fact, \[\Lambda_i \to \Lambda \implies \Lambda \in \mathscr{D} \text{ and }D^k\Lambda_i \to D^k\Lambda \quad \forall k=1,2,\cdots.\] To prove this one needs Banach-Steinhaus theorem. Here concludes our four requirements of distributions.

Convolution plays an important role in Fourier analysis, and here is how to invite distribution to the party.

Normally for two \(L^1\) functions \(f,g\) we define \[(f \ast g)(x)=\int_\mathbb{R}f(y)g(x-y)\mathrm{d}y.\] We can create more symbols to make life easier:

- \(\tau_xu(y)=u(y-x)\).
- \(\check{u}(y)=u(-y)\).

It follows that \(\tau_x\check{u}(y)=\check{u}(y-x)=u(x-y)\). Hence \[(f \ast g)(x) = \int_\mathbb{R} f(y)(\tau_x\check{g})(y)\mathrm{d}y.\] It shows that \(g \to (f \ast g)(x)\) is actually a linear functional of \(\Lambda_f\), \(\tau_x\) and \(g \mapsto \check{g}\). But \(\Lambda_f\) itself can be a distribution, hence we define convolution for a distribution and a smooth function by \[L \ast \phi(x) = L(\tau_x\check{\phi}), \quad L \in \mathscr{D}', \phi \in \mathscr{D}.\] Convolution can be characterised in a natural way. In fact, for any \(T:\mathscr{D} \to C^\infty\), if \[\tau_x T = T\tau_x,\] then there is a unique \(L \in \mathscr{D}'\) such that \[T\phi = L\ast \phi.\] As you can imagine, this setting creates a lot of potentials for Fourier transform.

- Walter Rudin,
*Functional Analysis*, Second Edition. (Part II of the book) - Peter Lax,
*Functional Analysis*. (Appendix B) - Stanford Encyclopedia of Philosophy Archive (Fall 2018 Edition), Quantum Theory: von Neumann vs. Dirac.

Let us say you are a programmer who has been working in big companies for a decade. How does it feel when you want to help someone who starts studying programming from scratch? You may find it makes no sense that he or she cannot understand that, by copying several lines of code on the book, they has successfully made a programme printing "Hello, world!" on the screen. You know what I am talking about - the curse of knowledge.

When one has successfully learnt some certain skill, they may immediately lose the sense on why other people cannot understand and study. What is the holdup? It becomes increasingly difficult to teach beginners. Blunt simplification does not do the trick all the time.

This is one of the reasons why becoming a good teacher is so hard. Academia superstars may be super awful in teaching, while teaching superstars may have already ceased focusing on academia.

I am not writing this post to be a guru and give some steps on how to lift the curse. In fact I think I am suffering from this as well.

For example, Tien-Yien Li was a famous curse of knowledge lifter. When he did talks, he always tried to start from simple examples (this is adorable of course). When instructing his students, he may ask his students to treat him as a fool, as if he had known nothing. He was indeed a good mathematician and good maths teacher, but I do wonder how practical it is. Can his students do calculus in front of him while assuming he has no idea what is calculus? I have no idea.

Though I am only guessing, I think 'fool' is somewhat over-exaggerating. His students were in the similar field as him, hence it would not be too hard to follow his student at all. Of course the way he instruct his students is adorable as well.

There was a reader emailed me, giving me suggestion on, well, I should write my post simpler at some certain points. But I declined his suggestion in the end. Am I doing some Serge Lang thing? I have no idea.

In his 1983 book Fundamentals of Diophantine Geometry, he included L. J. Mordell's review of Lang's own book Diophantine Geometry which was ended by

In conclusion, the reader will need no convincing that Lang, as has already been said, is a very learned mathematician, thoroughly familiar with every aspect of the topics he deals with, and their developments. His interesting and valuable historical notes give further evidence of this. Lang assumes that his readers are as knowledgeable as he is, and can grapple with the subject with the same ease that he does. Even if they could, Lang's style is not such as to make matters easy for them. Lang in writing is not a follower of Gauss, whose motto was "

pauca sed matura." Further thought and care about his book, before publication, would have been well worth while. Those who can understand the book will be indebted to him for having brought together in one volume the important results contained in it. How much greater thanks would he have earned if the book had been written in such a way that more of it could have been more easily comprehended by a larger class of readers! It is to be hoped that so me one will undertake the task of writing such a book.And he also included his response:

All my books are meant to be understood by readers having the prerequisites for the level at which the books are written.

These prerequisites vary from book to book, depending on the subject matter, my mood, and other aesthetic feelings which I have at the moment of writing.When I write a standard text in Algebra, I attempt something very different from writing a book which for the first time gives a systematic point of view on the relations of Diophantine equations and the advanced contexts of algebraic geometry. The purpose of the latter is to jazz things up as much as possible. The purpose of the former is to educate someone in the first steps which might eventually culminate in his knowing the jazz too, if his tastes allow him that path. And if his tastes don't, then my blessings to him also. This is known as aesthetic tolerance. But just as a composer of music (be it Bach or the Beatles),I have to take my responsibility as to what I consider to be beautiful, and write my books accordingly, not just with the intent of pleasing one segment of the population. Let pleasure then fall where it may.With best regards, Serge Lang.

*Refer to this reddit post for a discussion.*

I can speak with absolute certainty that my posts are much more detailed than Serge Lang. And Lang never tried to lift the curse. But my posts cannot be readable to everyone. Say my posts on functional analysis, is not prepared for middle school students, unless they are ridiculously exceptional and have studied all prerequisites (linear algebra, real analysis, integration theory, topology) at that time. Though I shall never make my posts as terse as in Lang's book, it is never my duty to make my posts readable for everyone. So to some extent I fail as well.

If I try to, over-simplification has to be admitted. And it is against my rule. I do not like over-simplification so I try to make sure everything makes sense. But one would not understand unless he or she has certain prerequisites. I may recover some obstacles and show the clues, but that is so much for it. I can only lift the curse with respect to a certain group of people.

It seems I did not give a thoughtful discussion. But I do hope my inbox gives me good chance for discussion instead of chance to spark unnecessity. I did not try to close myself and a good evidence is that many of my posts can be found on the first page of Google search.

]]>Throughout we consider the polynomial ring \[R=\mathbb{R}[\cos{x},\sin{x}].\] This ring has a lot of non-trivial properties which give us a good chance to study commutative ring theory.

First of all, note it is immediate that \[R \cong \mathbb{R}[X,Y]/(X^2+Y^2-1)\] if the map is given by \(X \mapsto \cos x\) and \(Y \mapsto \sin x\). Besides, in \(R\) we have \[\sin^2x=(1-\cos{x})(1+\cos{x})=\sin{x}\cdot\sin{x}\] which indicates us to study whether \(R\) is a UFD. In fact, it is not, because the ideal class group is \(\mathbb{Z}/2\mathbb{Z}\). A Dedekind domain is a UFD if and only if its ideal class group is trivial (corollary 3.22).

This blog post is inspired by an exercise on Serge Lang's *Algebra*. But when writing this blog post, I found some paywalls. It would be absurd of me to direct a random reader to these paywalls. So it is very likely that I will include proofs as many as possible (when there is an absurd paywall, and chances are I will rework them for readability). But I can't remove the assumption that the reader has finished Atiyah-MacDonald full book or equivalences at the very least. I will add more topics in the future but that is not an easy job.

By Hilbert's basis theorem, \(\mathbb{R}[\cos{x},\sin{x}]\) is Noetherian because it is a finite \(\mathbb{R}\)-algebra. Now we are interested in the normality of it. Since \(\mathbb{R}[X,Y]/(X^2+Y^2-1) \cong \mathbb{R}[X][Y]/(Y^2-(1-X^2))\) and \(2\) is a unit, \(1-X^2\) is square-free but not a unit, we can apply the following lemma to show that \(R\) is a normal Noetherian ring (integrally closed in its field of fraction). For the definition and properties of a normal ring, please refer to the stack project.

(Lemma 1)Let \(A\) be a factorial ring with the field of fraction \(K\) in which \(2\) is a unit, \(a\) in \(A\) a square-free element (i.e., if \(p\) is a prime element in \(A\), then \(a \not\in p^2A)\) which is not a unit. Then \(A[T]/(T^2-a)\) is normal.

Let \(t\) be the image of \(T\) in \(A[T]/(T^2-a)\) and in \(L\). Then it is clear that \(A[t] \cong A[T]/(T^2-a)\) and we can write \(L=K(t)\). Note an element in \(K(t)\) is of degree at most \(1\), which is to say every element in \(L\) can be written uniquely as a sum \(r+st\) where \(r,s \in K\). To prove integral closeness, we need to find minimal polynomial of \(r+st\).

Next we show when \(A[t]\) is integrally closed. Note \[ \begin{aligned} \left[(r+st)-r\right]^2=(st)^2 &= s^2[T^2+(T^2-a)]\\ &= s^2[a+T^2-a+(T^2-a)]\\ &= as^2 \end{aligned}\] Hence \(f(X)=(X-r)^2-as^2\) sends \(r+st\) to \(0\). For polynomial of degree \(1\), we can only write \(g(X)=X-X\) such that \(g(r+st)=0\), which is absurd. Hence \(f(X)\) is the minimal polynomial of \(r+st\). With these being said, \(r+st\) is integral over \(A[t]\) if and only if \(-2r \in A[t]\) and \(r^2-as^2 \in A[t]\). We need to show this implies \(r+st \in A[t]\). Since we can consider \(A\) to be a subring of \(A[t]\), it suffices to show that \(r,s \in A\), provided \(-2r \in A\) and \(r^2-as^2 \in A\) when \(s \ne 0\).

Since \(2\) is a unit in \(A\), \(-2r \in A\) clearly implies \(r \in A\). It remains to prove that \(-as^2 \in A\). For \(s \in K\), we can write \(s=s_1/s_2\) with \(s_1,s_2 \in A\) relatively prime. We shall show that \(s_2\) will always be a unit, which implies that \(s \in A\). Write \(as^2=h\), then we have \(as_1^2=hs_2^2\). Assume \(s_2\) is not a unit, then there is a prime \(p\) divides \(s_2\) as \(A\) is a factorial ring. hence \(as_1^2 = hs_2^2 \in p^2A\). Since \(s_1\) and \(s_2\) are relatively prime, \(p\) and \(p^2\) do not divide \(s_1\), hence \(a \in p^2A\), a contradiction (we have assumed \(a\) to be square-free. Also, the assumption that \(a\) is not a unit is used here to reach the contradiction). Hence \(s_2\) is a unit, \(s \in A\) and therefore \(-as^2 \in A\). The proof is complete. \(\square\)

Of course, I shan't be this lazy. It is clear that in the factorial ring \(A=\mathbb{R}[X]\), \(2\) is a unit. By square-free, we mean, if \(p \in A\) is prime, then \(a \not \in p^2A\). For example, in \(\mathbb{Z}\), \(12\) is not square-free because \(12=2^2 \times 3 \in 2^2\mathbb{Z}\) while \(14\) is square-free because \(14=2 \times 7\) and square does not appear. And for \(1-X^2\) things is clear because we only have \(1-X^2=(1-X)(1+X)\) - there is no square. We require \(2\) to be a unit because if not this argument becomes much more difficult to prove. We shall return to normality after we study the irreducible elements.

To conclude we have got a satisfying result:

(Proposition 1)\(R\) is a normal Noetherian ring.

With help of Fourier transform or elementary trigonometric relations, every polynomial in \(R=\mathbb{R}[\cos{x},\sin{x}]\) can be written in the form \[P(x) = a_0+\sum_{k=1}^{n}(a_k\cos{kx}+b_k\sin{kx})\] where \(a_0,a_k,b_k \in \mathbb{R}\). Define the degree \(\delta(P)\) to be the maximum of integers \(r,s\) where \(a_r,b_s \ne 0\). Then a direct computation shows that \(\delta(PQ)=\delta(P)+\delta(Q)\). This is not the case when the scalar is complex. For example, \((\cos{x}+i\sin{x})(\cos{x}-i\sin{x})=1\).

If \(\delta(P)=0\), then \(P(x)=a_0\) is zero or a unit. If \(\delta(P)=1\), then if we have \(P=P_1P_2\), then \(\delta(P_1)+\delta(P_2)=1\). One of them has to be unit, hence \(P\) is irreducible. If \(\delta(P)=2\), then \(P\) is reducible because we can solve equations in the expansion of the product \[(a+b\sin{x}+c\cos{x})(a'+b'\sin{x}+c'\cos{x}).\] By induction all polynomials of degree \(\ge 2\) is reducible. Hence irreducible elements are of the form \[a+b\sin{x}+c\cos{x} \quad (b,c) \ne (0,0).\] But since \(R\) is not a UFD, we cannot work on the ideal \((a+b\sin{x}+c\cos{x})\) directly. We need to dive into abstraction for a long time.

We now proceed to another satisfying result.

(Proposition 2)\(R\) is a Dedekind domain.

*Proof.* Throughout, we work on the form \(R \cong \mathbb{R}[X,Y]/(X^2+Y^2-1)\). Since \(\mathbb{R}[X,Y]\) is of Krull dimension \(2\) (see Atiyah-MacDonald exercise 11.7, where a solution is almost given), \(X^2+Y^2-1\) is irreducible, we have a prime ideal \((X^2+Y^2-1)\), and all prime ideals \(P \subset \mathbb{R}[X,Y]\) strictly containing \((X^2+Y^2-1)\) are maximal. Next, let the canonical map \(\pi:\mathbb{R}[X,Y] \to \mathbb{R}[X,Y]/(X^2+Y^2-1)\) be given. By proposition 1.1 of Atiyah-MacDonald, \(\pi(P)\) are maximal ideals in \(\mathbb{R}[X,Y]/(X^2+Y^2-1)\) provided that \(P \supsetneq (X^2+Y^2-1)\) is prime. If nontrivial ideal \(Q \subset \mathbb{R}[X,Y]/(X^2+Y^2-1)\) is prime, then \(\pi^{-1}(Q)=Q^c\) is also prime, and it contains \((X^2+Y^2-1)\) strictly, which implies that \(Q\) is maximal. Hence \(R\) is of Krull dimension \(1\). By proposition 1, \(R\) is integrally closed, hence it is Dedekind. \(\square\)

Let \(A\) be an integral domain and \(P\) be the set of all prime ideals of height \(1\), i.e. the set of all prime ideals that only contain itself as a nonzero prime ideal. Then \(A\) is a Krull domain if

(KD1) \(A_{\mathfrak{p}}\) is a discrete valuation ring for all \(\mathfrak{p} \in P\). (KD2) \(A\) is the intersection of these discrete valuation rings (all considered as subrings of the field of fraction of \(A\). (KD3) Any nonzero element of \(A\) is contained in only a finite number of height \(1\) prime ideals.

To proceed our study of \(R\), we need a lemma:

(Lemma 2)If \(A\) is a Dedekind domain, then \(A\) is also a Krull domain.

Next we prove (KD3). Pick any nonzero \(a \in A\). If \(a\) is a unit, then it is contained in \(0\) ideals. If not, consider the ring \((a)=aA\). We have a unique factorisation as a product of prime ideals: \[ (a)= \mathfrak{p}_1^{r_1}\cdots\mathfrak{p}_n^{r_n} \subset \bigcap_{j=1}^{n}\mathfrak{p}_j.\] Hence (KD3) is proved.

For (KD2), note first \(A \subset \bigcap_{\mathfrak{p}}A_{\mathfrak{p}}\) because the natural map \(A \to A_{\mathfrak{p}}\) is injective for all \(\mathfrak{p}\). Hence it suffices to prove the reverse. But elements in \(A_{\mathfrak{p}}\) are of the form \(a/s\). Hence we expect those elements of the form \(b/1\) to be in \(A\). Therefore it suffices to prove that \(b/1 \in (a/1)A_{\mathfrak{p}}\) for all prime \(\mathfrak{p}\) implies \(b \in aA\) for all \(a,b \in A\), \(a ,b\ne 0\). Put \[ (a)=\mathfrak{p}_1^{r_1}\cdots\mathfrak{p}_n^{r_n}\] we see \(\mathfrak{q}_i = \mathfrak{p}_i^{r_i}\) is \(\mathfrak{p}_i\)-primary and we obtain a primary decomposition. Note we in particular have \[ b \in \bigcap_{j=1}^{n}\left(aA_{\mathfrak{p}_i} \cap A \right) = \bigcap_{j=1}^{n}\mathfrak{q}_i = aA\] because each \(\mathfrak{p}_i\) has height \(1\). \(\square\)

Which is to say that

(Proposition 3)\(R\) is a Krull domain.

We know that since \(R\) is Dedekind, its fractional ideals form an abelian group. This gives rise to the ideal class group. By a result of Samuel, we have a shockingly simple fact:

(Proposition 4)The ideal class group \(Cl(R) \cong \mathbb{Z}/2\mathbb{Z}\).

Which can be considered as a corollary to this following statement:

(Samuel)Let \(F\) be a non-degenerate quadratic form in \(k[X_1,X_2,X_3]\). Let \(A_F=k[X_1,X_2,X_3]/(F)\). Then \(Cl(A_F)=\mathbb{Z}/2\mathbb{Z}\) if and only if there is a nontrivial solution to \(F(X_1,X_2,X_3)=0\) in \(k\).

One can find this result via this link, and refer to **study of plane conics**.

With these being said, by theorem 8 of Zaks' paper, one sees that \(R\) is a HFD domain. To be precise, for polynomials \(x_1,x_2,\cdots,x_n\) and \(y_1,y_2,\cdots,y_m\), if \(x_1x_2\cdots x_n=y_1y_2\cdots y_m\), then \(m=n\). I may recover the proof here one day, but it would be much more difficult than writing everything you have seen here. This ring \(R\) also shows that HFD is not necessarily UFD.

Since \(Cl(R) \cong \mathbb{Z}/2\mathbb{Z}\), for any maximal ideal \(M \subset A\), either it is principal or \(M^2\) is principal. If \(M\) and \(M'\) are two non-principal ideal, then \(MM'\) is principal. Conversely, for any irreducible \(z \in R\), either \((z)\) is maximal or \((z)=MM'\) for some maximal ideal \(M\) and \(M'\), and \(M\) and \(M'\) may coincide. We have given the form of irreducible elements \[z = a+b\sin{x}+c\cos{x},\quad (b,c) \ne (0,0).\] So we are now interested in these \(a,b,c\). We will do some high school trick first. If we put \[\begin{cases}k = \frac{a}{\sqrt{b^2+c^2}} \\b' = \frac{b}{\sqrt{b^2+c^2}} \\c' = \frac{c}{\sqrt{b^2+c^2}}\end{cases}\] then \(z= \sqrt{a^2+b^2}(\sin(x+\alpha)+k)\) where \(b'=\cos\alpha\) and \(c' = \sin\alpha\). Since \(\sqrt{a^2+b^2} \in \mathbb{R}\) it suffices to study elements of the form \(\sin(x+\alpha)+k\).

Define a shift morphism \(h:R \to R\) by \[h(\cos{x})=\cos(x+\alpha), \quad h(\sin{x}) = \sin(x+\alpha), \quad h(t) = t\] This map is clearly an isomorphism. More importantly, since \[h(\sin{x}+k)=\sin(x+\alpha)+k,\] the primary decomposition of \((\sin(x+\alpha)+k)\) and \((\sin{x}+k)\) are of the same form. We are interested in the ring \(R/(\sin{x}+k)\), where it is natural to study the behaviour of \(\cos{x}\). For this reason we consider the substitution morphism \[\begin{aligned}g:\mathbb{R}[X] & \to R \\ X & \mapsto \cos{x}.\end{aligned}\] We first compute the inverse image \(g^{-1}[(\sin{x}+k)]\). It is natural to think about cancelling \(\sin x\) into \(\cos x\). Note \((\sin x + k)( \sin x - k) = (\sin^2x -k^2) = (1- \cos^2x-k^2)\), pick whichever \(P(X) \in (1-k^2-X^2)\), we have \[g(P(X))=P(\cos{x}) = (1-\cos^2x-k^2)Q(\cos{x})=(\sin{x}+k)(\sin{x}-k)Q(\cos{x})\] Hence \((1-k^2-X^2)\subset g^{-1}[(\sin{x}+k)]\). For the converse, note that if nonzero \(P \in g^{-1}[(\sin x + k)]\), we have \(\deg P > 1\) because trigonometric polynomial of the form \(a+b\cos x\) can never be divided by \(\sin x + k\). By Euclidean algorithm, we find \(Q(X)\), \(R(X)\) such that \[P(X)=Q(X)(1-k^2-X^2) + R(X)\] with \(\deg R \le 1.\) But when \(P \in g^{-1}[(\sin x + k)]\), we must have \(R(X)=0\), according to our study of the degree earlier. Hence we must have \(P(X) \in (1-k^2-X^2)\), which is to say \[g^{-1}[(\sin x + k)]= (1-k^2-X^2).\] This induces an isomorphism \[\mathbb{R}[X]/(1-k^2-X^2) \cong R/(k+\sin x).\] And it is much easier to study the ideal \(1-k^2-X^2\). To be precise,

- \(k^2=1 \iff (1-k^2-X^2)=(X)^2 \iff (k+\sin x)=M^2\) for some maximal ideal \(M\), because \((X)\) is a maximal ideal.
- \(k^2<1 \iff (1-k^2-X^2)\) is a product of two distinct maximal ideals \(\iff (k+\sin x)\) is a product of two distinct maximal ideals \(M\) and \(M'\).
- \(k^2>1 \iff (1-k^2-X^2)\) is maximal \(\iff\) \((k+\sin x)\) is maximal.

Therefore maximal ideals of \(R\) are determined by \(k\), or more precisely the relation between \(c^2\) and \(a^2+b^2\). Moreover, let \(M\) be a maximal ideal, we have

- If \(M\) is principal, then there exists \(\alpha\) and \(k\) such that

\[M = (\sin(x+\alpha) + k)\]

and \(R/M \cong \mathbb{C}\).

- If \(M\) is not principal, then there exists \(\alpha \in \mathbb{R}\) such that

\[M = (\sin(x+\alpha)+1,\cos(x+\alpha)), \quad M^2 = (\sin(x+\alpha)+1).\]

and \(R/M \cong \mathbb{R}\).

- Robert M. Fossum,
*The Divisor Class Group of a Krull Domain*. - M. F. Atiyah, FRS & I. G. MacDonald,
*Introduction to Commutative Algebra*. - Macro Fontana, Salah-Eddine Kabbaj, Sylvia Wiegand,
*Commutative Ring Theory and Applications.* - Hideyuki Matsumura,
*Commutative Ring Theory*. - P. Samuel,
*Lectures on Unique Factorization Domains*. - A. Zaks,
*Half Factorial Domains*.

Consider a sequence of real or complex numbers \(\{s_n\}\). If \(s_n \to s\), then \[\pi_n = \frac{s_1+\cdots+s_n}{n} \to s.\]

Here, \(\pi_n\) is called the Cesàro sum of \(\{s_n\}\). The proof is rather simple. Given \(\varepsilon>0\), there exists some \(N>0\) such that \(|s_n-s|<\varepsilon\) for all \(n > N\). Therefore we can write \[\begin{aligned} |\pi_n - s| &= \left|\frac{s_1+s_2+\cdots+s_N}{n}+\frac{s_{N+1}+\cdots+s_n}{n}-s\right| \\ &= \left|\frac{(s_1-s)+(s_2-s)+\cdots+(s_N-s)}{n}+\frac{(s_{N+1}-s)+\cdots+(s_n-s)}{n}\right| \\ &\leq \left| \frac{s_1+\cdots+s_N-Ns}{n} \right| + \frac{N}{n}\varepsilon\end{aligned}\] For fixed \(N\), we can pick \(n\) big enough such that \(N/n<1/2\) (i.e. \(n>2N\)) and \[\left| \frac{s_1+\cdots+s_N-Ns}{n} \right|<\frac{1}{2}\varepsilon.\] Hence \(\pi_n\) converges to \(s\). But the converse is not true in general. For example, if we put \(s_n=(-1)^n\), then it diverges but \(\pi_n \to 0\). If \(\pi_n\) converges, we say \(\{s_n\}\) is Cesàro summable.

If we treat \(\pi_n\) as an integration with respect to the counting measure, things become interesting. Why don't we investigate the operator defined to be \[C(f)(x)= \frac{1}{x}\int_0^xf(t)dt.\] In this blog post we investigate this operator in Hilbert space \(L^2(0,\infty)\).

Put \(L^2=L^2(0,\infty)\) relative to Lebesgue measure, and the Cèsaro operator \(C\) is defined as follows: \[\begin{aligned}(Cf)(s) = \frac{1}{s}\int_0^sf(t)dt.\end{aligned}\]

From the example above, we shouldn't expect \(C\) to be too normal or well-behaved, as convergence is not goes as expected. But fortunately it is at the very least continuous: due to Hardy's inequality, we have \(\lVert C \rVert = 2\). I have organised several proofs of this. But \(C\) is not compact.

Here is the proof. Consider a family of functions \(\{\varphi_A\}_{A>0}\) where \[\varphi_A = \sqrt{A}\chi_{(0,1/A]}.\] (I owe Oliver Diaz for this family of functions.) It's not hard to show that \(\lVert \varphi_A \rVert = 1\). If we apply \(C\) on it we see \[(C\varphi_A)(x) = \frac{1}{x}\int_0^x\sqrt{A}\chi_{(0,1/A]}dx = \sqrt{A}\left(\chi_{(0,1/A]}(x)+\frac{1}{Ax}\chi_{(1/A,+\infty)}\right)\] Hence \(\lVert C\varphi_A \rVert = \frac{\sqrt{1+A^2}}{A}\). Meanwhile for \(B>A\), we have \[\begin{aligned}C(\varphi_B-\varphi_A)(x) &=\left(\sqrt{B}-\sqrt{A} \right)\chi_{(0,1/B]}(x)+\left(\frac{1}{\sqrt{B}x}-\sqrt{A}\right)\chi_{(1/B,1/A]}(x) \\ &+\left(\frac{1}{\sqrt{B}} - \frac{1}{\sqrt{A}} \right)\frac{1}{x}\chi_{(1/A,+\infty)}(x)\end{aligned}\] It follows that \[|C(\varphi_B-\varphi_A)|(x) \geq \left(\frac{1}{\sqrt{A}}-\frac{1}{\sqrt{B}} \right)\frac{1}{x}\chi_{(1/A,\infty)}(x).\] If we compute the norm on the right hand side we get \[\|C(\varphi_B-\varphi_A)\| \geq \left|1-\sqrt{\frac{A}{B}} \right|.\] As a result, if we pick \(f_n=\varphi_{2^n}\), then for any \(m>n\) we get \[\|C(f_m-f_n)\| \geq \left|1 - \sqrt{2^{n-m}} \right| \geq 1-\frac{1}{\sqrt{2}}.\] Therefore, we find a sequence \((f_n)\) on the unit ball such that \((Cf_n)\) has no convergent subsequence.

Also we can find its adjoint operator: \[\begin{aligned}\langle Cf,g \rangle &= \int_0^\infty \left(\frac{1}{s}\int_0^sf(t)dt \right)\overline{g}(s)ds \\ &= \int_0^\infty\left(\int_t^\infty \frac{1}{s}f(t)\overline{g}(s)ds\right)dt \\ &= \int_0^\infty f(t) \left(\int_t^{\infty}\frac{1}{s}\overline{g}(s)ds\right)dt.\end{aligned}\] Hence the adjoint is given by \[(C^\ast f)(t) = \int_t^{\infty}\frac{1}{s}g(s)ds.\] \(C^\ast\) is not compact as well. Further, another application of Fubini's theorem shows that \[CC^\ast = C + C^\ast=C^\ast C \implies (I-C)(I-C^\ast)=I=(I-C^\ast)(I-C)\] Hence \(I-C\) is an isometry, and \(C\) is normal.

In this section we study the spectrum of \(C\) and \(C^\ast\), which will be derived from properties of bilateral shift, which comes from \(\ell^2\) space. For convenience we write \(\mathbb{N}=\mathbb{Z}_{\geq 0}\). This section can also help you understand the connection between \(L^2(0,1)\) and \(L^2(0,\infty)\).

An operator \(U\) on a Hilbert space \(H\) is called a *simple unilateral shift* if \(H\) has a orthonormal basis \(\{e_n\}\) such that \(U(e_n)=e_{n+1}\) for all \(n \in \mathbb{N}\). This is nothing but right-shift operator in the sense of basis. Besides, we call \(U\) a *unilateral shift of multiplicity \(m\)* if \(U\) is a direct sum of \(m\) simple unilateral shifts (note: \(m\) can be any cardinal number, finite or infinite).

If we consider the difference between \(\mathbb{N}\) and \(\mathbb{Z}\), we have the definition of *bilateral shift*. An operator \(W\) on \(K\) is called a *simple bilateral shift* if \(K\) has a orthonormal basis \(\{e_n\}\) such that \(We_{n}=e_{n+1}\) for all \(n \in \mathbb{Z}\). Besides, if we consider the subspace \(H\) which is spanned by \(\{e_n\}\), we see \(W|_H\) is simply a unilateral shift. Before we begin, we investigate some elementary properties of uni/bilateral shifts.

(Proposition 1)A simple unilateral shift \(U\) is an isometry.

*Proof.* Note \((Ue_m,Ue_n)=(e_{m+1},e_{n+1})=\delta_{m+1,n+1}=\delta_{mn}=(e_m,e_n)\). \(\square\)

(Proposition 2)A simple bilateral shift \(W\) is unitary, hence is also an isometry.

*Proof.* Note \((We_m,e_n)=(e_{m+1},e_n)=\delta_{m+1,n}=\delta_{m,n-1}=(e_m,W^{-1}e_n)\), from which it follows that \(W^\ast=W^{-1}\). \(\square\)

Now let the Hilbert space \(K\) and its subspace \(H\) (invariant under \(W\)) be given. Consider the 'orthonormal' operator given by \(Re_n=e_{-(n+1)}\). It follows that \(R\) is a unitary involution and \[Re_0=W^{-1}e_0 \quad RH = H^{\perp} \quad R \circ W = W^{-1} \circ R.\]

With these tools, we are ready for the most important theorems.

\(W=I-C^\ast\) is a simple bilateral shift on \(K=L^2\).

**Step 1 - Obtaining missing subspace, operator and basis**

Here we put \(H=L^2(0,1)\), which can be canonically embedded into \(L^2(0,\infty)\) in the obvious way (consider all \(L^2\) functions vanish outside \((0,1)\)). It is natural to put this, as there are many similarities between \(L^2(0,1)\) and \(L^2(0,\infty)\).

Explicitly, \[(Wf)(x) = f(x) - \int_x^\infty \frac{1}{t}f(t)dt, \quad x \in L^2(0,\infty).\] Also we claim the basis to be generated by \(e_0= \chi_{(0,1)}\). First of all we show that \((We_n)_{n \geq 0}\) is orthonormal. Note as we have proved, \(W^\ast W = (I-C)(I-C^\ast)=I\). Without loss of generality we assume that \(m \geq n\) and therefore \[(e_m,e_n)=(W^me_0,W^ne_0)=((W^\ast)^nW^me_0,e_0)=((W^\ast W)^nW^{m-n}e_0,e_0)=(W^{m-n}e_0,e_0).\] If \(m=n\), then \((e_m,e_n)=(e_0,e_0)=1\). Hence it is reduced to prove that \((W^ke_0,e_0)=0\) for all \(k>0\). First of all we have \[(We_0,e_0)=(e_0,e_0)-(C^\ast e_0,e_0)=1-(C^\ast e_0,e_0)\] meanwhile \[\begin{aligned} (C^\ast e_0,e_0) &= \int_0^1 \left(\int_x^1 \frac{1}{t}dt \right)dx \\ &= \int_0^1(-\ln{x})dx \\ &= (-x\ln{x}+x)|_0^1 = 1\end{aligned}\] Hence \(We_0 \perp e_0\). Suppose now we have \((W^{k-1}e_0,e_0)=0\), then $$\[\begin{aligned} (W^ke_0,e_0)&=(WW^{k-1}e_0,e_0) \\ &=((I-C^\ast)W^{k-1}e_0,e_0) \\ &= (W^{k-1}e_0,e_0)-(C^\ast W^{k-1}e_0,e_0) \\ &= -(W^{k-1}e_0,C e_0) \\ &= -\int_0^1W^{k-1}e_0(x)\frac{1}{x}\left(\int_0^xdt\right)dx \\ &= -\int_0^1 W^{k-1}e_0(x)\frac{1}{x} \cdot x dx \\ &= -(W^{k-1}e_0,e_0) \\ &= 0. \end{aligned}\]$$ Note \(W^ke_0 \in L^2(0,\infty)\) always vanishes when \(x \geq 1\): when we are doing inner product, \([1,\infty)\) is automatically excluded. With these being said, \((W^ne_0)_{n \geq 0}\) forms a orthonormal set. By The Hausdorff Maximality Theorem, it is contained in a maximal orthonormal set. But since \(H=L^2(0,1)\) is separable (admitting a countable basis, proof), \((W^ke_0)_k\) forms a basis of \(H\). From now on we write \(\{e_n\}\).

To find the involution \(R\), note first \(W=I-C^\ast\) is already unitary (also, if it is not unitary, then it cannot be a bilateral shift, we have nothing to prove), whose inverse or adjoint is \(W^\ast=I-C\) as we have proved earlier. Hence we have \[Re_0=e_{-1}=(I-C)e_0=\chi_{(0,1)}-\frac{1}{x}\int_0^xdt = -\frac{1}{x}\chi_{[1,\infty)}\] But we have no idea what \(R\) is exactly. We need to find it manually (or we have to guess). First of all it shall be guaranteed that \(RH=H^\perp\). Since \(H\) contains all \(L^2\) functions vanish on \([1,\infty)\), functions in \(RH\) should vanish on \((0,1)\). It is natural to put \(R(f)(x)=g(x)f\left( \frac{1}{x}\right)\) for the time being. \(g\) should be determined by \(e_{-1}\). Note \(e_0\left(\frac{1}{x}\right)=\chi_{[1,\infty)}\) almost everywhere, we shall put \(g(x)=-\frac{1}{x}\). It is then clear that \(Re_0=W^{-1}e_0\) and \(RH=H^\perp\). For the third condition, we need to show that \[W \circ R \circ W = R.\] Note \[\begin{aligned}W \circ R \circ W(f) &= W \circ R \left(f(x)-\int_x^\infty\frac{1}{t}f(t)dt\right) \\ &= W \left(-\frac{1}{x}f\left(\frac{1}{x}\right)+\frac{1}{x}\int_{1/x}^{\infty}f(t)dt \right) \\ &= -\frac{1}{x}f\left(\frac{1}{x}\right)+\underbrace{\frac{1}{x}\int_{1/x}^{\infty}f(t)dt + \int_x^\infty \frac{1}{t^2}f\left(\frac{1}{t}\right)dt + \int_x^\infty \frac{1}{t^2}\int_{1/t}^{\infty}f(u)du}_{=0 \text{ by Fubini's theorem, similar to proving }CC^\ast=C+C^\ast.} \\ &= R(f).\end{aligned}\] **Step 2 - With these, \(W\) in step 1 has to be a simple bilateral shift**

This is independent to the spaces chosen. To finish the proof, we need a lemma:

Suppose \(K\) is a Hilbert space, \(H\) is a subspace and \(e_0 \in H\). \(W\) is a unitary operator such that \(W^ne_0 \in H\) for all \(n \geq 0\) and \(\{e_n=W^ne_0\}_{n \geq 0}\) forms a orthonormal basis of \(H\). \(R\) is a unitary involution on \(K\) such that \[Re_0 = W^{-1}e_0 \quad RH=H^\perp \quad R \circ W = W^{-1} \circ R\] then \(W\) is a simple bilateral shift.

Indeed, objects mentioned in step 1 fit in this lemma. To begin with, we write \(e_n=W^ne_0\) for all \(n \in \mathbb{Z}\). Then \(\{e_n\}\) is an orthonormal set because for arbitrary \(m,n \in \mathbb{Z}\), there is a \(j \in \mathbb{Z}\) such that \(m+j,n+j \geq 0\). Therefore \[(e_m,e_n)=(W^je_m,W^je_n)=(W^{m+j}e_0,W^{n+j}e_0)=(e_{m+j},e_{n+j})=\delta_{m+j,n+j}=\delta_{m,n}.\] Since \((e_0,e_1,\cdots)\) spans \(H\), \(RH=H^{\perp}\), we see \((Re_0,Re_1,\cdots)\) spans \(H^{\perp}\). But \[Re_n=RW^ne_0=W^{-n}Re_0=W^{-n-1}e_0=e_{-n-1},\] hence \(\{e_{-1},e_{-2},\cdots\}\) spans \(H^\perp\). By definition of \(W\), it is indeed a bilateral shift. And our proof is done \(\square\)

- Walter Rudin,
*Functional Analysis*. - Arlen Brown, P. R. Halmos, A. L. Shields,
*Cesàro operators*.

Throughout we consider the Hilbert space \(L^2=L^2(\mathbb{R})\), the space of all complex-valued functions with real variable such that \(f \in L^2\) if and only if \[\lVert f \rVert_2^2=\int_{-\infty}^{\infty}|f(t)|^2dm(t)<\infty\] where \(m\) denotes the ordinary Lebesgue measure (in fact it's legitimate to consider Riemann integral in this context).

For each \(t \geq 0\), we assign an bounded linear operator \(Q(t)\) such that \[(Q(t)f)(s)=f(s+t).\] This is indeed bounded since we have \(\lVert Q(t)f \rVert_2 = \lVert f \rVert_2\) as the Lebesgue measure is translate-invariant. This is a left translation operator with a single step \(t\).

The inner product in \(L^2\) is defined by \[(f,g)=\int_{-\infty}^{\infty}f(s)\overline{g(s)}dm(s), \quad f,g\in L^2.\] If we apply \(Q(t)\) on \(f\), we see \[\begin{aligned} (Q(t)f,g) &= \int_{-\infty}^{\infty}f(s+t)\overline{g(s)}dm(s) \\ &= \int_{-\infty}^{\infty}f(u)\overline{g(u-t)}dm(u) \quad (u=s+t) \\ &= (f,Q(t)^{\ast}g)\end{aligned}\] where \(Q(t)^\ast\) is the adjoint of \(Q(t)\), which happens to be a left translation operator with a single step \(t\). Clearly we have \(Q(t)Q(t)^\ast=Q(t)^\ast Q(t)=I\), which indicates that \(Q(t)\) is unitary. Also we can check in a more manual way: \[(Q(t)f,Q(t)g) = \int_{-\infty}^{\infty}f(s+t)\overline{g(s+t)}dm(s) = \int_{-\infty}^{\infty}f(s+t)\overline{g(s+t)}dm(s+t)=(f,g).\] By operator theory, since \(Q(t)\) is unitary and bounded, the spectrum of \(Q(t)\) lies in the unit circle \(S^1\).

Note \(Q(0)=I\) and \[Q(t+u)f(s)=f(s+t+u)=f[(s+t)+u]=Q(u)f(s+t)=Q(t)Q(u)f(s)\] for all \(f \in L^2\), which is to say that \(Q(t+u)=Q(t)Q(u)\). Therefore we say \(\{Q(t)\}\) is a *semigroup*. But what's more important is that it satisfies strong continuity near the origin: \[\lim_{t \to 0}\lVert Q(t)f - f \rVert_2 = 0.\] This is not too hard to verify. It suffices to prove that \[\lim_{t \to 0}\int_{-\infty}^{\infty} |f(s+t)-f(s)|^2dm(s) =0.\] Note \(C_c(\mathbb{R})\) (continuous function with compact support) is dense in \(L^2\), and for \(f \in C_c(\mathbb{R})\), it follows immediately from properties of continuous functions. Next pick \(f \in L^2\). Then for \(\varepsilon>0\) there exists some \(f_1 \in C_c(\mathbb{R})\) such that \(\lVert f-f_1 \rVert_2 < \frac{\varepsilon}{4}\) and \(\lVert f_1(s+t)-f_1(s)\rVert_2<\frac{\varepsilon}{2}\) for \(t\) small enough. If we put \(f_2=f-f_1\) we get \[\begin{aligned} \lVert f(s+t)-f(s) \rVert_2 &\leq \lVert f_1(s+t)-f_1(s) \rVert_2+\lVert f_2(s+t)-f_2(s) \rVert \\ &< \frac{\varepsilon}{2}+2\lVert f_2(s)\rVert < \varepsilon.\end{aligned}\] The limit follows as \(\varepsilon \to 0\).

Recall that the infinitesimal generator of \(Q(t)\) is defined to be \[A=\lim_{t \to 0}\frac{1}{t}[Q(t)-I]\] which is inspired by \(\frac{d}{dt}e^{tA}=A\) (thanks to von Neumann). Note if \(f \in L^2\) is differentiable, then \[Af(s) = \lim_{t \to 0} \frac{f(s+t)-f(s)}{t} = f'(s).\] The infinitesimal generator of \(Q(t)\) being differentiation operator is quite intuitive. But we need to clarify it in \(L^2\) which is much larger. So what is the domain \(D(A)\)? We don't know yet but we can guess. When talking about differentiation in \(L^p\) space, it makes sense to extend our differentiation to absolute continuity. Also we need to make sure that \(Af \in L^2\), hence we put \[D=\{f\in L^2:f \text{ absolutely continuous, }f' \in L^2\}.\] For every \(x \in D(A)\) and any fixed \(t\) we already have \[\frac{d}{dt}Q(t)f(s)=f'(s+t)=Af(s+t)\] hence \(Af=f'\) for every \(x \in D(A)\) and it follows that \(D(A) \subset D\). In fact, \(A\) is the restriction of the differential operator on \(D(A)\). Conversely, By Hille-Yosida theorem, we see \(1 \in \rho(A)\) and also one can show that \(1 \in \rho(\frac{d}{dx})\). Therefore \[(I-\frac{d}{dx})D(A)=(I-A)D(A)=L^2.\] But we also have \[D=(I-\frac{d}{dx})^{-1}L^2.\] Thus \[D = \left(1-\frac{d}{dx}\right)^{-1}\left(1-\frac{d}{dx}\right)D(A)=D(A).\] The fact that \((I-\frac{d}{dx})D=L^2\) can be realised by the equation \(f-f'=g\), where the existence of solution can be proved using Fourier transform. Note \(\hat{f'}(y)=iy\hat{f}(y)\), with some knowledge of distribution, the result can also be given by \[D(A)=\left\{f\in L^2:\int_{-\infty}^{\infty}|y\hat{f}(y)|^2dy<\infty\right\}.\]

By the Hille-Yosida theorem, the half plane \(\{z:\Re z>0\} \subset \rho(A)\). But we can give a more precise result of it.

Pick any \(f \in D(A)\). It is directly verified that \[(A-\lambda{I})f = f'-\lambda{f}.\] Put \(g=(A-\lambda{I})f\) then \[\hat{g}(y)=iy\hat{f}(y)-\lambda{\hat{f}(y)}.\] Therefore \[\hat{f}(y) = \frac{\hat{g}(y)}{iy-\lambda} \in L^2.\] Conversely, suppose \(h(y)=\frac{\hat{g}(y)}{iy-\lambda} \in L^2\), then \(\hat{g}(y)=iyh(y)-\lambda{h}(y)\). If we take its Fourier inverse, we see \(g \in R(A-\lambda{I})\).

If \(g \in L^2\), then clearly \(\hat{g} \in L^2\). It remains to discuss \(\hat{g}(y)/(iy-\lambda)\). Note \(iy\) is on the imaginary axis, hence if \(\lambda\) is not purely imaginary, then \(\hat{g}(y)/(iy-\lambda) \in L^2\). If \(\lambda\) is purely imaginary however, then we may have \(\hat{g}(y)/(iy-\lambda)\not\in L^2\). For example, we can take \(\hat{g}=\chi_{[s-1,s+1]}\) where \(\lambda = is\). Hence if \(\lambda\) is purely imaginary, \(R(A-{\lambda}I)\) is a proper subspace of \(L^2\). Therefore we conclude: \[\sigma(A)= \{z \in \mathbb{C}:\Re z = 0\}.\] *This is an exercise on W. Rudin's Functional Analysis. You can find related theorems in Chapter 13.*

Guided by researches in function theory, operator theorists gave the analogue to quasi-analytic classes. Let \(A\) be an operator in a Banach space \(X\). \(A\) is not necessarily bounded hence the domain \(D(A)\) is not necessarily to be the whole space. We say \(x \in X\) is a \(C^\infty\) vector if \(x \in \bigcap_{n \geq 1}D(A^n)\). This is quite intuitive if we consider the differential operator. A vector is analytic if the series \[\sum_{n=0}^{\infty}\lVert{A^n x}\rVert\frac{t^n}{n!}\] has a positive radius of convergence. Finally, we say \(x\) is quasi-analytic for \(A\) provided that \[\sum_{n=0}^{\infty}\left(\frac{1}{\lVert A^n x \rVert}\right)^{1/n} = \infty\] or equivalently its nondecreasing majorant. Interestingly, if \(A\) is symmetric, then \(\lVert{A^nx}\rVert\) is log convex.

Based on the density of quasi-analytic vectors, we have an interesting result.

(Theorem)Let \(A\) be a symmetric operator in a Hilbert space \(\mathscr{H}\). If the set of quasi-analytic vectors spans a dense subset, then \(A\) is essentially self-adjoint.

This theorem can be considered as a corollary to the fundamental theorem of quasi-analytic classes, by applying suitable Banach space techniques in lieu.

For a positive sequence \(\{a_n\}\), we see it is the moment of a positive measure \(\mu\), i.e. \(a_n = \int_\mathbb{R}t^n d\mu(t)\) if and only if it is positively definite (proof). But the uniqueness is not guaranteed. Here we have a sufficient condition for this - using the concept of the quasi-analytic vector. This is an old theorem (1922) but we are using operator theory to prove it which appeared decades later.

(Carleman's condition)Suppose \(\{a_n\}\) is the moment sequence of a positive measure \(\mu\) on \(\mathbb{R}\), then \(\mu\) is uniquely determined provided that \(\sum a_{2n}^{-1/2n}=\infty\).

**Proof.** Consider the Hilbert space \[\mathscr{H}= L^2(\mathbb{R},\gamma)\] and the operator \[ A:f(t) \mapsto tf(t).\] It is clear that \(A\) is self-adjoint. We shall work on the constant function \(u(t) \equiv 1 \in \mathscr{H}\). Since \(A^nu = t^n\), we see \(u \in C^\infty\), otherwise, \(a_n\) is not defined. On the other hand, we have \[ (A^n u, u) = a_n \implies (A^{2n} u,u) = \lVert A^n u \rVert^2 = |(A^n u, u)|^2 = a_{2n}.\] But \(a_{2n}^{-1/2n}=\lVert A^n u \rVert^{-1/n}\) and as a result, we see \(\sum a_{2n}^{-1/2n}= \sum \lVert A^n u \rVert^{-1/n} = \infty\), hence \(u\) is quasi-analytic. In general, \(t^n = A^n u\) is quasi-analytic for all \(n \geq 0\). Consider the space of polynomial \(\mathcal{P}[t]\) with closure \(\mathscr{H}_1\). It follows from the theorem above that \(A_1 = A|_{\mathcal{P}[t]}\) is essentially self-adjoint in \(\mathscr{H}_1\). Hence \(\mathscr{H}_1\) is invariant under the one-parameter group \(e^{iAs}\). Pick \(y \in \mathcal{P}[t]^{\perp}\), we see \[(y,e^{iAs}u) = \int_\mathbb{R}e^{-ist}y(t)d\gamma(t) = 0,\] which implies that \(y = 0\) a.e. [\(\gamma\)]. It follows that \(\mathscr{H}_1 = \mathscr{H}\) or equivalently \(\mathcal{P}[t]\) is dense in \(\mathscr{H}\). Suppose now we have another generating measure \(\nu\) of \(\{a_n\}\). With respect to \(\nu\), \(\mathcal{P}[t]\) is still a dense space. But the norm on \(\mathcal{P}[t]\) is fixed by \(\{a_n\}\), hence we obtain an isometry between \(\mathcal{P}[t]_\gamma\) and \(\mathcal{P}[t]_\nu\), which extends to the isometry between \(L^2(\mathbb{R},\gamma)\) and \(L^2(\mathbb{R},\nu)\) which forces \(\gamma\) and \(\nu\) to be equal. \(\blacksquare\)

There are a lot of nice properties of analytic functions, whose class is denoted by \(C^\omega\). Formally we have the following definition:

If \(f \in C^\omega\) and \(x_0 \in \mathbb{R}\), one can write \[f = a_0+a_1(x-x_0)+a_2(x-x_0)^2+\cdots.\]

Obviously \(f \in C^\infty\) (and hence \(C^\omega \subset C^\infty\)) and alternatively we have the Taylor series converges to \(f\) for any \(x_0 \in \mathbb{R}\): \[T(x) = \sum_{n=0}^{\infty}\frac{D^nf(x_0)}{n!}(x-x_0)^n.\] One interesting thing is, every \(f \in C^\omega\) is uniquely determined by a sequence \(D^0f(x_0), Df(x_0),D^2f(x_0),\cdots\).

Unfortunately, this property is not generally true on \(C^\infty\). For example, we can consider the bump function \(\varphi\) (a simple example can be found on wikipedia). In brief, \(\varphi=0\) for all \(x \in (-\infty,-1] \cup [1,+\infty)\) but \(\varphi>0\) on \((-1,1)\). And more importantly, \(\varphi \in C^\infty\). However, if we take \(f = \varphi\) and \(g = 2\varphi\), then \(f \neq g\), but \(D^nf(-2)=D^ng(-2)=0\) for all \(n \geq 0\). We get a sequence of derivatives of different orders, but this sequence does not determine a unique \(C^\infty\) function.

The term "uniquely determined" can also be described in an alternative way: If \(f \in C^\omega\) and \(D^k(x_0)=0\) for all \(k \geq 0\), then \(f=0\) everywhere.

So a question comes up naturally: how many functions can be determined by its derivatives of all orders? Does \(C^\omega\) contain all we can get? If not, how can we describe them?

The class of analytics functions is our source of motivation, so it makes sense to dig into its properties to find more. For an analytic function it is natural to consider the restriction of a holomorphic function on the complex plane. Let \(\Omega\) be the set of all \(z=x+iy\) such that \(|y| < \delta\) and suppose \(f \in H(\Omega)\) and \(|f(z)|<\beta\) for all \(z \in \Omega\). By Cauchy's Estimate, we get \[|D^n f(x)| \leq \beta \delta^{-n}n!\quad n \in \mathbb{N},x\in \mathbb{R}.\] Also the restriction of \(f\) on \(\mathbb{R}\) is real-analytic. Here comes the interesting part: \(\beta\) and \(\frac{1}{\delta}\) is determined only by \(f\) and have nothing to do with \(n\), meanwhile \(n!\) is a special sequence that dominated \(f\) to some extent.

This motivates us to define a special class of functions, which is called the class \(C\{M_n\}\).

Let \(\{M_n\}\) be a sequence of positive numbers, we let \(C\{M_n\}\) denote the class of all \(f \in C^\infty\) such that \[\lVert D^nf\rVert_\infty \leq \beta_f B^n_f M_n,\] where \(\lVert \cdot \rVert_\infty\) is the supremum norm defined on \(\mathbb{R}\), and \(\beta_f,B_f\) are constants only determined by \(f\) but not \(n\).

In order to equip \(C\{M_n\}\) with some satisfying algebraic structures, which can simplify our work, we need some restrictions.

Indeed, \(B_f\) plays an much more important rule, since we have \[\limsup_{n \to \infty}\left(\frac{\lVert D^n f\rVert_\infty}{M_n}\right)^{1/n} \leq B_f\] while \(\beta_f\) was eliminated to \(1\) in this limit. However, if we eliminate \(\beta_f\) at the beginning, i.e. put \(\beta_f = 1\) for all \(f \in C\{M_n\}\), then when \(n=0\), we have \[\lVert f \rVert_\infty \leq M_0,\] which prevents \(C\{M_n\}\) to be a vector space. For example, if \(\lVert f \rVert_\infty = M_0\), then \(\lVert 2f \rVert_\infty = 2M_0 > M_0\), hence \(2f \not\in C\{M_n\}\). However, if we add \(\beta_f\) no matter what, say \(\lVert f \rVert_\infty \leq \beta_f M_0\), then whenever we do addition and scalar multiplication, there is a different constant with respect to the function, which makes sure that \(C\{M_n\}\) is closed under addition and scalar multiplication, i.e. is a vector space. If we don't add such a constant, our class contains way too few functions.

Further, we have some restriction on the sequence \(\{M_n\}\):

- \(M_0=1\).
- \(M_n^2 \leq M_{n-1}M_{n+1}\) (\(\{\log M_n\}\) is a convex sequence).

As we will see soon, this makes \(C\{M_n\}\) an algebra over \(\mathbb{R}\), where multiplication is defined pointwise.

*Proof.* If \(f,g \in C\{M_n\}\), then we need to show that \(fg \in C\{M_n\}\). We have the product rule for differentiation: \[D^n(fg) = \sum_{j=0}^{n}{n \choose k}(D^jf)(D^{n-j}g).\] Since \(f,g \in C\{M_n\}\), we have \[|D^n(fg)| \leq \sum_{j=0}^{n}{n \choose k}\beta_fB_f^jM_j\beta_gB_g^{n-j}M_{n-j} = \beta_f\beta_g\sum_{j=0}^{n}{n \choose k}B_f^jB_g^{n-j}M_jM_{n-j}.\] Of course we want to eliminate \(M_jM_{n-j}\) to obtain a binomial expansion. To do this we need the convexity of the sequence \(\{\log M_n\}\). Note \(M_n^2 \leq M_{n-1}M_{n+1}\) implies \[\log M_n - \log M_{n-1} \leq \log M_{n+1} - \log M_n.\] As a result, the line segment connecting \((n,\log M_n)\) and \((n-1,\log M_{n-1})\) is steeper and steeper as \(n\) grows. By connecting these points, we actually gets a convex function but we will be more rigorous. For \(0 < j < n\), we have \[\begin{aligned}\log M_n - \log M_j &= \sum_{k=j+1}^{n}\left(\log M_k - \log M_{k-1}\right) \\&\geq \sum_{k = j}^{n-1}\left(\log M_{k} - \log M_{k-1}\right) \\&\geq \sum_{k=1}^{n-j}(\log M_k - \log M_{k-1}) \quad\text{(note $\log M_0=0$)} \\&= \log M_{n-j}.\end{aligned}\] Hence \(M_n \geq M_jM_{n-j}\) for \(0<j<n\). It also hold when \(j=0\) or \(j=n\), hence we get \[|D^n(fg)|= \beta_f\beta_g\sum_{j=0}^{n}{n \choose k}B_f^jB_g^{n-j}M_jM_{n-j} \leq \beta_f\beta_g\sum_{j=0}^{n}{n \choose k}B_f^jB_g^{n-j}M_n = \beta_f\beta_g(B_f+B_g)^nM_n.\] Hence \(fg \in C\{M_n\}\). The reason why \(C\{M_n\}\) is a vector space has been stated already. \(\square\)

This restriction does not hurt the generality. In fact whenever we are given a positive sequence \(\{M_n\}\), we have another sequence \(\{M'_n\}\) satisfying the two restrictions such that \(C\{M_n\}=C\{M'_n\}\).

A class \(C\{M_n\}\) is said to be quasi-analytic if the condition \[f \in C\{M_n\},\quad (D^nf)(0)=0 \] for all \(n \in \mathbb{N}\) implies that \(f = 0\) for all \(x \in \mathbb{R}\).

The reason we try to check whether it's equal to \(0\) everywhere, instead of check whether it is 'uniquely determined' by a sequence of derivative of different order is, this one is much simpler to work with. If a sequence of derivative of different order determines two functions, then their difference is always \(0\).

We have seen that \(C\{n!\}\) contains all functions which is a restriction of a holomorphic function in the strip defined by \(|\Im(z)|<\delta\). Conversely, we show that any function in \(C\{n!\}\) defined on the real axis can be extended to a holomorphic function with the same property. As a result, \(C\{n!\}\) is a quasi-analytics class (which contains all bounded function of \(C^\omega\)). If we only consider functions defined on a closed and bounded interval \([a,b]\), then \(C\{n!\}\) is exactly \(C^\omega\).

Suppose \(f \in C\{n!\}\). First of all we have \[\lVert D^nf \rVert_\infty \leq \beta B^nn!\] for \(n \in \mathbb{N}\). By Taylor's formulae \[f(x) = \sum_{j=0}^{n-1}\frac{D^jf(a)}{j!}+\frac{1}{(n-1)!}\int_a^x(x-t)^{n-1}D^nf(t)dt.\] The remainder is therefore dominated by \[\frac{n!}{(n-1)!}\beta B^n\left\vert\int_a^x(x-t)^{n-1}dt\right\vert = \beta|B(x-a)|^n.\] If \(|B(x-a)|<1\), then \(\lim_{n \to \infty}|B(x-a)|^n = 0\), and we can safely write the expansion \[f(x) = \sum_{n=0}^{\infty}\frac{D^nf(a)}{n!}(x-a)^n.\] Pick \(0<\delta<\frac{1}{B}\), we can replace \(x\) in the expansion above with \(z\) such that \(|z-a|<\delta\). This defines a holomorphic function \(F_a\) on \(D(a,\delta)\) (the open disk centred at \(a\) with radius \(\delta\)). If \(x \in D(a,\delta)\) is real, then \(F_a(x)=f(x)\). Therefore \(F_a\) is the analytic continuation of \(f\); all \(F_a\) form a holomorphic extension \(F\) of \(f\) in the strip \(|\Im(z)|<\delta\). As a result, for \(z = a+iy\) with \(|y|<\delta\), we have \[|F(z)|=|F_a(z)| = \left\vert\sum_{n=0}^{\infty}\frac{D^nf(a)}{n!}(iy)^n\right\vert \leq \beta \sum_{n=0}^{\infty}(B\delta)^n = \frac{1}{1-B\delta}\] Hence \(F\) is bounded in such a region.

In general, if \(M_n \to \infty\) way too fast (at least faster than \(n!\)) as \(n \to \infty\), then \(C\{M_n\}\) is quasi-analytic. There are several equivalent statements on whether \(C\{M_n\}\) is a quasi-analytic class, which is given by the Denjoy-Carleman theorem. Here I collect all conditions that I have found:

(Denjoy-Carleman theorem)The following conditions are equivalent:

- \(C\{M_n\}\) is not quasi-analytic.
- \(\int_0^\infty \log Q(x)\frac{dx}{1+x^2}<\infty\), where \(Q(x)=\sum_{n=0}^{\infty}\frac{x^n}{M_n}\).
- \(\int_0^\infty \log q(x) \frac{dx}{1+x^2}<\infty\), where \(q(x) = \sup \frac{x^n}{M_n}\).
- \(\sum_{n=1}^{\infty}\left(\frac{1}{M_n}\right)^{1/n}<\infty\).
- \(\sum_{n=1}^{\infty}\frac{M_{n-1}}{M_n}<\infty\)
- \(C\{M_n\}\) contains nontrivial function with compact support.
- \(\sum_{n=1}^{\infty}\frac{1}{\lambda_n}<\infty\) where \(\lambda_n = \inf_{k \geq n}M_k^{\frac{1}{k}}\).

You may find condition 7 is ridiculous. In fact, in this condition \(\{M_n\}\) is not required to satisfy the two restriction. This one is what Denjoy and Carleman found initially. Later, mathematicians find that for a sequence \(\{M_n\}\) we can obtain its convex minorant ${M_n'} $ such that

- \(M_n \geq M_n'\) for all \(n\).
- \(\{\log M_n'\}\) is convex.
- There is a sequence \(0=n_0<n_1<\cdots\) such that \(M_{n_0} = M'_{n_0}\) and \(\log M_k\) is linear for \(n_i \leq k \leq n_{i+1}\).

And as you may guess, the convex minorant \(\{M_n'\}\) is what we are using today.

The proof of the Denjoy-Carleman theorem will come out in my next blog post. There are quite a lot of work to do to finish the proof, and it cannot be done within hours. We will be using many complex analysis theories. Also, I will try to cover some extra properties of quasi-analytic classes as well as why convex minorant is sufficient.

]]>