Let $G$ be a locally compact abelian group (for example, $\mathbb{R}$, $\mathbb{Z}$, $\mathbb{T}$, $\mathbb{Q}_p$). Then every irreducible unitary representation $\pi:G \to U(\mathcal{H}_\pi)$ is one dimensional, where $\mathcal{H}_\pi$ is a non-zero Hilbert space, in which case we take it as $\mathbb{C}$. It follows that $\pi(x)(z)=\xi(x)z$ for all $z \in \mathbb{C}$ where $\xi \in \operatorname{Hom}(G,\mathbb{T})$, viewing $\mathbb{T}$ as the unit circle in the complex plane. Such homomorphisms are called (unitary) **characters**, and we denote all characters of $G$ by $\widehat{G}$, call it the Pontryagin dual group. It should ring a bell about representation theory in finite groups. For convenience, instead of $\xi(x)$, we often write $\langle x,\xi \rangle$. We also write $\langle x,\xi\rangle\langle y,\xi \rangle=\langle x+y ,\xi\rangle$, and the following examples will remind the reader the reason.

Some easily accessible examples are:

- $\widehat{\mathbb{R}} \cong \mathbb{R}$, with $\langle x,\xi \rangle = e^{2\pi i \xi x}$.
- $\widehat{\mathbb{T}} \cong \mathbb{Z}$, with $\langle z, n \rangle = z^n$.
- $\widehat{Z} \cong \mathbb{T}$, with $\langle n,z \rangle = z^n$.
- $\widehat{\mathbb{Z}/k\mathbb{Z}} \cong \mathbb{Z}/k\mathbb{Z}$, with $\langle m,n\rangle =e^{2\pi i m n / k}$.

But we want to show that

It is broken down into several steps. But it shall be clear that $\mathbb{Q}_p$ is a topological group with respect to addition.

Every $p$-adic number $x \in \mathbb{Q}_p$ can be written in the form

where $m \in \mathbb{Z}$, $x_j \in \{1,2,\dots,p-1\}$ for all $j$. We immediately define

and claim that $\xi_1$ is a character. Notice that the right hand side is always well-defined, because all summands when $j \ge 0$ contributes nothing as $\exp(2\pi i x_jp^j)=1$. That is to say, the right hand side can be understood as a finite product: when $m \ge 0$, i.e. $x \in \mathbb{Z}_p$, the pairing $\langle x, \xi \rangle = 1$; when $m<0$ however, $\langle x,\xi_1 \rangle = \exp\left( 2\pi i \sum_{j=m}^{-1}x_jp^j\right)$. Therefore it is legitimate to write

From this it follows immediately that

The function $\xi_1$ is continuous because it is continuous on $\mathbb{Z}_p$, being constant. Therefore it is safe to say that $\xi_1$ is a character with kernel $\mathbb{Z}_p$.

A quick thought would be, generating all characters out of $\xi_1$, something like $\xi_p$, $\xi_{1+p+p^2+\dots}$. But that might lead to a nightmare of subscripts. Instead, we try to discover as many as possible. For any $y \in \mathbb{Q}_p$, we define

In other words, $\xi_y$ is defined by $x \mapsto \langle xy,\xi_1\rangle$. Since multiplication is continuous, we see immediately that $\xi_y$ is a character, not very more complicated than $\xi_1$. We will show that this is all we need. To do this, we need to *characterise* all characters. Characters have the same image but their kernels differ. That is where we attack the problem.

For $\xi_y$ above, notice that $\langle x,\xi_y\rangle=1$ if and only if $xy \in \ker\xi_1=\mathbb{Z}_p$, i.e. $|xy|_p \le 1$. Therefore

We expect that all characters are of the form $\xi_y$. Therefore their kernels shall be like $\ker\xi_y$ naturally. Notice that for fixed $y$, we have $|y|_p=p^m$ for some $m \in \mathbb{Z}$. As a result $\ker\xi_y = \overline{B}(0,p^{-m})$. For this reason we have the following (more obscure) argument

Lemma 1.If $\xi \in \widehat{\mathbb{Q}}_p$, there exists an integer $k$ such that $\overline{B}(0,p^{-k}) \subset \ker\xi$.

*Proof.* Since $\xi$ is continuous, $\langle 0,\xi\rangle=1$ on the circle, there exists $k$ such that $\overline{B}(0,p^{-k}) \subset \xi^{-1}\{z \in \mathbb{T}:|z-1| < 1\}$ (this is to say the right hand side is an open set). But $\overline{B}(0,p^{-k})$ is a group (as $|\cdot|_p$ is non-Archimedean), therefore it maps into a subgroup of $\mathbb{T}$, which can only be $\{1\}$. $\square$

We cannot say the kernel of $\xi$ is exactly of the form $\overline{B}(0,p^{-k})$ yet, but we have a way to formalise them now. If $\overline{B}(0,p^{-k}) \subset \ker\xi$ for all $k$, then $\xi=1$ is the unit in $\widehat{\mathbb{Q}}_p$. Otherwise, for each $\xi$, there is a smallest $k_0$ such that $\overline{B}(0,p^{-k_0})\subset \ker\xi$ but $\overline{B}(0,p^{-k}) \not \subset \ker\xi$ whenever $k<k_0$. In another way around, we have $\langle p^{k_0-1},\xi\rangle \ne1$ but $\langle p^k,\xi\rangle=1$ whenever $k \ge k_0$. As one may guess, such $k_0$ subjects to the “size” of $\xi$. For convenience we study the case when $k_0=0$ first.

Lemma 2 (“Fourier series”).Suppose for given $\xi \in \widehat{\mathbb{Q}}_p$, $\langle 1,\xi \rangle = 1$ but $\langle p^{-1},\xi \rangle \ne 1$. There is a sequence $(c_j)$ taking values in $\{0,1,\dots,p-1\}$ such that $\langle p^{-k},\xi \rangle=\exp\left(2\pi i\sum_1^k c_{k-j}p^{-j}\right)$ for all $k=1,2,\dots$. In particular, $c_0 \ne 0$.

*Proof.* Put $\omega_k=\langle p^{-k},\xi\rangle$. Then $\omega_0=1$ but $\omega_k \ne 1$ for all $k \ge 1$. Since

each $\omega_{k+1}$ is a $p$-th root of $\omega_{k}$, and in particular $\omega_1$ is a $p$-th root of unity. There exists $c_0 \in \{1,\dots,p-1\}$ such that

and the overall formula for $\omega_k$ follows from induction. $\square$

One would guess that for the corresponding $k_0$, the “size” of $\xi$ should be $p^{k_0}$. This looks realistic, but will be tedious. Right now we still only study the case when $k_0=0$.

Lemma 3.Notation being in lemma 2, there exists $y \in \mathbb{Q}_p$ with $|y|_p=1$ such that $\xi = \xi_y$.

*Proof.* From lemma 2 we obtain a series $y=\sum_{j=0}^{\infty}c_jp^j$ with $c_0 \ne 0$. Then in particular $|y|_p=1$. By expanding the term, we see

It follows that $\langle x,\xi \rangle = \langle x,\xi_y \rangle$ for all $x \in \mathbb{Q}_p$. $\square$

Now we are ready to conclude our observation of the dual group.

Theorem.The map $\Lambda:y \mapsto \xi_y$ is an isomorphism of topological groups. Hence $\mathbb{Q}_p \cong \widehat{\mathbb{Q}}_p$.

*Proof.* First of all we study the algebraic isomorphism. First of all if $\xi_y=1$, then

Hence the map $\Lambda$ is injective. To show that $\Lambda$ is surjective, fix $\xi \in \widehat{\mathbb{Q}}_p$. By the comment below lemma 1, there is a smallest integer $k$ such that $\langle p^j,\xi \rangle = 1$. Then one considers the character $\eta$ defined by

It satisfies the condition in lemma 3, therefore there exists $z \in \mathbb{Q}_p$ such that $\eta=\xi_z$, and it follows that $\xi=\xi_{p^{-k}z}$.

Next we show that $\Lambda$ is a homeomorphism. Observe the following sets

ranging over $\ell \ge 1$ and $k \in \mathbb{Z}$. These sets constitute a local base at $1$ for $\widehat{\mathbb{Q}}_p$. We need to show that it corresponds to a local base of $\mathbb{Q}_p$ under the map $\Lambda$:

The image of the set $\{x:|x|_p \le p^k\}$ under $\xi_1$ is $\{1\}$ if $k \le 1$ and is the group of $p^k$-th roots of unity if $k>0$, and hence is contained in $\{z:|z-1|<\ell^{-1}\}$ if and only if $k \le 0$. It follows that $\xi_y \in N(\ell,k)$ if and only if $|y|_p \le p^{-k}$, i.e., $y \in \overline{B}(0,p^{-k})$. We are done. $\square$

]]>Let $p$ be a prime number. Then the space of $p$-adic numbers $\mathbb{Q}_p$ is a locally compact abelian group. This can be observed through the local basis

where $|\cdot|_p$ is the $p$-adic norm such that, whenever we write $r=p^mq$ such that $q$ is prime to $p$, we have $|r|_p=p^{-m}$.

We remind the reader that every locally compact abelian group $G$ admits a Haar measure, which is unique up to a scalar multiplication (proof). In this post, we try to find the Haar measure on $\mathbb{Q}_p$, which makes it possible to do harmonic analysis on it. For this reason, in future posts, we also find the dual group of $\mathbb{Q}_p$ as well as the dual measure.

Let us first recall the basic structure of $\mathbb{Q}_p$. Every element is in the form of Laurent series

where $m \in \mathbb{Z}$ and $c_j \in \{0,\dots,p-1\}$. The ring of integers $\mathbb{Z}_p$ is exactly the closed disc of radius $1$ at the origin. That is, $\mathbb{Z}_p=\overline{B}(0,1)$ is a compact set. Let $\mu$ be an arbitrary Haar measure on $\mathbb{Q}_p$. Then $\mu(\mathbb{Z}_p)$ is non-zero and finite. We can therefore put

Then in particular $m_p(\mathbb{Z}_p)=1$. This is the canonical Haar measure we are looking for. But it would be hilarious to end the post here. We will give a closer look at it, at least on a $p$-adic level.

Recall that when studying the Lebesgue measure on $\mathbb{R}$ we have encountered some definition in the form of

where the infimum is taken over all countable collections of open intervals $\{I_j\}$ such that $\bigcup_j I_j \supset E$, and $\ell(I_j)$ is the length of $I_j$. In fact, we can actually write

On $\mathbb{Q}_p$, we write

The point here is how to express $V$. For this reason we need to recall some topology of $\mathbb{Q}_p$.

$\mathbb{Q}_p$ is a separable metric space. Therefore every open set $V$ is a union of open balls.

There is nothing special about this statement. The space has already been equipped with a norm. Besides, as $\mathbb{Q}$ is dense in $\mathbb{Q}_p$, we have nothing to worry about second countability.

Every closed ball of $\mathbb{Q}_p$ is open (hence we call them “balls” thereafter). Every point in the ball is a “centre”. If two balls intersect then one is contained in the other.

This is dramatically different from our understanding of $\mathbb{R}$ or $\mathbb{C}$. Notice that the $p$-adic norm $|\cdot|_p$ only takes the values from $p^k$ with $k \in \mathbb{Z}$ or $0$. For any $r>0$, there exists some $\varepsilon>0$ such that

The clopenness of balls in $\mathbb{Q}_p$ follows.

Next, recall that $|\cdot|_p$ is non-Archimedean. Consider $y \in \overline{B}(x,r)$. It follows that $|x-y|_p=|y-x|_p \le r$. On the other hand, for any $z \in \overline{B}(x,r)$, we have $|x-z|_p \le r$. Therefore $|y-z|_p \le r$. Hence $\overline{B}(x,r)\subset \overline{B}(y,r)$. Symmetrically we see $\overline{B}(y,r) \subset \overline{B}(x,r)$. Hence they are equal.

Let $\overline{B}(x,r)$ and $\overline{B}(x’,r’)$ be two balls that intersect, and without loss of generality we assume that $r \le r’$. Let $y$ be a point in the intersection, then we see

So far so good. We next try to compute the Haar measure of every ball.

Every ball of radius $p^k$ has measure $p^k$ ($k \in \mathbb{Z}$).

First of all notice that $\overline{B}(0,1)=\mathbb{Z}_p$, and we defined $m_p$ so that $m_p(\mathbb{Z}_p)=1$. Therefore every ball of the form $\overline{B}(x,1)$ has measure $1$. Next, notice that $\overline{B}(0,p^k)=p^{-k}\mathbb{Z}_p$ for all $k \in \mathbb{Z}$, it is necessary to unwind $\mathbb{Z}_p$ a little bit more.

We have

Therefore $\mathbb{Z}_p$ is a disjoint union of $p^k$ balls of radius $p^{-k}$ when $k>0$. Hence in this case,

as expected. In other words, for $k<0$, the ball $\overline{B}(0,p^k)$ has measure $p^k$.

For the counterpart, we notice that

which is to say $\overline{B}(0,p^k)=p^{-k}\mathbb{Z}_p$ is a disjoint union of $p^k$ balls of radius $1$. Hence its measure is $p^k$. This concludes our computation of balls in $\mathbb{Q}_p$.

Now we come back to the definition of $m_p$. Now every open set $V$ can be written in the form

The union is countable because $\mathbb{Q}_p$ is second countable. By combining intersecting balls, we can assume that the union is also disjoint. It follows that

Note: this should be understood in the sense of real series, instead of $p$-adic number, because $m_p$ takes the values in $\mathbb{R}$. So for an arbitrary measurable set, we have

]]>The notion of Cohen-Macaulay ring is sufficiently general to a wealth of examples in algebraic geometry, invariance theory and combinatorics; meanwhile it is sufficiently strict to allow a rich theory. The notion of Cohen-Macaulay is a workhorse of commutative algebra. In this post, we discover an important subclass of Cohen-Macaulay ring - regular local rings (one would be thinking about $k[[x_1,\dots,x_n]]$). See also “Why Cohen-Macaulay rings have become important in commutative algebra?” on MathOverflow.

It is recommended to be familiar with basic commutative algebra tools such as Nakayama’s lemma and minimal prime ideals.

The content can be generalised to modules to a good extent, but we are not doing it for sake of quick accessibility.

Definition 1.TheKrull dimensionof $R$, written as $\dim{R}$, is the supremum taking over the length of prime ideal chains

This definition was introduced to define dimension of affine varieties, in a global sense. Locally, we have the following definition.

Definition 2.Theembedding dimensionof $R$ is the dimension of a vector spaceThe right hand side is the dimension of a $k$-vector space $\mathfrak{m}/\mathfrak{m}^2$.

Let $R$ be the local ring of a complex variety $X$ at a point $P$, in other words we write $R=\mathcal{O}_{P,X}$. Then $(\mathfrak{m}/\mathfrak{m}^2)^\ast$ is the Zariski tangent space of $X$ at $P$, whose dimension equals $\dim_k(\mathfrak{m}/\mathfrak{m}^2)=emb.\dim{R}$. The embedding dimension of $R$ is the smallest integer $n$ such that some analytic neighbourhood of $P$ in $X$ embeds into $\mathbb{C}^n$. If this dimension equals the dimension of $X$, then $X$ is “smooth” at $P$. For this reason we define regular local ring.

Definition 3.The ring $R$ is calledregularif $\dim{R}=emb.\dim{R}$.

The most immediate intuitive example of regular local ring has to be rings of the form

where $K$ is a field. These kind of rings are regular local rings of Krull dimension $n$. As one would imagine, this ring contains much more information than $K[x_1,\dots,x_n]$. Power series in complex analysis is much more powerful than polynomials.

But by working on regular local rings, we are not essentially restricting ourselves into the ring of power series over a field. For example, the ring $\mathbb{Z}[X]_{(2,X)}$ is also a regular local ring, but it does not even contain a field.

Nevertheless, our primary model of regular local rings is still a ring of the form $A=K[[x_1,\dots,x_n]]$, which has a maximal ideal $\mathfrak{m}=(x_1,\dots,x_n)$. To study local rings in the flavour of $A$, we develop an analogy of elements $\{x_1,\dots,x_n\}$.

Definition 4.Aregular sequenceof $R$, also written as $R$-sequence, is a sequence $[x_1,\dots,x_n]$ of elements in $\mathfrak{m}$ such that $x_1$ is a non-zero-divisor in $R$, and such that given $i>1$, each $x_i$ is a non-zero-divisor in $R/(x_1,\dots,x_i)$.The

gradeof $R$, $G(R)$, is the longest length of regular sequences. If $G(R)=\dim{R}$, then $R$ is calledCohen-Macaulay.

It is quite intuitive that, for $A=K[[x_1,\dots,x_n]]$, the longest $R$-sequence has to be $[x_1,\dots,x_n]$, and therefore $A$ is Cohen-Macaulay. But such an argument does not bring us to the conclusion that quick. We will show later, anyway, every regular local ring is a Cohen-Macaulay ring.

Amongst many sequences, we are in particular interested in the sequence that are mapped onto a basis of the $k$-vector space $\mathfrak{m}/\mathfrak{m}^2$. We will show later that this “regular” sequence is indeed the *regular* sequence.

Proposition 1.Let $x_1,\dots,x_n$ be elements in $\mathfrak{m} \subset R$ whose images form a basis of $\mathfrak{m}/\mathfrak{m}^2$, then $x_1,\dots,x_n$ generate the maximal ideal $\mathfrak{m}$.

*Proof.* Nakayama’s lemma (8). Notice that as $R$ is local, the Jacobson radical is $\mathfrak{m}$. Besides, we take $I=M=\mathfrak{m}$. $\square$

Proposition 2.If $R$ is a regular local ring of dimension $n$ and $x_1, \dots,x_n \in \mathfrak{m}$ map to a basis of $\mathfrak{m}/\mathfrak{m}^2$, then $R/(x_1,\dots,x_i)$ is a regular local ring of dimension $n-i$.

*Proof.* By proposition 1, we have $\mathfrak{m}=(x_1,\dots,x_i,x_{i+1},\dots,x_n)$. The dimension of $R/(x_1,\dots,x_i)$ is determined by the chain in $R$:

which has length $n-i$. That is, $\dim R/(x_1,\dots,x_i)=n-i$. On the other hand, the maximal ideal $\mathfrak{M}$ in $R/(x_1,\dots,x_i)$ is isomorphic to $(x_{i+1},\dots,x_n)$, and $x_{i+1},\dots,x_n$ map to a basis of $\mathfrak{M}/\mathfrak{M}^2$, which consequently has dimension $n-i$. $\square$

It looks quite promising now that the sequence of basis can get everything down to earth, and we will show that in the following section.

Proposition 3.If $R$ is regular, then $R$ is an integral domain.

*Proof.* We use induction on $\dim R$. When $\dim{R}=0$ and $R$ is regular, $R$ has to be a field, hence an integral domain by definition. Next we assume that $\dim{R}>0$ and the argument has been proved for $\dim{R}-1$.

Pick $x \in \mathfrak{m} \setminus \mathfrak{m}^2$. Then this element map to a nonzero element in $\mathfrak{m}/\mathfrak{m}^2$. There exists a basis of $\mathfrak{m}/\mathfrak{m}^2$ that contains $\overline{x}$. Therefore by proposition 2, $R/(x)$ is a regular local ring of dimension $\dim{R}-1$, which is an integral domain by assumption. It follows that $(x)$ is prime.

We claim that there exists $x \in \mathfrak{m}/\mathfrak{m}^2$ such that $(x)$ has height $1$. If not, then for all $x \in \mathfrak{m}/\mathfrak{m}^2$, $(x)$ is a minimal. It follows that there exists finitely many minimal prime ideals $\mathfrak{p}_1,\dots,\mathfrak{p}_r$ such that

and consequently $\mathfrak{m} \subset \mathfrak{p}_j$ for some $1 \le j \le r$. It follows that $\dim{R}=0$, contradicting our assumption that $\dim{R}>0$. [Note: the prime avoidance allows at most two ideals to be non-prime. See P. 90 of Eisenbud’s Commutative Algebra, with a View Toward Algebraic Geometry.]

Thus, as our claim is true, we can write $\mathfrak{p} \subsetneq (x)$ with $\mathfrak{p}$ prime and $x \in \mathfrak{m} \setminus \mathfrak{m}^2$. We see $\mathfrak{p} \in (x^n)$ for all $n$ because if $p=rx^n \in \mathfrak{p}$, then $r \in \mathfrak{q} \subset (x)$ and therefore we write $r=sx$ or equivalently $p = sx^{n+1} \in (x^{n+1})$. When this is the case, we have $\mathfrak{p} \subset \bigcap_{n=1}^{\infty}(x^n)=0$. Therefore $R/\mathfrak{p}=R/0=R$ is an integral domain.

We now reach our conclusion of this post.

Proposition 4.If $R$ is regular and of Krull dimension $n$, any $x_1,\dots,x_n \in \mathfrak{m}$ mapping to a basis of $\mathfrak{m}/\mathfrak{m}^2$ gives rise to a regular sequence ($R$-sequence). Hence $G(R)=\dim{R}$ and therefore $R$ is Cohen-Macaulay.

*Proof.* As $G(R) \le \dim{R}$, once we have shown that $[x_1,\dots,x_n]$ is a regular sequence, we have $G(R) \ge \dim{R}$. To show it being a regular sequence, first of all notice that $x_1$ is non-zero-divisor (because $R$ is an integral domain). For any $i>1$, we see $R/(x_1,\dots,x_i)$ is a regular local ring of dimension $d-i$, hence again an integral domain. Therefore $x_{i+1},\dots,x_i$ are non-zero-divisors. $\square$

- Charles A. Weibel,
*An Introduction to Homological Algebra*. - M. F. Atiyah, I. G. MacDonald,
*Introduction to Commutative Algebra*. - David Eisenbud,
*Commutative Algebra: with a View Toward Algebraic Geometry*. - Winfred Bruns, Jürgen Herzog,
*Cohen-Macaulay Rings*

For example, If $f(X)=(X-1)^{100}$, we have $n_0(f)=1$. It seems we are diving into calculus but actually there is still a lot of algebra.

Theorem 1 (Mason-Stothers).Let $a(X),b(X),c(X) \in K[X]$ be polynomials such that $(a,b,c)=1$ and $a+b=c$. Then

*Proof.* Putting $f=a/c$ and $g=b/c$, we have

This implies

We interrupt the proof here for some good reasons. Rational functions of the form $f’/f$ remind us of the chain rule applied to $\log{x}$. In the context of calculus, we have $\left(\log{f(x)}\right)’=f’/f$. On the ring $K[x]$, we define $D:K[x] \to K[x]$ to be the formal derivative morphism. Then this endomorphism extends to $K(x)$ by

On $K(x)^\ast$ (read: the multiplicative group of the rational function field $K(x)$), we define the logarithm derivative

It follows that

Also observe that, just as in calculus, if $f$ is a constant function, then $D(f)=0$. Now we write

Then it follows that

Now we can be back to the proof.

*Proof (continued).* Since $K$ is algebraically closed,

We see, for example

Therefore

Likewise

Combining both, we obtain

Next, multiplying $f’/f$ and $g’/g$ by

which has degree $n_0(abc)$ (since $(a,b,c)=1$, these three polynomials share no root). Both $N_0f’/f$ and $N_0g’/g$ are polynomials of degrees at most $n_0(abc)-1$ (this is because $\deg h’=\deg h-1$ for non-constant $h \in K[X]$, while $f$ and $g$ are non-constant (why?); we assume $\operatorname{char} K=0$ for this reason).

Next we observe the degrees of $a,b$ and $c$. Since $a+b=c$, we actually have $\deg c \le \max\{\deg a,\deg b\}$. Therefore $\max\{\deg a,\deg b,\deg c\}=\max\{\deg a,\deg b\}$. From the relation

and the assumption that $(a,b)=1$, one can find polynomial $h \in K[X]$ such that

Taking the degrees of both sides, we see

This proves the theorem. $\square$

We present some applications of this theorem.

Corollary 1 (Fermat’s theorem for polynomials).Let $a(X),b(X)$ and $c(X)$ be relatively prime polynomials in $K[X]$ such that not all of them are constant, and such thatThen $n \le 2$.

Alternatively one can argue the curve $x^n+y^n=1$ on $K(X)$.

*Proof.* Since $a,b$ and $c$ are relatively prime, we also have $a^n$, $b^n$ and $c^n$ to be relatively prime. By Mason-Stothers theorem,

Replacing $a$ by $b$ and $c$, we see

It follows that

In this case $n<3$. $\square$

Corollary 2 (Davenport’s inequality).Let $f,g \in K[X]$ be non-constant polynomials such that $f^3-g^2 \ne 0$. Then

One may discuss cases separately on whether $f$ and $g$ are coprime, and try to apply Mason-Stothers theorem respectively, and many documents only record the proof of coprime case, which is a shame. The case when $f$ and $g$ are not coprime can be a nightmare. Instead, for sake of accessibility, we offer the elegant proof given by Stothers, starting with a lemma about the degree of the difference of two polynomials.

Lemma 1.Suppose $p,q \in K[X]$ are two distinct non-constant polynomials, then

*Proof.* Let $k(f)$ be the leading coefficient of a polynomial $f$. If $\deg p \ne \deg q$ or $k(p) \ne k(q)$, then $\deg(p-q)\ge \deg p \ge \deg p - n_0(p)-n_0(q)+1$ because $n_0(p) \ge 1$ and $n_0(q) \ge 1$.

Next suppose $\deg p = \deg q$ and $k(p)=k(q)$. If $(p,q)=1$, then by Mason-Stothers,

Otherwise, suppose $(p,q)=r$. Then $p/r$ and $q/r$ are coprime. Again by Mason-Stothers,

Therefore

On the other hand,

Combining all these inequalities, we obtain what we want. $\square$

*Proof (of corollary 2).* Put $\deg{f}=m$ and $\deg{g}=n$. If $3m \ne 2n$, then

because $m \ge 1$. Next we assume that $3m=2n$, or in other word, $m=2r$ and $n=3r$. By lemma 1, we can write

This proves the inequality. $\square$

One may also generalise the case to $f^m-g^n$. But we put down some more important remarks. First of all, Mason-Stothers is originally a generalisation of Davenport’s inequality (by Stothers). I personally do not think any mortal can find the original paper of Davenport’s inequality, but on [Shioda 04] there is a reproduced proof using linear algebra (lemma 3.1).

For more geometrical interpretation, one may be interested in [Zannier 95], where Riemann’s existence theorem is also discussed.

In Stothers’s paper [Stothers 81], the author discussed the condition where the equality holds. If you look carefully you will realise his theorem 1.1 is exactly the Mason-Stothers theorem.

- [Davenport 65] H. Davenport,
*On $f^3(t)-g^2(t)$*, 1965. (can someone find a digital copy of this paper?) - [Ma 84] R. C. Mason,
*Diophantine Equations over Function Fields*, 1984. - [Shioda 04] Tetsuji Shioda,
*The abc-theorem, Davenport’s inequality and elliptic surfaces*, 2004 (https://www2.rikkyo.ac.jp/web/shioda/papers/esdstadd.pdf) - [Stothers 81] W. W. Stothers,
*POLYNOMIAL IDENTITIES AND HAUPTMODULN*, 1981. (https://doi.org/10.1093/qmath/32.3.349) - [Zannier 95] Umberto Zannier (Venezia),
*On Davenport’s bound for the degree of $f^3-g^2$ and Riemann’s Existence Theorem*, 1995. (https://eudml.org/doc/206763)

The **Riemann zeta function** is widely known:

It is widely known mainly because of the celebrated hypothesis by Riemann that remains unsolved after more than a century’s attempts by mathematicians and 150 million attempts by computers:

Riemann Hypothesis:The non-trivial zeros of $\zeta(s)$ lie on the line $\Re(s)=\frac{1}{2}$.

People are told by pop-science how important and mysterious this hypothesis is. Or how disastrous if this would be solved one day. We can put them aside. A question is, why would Riemann ever think about the zero set of *such* a function? Perhaps something else? According to Riemann, the distribution function of primes

may be written as the series

where

and $\rho$ varies over all zeros of $\zeta(s)$. With these being said, once this *hypothesis* is proven true, we may have a much more concrete say of the distribution of prime numbers.

But this is not the topic of this post actually. The author of this post is not trying to prove the Riemann Hypothesis in a few pages, and nobody could. In this post, we investigate the analytic continuation of $\zeta(s)$ step-by-step, so that it will make sense to even think about evaluating the value at $\frac{1}{2}$. For the theory of analytic continuation, I recommend Real and Complex Analysis by Walter Rudin. Although in his book he went into modular function and Picard’s little theorem, instead of $\zeta(s)$ function and related.

A sketch of our procedure follows. The function $\zeta(s)$ does not ring a bell of power series, although a straightforward observation shows that $\sum_{n=1}^{\infty}\frac{1}{n^{s}}$ represents an analytic function in the half-plane $\Re(s)>1$. We need to develop tools that can easily be utilised into the study of the zeta function. Our two main tools are the Gamma function and Mellin transform.

With these two tools being developed, we will observe the so-called complete zeta function, which will bring us to THE continuation we are looking for.

We will carry out details more about non-trivial processes, instead of basic complex analysis. The reader may skip our preparation if they are familiar with these content.

The Gamma function should be studied in an analysis course:

In an analysis course we have studied some of this function’s important properties:

$\Gamma(1)=1$.

$\Gamma(s+1)=s\Gamma(s)$ (as a result $n!=\Gamma(n+1)$)

$\log\Gamma(s)$ is a convex function.

In this section however, we will study it in the context of complex analysis.

Theorem 1.The Gamma functionis well-defined as an analytic function in the half plane $\Re(s)>0$.

*Proof.* If we write $s=u+iv$ with $u>0$ and $t=e^c$, then

Therefore

Then other properties follows. $\square$

Theorem 2.If $\Re(s)>0$, thenand as a consequence $\Gamma(n+1)=n!$ for $n=0,1,\dots$.

*Proof.* The second statement follows immediately because $\Gamma(1)=1$. For the first equation, we do a integration by parts:

Taking $\varepsilon \to 0$, we get what we want. $\square$

Now we are ready for the analytic continuation for the Gamma function, which builds a bridge to the analytic continuation of $\zeta$.

Theorem 3.The function $\Gamma(s)$ defined in theorem 1 admits an analytic continuation to a meromorphic function on the complex plane whose singularities are simple poles at $0,-1,\dots$, with corresponding residue $\frac{(-1)^n}{n!}$.

*Proof.* It suffices to show that we can $\Gamma$ to $\Re(s)>-m$, for all $m>0$ (hence we can extend it to all the complex plane). For this reason, we put $\Gamma_0(s)=\Gamma(s)$, which is defined in theorem 1. Then

is THE analytic continuation of $\Gamma_0(s)$ at $\Re(s)>-1$, with the only singularity $s=0$. Then

Likewise, we can define

Overall, whenever $m \ge 1$ is an integer, we can define

This function is meromorphic in $\Re(s)>-m$ and has simple poles at $s=0,-1,\dots,-m+1$ with residues

Successive applications of the lemma shows that $\Gamma_m(s)=\Gamma(s)$ for $\Re(s)>0$. Therefore we have obtained the analytic continuation through this process. $\square$

Throughout, unless specified, we will call the function obtained in the proof of theorem 3 as THE function $\Gamma$.

For all $s \in \mathbb{C}$, this function satisfies $\Gamma(s+1)=s\Gamma(s)$ as it should be.

Before we proceed, we develop two relationship between $\Gamma$ function and $\zeta$ function, in an attempt to convince the reader that we are not doing something for nothing.

If we perform a chance of variable $t=nu$ in the definition of $\Gamma(s)$, we see

This is to say,

Taking the sum of all $n$, we see

This relationship is beautiful, but may make our computation a little bit more complicated. However, if we get our hand dirty earlier, our study will be easier. Thus we will do a “uglier” change of variable $t \mapsto \pi n^2y$ to obtain

which implies

Either case, it is legal to change the order of summation and integration, because of the monotone convergence theorem.

Before we proceed, we need some more properties of the Gamma function.

Theorem 3 (Euler’s reflection formula).For all $s \in \mathbb{C}$,

Observe that this identity makes sense at all poles. Since $\Gamma(s)$ has simple poles at $0,-1,\dots$ meanwhile $\Gamma(1-s)$ has simple poles at $1,2,\dots$. As a result, $\Gamma(s)\Gamma(1-s)$ has simple poles at all integers, a property which is shared by $\pi/\sin\pi{s}$.

By analytic continuation, it suffices to prove it for $0<s<1$ because it will be extended to all of $\mathbb{C}$.

*Proof (real version).* First of all, observe that

On the other hand, we have

by taking $t=\frac{1}{1+y}$. Next we compute this integral for both $(0,1]$ and $[1,\infty)$.

(One shall be disturbed by our exchange of infinite sum and integration due to his or her study in analysis, but will be relaxed after being informed about Arzelà’s dominated convergence theorem of Riemann integrals.)

On the other hand, taking $y=\frac{1}{u}$, we see

Summing up, one has

It remains to show that $\pi\csc{\pi{x}}$ satisfies such an expansion as well, which is not straightforward because neither Fourier series nor Taylor series can drive us there directly. One can start with the infinite product expansion of $\sin{x}$ but here we follow an alternative approach. Notice that for $\alpha \in \mathbb{R} \setminus \mathbb{Z}$,

Taking $t=0$ and multiplying both sides by $\pi\csc\pi\alpha$, we obtain what we want. $\square$

*Proof (complex version).* By definition,

Here we performed a change-of-variable on $v=tu$. To compute the last integral, we put $u=e^x$, and it follows that

The integral on the right hand side can be computed to be $\frac{\pi}{\sin(1-s)\pi}=\frac{s}{\sin\pi s}$. This is a easy consequence of the residue formula (by considering a rectangle with centre $z=\pi i$, height $2\pi$ and one side being the real axis). $\square$

- In very much particular, by putting $s=1/2$, we obtain

As a bonus of this, by putting $t=u^2$, we also see

Therefore

To conclude this section, we mention the

Theorem 4 (Legendre duplication formula).

One can find a proof here.

Put $Z(s)=\pi^{-s/2}\Gamma(s/2)\zeta(s)$. It looks we are pretty close to a great property of $\zeta(s)$, if we can figure out $Z$ a little bit more, because $\pi^{-s/2}$ and $\Gamma(s/2)$ behave nicely. Therefore we introduce the Jacobi theta function

and try to deduce its relation with $Z(s)$.

To begin with, we first show that

Proposition 1.The theta function is holomorphic on the right half plane.

*Proof.* Let $C$ be a compact subset of the right half plane, and put $y_0=\inf_{s \in C}\Re(s)$. Pick any $n_0\ge \frac{1}{y_0}$. For $s=u+iv \in C$, we have $u \ge y_0$ and therefore

Therefore $\theta(s)$ converges absolutely on any compact subset of the right half plane. (Note we have used the fact that $n^2y_0 \ge |n|n_0y_0 \ge |n|$ when we are studying the convergence.) Since each term is holomorphic, we have shown that $\theta(s)$ itself is holomorphic. $\square$

Therefore it is safe to work around theta function. Now we are ready to deduce a functional equation.

Theorem 4.The theta function satisfies the functional equation on $\{\Re(s)>0\}$:

The square root is chosen to be in the branch with positive real part.

*Proof.* Consider the function $f(x)=e^{-\pi x^2}$. We know that this is the fixed point of Fourier transform (in this convenient form)

Now we put $g(x)=e^{-\pi u x^2}=f(\sqrt{u}x)$. The Fourier transform of $g$ is easy to deduce:

Since $g(x)$ is a Schwartz function, by Poisson summation formula, we have

By extending with analytic continuation, we are done. $\square$

For Schwartz functions, also known as rapidly decreasing functions, we refer the reader to chapter 7 of W. Rudin’s *Functional Analysis*.

Next we will study the behaviour of $\theta(s)$ on the half real line, especially at the origin and infinity. By the functional equation above, once we have a better view around the origin, we can quickly know what will happen at the infinity.

Proposition 2.When the real number $t \to 0$, the theta function is equivalent to $\frac{1}{\sqrt{t}}$. More precisely, when $t$ is small enough, the following inequality holds:

*Proof.* Rewrite $\theta(t)$ in the form

Therefore

Pick $t>0$ small enough so that

It follows that

$\square$

As a result, we also know how $\theta(t)$ behaves at the infinity. To be precise, we have the following corollary.

Corollary 1.The limit of $\theta(t)$ at infinity is $1$ in the following sense: when $t$ is big enough,

*Proof.* Let $t$ be big enough such that $\frac{1}{t}$ is small enough. That is,

according to proposition 2. The result follows. $\square$

To begin with, we introduce the Mellin transform. In a manner of speaking, this transform can actually be understood as the multiplicative version of the two-sided Laplace transform.

Definition.Given a function $f:\mathbb{R}_+ \to \mathbb{C}$, the Mellin transform of $f$ is defined to beprovided that the limit exists.

For example, $\Gamma(s)$ is the Mellin transform of $e^x$. Moreover, for the two-side Laplace transform

we actually have

where $\tilde{f}(x)=f(e^{-x})$.

Our goal is to recover $Z(s)$ through the Mellin transform of $\theta(x)$. As we have proved earlier,

It seems we can get our result really quick by studying $\frac{1}{2}(\theta(s)-1)$. However we see $\theta(x)$ goes to $\frac{1}{\sqrt{x}}$ rapidly as $x \to 0$, and goes to $1$ rapidly as $x \to \infty$. Convergence has to be taken care of. Therefore we add error correction terms. For this reason, we study the function

We use $s/2$ in place of $s$ because we do not want $\zeta$ to be evaluated at $2s$ all the time.

The partition $(0,1) \cup (1,\infty)$ immediately inspires one to use the change-of-variable $y=\frac{1}{x}$. As a result,

Now we are ready to compute $\phi(s)$. For the first part,

On the other hand,

Therefore

Therefore

In particular,

Expanding this equation above, we see

This gives

Finally we try to simplify the quotient above. By Legendre’s duplication formula,

By Euler’s reflection formula,

Combining these two equations, we obtain

Proposition 3.The Riemann Zeta function $\zeta(s)$ admits an analytic continuation satisfying the functional equation

In particular, since we also have

it is immediate that $\zeta(s)$ admits a simple hole at $s=1$ with residue $1$. Another concern is $s=0$. Nevertheless, since we have

there is no pole at $s=0$ (notice that $\phi(s)$ is entire). We now know a little bit more about the analyticity of $\zeta(s)$.

Corollary 2.The Riemann zeta function $\zeta(s)$ has its analytic continuation defined on $\mathbb{C} \setminus \{1\}$, with a simple pole at $s=1$ with residue $1$.

Now we are safe to compute $\zeta(-1)$.

But I believe, after these long computation of the analytical continuation, we can be confident enough to say that, when $\Re(s) \le 1$, the Riemann zeta function $\zeta(s)$ can not remotely be immediately explained by its ordinary definition $\sum_{n=1}^{\infty}n^{-s}$. Claiming $1+2+\dots=-\frac{1}{12}$ is a ridiculous abuse of language.

This post ends with Greg Gbur‘s criticism of the infamous Numberphile video.

]]>So why is this important? Part of what I’ve tried to show on this blog is that mathematics and physics can be extremely non-intuitive, even bizarre, but that they have their own rules and logic that make perfect sense once you get familiar with them. The original video, in my opinion, acts more like a magic trick than an explanation: it shows a peculiar, non-intuitive result and tries to pass it off as absolute truth without qualification. Making science and math look like incomprehensible magic does not do any favors for the scientists who study it nor for the public who would like to understand it.

Let $K$ be a field (in this post we mostly assume that $K \supset \mathbb{Q}$) and $n$ an integer $>1$ which is not divisible by the characteristic of $K$. Then the polynomial

is separable because its derivative is $nX^{n-1} \ne 0$. Hence in the algebraic closure $\overline{K}$, the polynomial has $n$ distinct roots, which forms a group $U$, and is cyclic. In fact, as an exercise, one can show that, for a field $k$, any subgroup $U$ of the multiplicative group $k^\ast$ is a cyclic group.

The generator $\zeta_n$ of $U$ is called the primitive $n$-th root of unity. Let $K=\mathbb{Q}$ and $L$ be the smallest extension that contains all elements of $U$, then we have $L=\mathbb{Q}(\zeta_n)$. As a matter of fact, $L/K$ is a Galois extension (to be shown later), and the cyclotomic polynomial $\Phi_n(X)$ is the irreducible polynomial of $\zeta_n$ over $\mathbb{Q}$. We first need to find the degree $[L:K]$.

Proposition 1.Notation being above, $L/K$ is Galois, the Galois group $\operatorname{Gal}(L/K) \cong (\mathbb{Z}/n\mathbb{Z})^\ast$ (the group of units in $\mathbb{Z}/n\mathbb{Z}$) and $[L:K]=\varphi(n)$.

Let’s first elaborate the fact that $|(\mathbb{Z}/n\mathbb{Z})^\ast|=\varphi(n)$. Let $[0],[1],\dots,[n-1]$ be representatives of $\mathbb{Z}/n\mathbb{Z}$. An element $[x]$ in $\mathbb{Z}/n\mathbb{Z}$ is a unit if and only if there exists $[y]$ such that $[xy]=[1]$, which is to say, $xy \equiv 1 \mod n$. Notice that $xy \equiv 1 \mod n$ if and only if $xy+mn=1$ for some $y,n \in \mathbb{Z}$, if and only if $\gcd(x,n)=1$. Therefore $|(\mathbb{Z}/n\mathbb{Z})^\ast|=\varphi(n)$ is proved.

The proof can be produced by two lemmas, the first of which is independent to the characteristic of the field.

Lemma 1.Let $k$ be a field and $n$ be not divisible by the characteristic $p$. Let $\zeta=\zeta_n$ be a primitive $n$-th root of unity in $\overline{k}$, then $(\mathbb{Z}/n\mathbb{Z})^\ast \supset \operatorname{Gal}(k(\zeta)/k)$ and therefore $[k(\zeta):k] \le \varphi(n)$. Besides, $k(\zeta)/k$ is a normal abelian extension.

*Proof.* Let $\sigma$ be an embedding of $k(\zeta)$ in $\overline{k}$ over $k$, then

so that $\sigma\zeta$ is also an $n$-th root of unity also. Hence $\sigma\zeta=\zeta^i$ for some $i=i(\sigma)$, uniquely determined modulo $n$. It follows that $\sigma$ maps $k(\zeta)$ into itself. This is to say, $k(\zeta)$ is normal over $k$. Let $\tau$ be another automorphism of $k(\zeta)$ over $k$ then

It follows that $i(\sigma)$ and $i(\tau)$ are prime to $n$ (otherwise, $\sigma\zeta$ would have a period smaller than $n$, implying that the period of $\zeta$ is smaller than $n$, which is absurd). Therefore for each $\sigma \in \operatorname{Gal}(k(\zeta)/k)$, $i(\sigma)$ can be embedded into $(\mathbb{Z}/n\mathbb{Z})^\ast$, thus proving our theorem. $\square$

It is easy to find an example with strict inclusion. One only needs to look at $k=\mathbb{R}$ or $\mathbb{C}$.

Lemma 2.Let $\zeta=\zeta_n$ be a primitive $n$-th root of polynomial over $\mathbb{Q}$, then for any $p \nmid n$, $\zeta^p$ is also a primitive $n$-th root of unity.

*Proof.* Let $f(X)$ be the irreducible polynomial of $\zeta$ over $\mathbb{Q}$, then $f(X)|(X^n-1)$ by definition. As a result we can write $X^n-1=f(X)h(X)$ where $h(X)$ has leading coefficient $1$. By Gauss’s lemma, both $f$ and $h$ have integral coefficients.

Suppose $\zeta^p$ is not a root of $f$. Since $(\zeta^p)^n-1=(\zeta^n)^p-1=0$, it follows that $\zeta^p$ is a root of $h$, and $\zeta$ is a root of $h(X^p)$. As a result, $f(X)$ divides $h(X^p)$ and we write

Again by Gauss’s lemma, $g(X)$ has integral coefficients.

Next we reduce these equations in $\mathbf{F}_p=\mathbb{Z}/p\mathbb{Z}$. We firstly have

By Fermat’s little theorem $a^p=a$ for all $a \in \mathbf{F}_q$, we also have

Therefore

which implies that $\overline{f}(X)|\overline{h}(X)^p$. Hence $\overline{f}$ and $\overline{h}$ must have a common factor. As a result, $X^n-\overline{1}=\overline{f}(X)\overline{h}(X)$ has multiple roots in $\mathbf{F}_p$, which is impossible because of our choice of $p$. $\square$

Now we are ready for Proposition 1.

*Proof of Proposition 1.* Since $\mathbb{Q}$ is a perfect field, $\mathbb{Q}(\zeta)/\mathbb{Q}$ is automatically separable. This extension is Galois because of lemma 1. By lemma 1, it suffices to show that $[\mathbb{Q}(\zeta):\mathbb{Q}] \ge \varphi(n)$.

Recall in elementary group theory, if $G$ is a finite cyclic group of order $m$ and $x$ is a generator of $G$, then the set of generators consists elements of the form $x^\nu$ where $\nu \nmid m$. In this occasion, if $\zeta$ generates $U$, then $\zeta^p$ also generates $U$ because $p \nmid n$. It follows that every primitive $n$-th root of unity can be obtained by raising $\zeta$ to a succession of prime numbers that do not divide $n$ (as a result we obtain exactly $\varphi(n)$ such primitive roots). By lemma 2, all these numbers are roots of $f$ in the proof of lemma 2. Therefore $\deg f = [L:K] \ge \varphi(n)$. Hence the proposition is proved. $\square$

We will show that $f$ in the proof lemma 2 is actually the cyclotomic polynomial $\Phi_n(x)$ you are looking for. The following procedure works for all fields where the characteristic does not divide $n$, but we assume characteristic to be $0$ for simplicity.

We have

where the product is taken over all $n$-th roots of unity. Collecting all roots with the same period $d$ (i.e., those $\zeta$ such that $\zeta^d=1$), we put

Then

It follows that $\Phi_1(X)=X-1$ and

This presentation makes our computation much easier. But to understand $\Phi_n$, we still should keep in mind that the $n$-th cyclotomic polynomial is defined to be

whose roots are all primitive $n$-th roots of unity. As stated in the proof of proposition 1, there are $\varphi(n)$ primitive $n$-th roots of unity, and therefore $\deg\Phi_n(X)=\varphi(n)$. Besides, $f|\Phi_n$. Since both have the same degree, these two polynomials equal. It also follows that $\sum_{d|n}\varphi(n)=n$.

Proposition 2.The cyclotomic polynomial is irreducible and is the irreducible polynomial of $\zeta$ over $\mathbb{Q}$, where $\zeta$ is a primitive $n$-th root of unity.

We end this section by a problem in number fields, making use of what we have studied above.

Problem 0.A number field $F$ only contains finitely many roots of unity.

*Solution.* Let $\zeta \in F$ be a root of unity with period $n$. Then $\Phi_n(\zeta)=0$ and therefore $[\mathbb{Q}(\zeta):\mathbb{Q}]$ has degree $\varphi(n)$. Since $\mathbb{Q}(\zeta)$ is also a subfield of $F$, we also have $\varphi(n) \le [F:\mathbb{Q}]$. Since $\{n:\varphi(n) \le [F:\mathbb{Q}]\}$ is certainly a finite set, the number of roots of unity lie in $F$ is finite. $\square$

We will do some dirty computation in this section.

Problem 1.If $p$ is prime, then $\Phi_p(X)=X^{p-1}+X^{p-2}+\dots+1$, and for an integer $\nu \ge 1$, $\Phi_{p^\nu}(X)=\Phi_p(X^{p^{\nu-1}})$.

*Solution.* The only integer $d$ that divides $p$ is $1$ and we can only have

For the second statement, we use induction on $\nu$. When $\nu=1$ we have nothing to prove. Suppose now

is proved, then $X^{p^\nu}-1=\prod_{r=0}^{\nu}\Phi_{p^r}(X)$ and therefore

Problem 2.Let $p$ be a prime number. If $p \nmid n$, then

*Solution.* Assume $p \nmid n$ first. It holds clearly for $n=1$. Suppose now the statement holds for all integers $<n$ that are prime to $p$. We see

Problem 3.If $n$ is an odd number $>1$, then $\Phi_{2n}(X)=\Phi_n(-X)$.

*Solution.* By problem 2, $\Phi_{2n}(X)=\Phi_n(X^2)/\Phi_n(X)$. To show the identity it suffices to show that

For $n=3$ we see

Now suppose it holds for all odd numbers $3 \le d < n$, then

The following problem would not be very easy without the Möbius inversion formula so we will use it anyway. Problems above can also be deduced from this formula. Let $f:\mathbb{Z}_{\ge 0} \to \mathbb{Z}_{\ge 0}$ be a function and $F(n)=\prod_{d|n}f(d)$, then the Möbius inversion formula states that

with

Putting $f(d)=\Phi_d(X)$, we see

Now we proceed.

Problem 4.If $p|n$, then $\Phi_{pn}(X)=\Phi_n(X^p)$.

*Solution.* By the Möbius inversion formula, we see

because all $d$ that divides $np$ but not $n$ must be divisible by $p^2$. Problem 2 can also follow from here.

Problem 5.Let $n=p_1^{r_1}\dots p_s^{r_s}$, then

*Solution.* This problem can be solved by induction on the number of primes. For $s=1$ it is problem 1. Suppose it has been proved for $s-1$ primes, then for

and a prime $p_s$, we have

On the other hand,

if we put $Y=X^{p_1^{r_1-1}\dots p_{s-1}^{r_{s-1}-1}}$. When it comes to higher degree of $p_s$, it’s merely problem 2. Therefore we have shown what we want.

Let $\zeta$ be a primitive $n$-th root of unity, put $K=\mathbb{Q}(\zeta)$ and $G$ the Galois group.. We will compute the norm of $1-\zeta$ with respect to the extension $K/\mathbb{Q}$. Since this extension is separable, we have

Since $G$ acts on the set of primitive roots transitively, $\{\sigma\zeta\}_{\sigma \in G}$ is exactly the set of primitive roots of unity, which are roots of $\Phi_n(X)$. It follows that

If $n=p^r$, then $N_\mathbb{Q}^K(1-\zeta)=\Phi_p(1^{p^{r-1}})=\Phi_p(1)=p$. On the other hand, if

then

]]>

Definition.For a polynomial with coefficients in a number field $K$the

heightof $f$ is defined to bewhere

is the

Gauss normfor any place $v$.

Here, $M_K$ refers to the canonical set of non-equivalent places on $K$. See first four pages of this document for a reference.

As one can expect, this can tell us about some complexity of a polynomial, just like how the height of an algebraic number tells us its complexity. Let us compute some examples.

Let us consider the simplest one

first. Since $|x^2-1|_v=1$ for all places $v$, the height of $f$ is a sum of $0$, which is still $0$.

Next, we take care of a polynomial that involves prime numbers

We see $|g(x)|_\infty=2$, $|g(x)|_2=2^{-(-2)}=4$, $|g(x)|_3=3^{-(-1)}=3$, and the Gauss norm is $1$ for all other primes. Therefore

Put $u(x,y)=\sqrt{2}x^2 + 3\sqrt{2}xy+5y^2+7 \in \mathbb{Q}(\sqrt{2})[x,y]$, we can compute its height carefully. Notice that $|\sqrt{2}|_v=\sqrt{|2|_v}$ for all places $v$ and we therefore have

If $f \in K[s_1,\dots,s_n]$ and $g \in K[t_1,\dots,t_m]$ are two polynomials in different variables, then as a polynomial in $K[s_1,\dots,s_n;t_1,\dots,t_m]$, $fg$ has height $h(f)+h(g)$. This is immediately realised once we notice that the height of a polynomial is equal to the height of the vector of coefficients in appropriate projective space. The identity $h(fg)=h(f)+h(g)$ follows from the Segre embedding.

But if variables coincide, things get different. For example, $h(x+1)=0$ but $h((x+1)^2)=2$. This is because we do not have $|fg|_\infty=|f|_\infty|g|_\infty$. Nevertheless, for non-Archimedean places, things are easier.

Gauss’s lemma.If $v$ is not Archimedean, then $|fg|_v=|f|_v|g|_v$.

*Proof.* First of all, it suffices to prove it for univariable cases. If $f$ and $g$ have multiple variables $x_1,\dots,x_n$, let $d$ be an integer greater than the degree of $fg$. Then the Kronecker substitution

reduces our study into $K[t]$. This is because, with such a $d$, this substitution gives a univariable polynomial with the same set of coefficients.

Therefore we only need to show that $|f(t)g(t)|_v=|f(t)|_v|g(t)|_v$. Without loss of generality we assume that $|f(t)|_v=|g(t)|_v=1$. Write $f(t)=\sum a_k t^k$ and $g(t)=\sum b_k t^k$, we have $f(t)g(t)=\sum c_jt^j$ where $c_j=\sum_{j=k+l}a_kb_l$.

We suppose that $|fg|_v<1$, i.e., $|c_j|_v<1$ for all $j$, and see what contradiction we will get. If $|a_j|=1$ for all $j$, then $|c_j|_v<1$ implies that $|b_k|_v<1$ for all $k$ and therefore $|g|_v<1$, a contradiction. Therefore we may assume that, without loss of generality, $|a_0|_v<1$ but $|a_1|_v=1$. Then, since

we have $|a_1b_{j-1}|_v=|b_{j-1}|_v<1$ for all $j \ge 1$. It follows that $|g(t)|_v<1$, still a contradiction. $\square$

So much for non-Archimedean case. For Archimedean case things are more complicated so we do not have enough space to cover that. Nevertheless, we have

Gelfond’s lemma.Let $f_1,\dots,f_m$ be complex polynomials in $n$ variables an set $f=f_1\cdots f_n$, thenwhere $d$ is the sum of the partial degrees of $f$, and $\ell_\infty(f)=\max_j|a_j|=|f|_\infty$.

Combining Gelfond’s lemma and Gauss’s lemma, we obtain

Is not actually given by Mahler initially. It was named after Mahler because he successfully extended it to multivariable cases in an elegant way. We will cover the original motivation anyway.

Say we want to find prime numbers large enough. Pierce came up with an idea. Consider $p(x) \in \mathbb{Z}[x]$, which is factored into

Consider $\Delta_n=\prod_i(\alpha^n_i-1)$. Then by some Galois theory, this is indeed an integer. So perhaps we may find some interesting integers in the factors of $\Delta_n$. Also, we expect it to grow slowly. Lehmer studied $\frac{\Delta_{n+1}}{\Delta_n}$ and observed that

So it makes sense to compare all roots of $p(x)$ with $1$. He therefore suggested the following function related to $p(x)$:

This number appears if we consider $\lim_{n \to \infty}\Delta_{n+1}/\Delta_n$.

He also asked the following question, which is now understood as **Lehmer conjecture**, although in his paper he addressed it as a problem instead of a conjecture:

Is there a constant $c$ such that, $M(p)>1 \implies M(p)>c$?

It remains open but we can mention some key bounds.

- Lehmer himself found that

and actually this is the finest result that has ever been discovered. It was because of this discovery that he gave his *problem*.

This polynomial has also led to the discovery of a large prime number $\sqrt{\Delta_{379}}=1, 794, 327, 140, 357$, although by studying $x^3-x-1$, we have found a bigger prime number $\Delta_{127}=3, 233, 514, 251, 032, 733$.

- Breusch (and later Smyth) discovered that if $p$ is monic, irreducible and nonreciprocal, i.e. it does not satisfy $p(x)=\pm x^{\deg p}f(1/x)$, then

- E. Dobrowlolski found that, t if $p(x)$ is monic, irreducible and noncyclotomic, and

has degree $d$ then

for some $c>0$.

Definition.For $f \in \mathbb{C}[x_1,\dots,x_n]$, theMahler measureis defined to bewhere $d\mu_i=\frac{1}{2\pi}d\theta_i$, i.e., $d\mu_1\dots d\mu_n$ corresponds to the (completion of) Harr measure on $\mathbb{T}^n$ with total measure $1$.

We see through Jensen’s formula that when $n=1$ this coincides with what we have defined before. Observe first that $M(fg)=M(f)M(g)$. Consider $f(t)=a\prod_{i=1}^{d}(t-\alpha_i)$, then

On the other hand, as an exercise in complex analysis, one can show that

Combining them, we see

Taking the logarithm we also obtain **Jensen’s formula**

We first give a reasonable and useful estimation of $M(f)$, which will be used to prove the Northcott’s theorem.

Definition.For $f(t)=a_dt^d+\dots+a_0$, the $\ell_p$-norm of $f$ is naturally defined to beFor $p=\infty$, we have $\ell_\infty(f)=\max_j|a_j|$.

Lemma 1.Notation being above, $M(f) \le \ell_1(f)$ and

*Proof.* To begin with, we observe those obvious ones. First of all,

Therefore

Next, by Jensen’s inequality

However, by Parseval’s formula, the last term equals

For the remaining inequality, we use Vieta’s formula

and therefore

for all $0 \le r \le d$. Replacing $|a_{d-r}|$ with $\ell_\infty(f)$, we have finished the proof. $\square$

Before proving Northcott’s theorem, we show the connection between Mahler measure and heights.

Proposition 1.Let $\alpha \in \overline{\mathbb{Q}}$ and let $f$ be the minimal polynomial of $\alpha$ over $\mathbb{Z}$. Thenand

*Proof.* Put $d=\deg(\alpha)$ and write

Choose a number field $K$ that contains $\alpha$ and is a Galois extension of $\mathbb{Q}$, with Galois group $G$. Then $(\sigma\alpha:\sigma \in G)$ contains every conjugate of $\alpha$ exactly $[K:\mathbb{Q}]/d$ times. Since $a_0,\dots,a_d$ are coprime, for any non-Archimedean absolute value $v \in M_K$, we must have $\max_i|a_i|_v=|f|_v=1$. Combining with Gauss’s lemma and Galois theory, we see

Now we are ready to compute the height of $\alpha$ to rediscover the Mahler’s measure. Notice that

We therefore obtain

The last term corresponds to what we have computed above about non-Archimedean absolute values so we break it down a little bit:

for some $u \mid \infty$, according to the product formula. On the other hand, for $v \mid \infty$,

All in all,

The second assertion follows immediately because

The set of non-zero algebraic integers of height $0$ lies on the unit circle, and they are actually roots of unit, by Kronecker’s theorem. However keep in mind that algebraic integers on the unit circle are not necessarily roots of units. See this short paper.

When it comes to algebraic integers of small heights, things may get complicated, but Northcott’s theorem assures that we will be studying a finite set.

Northcott’s Theorem.Given an integer $N>0$ and a real number $H \ge1$, there are only a finite number of algebraic integers $\alpha$ satisfying $\deg(\alpha) \le N$ and $h(\alpha) \le \log H$.

*Proof.* Let $\alpha$ be a algebraic integer of degree $d<N$ and height $h(\alpha) \le \log H$. Suppose $f(t)=a_dt^d+\dots+a_0 \in \mathbb{Z}[t]$ is the minimal polynomial of $\alpha$. Then lemma 1 shows us that

On the other hand, by proposition 1,

we have actually

This gives rise to no more than $(2\lfloor (2H)^d \rfloor+1)^{d+1}$ distinct polynomials $f$, which produces at most $d(2\lfloor (2H)^d \rfloor+1)^{d+1}<\infty$ algebraic integers. Ranging through all $d \le N$ we get what we want. $\square$

We also have the **Northcott property**, where we do not care about degrees. A set $L$ of algebraic integers is said to satisfy Northcott property if, for every $T>0$, the set

is finite. Such a set $L$ is said to satisfy **Bogomolov property** if, there exists $T>0$ such that the set

is empty. As a matter of elementary topology, Northcott property implies Bogomolov property. It would be quite interesting if $L$ is a field. This paper can be quite interesting.

Erico Bombieri, Walter Gubler,

*Heights in Diophantine Geometry*.Michel Waldschmidt,

*Diophantine Approximation on Linear Algebraic Groups, Transcendence Properties of the Exponential Function in Several Variables*.Chris Smyth,

*THE MAHLER MEASURE OF ALGEBRAIC NUMBERS: A SURVEY*.

Let $F$ be a non-Archimedean local field, meaning that $F$ is complete under the metric induced by a non-Archimedean absolute value $|\cdot|$. Consider the ring of integers

and its unique prime (hence maximal) ideal

The residue field $k=\mathfrak{o}_F/\mathfrak{p}$ is finite because it is compact and discrete. For compactness notice that $\mathfrak{o}_F$ is compact, and the canonical projection $\mathfrak{o}_F \to k$ is open. For discreteness, notice that $\mathfrak{p}$ is open, connected and contains the unit.

Let $f \in \mathfrak{o}_F[x]$ be a polynomial. Hensel’s lemma states that, if $\overline{f} \in k[x]$, the reduction of $f$, has a simple root $a$ in $k$, then the root can be lifted to a root of $f$ in $\mathfrak{o}_F$ and hence $F$. This blog post is intended to offer a well-organised proof of this lemma.

To do this, we need to use Newton’s method of approximating roots of $f(x)=0$, something like

We know that $a_n \to \zeta$ where $f(\zeta)=0$ at a $A^{2^n}$ speed for some constant $A$, in calculus (do Walter Rudin’s exercise 5.25 of *Principles of Mathematical Analysis* if you are not familiar with it, I heartily recommend.). Now we will steal Newton’s method into number theory to find roots in a non-Archimedean field, which is violently different from $\mathbb{R}$, the playground of elementary calculus.

We will also use induction, in the form of which I would like to call “double induction”. Instead of claiming that $P(n)$ is true for all $n$, we claim that $P(n)$ and $Q(n)$ are true for all $n$. When proving $P(n+1)$, we may use $Q(n)$, and vice versa.

This method is inspired by this lecture note, where actually a “quadra induction” is used, and everything is proved altogether. Nevertheless, I would like to argue that, the quadra induction is too dense to expose the motivation and intuition of this proof. Therefore, we reduce the induction into two arguments and derive the rest with more reasonings.

Hensel’s Lemma.Let $F$ be a non-Archimedean local field with ring of integers $\mathfrak{o}_F=\{\alpha \in F:|\alpha| \le 1\}$ and prime ideal $\mathfrak{p}=\{\alpha \in F:|\alpha|<1\}$. Let $f \in \mathfrak{o}_F[x]$ be a polynomial whose reduction $\overline{f} \in k[x]$ has a simple root $a \in k$, then $a$ can be lifted to $\alpha \equiv a \mod \mathfrak{p}$, such that $f(\alpha)=0$.

By simple root we mean $\overline{f}(a)=0$ but $\overline{f}’(a) \ne 0$. Before we prove this lemma, we see some examples.

Put $F=\mathbb{Q}_7$. Then $\mathfrak{o}_F=\mathbb{Z}_7$, $\mathfrak{p}=7\mathbb{Z}_7$ and $k=\mathbb{F}_7$. We show that square roots of $2$ are in $F$. Note $\overline{f}(x)=x^2-2=(x-3)(x+3) \in k[x]$, we therefore two simple roots of $\overline{f}$, namely $3$ and $-3$. Lifting to $\mathfrak{o}_F$, we have two roots $\alpha_1 \equiv 3 \mod 7\mathbb{Z}_7$ and $\alpha_2 \equiv -3 \mod 7\mathbb{Z}_7$, of $f$. For $\alpha_1$, we have

Hence we can put $\alpha=\sqrt{2}=3+7+2\cdot 7^2+6\cdot 7^3\cdots\in\mathbb{Z}_7 \subset \mathbb{Q}_7$. Likewise $\alpha_2$ can be understood as $-\sqrt{2}$. This expansion is totally different from our understanding in $\mathbb{Q}$ or $\mathbb{R}$.

Since $k$ is a finite field, we see $k^\times$ is a cyclic group of order $q-1$ where $q=p^n=|k|$ for some prime $p$. It follows that $x^{q-1}=1$ for all $x \in k^\times$. Therefore $f(x)=x^{q-1}-1$ has $q-1$ distinct roots in $k$. By Hensel’s lemma, $F$ contains all $(q-1)$st roots of unity. It does not matter whether $F$ is isomorphic to $\mathbb{Q}_p$ or $\mathbb{F}_q((t))$.

Pick any $a_0 \in \mathfrak{o}_F$ that is a lift of $a\mod\mathfrak{p}$. Define

then we claim that $a_n$ converges to the root we are looking for.

First of all, we need to show that $a_n \in \mathfrak{o}_F$, i.e., $|a_n| \le 1$ for all $n$. It suffices to show that $|f(a_{n-1})/f’(a_{n-1})| \le 1$. We firstly observe the case when $n=1$.

Since $\overline{f}(a)=0$ but $\overline{f}’(a) \ne 0$, we have $f(a_0) \in \mathfrak{p}$ but $f’(a_0)\not\in\mathfrak{p}$. As a result, $|f(a_0)|<1$ but $|f’(a_0)|=1$. As a result, $|f(a_0)/f’(a_0)|<1$, which implies that $f(a_0)/f’(a_0) \in \mathfrak{o}_F$ and therefore $a_1 \in \mathfrak{o}_F$.

By Taylor’s theorem.

for some $g_n \in \mathfrak{o}_F[x]$. When $n=1$, we see $g_1(a_1) \in \mathfrak{o}_F$ and as a result $|g_1(a_1)| \le 1$. Therefore

Since $a_1 \in \mathfrak{o}_F$, we also see that $f(a_1) \in \mathfrak{o}_F$ hence its absolute value is not greater than $1$. As a result $|f(a_1)/f’(a_1)| \le 1$, which implies that $a_2 \in \mathfrak{o}_F$.

This inspires us to claim the following *two* statements:

(a) $|f(a_n)| < 1$ for all $n \ge 0$.

(b) $|f’(a_n)|=|f’(a_0)|=1$ for all $n \ge 0$.

We have verified (a) and (b) for $n=0$ and $n=1$. Now assume that (a) and (b) are true for $n-1$, then, for $n$, we will verify as follows.

First of all, by (a) and (b) for $n-1$, we see $a_n \in \mathfrak{o}_F$.

Consider the Taylor’s expansion

where $h_n \in \mathfrak{o}_F[x]$. It follows that $|h_n(a_n)| \le 1$. Since $|f’(a_{n-1})|=1$, by (b) we actually have

To prove (b) for $n$, we consider the Taylor’s expansion

Notice that since $a_n \in \mathfrak{o}_F$, we have $f’’(a_{n-1}),g_n(a_n) \in \mathfrak{o}_F$. By (a) and (b) for $n-1$, we see

Hence

bearing in mind that for a non-Archimedean absolute value, $|x+y|=\max\{|x|,|y|\}$ iff $|x| \ne |y|$. Through this process we have also proved (b).

We need to show that $\{a_n\}$ is a Cauchy sequence. To do this, it suffices to show that $|f(a_n)| \to 0$ sufficiently quick. Recall in the proof of (a) we have shown that $|f(a_n)| \le |f(a_{n-1})|^2$ for all $n$. By applying this relation inductively, we see $|f(a_n)| \le |f(a_0)|^{2^n}$. Since $|f(a_0)|<1$, it follows that $|f(a_n)| \to 0$ as $n \to \infty$.

For any $\varepsilon>0$, there exists $N>0$ such that $|f(a_n)| <\varepsilon$ for all $n \ge N$. As a result, for all $m>n>N$, we have

Therefore $\{a_n\}$ is Cauchy. Since $F$ is complete, $a_n$ converges to some $\alpha \in \mathfrak{o}_F \subset F$ such that $f(\alpha)=\lim_{n \to \infty}f(a_n)=0$.

In local fields, congruence is determined by inequality. In fact, we only need to show that $|\alpha-a_0|<1$, which means that $\alpha-a_0 \in \mathfrak{p}$, and therefore $\alpha \equiv a \mod \mathfrak{p}$ as expected. To do this, we show by induction that $|a_n-a_0|<1$. For $n=1$ we see $|a_1-a_0|=|f_0|<1$.

Suppose $|a_{n-1}-a_0|<1$ then

Therefore $|\alpha-a_0|=\lim_{n \to \infty}|a_n-a_0|<1$, from which the result follows. $\square$

In fact we have not explicitly used the fact that $a$ is a simple root. We only used the fact that $|f(a_0)|<1$ but $|f’(a_0)|=1$. Moreover, what really matters here is that $|f(a_n)|$ converges to $0$ quick enough. Therefore $1$ may be replaced by a smaller constant. For this reason we introduce a stronger version of Hensel’s lemma.

Hensel’s lemma, stronger version.Let $F$ be a non-Archimedean local field with ring of integers $\mathfrak{o}_F$. Suppose there exists $a \in \mathfrak{o}_F$ such that $|f(a)|<|f’(a)|^2$, then there exists some $b \in \mathfrak{o}_F$ such that $f(b)=0$ and $|b-a|<|f’(a)|$.

Instead of asserting $|f’(a_n)|=1$ for all $n$, we claim that $|f’(a_n)|=|f’(a_0)|$ (as it should be!). Instead of asserting $|f(a_n)|<1$, we claim that $|f(a_n)| \le \lambda^{2^n}|f’(a_0)|$ where $\lambda=|f(a_0)|/|f’(a_0)|^2$. The proof will be nearly the same.

For example, we can find a square root of $257$ in $\mathbb{Z}_2 \subset \mathbb{Q}_2$. The polynomial $f(x)=x^2-257$ is reduced to $\overline{f}(x)=x^2-1=(x-1)^2$ in $\mathbb{F}_2[x]$, where $1$ is not a simple root. Therefore we cannot apply the original version of Hensel’s lemma to this polynomial. Nevertheless, we see $f(1)=-256$ and $f’(1)=2$. Therefore $|f(1)|=\frac{1}{2^8}$ while $|f’(1)|=\frac{1}{2}$. We can apply Newton’s method here to find a square root of $257$ without worrying about repeated roots.

There are a lot of variants of Hensel’s lemma, for example you can do exercise 10.9 of Atiyah-MacDonald. In fact, we later even have Henselian ring and Henselisation of a ring.

There are some other proofs of Hensel’s lemma in this post, for example, since Newton’s method can also be understood as a contraction mapping, we can also prove it using properties of contraction mapping (see K. Conrad’s note).

]]>The group $GL_2(\mathbb{F}_q)$ consists of invertible $2 \times 2$ matrices with entries in the finite field $\mathbb{F}_q$, where $q=p^n$ for some prime $p$ (throughout we exclude the case when $p=2$ because it can be quite difficult). As a $\mathbb{F}_p$-vector space, $\mathbb{F}_q$ has dimension $n$. The Galois group $G(\mathbb{F}_q/\mathbb{F}_p)$ is cyclic and is generated by the Frobenius map.

The field $\mathbb{F}_q$ itself is already pretty complicated, let alone a matrix group over it. In this post we try to follow Fulton-Harris’ idea on *Representation Theory: A First Course* to classify all irreducible representations of $G=GL_2(\mathbb{F}_q)$. To be specific, we are talking about group homomorphisms $\rho:G \to GL(V)$ where $V$ is a $\mathbb{C}$-vector space.

First of all we determine the cardinality of $G=GL_2(\mathbb{F}_q)$. Along the way, we will introduce some important subgroups.

The cardinality of $G$ is determined by the class formula, consider the canonical action on $\mathbb{P}^1(\mathbb{F}_q)$.

First of all, notice that $|\mathbb{P}^1(\mathbb{F}_q)|=q+1$. There are $q^2$ elements in $\mathbb{F}_q \times \mathbb{F}_q$, excluding the zero, we have $q^2-1$ remaining. Since $(r:s)=a(r:s)$ for all $a \in \mathbb{F}_q^\ast$, we divide $q^2-1$ by $|\mathbb{F}_q^\ast|=q-1$ to obtain the cardinality of the projective space.

The action of $G$ on $\mathbb{P}^1(\mathbb{F}_q)$ is defined canonically as follows:

In particular, $B$ is the isotropy group of the set $\{(1:0)\}$, because in this case, $(ar+bs:cr+ds)=(a:0)=(1:0)$. There are $(q-1)(q-1)q$ elements of $B$.

Since $G$ clearly acts on $\mathbb{P}^1(\mathbb{F}_q)$ transitively, by the class equation, we have

In general, the cardinality of $GL_n(\mathbb{F}_q)$ is $\prod_{k=0}^{n-1}(q^n-q^k)$. One can check this document.

We next consider the diagonal subgroup

Let $\mathbb{F}’=\mathbb{F}_{q^2}$ be the extension of $\mathbb{F}_q$ of degree $2$. We can certainly identify $GL_2(\mathbb{F}_q)$ as the group of $\mathbb{F}_q$-linear invertible automorphisms of $\mathbb{F}’$. Each $h \in (\mathbb{F}’)^\ast$ induces a $\mathbb{F}_q$-linear automorphism by multiplication, hence $h$ can be embedded into $GL_2(\mathbb{F}_q)$. The question is, how. Let $K = (\mathbb{F}’)^\ast$ be this subgroup. We write down the matrix representation explicitly.

Let $\varepsilon $ be a generator of the cyclic group $\mathbb{F}_q^\ast$, then $X^2-\varepsilon$ is irreducible in $\mathbb{F}_q[X]$. We therefore have $\mathbb{F}_{q^2} \cong \mathbb{F}_q[X]/(X^2-\varepsilon)$. We see $\{1,X\}$, or more precisely, $\{1,\sqrt\varepsilon\}$, is a basis of $\mathbb{F}_{q^2}$ as a vector space over $\mathbb{F}_q$. We can then identify $(\mathbb{F}’)^\ast$ as a subgroup $K$ of $G$ where

The isomorphism is given by

To make $K$ a subgroup of $G$, each entry must be in $\mathbb{F}_q$. That’s why we write $\varepsilon$ instead of $\sqrt\varepsilon$ in the definition of $K$.

*At the end of this section one can see a table of the result.*

To matrices, conjugacy gives rise to eigenvalues and Jordan canonical form. So we immediately come up with the three following forms:

For each $x$ and $y$, $a_x$, $b_x$ and $c_{x,y}$ represents three different conjugacy classes respectively, and they do not intersect. We will study these three families of conjugacy classes and see how far we can go. Spoiler: we will miss $\frac{q(q-1)}{2}$ conjugacy classes, which will be found in the subgroup $K$.

Conjugacy classes represented by $a_x$ is the easiest one. Since scalar matrices commutes with any matrix, for any invertible matrix $A$, we have

Therefore there is only one element in the conjugacy class represented by $a_x$. Ranging through all $x \ne 0$, we obtain $q-1$ such classes.

For Jordan canonical form like $b_x$, the story is different. Ranging through all $x \ne 0$, we again obtain $q-1$ such classes. Nevertheless, to determine the cardinality of each class, it is unrealistic to work only in the scope of matrices.

Let $\mathcal{C}=(b_x)$ be a conjugacy class. Let $G$ act on $\mathcal{C}$ by conjugation. The action is transitive: for $A,B \in \mathcal{C}$, there are invertible matrices $U$ and $V$ such that $U^{-1}AU=b_x=V^{-1}BV$, and therefore $A=(VU^{-1})^{-1}B(VU^{-1})$.

To determine the cardinality of $\mathcal{C}$, we use the class formula again. Suppose $A=(a_{ij})$ fixes $b_x$, i.e. $A^{-1}b_xA=b_x$, or $b_xA=Ab_x$, then

The equation above implies that $c=0$ and $a=d$. Therefore the isotropy group of $\{b_x\}$ is

It follows that $|\mathcal{C}|=|G|/|J|=(q^2-q)(q^2-1)/(q^2-q)=q^2-1$.

Let $\mathcal{D}=(c_{x,y})$ be a conjugacy class. Ranging through all $x,y \ne 0$ with $y \ne x$, then divide it by $2$, we obtain $\frac{(q-1)(q-2)}{2}$ conjugacy classes in the same form of $\mathcal{D}$. We divide it by $2$ because $c_{x,y}$ is conjugate to $c_{y,x}$ as they share the same eigenvalues.

We determine the cardinality of $\mathcal{D}$ in the same way as $\mathcal{C}$. The isotropy group of $\{c_{x,y}\}$ is $D$ in the introduction. Therefore $|\mathcal{D}|=|G|/|D|=q^2+q$.

Now let’s count how many conjugacy classes we have obtained:

We still need to find $\frac{q^2(q-1)^2}{2}$ elements. Look at subgroups we derived in the introduction. Subgroups like $B$, $N$ and $D$ all go down into Jordan canonical form immediately, but $K$ is not the case. Consider

Then the eigenvalues of $d_{x,y}$ are $x\pm \sqrt\varepsilon y$, none of which lies in $\mathbb{F}_q$. Therefore it has nothing to do with Jordan canonical form. We will explore the remainder of conjugacy classes of $G$ here.

Ranging through all $x$ and $y \ne 0$, then divide it by $2$, we obtain $\frac{q(q-1)}{2}$ conjugacy classes. We divide it by $2$ because $d_{x,y}$ and $d_{x,-y}$ are conjugate by any

Now let $\mathcal{E}=(d_{x,y})$ be a conjugacy class. Notice the isotropy group of $d_{x,y}$ is $K$, so we can obtain the cardinality $|\mathcal{E}|=|G|/|K|=(q-1)^2q(q+1)/(q^2-1)=q^2-q$. Now our search is complete.

Representative | Number of Elements in Class | Number of Classes |
---|---|---|

$a_x=\begin{pmatrix}x & 0 \\ 0 & x\end{pmatrix}$ | $1$ | $q-1$ |

$b_x = \begin{pmatrix}x & 1 \\ 0 & x \end{pmatrix}$ | $q^2-1$ | $q-1$ |

$c_{x,y} = \begin{pmatrix}x & 0 \\ 0 & y \end{pmatrix} (x \ne y)$ | $q^2+q$ | $\frac{(q-1)(q-2)}{2}$ |

$d_{x,y}=\begin{pmatrix} x & \varepsilon{y} \\ y & x \end{pmatrix}, y \ne 0$ | $q^2-q$ | $\frac{q(q-1)}{2}$ |

*These matrices would not frequently appear in the remainder of the post because it will mess up the format.*

There are $q-1+q-1+\frac{(q-1)(q-2)+q(q-1)}{2}=q^2-1$ conjugacy classes, so we need to find $q^2-1$ irreducible representations. Of course we cannot list down all of them. We will instead classify them with certain reasonings. A character table can be found in the next section.

Some computations are omitted because if not, this section would be unreadable. However, the author of this post has checked most of them on paper. The reader should find it easy to compute by themselves. For completed computation, one refers to this note. Note however the classification is a little bit different from here. Reading this first may help you to get the hang of it.

Recall how we find irreducible representations of $\mathfrak{S}_3$: consider permutation of a basis in a vector space of dimension $3$. We do a similar thing here. Let $G$ acts on $\mathbb{P}^1(\mathbb{F}_q)$ by permutation. This induces a $q+1$ dimensional representation $W$ because $\mathbb{P}^1(\mathbb{F}_q)$ has $q+1$ elements. It contains the trivial representation $U$. Let $V$ be the complement of $U$, i.e. $W=U \oplus V$, then $V$ has dimension $q$. Now we determine the character of $V$. Since $\chi_V=\chi_W-\chi_U=\chi_W-1$, we only need to calculate $\chi_W$, i.e. to see fixed points of the permutation on each conjugacy class.

$\chi_W(a_x)=q+1$. It fixes every point.

$\chi_W(b_x)=1$. It only fixes one point: $(1:0)$.

$\chi_W(c_{x,y})=2$. It fixes two points: $(1:0)$ and $(0:1)$.

$\chi_W(d_{x,y})=0$. If $d_{x,y}$ fixes $(a:b)$, then $a^2=\varepsilon b^2$, and this cannot happen.

Therefore we have

$a_x$ | $b_x$ | $c_{x,y}$ | $d_{x,y}$ | |
---|---|---|---|---|

$\chi_V$ | $q$ | $0$ | $1$ | $-1$ |

We see, $(\chi_V,\chi_V)=1$. Therefore $V$ is irreducible and we cannot decompose $W$ further. We have to find different approaches.

The Pontryagin dual of a group $H$ is defined to be

If $H$ admits a topology, we may want to eliminate non-continuous homomorphisms but it’s not our concern here because we only care about finite groups now, which admits discrete topology. Notice that if $H$ is finite and cyclic, then $\hat{H} \cong H$. We will use this fact right now.

Since $G$ can be pretty big, it is not realistic to study all eigenvalues of representations. Instead, we consider the Pontryagin dual of $\mathbb{F}_q^\ast$, which is again a finite cyclic group. For each of the $q-1$ elements in $\hat{H}$, $\alpha:\mathbb{F}_q^\ast \to S^1$, we have a one-dimensional representation $U_\alpha$ of $G$ defined by

Note the trivial representation is one of the $U_\alpha$, once one realises that $\alpha$ defined by $\alpha(x)=1$ for all $x \in \mathbb{F}_q^\ast$ is also a homomorphism into $S^1$.

Tensoring $U_\alpha$ with $V$, we obtain another family of irreducible representations $\{V_\alpha = V \otimes U_\alpha\}$. Note $V$ is one of the $V_\alpha$. The character table of them are easily computed.

$a_x$ | $b_x$ | $c_{x,y}$ | $d_{x,y}$ | |
---|---|---|---|---|

$U_\alpha$ | $\alpha(x)^2$ | $\alpha(x)^2$ | $\alpha(x)\alpha(y)$ | $\alpha(x^2-\varepsilon y^2)$ |

$V_\alpha$ | $q\alpha(x)^2$ | $0$ | $\alpha(x)\alpha(y)$ | $-\alpha(x^2-\varepsilon y^2)$ |

We have successfully determined $2(q-1)$ irreducible representations, still $(q-1)^2$ of them to be found. We now make use of those subgroups we have determined. For each $\alpha,\beta \in \widehat{\mathbb{F}_q^\ast}$, we have a new character of a representation:

Let $W’_{\alpha,\beta}$ be the representation of $B$ with character $\gamma_{\alpha,\beta}$, and let $W_{\alpha,\beta}=\operatorname{Ind}_B^G W_{\alpha,\beta}’$. We can quite easily (no, with a lot of dirty computation) write down the character table of $W_{\alpha,\beta}$.

$a_x$ | $b_x$ | $c_x$ | $d_x$ | |
---|---|---|---|---|

$W_{\alpha,\beta}$ | $(a+1)\alpha(x)\beta(x)$ | $\alpha(x)\beta(x)$ | $\alpha(x)\beta(y)+\alpha(y)\beta(x)$ | $0$ |

If $\alpha=\beta$, then $W_{\alpha,\beta}=U_\alpha \oplus V_\beta$ so not irreducible. However, if $\alpha \ne \beta$, then $W_{\alpha,\beta} \cong W_{\beta,\alpha}$ is irreducible, if one computes $(\chi,\chi)$. The dimension of $W_{\alpha,\beta}$ is $[G:B]=q+1$, and there are $\frac{1}{2}(q-1)(q-2)$ of them. Still there are $\frac{1}{2}q(q-1)$ irreducible representations to be found.

We haven’t used this subgroup yet so we first explore it in the same vein as attempt 3. We consider the the dual of $\mathbb{F}’ \cong K$. Each $\varphi:(\mathbb{F}’)^\ast \to \mathbb{C}^\ast$ also determines a representation on $K$ with character $\varphi$. One immediately think about $\operatorname{Ind}_K^G(\varphi)$, for which we simply write $\operatorname{Ind}(\varphi)$. It is easy to compute the character table so far.

$a_x$ | $b_x$ | $c_{x,y}$ | $d_{x,y}$ | |
---|---|---|---|---|

$\operatorname{Ind}(\varphi)$ | $q(q-1)\varphi(x)$ | $0$ | $0$ | $\varphi(\zeta)+\varphi(\zeta)^q$ |

We put $\zeta=x+y\sqrt\varepsilon \in K = (\mathbb{F}’)^\ast$. Since $\operatorname{Ind}(\varphi) \cong \operatorname{Ind}(\varphi^q)$, we obtain $\frac{1}{2}q(q-1)$ representations out of this, with the restriction that $\varphi^q \ne \varphi$. Nevertheless, we have $(\chi,\chi)=q$ if $\varphi^q=\varphi$, $(\chi,\chi)=q-1$ if $\varphi^q \ne \varphi$. We still need to work on it from different direction.

Let’s try to tensor what we have found. It is easy to see that $V_\alpha \otimes U_\gamma = V_{\alpha\gamma}$ and $W_{\alpha,\beta} \otimes U_{\gamma}=W_{\alpha\gamma,\beta\gamma}$. So we cannot find anything new here. But tensoring $W_{\alpha,\beta}$ and $V_\alpha$ gives us something quite different. We see for $\alpha \ne 1$,

$a_x$ | $b_x$ | $c_{x,y}$ | $d_{x,y}$ | |
---|---|---|---|---|

$V \otimes W_{\alpha,1}$ | $q(q+1)\alpha(x)$ | $0$ | $\alpha(x)+\alpha(y)$ | $0$ |

Let $\varphi \in \widehat{(\mathbb{F}’)^\ast}$ be a homomorphism such that $\varphi|_{\mathbb{F}_{q}^\ast}=\alpha$. Computing inner products gives us

We see $W_{\alpha,1}$ is contained in the representation determined by $\operatorname{Ind}(\varphi)$. We also see $W_{\alpha,1}$ is contained in $V \otimes W_{\alpha,1}$. Besides, $\operatorname{Ind}(\varphi)$ and $V \otimes W_{\alpha,1}$ has a lot of subrepresentations in common. The first guess would be that $\operatorname{Ind}(\varphi)$ is a subrepresentation of $V \otimes W_{\alpha,1}$. So, maybe we can find something we have been missing here. For this reason, consider the virtual character

We can compute that $(\chi_\varphi,\chi_\varphi)=1$. To see this is actually a real character, we compute the character table.

$a_x$ | $b_x$ | $c_{x,y}$ | $d_{x,y}$ | |
---|---|---|---|---|

$\chi_\varphi$ | $(q-1)\alpha(x)$ | $-\alpha(x)$ | $0$ | $\varphi(\zeta)+\varphi(\zeta)^q$ |

It follows that $\chi_\varphi(1)=q-1>0$. Hence $\chi_\varphi$ is irreducible. Since each $\chi_\varphi$ is determined by $\varphi$ with $\varphi^q \ne \varphi$, and there are $\frac{1}{2}q(q-1)$ of such $\varphi$, we have actually determined all of the irreducible characters. They are denoted by $X_\varphi$.

$GL_2(\mathbb{F}_q)$ | $a_x$ | $b_x$ | $c_{x,y}$ | $d_{x,y}\leftrightarrow\zeta$ | $\dim$ |
---|---|---|---|---|---|

$U_\alpha$ | $\alpha(x^2)$ | $\alpha(x^2)$ | $\alpha(xy)$ | $\alpha(\zeta^q)$ | $1$ |

$V_\alpha$ | $q\alpha(x^2)$ | $0$ | $\alpha(xy)$ | $-\alpha(\zeta^q)$ | $q$ |

$W_{\alpha,\beta}$ | $(q+1)\alpha(x)\beta(x)$ | $\alpha(x)\beta(x)$ | $\alpha(x)\beta(y)+\alpha(y)\beta(x)$ | $0$ | $q+1$ |

$X_\varphi$ | $(q-1)\varphi(x)$ | $-\varphi(x)$ | $0$ | $-(\varphi(\zeta)+\varphi(\zeta^q))$ | $q-1$ |

A few remarks are in order. We can call these four classes of irreducible representations in the following way (excerpted from this document):

$U_\alpha$: $1$-dimensional representations. There are $q-1$ of them.

$V_\alpha=V \otimes U_\alpha$: $q$-dimensional representations. Here $V$ is also called Steinberg representation. There are $q-1$ of them.

$W_{\alpha,\beta}$: ($q+1$)-dimensional irreducible principle series. There are $\frac{1}{2}(q-1)(q-2)$ of them. Some authors may also treat $q$-dimensional representations as principle series.

$X_\varphi$: irreducible cuspidal representations or complementary series representations. There are $\frac{1}{2}q(q-1)$ of them. A representation is cuspidal if the Jacquet module is trivial.

The Segre embedding allows us to define the product of projective varieties reasonably, and we will discuss it right now. To begin with we consider the product of $\mathbb{P}^m$ an $\mathbb{P}^n$. In this section the ground field is an arbitrary algebraically closed field.

Definition 1.TheSegre embeddingis defined as follows:Clearly, $N=(m+1)(n+1)-1=mn+m+n$. The image on the right hand side has $X_iY_j$ ordered lexicographically.

First of all we make sure that this function is well-defined, otherwise our work will be useless.

Proposition 1.The Segre embedding is a well-defined injective map.

*Proof.* Assume $X_i’=\lambda X_i$ and $Y_j’=\mu Y_j$ for some $\lambda,\mu \ne 0$, then

Next suppose that $[X_iY_j]=[X_i’Y_j’]$. Without loss of generality we can assume that $X_0 \ne 0$ and $X_0’ \ne 0$ so that we can put them to $1$. Then by looking at first $n+1$ elements we can identify $[Y_0:\dots:Y_n]$ with $[Y_0’:\dots:Y_n’]$. Then using $X_1Y_i=\lambda X_1’Y_i’$ we can immediately identify $X_1$ and $X_1’$. Likewise, other components are identified. $\square$

Next we study the image further using linear algebra

We can write elements in $\mathbb{P}^N$ as a $(m+1) \times (n+1)$ matrix, which can make things easier:

Therefore the image of $\iota$ is given by $Z_{ij}=X_iY_j$. Through an elementary observation, we see the matrix

has rank $1$. The question is, is the converse true? For this reason we study the set

Note $Z_{ij}Z_{kl}-Z_{kj}Z_{il}$ is the determinant of all $2\times2$ submatrices the matrix $[Z_{ij}]$. This $Z$ contains all $[Z_{ij}]$ with rank $1$. To show the converse, we consider the standard affine cover. Let $U_{kl}=Z(Z_{kl})$ and put $V_{kl}=\mathbb{P}^N \setminus U_{kl}$. Then $\{V_{kl}\}$ is the standard affine cover of $\mathbb{P}^N$ as we all know. Then likewise we use the affine open subset $U_k \subset \mathbb{P}^m$ and $U_l’ \subset \mathbb{P}^n$, to obtain

This is indeed the inverse map of $\iota$ on $U_k \times U_l’$. Hence the converse is true.

Therefore, the image of the Segre embedding is a projective variety. As a classic example, the image of $\mathbb{P}^1 \times \mathbb{P}^1 \to \mathbb{P}^3$ is determined by the polynomial $xy-zw=0$.

In this section we offer a way to understand the Segre embedding in number fields. To begin with, we need some definition.

Height is computed by absolute values on a field so we first normalise all absolute values on $\mathbb{Q}$. Recall that two absolute values $|\cdot|_1$ and $|\cdot|_2$ are equivalent if there exists some $\lambda>0$ such that $|\cdot|_1=|\cdot|_2^\lambda$. The question is, which $\lambda$ should we pick.

The ordinary absolute value $|\cdot|_\infty$ has nothing to worry about. But for $p$-adic absolute values $|\cdot|_p$, instead of using other equivalent ones, we have a restriction that $|p|_p=\frac{1}{p}$. All these absolute values will be denoted by

Likewise we define $M_K$, where throughout $K$ will always be a number field. $M_K$ consists of the ordinary absolute value and extensions of $p$-adic ones defined as follows:

In particular, $M_K$ satisfies the product formula:

for all $x \in K^\times$. This restriction allows us to work fine with projective spaces, as we will see later.

Definition 2.The (absolute logarithmic)heightof $x\in \mathbb{P}^n_{\overline{\mathbb{Q}}}$, with coordinates $(x_1,\dots,x_m) \in K$, is defined by

Actually, the height function can show the “algebraic complication” of $x$, and is well-defined in many senses.

Proposition 2.The height $h(x)$ is independent of the choice of $K$.

*Sketch of the proof.* Let $L$ be another number field containing $x_0,\dots,x_m$, then we can assume that $K \subset L$, then $L/K$ is a finite separable extension. But in this case,

which implies that

Therefore

gives what we want. $\square$

Proposition 3.$h(x)$ is well-defined on $\mathbb{P}^n_{\overline{\mathbb{Q}}}$.

*Proof.* It remains to show that $h(x)$ is independent of the choice of coordinates. For $\lambda \ne 0$, we see

Note $\sum_{v \in M_K}\log|\lambda|_v=0$ because o the product formula. $\square$

To highlight the ability of height to measure algebraic complication, let’s mention the following theorem of Kronecker.

Theorem 1 (Kronecker).The height of $\zeta\in\overline{\mathbb{Q}}^\times$ is $0$ if and only if $\zeta$ is a root of unity.

One direction is straightforward. To prove the converse, one may need some combinatorics, symmetric functions and Dirichlet’s pigeon-hole principle. See theorem 2.4 of this note for a proof.

Now let’s invite the Segre embedding into the party:

Using the fact that $\max_{i,j}|x_iy_j|_v=\max_i|x_i|_v \cdot \max_j|y_j|_v$, we see immediately that

The Segre embedding is immediately used after introducing the height of a polynomial.

Definition 3.For $f(t_1,\dots,t_n) \in K[t_1,\dots,t_n]$, we writeThen the

heightof $f$ is defined to bewhere

Likewise, it can show the complication of $f$ in some way, but we are not interested in it at this moment. Notice that products of multivariable polynomials in **different variables** can be understood as tensor products and therefore we have the following fact

Proposition 4.Let $f(t_1,\dots,t_n)$ and $g(s_1,\dots,s_m)$ be polynomials in different sets of variables, then

Note: if $f$ and $g$ share the same variable, the story is different. Say we have $f_1,\dots,f_m \in \overline{\mathbb{Q}}[X_1,\dots,X_n]$, and put $f=f_1 \dots f_n$, then

where $d$ is the sum of the partial degrees of $f$.

]]>We want to apply calculus to fields, but tools are needed. For the ordinary calculus, on $\mathbb{R}$ or $\mathbb{C}$, the most important role is played by limit:

However we cannot immigrate absolute value into other fields directly. Indeed, if the field $k$ is an extension of $\mathbb{Q}$, then we may define an absolute value on $k$ to be the restriction of the absolute value of $\mathbb{C}$. But this is not always the case: this method does now work on fields with positive characteristic. For example, $\mathbf{F}_8$ is not a subfield of $\mathbb{Q}$ because $\mathbf{F}_2$ is not. Besides, we should not restrict ourselves to the case faithful to ordinary calculus and ignore other potentials. The most important trait of the ordinary absolute value is triangle inequality, but perhaps we can omit that and replace it with something different. Maybe there are much more different absolute values to be studied. For these reasons, we define absolute values on fields out of nowhere first.

Definition 1.Anabsolute valueon a field $K$ is a real valued function $|\cdot|:K \to \mathbb{R}_+$ such that

For all $x \in K$, we have $|x| \ge 0$ and $|x|=0$ if and only if $x=0$.

$|xy|=|x||y|$.

There exists $c>0$ such that $|x+y| \le c \max\{|x|,|y|\}$.

Before we dive into some technical details of the inequality, let’s see some trivial and non-trivial examples.

On any field $K$, we can define $|x|=1$ for all $x \ne 0$. This is the most trivial absolute value and it carries little to none information. But whether the absolute value is trivial, we always have $|1|=1$ because $|1x|=|1||x|=|x|$.

If $K=\mathbb{Q}$, we can define $|m/n|$ to be the ordinary absolute value $\sqrt{\left(\frac{m}{n}\right)^2}$. We are familiar with it for sure. It is customary to write $|\cdot|_\infty$.

However, for $K=\mathbb{Q}$, and $m/n \in K$, we can also write

where $m’$ and $n’$ are integers coprime to $p$. Under this presentation we can put

In this way we obtain an absolute value $|\cdot|_p$ that is totally different from $|\cdot|_\infty$. The “difference” will be discussed later. One can verify that $|\cdot|_p$ is indeed an absolute value and the constant $c$ in definition should be $1$. This is called the $p$-adic absolute value.

- Let $K=\mathbf{F}_q$ be a finite field, then the only absolute value on $K$ is trivial. To see this, notice that $K^\times$ is a cyclic group. Pick any $x \in K^\times$, we have $|x|^{q-1}=|x^{q-1}|=|1|=1$.

It seems we have ignored the triangle inequality for no reason, but actually we didn’t. To see this, we give a refinement of the triangle inequality first.

Proposition 1.Let $|\cdot|:K \to \mathbb{R}$ be an absolute value with $|x+y|\le c\max\{|x|,|y|\}$, then the following two statements are equivalent:

$c \le 2$.

For all $a,b \in K$, we have $|a+b|\le |a|+|b|$. This is the

triangle inequality.

*Proof.* It is obvious that $|a|+|b| \le 2\max\{|a|,|b|\}$ so we only need to show that 1 implies 2. To do this, we will use a forward-backward induction.

Assume first that $n=2^m$ for some positive integer $m$ and let $a_1,\dots,a_n$ be a sequence of elements of $K$. Then by induction we immediately have

For all positive integers satisfying $2^{m-1} < n \le 2^m$, and $a_1,\dots,a_{n} \in K$, we can always put $a_{n+1}=\cdots=a_{2^m}=0$ to obtain

Let $\tilde{n}$ be the image of $n$ in $K$, i.e. $\tilde{n}=\underbrace{1+\dots+1}_{n\text{ times} }$. If we put $a_k=1$ for all $1 \le k \le n$, we in particular have $|\tilde{n}| \le 2n$. Moreover, we also have

We therefore can write

It follows that

Since $\lim_{n \to \infty}\sqrt[n]{4(n+1)}=1$, we are done. $\square$

Triangle inequality is always desirable but it is not always the case. To see this, consider $\mathbb{C}((X))$, the field of formal Laurent series, where each element is of the form

where $a_n \ne 0$. We can define an absolute value on $\mathbb{C}((X))$ by $|f|=|a_n|$. Three conditions are easily verified but triangle inequality is not the case. For example, if $f(X)=1+2X$ and $g(X)=-1+3X$, then $|f+g|=5$ while $|f|=|g|=1$.

For this reason, we are seeking ‘replacements’ of an absolute value.

Notice that an absolute value induces a translate-invariant metric in an obvious way:

A topology comes up in the nature of things. We cannot apply theorems in functional analysis directly because we do not have a real or complex vector space. But we can try to import those important concepts. When studying open mapping theorem, we care about equivalent norms or metrics, on whether they induce the same topology. Here we will also do that.

Definition 2.Two absolute values $|\cdot|_1$ and $|\cdot|_2$ areequivalentif they induces the same topology (this is clearly an equivalence relation). An equivalence class of absolute values is called aplace.

Clearly, the topology is discrete if and only if the absolute value is trivial. Therefore a trivial absolute value is not equivalent to any non-trivial ones. But let’s see two non-trivial absolute values that are not equivalent.

On $\mathbb{Q}$, consider $|\cdot|_\infty$ and $|\cdot|_2$. The sequence $\left\{\frac{1}{n}\right\}$ converges to $0$ under the first absolute value. However

if we take odd numbers into account. On the other hand, $\{2^n\}$ does not converge under $\left|\cdot\right|_\infty$ but $\left|2^n\right|_2=2^{-n} \to 0$ as $n \to \infty$. The topology induced by $|\cdot|_\infty$ is totally different from the one induced by $|\cdot|_p$ for prime $p$.

We have an important characterisation of equivalent absolute values.

Proposition 2.Let $|\cdot|_1$ and $|\cdot|_2$ be two non-trivial absolute values, then the following statements are equivalent.

$|\cdot|_1$ and $|\cdot|_2$ are equivalent.

$|x|_1<1$ implies that $|x|_2<1$.

There exists $\lambda>0$ such that $|\cdot|_1=|\cdot|_2^\lambda$.

*Proof.* Assume that $|\cdot|_1$ and $|\cdot|_2$ are equivalent. If $|x|_1<1$, then $\lim_{n \to \infty}x^n=0$. Therefore $|x|_2<1$ or otherwise $|x|_2^n$ would not convergent to $0$. Likewise $|x|_2<1 \implies |x|_1<1$.

Assume that $|x|_1<1$ always implies that $|x|_2<1$. It follows that $|x|_1>1$ implies that $|x|_2>1$ because $|x^{-1}|_1<1$. Since $|\cdot|_1$ is not trivial, there exists $x_0 \in K$ such that $|x_0|_1>1$. Put $a=|x_0|_1$ and $b=|x_0|_2$ and let $\lambda=\log(b)/\log(a)=\log_a{b}$. Pick $x \in K$ such that $|x|_1 \ge 1$. Then $|x|_1=|x_0|_1^\alpha$ for some $\alpha \ge 0$. We show that $|x|_2=|x_0|_2^\alpha$ by approximating $\alpha$ with rational numbers. If $m,n$ are positive integers such that $m/n>\alpha$, then $|x|_1<|x_0|_1^{m/n}$ and therefore $|x^n/x_0^m|_1<1$. It follows that $|x^n/x_0^m|_2<1$, i.e. $|x|_2<|x_0|^{m/n}_2$. If $m/n<\alpha$, we can similarly get $|x|_2>|x_0|_2^{m/n}$. Therefore $|x|_2=|x_0|_2^\alpha$. Therefore

3 implying 1 is immediate because $f(x)=x^\lambda$ do not change the limit. $\square$

If $|\cdot|_1=|\cdot|_2^\lambda$, $|x+y|_1\le c_1\max\{|x|_1,|y|_1\}$ and $|x+y|_2 \le c_2\max\{|x|_2,|y|_2\}$, then $c_1$ can be replaced by $c_2^\lambda$. If $c_2 >2$, then we can pick $\lambda$ small enough such that $c_2^\lambda \le 2$. Therefore

Proposition 3.Each absolute value is equivalent to one that satisfies the triangle inequality.

Bearing this in mind, we can study the case when $c=1$ in the definition of absolute values.

Proposition 4.Let $|\cdot|$ be an absolute value on $K$. Then the following statements are equivalent:

$|\cdot|$ satisfies the ultrametric inequality: $|x+y|\le\max\{|x|,|y|\}$.

$|\tilde{n}|\le 1$ for all $n \in \mathbb{N}$.

*Proof.* Suppose that $|x+y| \le \max\{|x|,|y|\}$. Then $|\tilde{1}|=|1|=1$ and $|\tilde{2}|=|1+1|=\max\{|1|,|1|\}=1$. Assume that $|\tilde{n}|\le 1$, then

Conversely, suppose that $|\tilde{n}| \le 1$ for all $n$. Replace the absolute value with one satisfying triangle inequality if necessary. It follows that

Therefore $|a+b| \le \sqrt[n]{n+1}\max\{|a|,|b|\}$. The result follows from the fact that $\sqrt[n]{n+1} \to 1$ as $n \to \infty$. $\square$

Definition 3.An absolute value is callednon-Archimedean, orultrametric, if the condition in proposition 4 is satisfied. Otherwise it is calledArchimedeanorordinary.

For example, trivial absolute values are ultrametric but we are not interested in that. What is interesting is that $p$-adic absolute values are non-Archimedean.

There is a second classification - Ostrowski’s theorem, which states that all nontrivial places on $\mathbb{Q}$ can be represented by $|\cdot|_\infty$ or $|\cdot|_p$ for some prime $p$. For other fields, we have quite some interesting analogues. But we do not have enough space to include these proofs. The reader can check

Theorem 4.2 of this note for the ordinary theorem of Ostrowski on $\mathbb{Q}$.

This expository paper for the theorem of Ostrowski on number fields.

This expository paper.pdf) for the theorem of Ostrowski on function fields.

When we have a field extension $L/K$, we certainly want to know how an absolute value on $L$ will be restricted to $K$, or conversely, how an absolute value can be extended to $L$. For an absolute value itself, we can also perform an action of completion just like we did to elementary calculus: $\overline{\mathbb{Q}}=\mathbb{R}$.

Definition 4.A field $K$ iscompletewith respect to $|\cdot|$ if $K$ is a complete metric space with respect to the metric $d(x,y)=|x-y|$.

To employ the similar device, we will define completion in a similar style. Let $\mathscr{P}_F$ be the set of all places of a field $F$. Each place $v$ on $L$ induces place $u=v|_F$ on $K$. We therefore have a map induced by restriction:

from the places of $L$ to places of $K$.

Definition 5.Let $L/K$ be a field extension and $u \in \mathscr{P}_K$. If $v \in r^{-1}(u)$, we write $v|w$ and say $v$divides$w$ or $v$lies over$u$.

Definition 6.A completion of $K$ with respect to a place $v$ is an extension field $K_v$ with a place $w$ such that

$w|v$.

The topology of $K_v$ induced by $w$ is complete.

$K$ is a dense subset of $K_v$.

The extension exists and is unique up to isomorphism (to see this, modify the proof on the completion of $\mathbb{Q}$). The classic example is of course $K=\mathbb{Q}$ and $v$ being represented by $|\cdot|_\infty$, and $K_v$ would therefore be $\mathbb{R}$ with the ordinary absolute value. With the help of Gelfand-Marzur, one can show that the only Archimedean complete fields are $\mathbb{Q}$ and $\mathbb{C}$, which are completions of $\mathbb{Q}$ and $\mathbb{Q}(i)$.

For $|\cdot|_p$ on $\mathbb{Q}$, we have the completion $\mathbb{Q}_p$ ($p$-adic numbers). The compact subset $\mathbb{Z}_p$ ($p$-adic integers) is the closure of $\mathbb{Z}$ in $\mathbb{Q}_p$. If $p,q$ are two distinct primes, then $\mathbb{Q}_p$ is not isomorphic to $\mathbb{Q}_q$ because they are completed using two different places.

As an striking example, in $\mathbb{Q}_2$, we have

because

There is nothing skippy or misunderstanding as that Numberphile video on the “identity” $1+2+\dots=-\frac{1}{12}$.

To conclude this post and prepare for future posts, we show that absolute values works fine with norms over a vector space (do not confuse with norms in Galois theory).

Definition 7.Let $K$ be a field with absolute value $|\cdot|$ and $E$ be a vector space over $K$. A norm $E \to \mathbb{R}$ compatible with $|\cdot|$ is a function $|\cdot|$ that satisfies

$|\xi|\ge 0$ for all $\xi \in E$, and $|\xi|=0$ if and only if $\xi=0$.

For all $x \in K$ and $\xi \in E$, $|x\xi|=|x||\xi|$.

$|\xi_1+\xi_2| \le |\xi_1|+|\xi_2|$ for all $\xi_1,\xi_2 \in E$.

Two norms $|\cdot|_1$ and $|\cdot|_2$ are **equivalent** if there exist numbers $C_1,C_2>0$ such that for all $\xi \in E$ we have

This is an equivalence relation and we have already seen this in elementary linear algebra and functional analysis. This is equivalent to the fact that $|\cdot|_1$ and $|\cdot|_2$ induce the same topology. When the dimension of $E$ is infinite, things are troublesome, as we may need things like open mapping theorem. For finite dimensional spaces, we can pick a basis $\xi_1,\xi_2,\dots,\xi_n \in E$ so that every $\xi \in E$ can be written in the form

We can define norms like $|\xi|_1=|x_1|+\dots+|x_n|$, $|\xi|_2=\sqrt{|x_1|^2+\dots+|x_n|^2}$ and $|\xi|_\infty=\max\{|x_1|,\dots,|x_n|\}$. In elementary linear algebra, we know that they are equivalent. Now things are the same over a field.

Proposition 5.Let $K$ be a complete field under a non-trivial absolute value $|\cdot|$, and let $E$ be a finite-dimensional space over $K$. Then any two norms on $E$ that are compatible with $|\cdot|$ are equivalent.

*Proof.* It suffices to show that $E \cong K \times \cdots \times K$ in topology under a norm that is compatible with the absolute value. Put $n=\dim E$. If $n=1$ things are trivial. Therefore we assume that $n \ge 2$. We need to show that given a basis $\xi_1,\xi_2,\dots,\xi_n$,

is a Cauchy sequence (with respect to a norm) in $E$ only if each one of the $n$ sequences $x_i^{(\nu)}$ is a Cauchy sequence in $K$. It suffices to assume that $\xi^{(\nu)}$ converges to $0$ as $\nu \to \infty$ (as we can replace $\xi^{(\nu)}$ with $\xi^{(\nu)}-\xi^{(\mu)}$ for $\nu,\mu \to \infty$ if necessary). Then we must show that each $x_i^{(\nu)}$ converges to $0$ as well.

Suppose this is false for $x_1^{(\nu)}$. Then there exists a number $a>0$ such that $|x_1^{(\nu)}|>a$ when $\nu$ is sufficiently large. Therefore for a subsequence of $(\nu)$, $\xi^{(\nu)}/x_1^{(\nu)}$ converges to $0$, and we can write

Taking the limit, we see $\xi_1$ is a linear combination of $\xi_2,\dots,\xi_n$ and this is absurd. $\square$

We will need this proposition to work with finite field extensions.

Erico Bombieri and Walter Gubler,

*Heights in Diophantine Geometry*.Serge Lang,

*Algebra Revisited Third Edition*.Dinakar Ramakrishman and Robert J. Valenza,

*Fourier Analysis on Number Fields*.

In our previous post on the irreducible representations of $SU(2)$ and $SO(3)$, the irreducible representations of $SU(2)$ has been determined explicitly: $V_n=\operatorname{Sym}^n\mathbb{C}^2$, and irreducible representations $W_n$ of $SO(3)$ correspond to $V_{2n}$.

The result is satisfying for $SU(2)$ but not for $SO(3)$. We hope it has something to do with $\mathbb{R}^3$ but $V_{2n}$ has not. In this post, we are delivering a much clearer characterisation of $W_n$.

This post would be relatively easier to read than the previous post. Other than the basic language of representation theory (of Lie groups), only multivariable calculus and linear algebra are needed.

The group $SO(3)$ has a rich background in physics. See, for example, “Why do we look at the representations of $SO(3)$ in QM?“ on Physics Stack Exchange.

Like in the previous post, we first determine a good playground and then show that this is all we need. The playground here is

The reason for the symmetric product of $\mathbb{R}^3$ is simple: we will be working on homogeneous polynomials (which will be later reduced to the unit sphere). We complexify this space to make sure that we will not worry about eigenvalues (of $SO(3)$). In other words, $P_\ell$ is the complex vector space of homogeneous polynomials in three variables of degree $\ell$, viewed as functions on $\mathbb{R}^3$.

Recall that

Therefore $\dim P_\ell=\frac{(\ell+2)(\ell+1)}{2}$, as a $\mathbb{C}$-vector space. We will extract what we want from spaces of this form.

The action of $SO(3)$ or $GL(3,\mathbb{R})$ in general on $P_\ell$ is defined in a similar way. For any $A \in GL(3,\mathbb{R})$ and $f \in P_\ell$, we define

Here, $x=(x_1,x_2,x_3) \in \mathbb{R} \times \mathbb{R} \times \mathbb{R}$, and $xA$ is a product of $x$ and $A$ in the sense of matrix multiplication. It is easy to verify that this indeed gives rise to a group representation.

To study this representation, we need to find some morphisms $P_\ell \to P_\ell$. The most obvious choice is the Laplacian, which is given by

In other words, $\Delta$ is the trace of the Hessian matrix of $f$. Trace is used in representation theory to define character so there is a chance to find its good connection to the representation of $SO(3)$.

We shall also not forget the **kernel** of the Laplacian, which is called **harmonic polynomials** of degree $\ell$ in this context:

Since functions in $P_\ell$ are homogeneous, the value of $f$ at a point $x$ is determined by the value at $\frac{x}{|x|} \in S^2$, the unit sphere. Therefore we also call $\mathfrak{H}_\ell$ the **spherical harmonics** of degree $\ell$. And we certainly need to know the nullity of $\Delta$.

Lemma 1.The dimension of $\mathfrak{H}_\ell$ is $2\ell+1$.

*Proof.* To begin with, we need to write a more explicit expression of the Laplacian. First of all we perform a Taylor expansion of $f \in P_\ell$ with respect to the first variable $x_1$:

Here, $f_k(x_2,x_3)$ is homogeneous of degree $\ell-k$ in $x_2,x_3$. Therefore we only need to study one term of the right hand side.

Now we can put them together naturally:

Let’s try to explore the last term a little bit more. If $k=\ell-1$ or $\ell$, then $f_k$ is of order $0$ and $1$ and consequently the second order derivative is $0$. Therefore we write

Therefore $\Delta{f}=0$ if and only if

Therefore, once $f_0$ and $f_1$ is determined, all of $f_k$ are determined and so is $f$ itself. Therefore

where $P_k^2$ is the space of homogeneous polynomials with two variables, hence is isomorphic to $\mathbb{C} \otimes_\mathbb{R}\operatorname{Sym}^k \mathbb{R}^2$, and we therefore have

Hence

$\square$

Recall that $\dim W_n=2n+1$. This should not be a coincidence, and we shall dive into it right now. To do this we immediately establish the connection between $\Delta$ and $SO(3)$.

Lemma 2.The action of the Laplacian on $C^\infty(\mathbb{R}^3;\mathbb{C})$ (which contains $P_\ell$ for all $\ell \ge 0$) commutes with the action of $SO(3)$, i.e. $\Delta$ is $SO(3)$-equivariant.

*Proof.* Really routine verification. $\square$

As a result, we have an very important result:

Theorem 1.The space $\mathfrak{H}_\ell$ is an $SO(3)$-invariant subspace of $P_\ell$.

We start with a direct observation of matrices in $SO(3)$, that ‘downgrades’ the rotation to a plane:

Lemma 2.Every element in $SO(3)$ is conjugate to $R(t)$, where

*Proof.* Pick any $A \in SO(3)$. First of all we show that $A$ has an eigenvalue $1$. Note

we therefore have $\det(I-A)=0$. Hence we can pick $v_1 \in \ker(I-A) \setminus \{0\}$ with norm $1$. Pick $v_2 \in \mathbb{R}^3$ pedicular to $v_1$ with norm $1$ and $v_3=v_1 \times v_2$. Then $\{v_1,v_2,v_3\}$ is an orthonormal basis and $V=(v_1,v_2,v_3)$ is in $SO(3)$. The matrix $A$ is therefore conjugate to

In particular, $R \in SO(3)$ also implies

Solving this equation system we must have $a=d$, $b=-c$ so that we can assign $a=\cos{t}$ and $c=\sin{t}$, and the result follows. $\square$

Since characters are invariant under conjugation, the study of the character of $SO(3)$ is reduced to $T$, the subgroup generated by matrices of the form $R(t)$. But direct computation is a nightmare so we try our best to do it more elegantly. For this reason, we return to the irreducible representations of $SU(2)$ (there are only two variables, anyway). The canonical map $SU(2) \to SO(3)$ has a specific value:

One can refer to this document for the map above. Our study of characters is now reduced to $SU(2)$, because $\chi_{W_n}(R(t))=\chi_{V_{2n}}(e(t/2))$, using the facts that character is invariant under isomorphism and that $V_{2n} \cong W_n$. We can compute that

Now we are ready for the irreducible representations of $SO(3)$.

Since we basically have $\dim \mathfrak{H}_\ell=\dim W_\ell$, it is natural to believe that $\mathfrak{H}_\ell \cong W_\ell$, in the sense of $SO(3)$-modules, and the following theorem answers this question affirmatively.

Theorem 2.The space $\mathfrak{H}_\ell$ is isomorphic to $W_\ell$. In other words, irreducible $SO(3)$-modules are determined by spherical harmonics whose Laplacians are $0$.

*Proof.* We will use the fact every compact Lie group is completely reducible. (First of all, $SO(3)$ is compact as it is a closed subgroup of $GL(3)$; see p. 3 of this document. On the other hand, the fact that every compact Lie group is completely reducible can be found in section 3 of this document).

Therefore we have

where each $W_{n_\nu}$ is an irreducible representation of $SO(3)$. Applying dimension on both sides yields

To prove that $\mathfrak{H}_\ell=W_{\ell}$, it suffices to show that $n_\nu \ge \ell$ for some $n_\nu$. On the other hand, applying characters on both sides yields

In other words, the character is a linear combination of $\exp(ikt)$ with $|k| \le \max_\nu n_\nu$ for all $k$. Each $\exp(ikt)$ appeared above is an eigenvalue of the action of $R(t)$. We have no idea about the distribution of $k$ and we don’t have to because our job is done if we can show that the action of $R(t)$ has an eigenvalue $\exp(-i\ell t)$.

To do this, we can consider vector $f(x_1,x_2,x_3)=(x_2+ix_3)^\ell \in \mathfrak{H}_\ell$. This is because for this vector we have

$\square$

There are no even-dimensional irreducible representations of $SO(3)$. This is what every reader has to take away from this post.

We find the eigenvalue because it shows that $\exp(-i\ell t)$ appears in the summand of $\chi_{\mathfrak{H}_\ell}(R(t))$, hence $|-\ell|=\ell \le \max n_\nu$. Since $\{n_\nu\}$ is finite, the maximum can be attained, and therefore our argument on dimension is done.

The representation of $U(2)$ can be deduced algebraically, for one only need to notice that $U(2) = (S^1 \times SU(2))/H$, where $H=\{(1,I),(-1,-I)\}$. One will also need an odd-even argument just like we did to $SO(3)$.

Likewise, since $O(3)=SO(3) \times \mathbb{Z}/2\mathbb{Z}$, we can deduce irreducible representations of $O(3)$ in a similar fashion.

Representation theory is important in various branches of mathematics and physics. When studying representation of finite groups, we have quite some algebra and combinatorics. When differentiation (more precisely, smoothness) joins the party, we have Lie group, involving calculus, linear algebra, geometry and much more. Especially, theories around $SU(2)$ and $SO(3)$ are of great importance. On one hand, they are those simplest non-elementary and high-dimensional Lie groups. On the other hand, they describes rotations in $\mathbb{C}^2$ and $\mathbb{R}^3$ respectively, which is “physically realistic”. I believe students in physics have more to say.

In this post we develop a way to study irreducible representations of these two Lie groups, in a mathematician’s way. I try my best to make sure that everything is down-to-earth, and everything can be “reduced” to 19th (pre-modern) mathematics.

Nevertheless, the reader has to be assumed to be familiar with elementary languages of representation theory (and you know that, there are a lot of abuse of language), which I think is not a problem because otherwise you wouldn’t be reading this post. You need to recall eigenvalue theories in linear algebra, as well as Fourier series. We need the fact that the trigonometric system is complete. In other words trigonometric polynomials are dense in the space of continuous functions. $\def\sym{\operatorname{Sym}}$

We will first study $SU(2)$ and a first classification of irreducible representations of $SO(3)$ follows at once. This is because we have an isomorphism

This is to say, $SU(2)$ is a “double cover” of $SO(3)$. To see this, notice that $SU(2) \cong S^3$ and $SO(3) \cong \mathbb{R}P^3$ as Lie groups, meanwhile $\mathbb{R}P^3 \cong S^3/\{-1,1\}$ can be considered as the definition.

Of course, by representation we mean finite dimensional and unitary representations.

Indeed it seems we have nowhere to start. Instead of trying to find all of them, we will try to work on seemingly immediate representations and it turns out that they are all we are looking for.

Let $V_0$ be the trivial representation on $\mathbb{C}$ and $V_1$ be the standard representation on $\mathbb{C}^2$, which is given by ordinary matrix multiplication. These representations are irreducible. We want to extend this family to $V_n$ for $n \ge 2$. It is natural to think about generate representations of higher dimensions through $V_1$. Here are several ways available.

Direct sum: $\bigoplus_{i=1}^{n}V_1$. The dimension is $2n$ and unfortunately, the representation is determined by each component so essentially there is no “new thing”.

Tensor product: $\bigotimes_{i=1}^{n}V_1$. The dimension is $2^n$ which is way too big.

Wedge product: $\bigwedge^{n}V_1$. It stops at $n=2$ and we have to deal with $u \wedge v = - v \wedge u$. This can be annoying.

Symmetric product: $\sym^{n}V_1$. The dimension is $n+1$ and it doesn’t stop. Besides, it can be understood as homogeneous polynomials of degree $n$ in two variables. This is a fantastic choice. Besides we have $\sym^0 V_1=V_0$ so nothing is abruptly excluded.

Put $V_n=\sym^nV_1$, which can be understood as the space of homogeneous polynomials of degree $n$ in variables $z_1$ and $z_2$. $V_n$ therefore has a canonical basis

And we will make use of it later.

For each $g \in SU(2)$, we have a left action

In other words, $\rho(g)P(z)=P(zg)$ where $z=(z_1,z_2)$ and $zg$ is matrix multiplication. Each $g \in SU(2)$ has matrix representation

Then

When there is no confusion, we will write $gP(z)=P(zg)$, viewing $g$ itself as an automorphism of $V_n$. One can also replace $SU(2)$ with $GL(2,\mathbb{C})$ but we are not studying that bigger one.

Since $z \mapsto zg$ is a homogeneous map of degree $1$ as it is linear and is non-degenerate, we have $gP(z) \in V_n$. In other words, $V_n$ are $SU(2)$-invariant. **We now have a well-defined representation.** Note $V_0=\mathbb{C}$ so the representation is trivial, and $V_1=\mathbb{C}^2$ yields linear maps. Again, nothing is abruptly excluded. Even more satisfyingly, those $V_n$ are all irreducible.

Proposition 1.The representations $V_n$ are irreducible.

*Proof.* By Schur’s lemma, we need to show that each $SU(2)$-equivariant automorphism $A$ of $V_n$ is a non-zero multiple of the identity, i.e. $A=\lambda I$ for some $\lambda \ne 0$. By definition, for each $g \in SU(2)$, we have $A\rho(g)P=\rho(g)AP$ for all $P \in V_n$. And for simplicity we write $Ag=gA$, realising $g$ as a linear transform of $V_n$, instead of an element of $SU(2)$.

The group $SU(2)$ can be complicated, but $U(1) \cong S^1$ is simple and can be considered as a subgroup of $SU(2)$ in two ways. We show that these two ways are just enough to expose the irreducibility of $V_n$.

First of all we embed $S^1$ into $SU(2)$ by

Call the matrix right hand side $g_a$. Then

for all $k$. This is to say, $P_k$ is the **eigenvector** corresponding to **eigenvalue** $a^{2k-n}$. As $g_aA=Ag_a$, information on eigenvalues and eigenvectors can help a lot so we dig into it first.

Since $\{P_k\}$ are linearly independent, under this basis, we have a matrix representation

but we don’t know how eigenspaces are spanned because we may have $a^j=a^k$ for $j \ne k$. However, the number $a$ can always be chosen that $a^{-n},a^{-n+2},\dots,a^n$ are pairwise distinct (for example, one can pick $a$ to be a primitive $m$-th root of $1$ and $m$ is big enough). As a result, $g_a$ has $n$ distinct eigenvalues. Therefore, the $a^{2k-n}$-eigenspace can only be generated by $P_k$.

On the other hand, by definition of $A$, we have

Hence $AP_k$ lies in $a^{2k-n}$-eigenspace. Therefore we have $AP_k=c_kP_k$ for some $c_k \ne 0$. In other words, $P_k$ is the $c_k$-eigenvector of $A$. We obtain another matrix representation under the basis $\{P_k\}$

We want this matrix to be a scalar matrix. The result follows from another embedding of $U(1)$ into $SU(2)$. Note $a \in S^1$ can be determined by $t \in [0,2\pi)$, and we therefore have a matrix

Still we have $Ag_t=g_tA$. As we can see,

This follows from our observation on eigenvalues. Next, we immediately use the eigenvalue $c_n$ to obtain

This is the definition of $g_tP_n$. Comparing coefficients of $P_k$, we must have $c_k=c_n$ for all $0 \le k \le n$. Recall that $\{P_k\}$ is a basis so coefficients must be unique for a given vector. But we have already obtained what we want: $A=c_n I$. $\square$

So far we have used diagonalisation of representations of $SU(2)$ but the diagonalisation of $SU(2)$ itself is not touched yet. Neither have we made use of character functions. So now we invite them to the party.

Let’s recall diagonalisation in $SU(2)$. Pick $g \in SU(2)$. First of all it is diagonalisable. Let $\lambda_1$ and $\lambda_2$ be their two eigenvalues, then $|g|=\lambda_1\lambda_2=1$. Therefore we have

where $\lambda$ is one of the eigenvalues of $g$. Since the diagonalised matrix is still in $SU(2)$, we have $|\lambda|=1$, i.e., $\lambda \in S^1$. We therefore write $g \sim e(t) \sim e(-t)$ where

We see, $e(s) \sim e(t)$ if and only if $s = \pm t \mod 2\pi$. By periodicity of $\exp$ function, we also see $e(t)$ is in particular $2\pi$-periodic. If $f:SU(2) \to \mathbb{C}$ is a class function, then $f \circ e:\mathbb{R} \to \mathbb{C}$ is an even $2\pi$-periodic function. Conversely, given an even $2\pi$-periodic function $h:\mathbb{R} \to \mathbb{C}$, we can recover it as a class function, and the process is as follows.

Define $\Lambda:SU(2) \to S^1$ sending $g \in SU(2)$ to the eigenvalue of $g$ with non-negative imaginary part (one can also pick non-positive one, because $h$ is even). Then $E:SU(2) \to [0,\pi]$ given by $g \mapsto \frac{1}{i}\log\Lambda(g)$ is a well defined function sending $g$ into $\mathbb{R}$ and $h \circ E:SU(2) \to \mathbb{C}$ is a class function. Besides we have $E \circ e(t)= \pm t \mod 2\pi$ and $e \circ E(g)$ is the diagonalisation of $g$. Therefore $h \circ E \circ e(t)=h(t)$ and $f \circ e \circ E(g)=f$ as is expected.

With help of this $e(t)$ and $E(t)$, we have this correspondence

Recall that the space on the right hand side has a countable uniform basis

In other words, $\{\cos{nt}\}_{n \ge 0}$ spans a dense subspace. This is about the completeness of trigonometric system. Since there are only even functions, $\sin{nt}$ are excluded. For a reference to the completeness, one can check 4.25 *Real and Complex Analysis* by W. Rudin.

For class functions, we certainly want to know about characters. Let $\chi_n$ be the character of $V_n$, then

When $t \in \pi\mathbb{Z}$, then $\chi_n(e(t)) \in \mathbb{Z}$. Otherwise, as a classic exercise in calculus, we have

We have $\kappa_0(t)=1$. For $\kappa_n(t)$ when $n >0$, we have

We see $\kappa_1(t)=2\cos{t}$. By induction, every $\kappa_n(t)$ is a polynomial in variables $1,\cos{t},\dots,\cos{nt}$. Therefore $\{\kappa_n(t)\}_{n \ge 0}$ spans the same space as $\{\cos{nt}\}_{n \ge 0}$, which is dense in the space of even $2\pi$-periodic functions. Note the $\kappa_n(t)$ are linearly independent, because the leading term is $\cos{nt}$.

The argument above shows that $\chi_n$ spans a dense subspace in the space of class functions. In other word, $\chi_n$ is the Fourier basis of class functions. As we all know, Fourier series is powerful. Let’s see how powerful it is in the calculus of Lie group $SU(2)$ itself.

Proposition 2.For continuous class function $f:SU(2) \to \mathbb{C}$, we have

*Proof.* On one hand, since the $V_n$ are irreducible, by fixed point theorem of representations,

Here, for a group $G$ and a representation $V$, $V^G$ is the fixed point set, i.e. the space of elements that are fixed by the action of $G$ on $V$. Since $\chi_n$ is irreducible, fixed points can only be $0$ unless the representation itself is trivial. Now we move on and check the right hand side.

On the right hand side we are looking for even $2\pi$-periodic continuous functions, reflecting the denseness of $\kappa_n(t)$. However we have $\int_{-\pi}^{\pi}\kappa_1(t)dt=\pi$ so it does not vanish on $n>0$. However, if we multiply it by $\sin^2{t}$, then it is transformed into the form $\sin{mt}\sin{nt}$ and we are familiar with this orthonormality. More precisely,

Since the functional $h \mapsto \frac{1}{2\pi}\int_{-\pi}^{\pi}h\sin^2{t}dt$ is continuous in the uniform topology and $\kappa_n$ spans a dense subspace, the result is now obtained. $\square$

Finally, surprisingly and satisfyingly enough, the denseness have actually axed out all other possibilities of irreducible representation. In other words, our search in symmetric products is optimal. We can see this through Parseval’s identity. This is the heart of this blog post.

Proposition 3.Every irreducible representation of $SU(2)$ is isomorphic to one of the $V_n$.

*Proof.* Suppose we have a character that is different from all of the $\chi_n$. Then the orthonormality shows that $\langle \chi,\chi_n \rangle = 0$ for all $n \ge 0$ and $\langle \chi,\chi \rangle=1$. Now let’s see why this is absurd.

Since $\{\chi_n\}_{n \ge 0}$ spans a dense subspace in the space of class functions, we actually have

Therefore

and

It is impossible to have the sum of $0$ to be $1$. $\square$

Now we head to $SO(3)$. In fact the result follows immediately from the surjection

We have $\ker\pi=\{-I,I\}$. Let $W$ be a representation of $SO(3)$, i.e., we have a map

Then

by $g \mapsto \rho(\pi(g))$ is an induced representation, and we write $\pi^\ast W$. If $W$ is irreducible, then $\pi^\ast W$ is also irreducible. In particular, $\pi^\ast\rho(-I)=\operatorname{id}_W$.

On the other hand, if $\vartheta:SU(2) \to GL(V)$ is an irreducible representation where $\vartheta(-I)=\operatorname{id}_V$, then we have an associated representation

given by $g\ker\pi \mapsto \vartheta(g)$. Let’s denote it by $\pi_\ast V$. Again, if $V$ is irreducible, then $\pi_\ast V$ is irreducible.

Therefore we have realised a correspondence

So it remains to determine those of $SU(2)$. Let $\rho_n:SU(2) \to GL(V_n)$ be an irreducible representation, then

because $P \in \mathbb{C}[z_1,z_2]$ is homogeneous of degree $n$. Therefore $-I$ acts as an identity if and only if $n$ is even. We obtain

Proposition 4.Every irreducible representation of $SO(3)$ is of the formwhere $V_{2n}$ is described in proposition 2.

This is, of course, just a first classification. But to introduce a classification as explicit as what we have done for $SU(2)$, there has to be another post. As a quick overview, here is the result.

Let $P_{\ell}$ be the complex vector space of homogeneous polynomials in three variables of degree $\ell$, which can be considered as functions on $\mathbb{R}^3$ immediately. This setting makes sense immediately, just as what we have done for $SU(2)$. Then, in fact,

This is to say, $W_\ell$ can be understood as harmonic homogeneous polynomials in $\mathbb{R}^3$, which can also be considered to be uniquely determined on the unit sphere $S^2$.

- Tendor Bröker and Tammo tom Dieck,
*Representations of Compact Lie Groups*. - Walter Rudin,
*Real and Complex Analysis, 3rd Edition*.

We want to compute the Fourier transform

As one can expect, the computation can be quite interesting, as $f_c(x)$ is related to the Gaussian integral in the following way:

Now we dive into this integral and see what we can get.

Let’s admit, trying to compute the integral straightforward is somewhat unrealistic. So we need to go through an alternative way. For convenience (of writing MathJax codes) we may write $\varphi(t)=\hat{f}_c(t)$.

First of all, $\hat{f}_c(t)$ is always well-defined, this is because

so we can compute it without worrying about anything.

It’s hard to think about but we do have it. An integration by parts gives

On the other hand, we have

(The well-definedness of the integral can be verified easily.) Combining both, we obtain an differential equation

This differential equation corresponds to an integral equation

And we solve it to obtain

or alternatively,

Now put the initial value back in. As we have shown above, this subjects to the Gaussian integral

Therefore

is exactly what we want.

Before showing another method, we first have an question: can we have $\hat{f}_c=f_c$? Solving an equation with variable in $c$ answers this question affirmatively:

In other words, $f_\frac{1}{2}$ is a fixed point of the Fourier transform. For this class of functions, the fixed point is this and only this one.

We can also make use of the Gaussian integral to get what we want.

As a classic property of the Fourier transform, for $f,g \in L^1$, we have

where

By the way, $f \in L^1$ means $\int_{-\infty}^{\infty}|f(x)|dx<\infty$. One can verify that $f \ast g \in L^1$ here as well.

With this result, we can compute $f_a \ast f_b$ easily. Note

We expect that there exist some $\gamma$ and $c$ such that $f_a \ast f_b = \gamma f_c$. In other words, we are looking for $\gamma,c \in \mathbb{R}$ such that

We should have

We also have

Therefore

where $c$ is given above. We do not even have to compute the integral of convolution explicitly.

]]>Is intended to supply a detailed proofs of the Riemann mapping theorem.

Riemann mapping theorem.Every simply connected region $\Omega \subsetneq \mathbb{C}$ is conformally equivalent to the open unit disc $U$.

Fortunately the proof can be found in many textbooks of complex analysis, but the proof is fairly technical so it can be painful to read. This post can be considered as a painkiller. In this post you will see the proof being filled with many details. However, the writer still encourage the reader to reproduce the proof by their own pen and paper. The writer also hopes that this post can increase the accessibility of this theorem and the proof.

However, there is a bar. We need to assume some background in complex analysis, although they are very basic already. Minimal prerequisite is being able to answer the following questions.

Contour integration, Cauchy’s formula.

Almost uniform convergence. Let $\Omega \subset \mathbb{C}$ be open and suppose that $f_j \in H(\Omega)$ for all $j=1,2,\dots$, and $f_j \to f$ uniformly on every compact subset $K \subset \Omega$. Does $f \in H(\Omega)$? What is the uniform limit of $f’_j$? Informally, we call the phenomenon that a sequence of functions uniformly converges on every compact subset

*almost uniform*convergence. This has nothing to do with*almost everywhere*in integration theory. In fact, this post does not require background in Lebesgue integration theory.Open mapping theorem (complex analysis version).

Maximum modulus principle and some variants.

Rouché’s theorem. Or even more, the calculus of residues.

Despite of the prerequisites, we still need some preparation beforehand.

Definition 1.Let $X$ be aconnectedtopological space. We say $X$ issimply connectedif every curve is null-homotopic. Let $\gamma:[0,1] \to X$ be a closed curve, i.e., it is a continuous map such that $\gamma(0)=\gamma(1)$. We say $\gamma$ is null-homotopic if it is homotopic to a constant map $\gamma_0:[0,1] \to \{x\}$ with $x \in X$.

Intuitively, if $X$ is simply connected, then $X$ contains no “hole”. For example, the unit disc $U$ is simply connected. However, $U \setminus \{0\}$ is not. On the other hand, $U \setminus [0,1)$ is still simply connected. Another satisfying result is that every convex and connected open set is simply connected. This is up to a convex combination.

There are a lot of good properties of simply connected region, which will be summarised below.

Proposition 1.For a region (open and connected subset of $\mathbb{R}^2$), the following conditions are equivalent. Each one can imply other eight.

- $\Omega$ is homeomorphic to the open unit disc $U$.
- $\Omega$ is simply connected.
- $\operatorname{Ind}_\gamma(\alpha)=0$ for every path $\gamma$ in $\Omega$ and $\alpha \in S^2 \setminus \Omega$, where $S^2$ is the Riemann sphere.
- $S^2 \setminus \Omega$ is connected.
- Every $f \in H(\Omega)$ can be approximated by polynomials, almost uniformly..
- For every $f \in H(\Omega)$ and every closed path $\gamma$ in $\Omega$,

- Every $f \in H(\Omega)$ has anti-derivative. That is, there exists an $F \in H(\Omega)$ such that $F’=f$.
- If $f \in H(\Omega)$ and $1/f \in H(\Omega)$, then there exists a $g \in H(\Omega)$ such that $f=\exp{g}$.
- For such $f$, there also exists a $\varphi \in H(\Omega)$ such that $f=\varphi^2$.

5~9 are pretty much saying, calculus is fine here and we are not worrying about nightmare counterexamples, to some extent. Most of the implications $n \implies n+1$ are not that difficult, but there are some deserve a mention. 4 implying 5 is a consequence of Runge’s theorem. In the implication of 7 to 8, one needs to use the fact that $\Omega$ is connected. When we have $f=\exp{g}$, then we can put $\varphi=\exp\frac{g}{2}$ from which we obtain $f=\varphi^2$. 9 implying 1 is partly a consequence of the Riemann mapping theorem. Indeed, if $\Omega$ is the plane then the homeomorphism is easy: $z \mapsto \frac{z}{1+|z|}$ is a homeomorphism of $\Omega$ onto $U$. But we need the Riemann mapping theorem to give the remaining part, when $\Omega$ is a proper subset.

If you know the definition of sheaf, you will realise that $(\mathbb{C},H(\cdot))$ is indeed a sheaf. For each open subset $\Omega \subset \mathbb{C}$, $H(\Omega)$ is a ring, even more precisely, a $\mathbb{C}$-algebra. The exponential map $\exp:g \mapsto e^g$ is a sheaf morphism. However, we now see that it is surjective if and only if $\Omega$ is simply connected. I hope this can help you figure out an exercise in algebraic geometry. You know, that celebrated book by Robin Hartshorne.

Since we haven’t prove the Riemann mapping theorem, we cannot use the equivalence above yet. However, we can use 9 right away. This gives rise to Koebe’s square root trick.

Equicontinuity is quite an important concept. You may have seen it in differential equation, harmonic function, maybe just sequence of functions. We will use it to describe a family of functions, where almost uniform convergence can be well established.

Definition 2.Let $\mathscr{F}$ be a family of functions $(X,d) \to \mathbb{C}$ where $(X,d)$ is a metric space.We say that $\mathscr{F}$ is

equicontinuousif, to every $\varepsilon>0$, there corresponds a $\delta>0$ such that whenever $d(x,y)<\delta$, we have $|f(x)-f(y)|<\varepsilon$ for all $f \in \mathscr{F}$. In particular, by definition, all functions in $\mathscr{F}$ are uniformly continuous.We say that $\mathscr{F}$ is

pointwise boundedif, to every $x \in X$, there corresponds some $0 \le M(x) < \infty$ such that $|f(x)| \le M(x)$ for every $f \in \mathscr{F}$.We say that $\mathscr{F}$ is

uniformly bounded on each compact subsetif, to each compact $K \subset X$, there corresponds a number $M(K)$ such that $|f(z)| \le M(K)$ for all $f \in \mathscr{F}$ and $z \in K$.

These concepts are talking about “a family of” continuity and boundedness. In our proof of the Riemann mapping theorem, we do not construct the map explicitly, instead, we will use these concepts above to obtain one (which is a limit) that exists. In this post we simply put $X=\Omega \subset \mathbb{C}$, a simply connected region and $d$ is the natural one.

A famous result of equicontinuity is Arzelà-Ascoli, which says that pointwise boundedness and equicontinuity implies almost uniform convergence.

Theorem 1 (Arzelà-Ascoli)Let $\mathscr{F}$ be a family of complex functions on a metric space $X$, which is pointwise bounded and equicontinuous. $X$ is separable, i.e., it contains a countable dense set. Then every sequence $\{f_n\}$ in $\mathscr{F}$ has then a subsequence that converges uniformly on every compact subset of $X$.

Here is a self-contained proof.

Certainly it is OK to let $X$ be a subset of $\mathbb{R}$, $\mathbb{C}$ or their product. We use this in real and complex analysis for this reason. We will need this almost uniform convergence to establish our conformal map. To specify its application in complex analysis, we introduce the concept of normal family.

Definition 3.Suppose $\mathscr{F} \subset H(\Omega)$, for some region $\Omega \subset \mathbb{C}$. We call $\mathscr{F}$ anormal familyif every sequence of members of $\mathscr{F}$ contains a subsequence, which converges uniformly on every compact subset of $\mathscr{F}$. The limit function is not required to be in $\mathscr{F}$.

We now apply Arzelà-Ascoli to complex analysis.

Theorem 2 (Montel).Suppose $\mathscr{F} \subset H(\Omega)$ is uniformly bounded, then $\mathscr{F}$ is a normal family.

*Proof.* We need to show that $\mathscr{F}$ is “almost” equicontinuous, since uniformly boundedness clearly implies pointwise boundedness, we can apply Arzelà-Ascoli later.

Let $\{K_n\}$ be a sequence of compact sets such that (1) $\bigcup_n K_n = \Omega$ and (2) $K_n \subset K^\circ_{n+1} \subset K_{n+1}$, the interior of $K_{n+1}$. Then for **every** $z \in K_n$, there exists a positive number $\delta_n$ such that

where $D(a,r)$ is the disc centred at $a$ with radius $r$. If such $\delta_n$ does not exist, then there exists a point $z \in K_{n}$ such that whenever $\delta>0$, $D(z,\delta) \setminus K_{n+1} \ne \varnothing$, which is to say, $z$ is a boundary point. But this is impossible because $z$ lies in the interior of $K_{n+1}$ by definition.

For such $\delta_n$, we pick $z’,z’’ \in K_n$ such that $|z’-z’’| < \delta_n$. Let $\gamma$ be the positively oriented circle with centre at $z’$ and radius $2\delta_n$, i.e. the boundary of $D(z’,2\delta_n)$. Recall that the Cauchy formula says

We will make use of this. By the formula above, we have

Now we make use of our choice of $z’$, $z’’$ and $\gamma$. By definition, for $\zeta \in \gamma^\ast$ (the range of $\gamma$), we have $|\zeta-z’|=2\delta_n$. Since $|z’-z’’|<\delta_n$, we have $|\zeta-z’|=2\delta_n=|\zeta-z’’+z’’-z|\le |\zeta-z’’|+|z’’-z’|$. Therefore $|\zeta-z’’| \ge 2\delta_n-|z’’-z’|>\delta_n$. Bearing this in mind, we see

This may looks confusing so we explain it a little more. Since $D(z’,2\delta_n) \subset K^\circ_{n+1}$, we must have $\overline{D}(z’,2\delta_n) \subset K_{n+1}$, therefore whenever $\zeta \in \gamma^\ast=\partial D(z’,2\delta_n)$, we have $|f(\zeta)| \le M(K_{n+1})$. This is where we use the hypothesis of uniformly bounded. we have $|(\zeta-z’)(\zeta-z’’)|>2\delta_n\delta_n$. The integral of the norm of the integrand $\frac{f(\zeta)}{(\zeta-z’)(\zeta-z’’)}$, is therefore bounded by $\frac{M(K_{n+1})}{2\delta_n^2}$. The integral over $\gamma$ is therefore bounded by $\frac{M(K_{n+1})}{2\delta_n^2}$ times $2\pi\delta_n$ and the result follows.

What does this inequality imply? For $\varepsilon>0$, if we pick $\delta=\min\{\delta_n,\frac{2\delta_n\varepsilon}{M(K_{n+1})}\}$, then $|f(z’)-f(z’’)|<\varepsilon$ for every $f \in \mathscr{F}$ and $|z’-z’’|<\delta$. That is, for each $K_n$, the **restrictions** of the members of $\mathscr{F}$ to $K_n$ form an equicontinuous family.

Now consider a sequence $\{f_j\}$ in $\mathscr{F}$. For each $n$, we apply Arzelà-Ascoli theorem to the restriction of $\mathscr{F}$ to $K_n$, and it gives us an infinite subset $S_n \subset \mathbb{N}$ such that $f_j$ converges uniformly on $K_n$ as $j \to \infty $ and $j \in S_n$. Note we can make sure $S_n \supset S_{n+1}$ because if the subsequence converges uniformly within $S_{n+1}$ then it converges uniformly within $S_n$ as well. Pick a new sequence $\{s_j\}$ where $s_j \in S_j$, then we see $\lim_{j \to \infty}f_{s_j}$ converges uniformly on every $K_n$ and therefore on every compact subset $K$ of $\Omega$. The statement is now proved. $\square$

**Remarks.** We have no idea what the limit is, and this happens in our proof of the Riemann map theorem as well.

The sequence $\{K_n\}$ can be constructed explicitly, however. In fact, for every open set $\Omega$ in the plane there is a sequence $\{K_n\}$ of compact sets such that

- $\bigcup_n K_n=\Omega$.
- $K_n \subset K_{n+1}^\circ$.
- For every compact $K \subset \Omega$, there is some $n$ such that $K \subset K_n$.
- Every component of $S^2 \setminus K_n$ contains a component of $S^2 \setminus \Omega$.

The set is constructed as follows and can be verified to satisfy what we want above. or each $n$, define

Then $K=S^2 \setminus V_n$ is what we want.

Is another important tool for our proof of the Riemann mapping theorem. We need this lemma to establish important inequalities. This lemma as well as its variants show the rigidity of holomorphic maps. We make use of the maximum modulus theorem. For simplicity, let $H^\infty$ be the Banach space of bounded holomorphic functions on $U$, equipped with supremum norm $| \cdot |_\infty$.

Theorem 3 (Schwarz lemma).Suppose $f:U \to \mathbb{C}$ is a holomorphic map in $H^\infty$ such that $f(0)=0$ and $|f|_\infty \le 1$, thenon the other hand, if $|f(z)|=|z|$ holds for some $z \in U \setminus \{0\}$, or if $|f’(0)|=1$ holds, then $f(z)=\lambda{z}$ for some complex constant $\lambda$ such that $|\lambda|=1$.

*Proof.* Since $f(0)=0$, $f(z)/z$ has a removable singularity at $z=0$. Hence there exists $g \in H(U)$ such that $f(z)=zg(z)$. Fix $0<r<1$. For any $z \in U$ such that $|z|<r$, we have

Therefore when $r \to 1$, we see $|g(z)| \le 1$ for all $z \in U$. Therefore $|f(z)| \le |z|$ follows. On the other hand, if $|g(z)|=1$ at some point, the maximum modulus forces $g(z)$ to be a constant, say $\lambda$, from which it follows that $|\lambda|=|g(z)|=1$ and $f(z)=\lambda{z}$. $\square$

There are many variances of the Schwarz lemma, and we will be using Schwarz-Pick.

Definition 4.For any $\alpha \in U$, define

This family is a subfamily of Möbius transformation, but we are not paying very much attention to this family right now. We need the fact that such $\varphi_\alpha$ is always a one-to-one mapping which carries $S^1$ (the unit circle) onto $S^1$ and $U$ onto $U$ and $\alpha$ to $0$. This requires another application of the maximum modulus theorem. A direct computation shows that

Theorem 4 (Schwarz-Pick lemma).Suppose $\alpha,\beta \in U$, $f \in H^\infty$ and $| f|_\infty \le 1$, $f(\alpha)=\beta$. Then

*Proof.* Consider

We see $g \in H^\infty$ and $|g|_\infty \le 1$. What’s more important, $g(0)=\varphi_\beta \circ f(\alpha)=\varphi_\beta(\beta)=0$. By the Schwarz lemma, $|g’(0)| \le 1$. On the other hand, we see

and therefore

In particular, equality holds if and only if $g(z)=\lambda{z}$ for some constant $\lambda$. If this is the case, then

The story can go on but we halt here and continue our story of the Riemann mapping theorem.

Each $z \ne 0$ determines a *direction* from the origin, which can be described by

Let $f:\Omega \to \mathbb{C}$ be a map. We say $f$ *preserves angles* at $z_0 \in \Omega$ if

exists and is independent of $\theta$.

Conformal mappings preserves angles in a reasonable way. A function $f$ is **conformal** if it is holomorphic and $f’(z) \ne 0$ everywhere. We have a theorem describes that, but it is pretty elementary so we are not including the proof in this post.

Theorem 5.Let $f$ map a region $\Omega$ into the plane. If $f’(z_0)$ exists at some $z_0 \in \Omega$ and $f’(z_0) \ne 0$, then $f$ preserves angles at $z_0$. Conversely, if the differential $Df$ exists and is different from $0$ at $z_0$, and if $f$ preserves angles at $z_0$, then $f’(z_0)$ exists and is different from $0$.

There is no confusion about $f’(z_0)$. By differential $Df$ we mean a linear map $L:\mathbb{R}^2 \to \mathbb{R}^2$ such that, writing $z_0=(x_0,y_0)$, we have

where $\eta(x,y) \to 0$ as $x \to 0$ and $y \to 0$. To prove this, one can assume that $z_0=f(z_0)=0$. When the differential exists, one writes

We say that two regions $\Omega_1$ and $\Omega_2$ are **conformally equivalent** if there is a conformal one-to-one mapping of $\Omega_1$ onto $\Omega_2$. The Riemann mapping theorem states that

Theorem 6 (Riemann mapping theorem).Every proper simply connected region $\Omega$ in the plane is conformally equivalent to the open unit disc $U$.

As a famous example, the upper plane $\mathbb{H}$ is conformally equivalent to $U$ by the Cayley transform.

As one may expect, this theorem asserts that the study of a simply connected region $\Omega$ can be reduced to $U$ to some extent. But a conformal equivalence is not just about homeomorphism. If $\varphi:\Omega_1 \to \Omega_2$ is a conformal one-to-one mapping, then $\varphi^{-1}:\Omega_2 \to \Omega_1$ is also a conformal mapping. In the language of algebra, such a mapping $\varphi$ **induces** a ring isomorphism

Therefore, the ring $H(\Omega_2)$ is algebraically the same as $H(\Omega_1)$. The Riemann mapping theorem also states that, if $\Omega$ is a simply connected region, then $H(\Omega) \cong H(U)$. From this we can exploit much more information on top of homeomorphism. One can also extend the story to $S^2$, the Riemann sphere, but that’s another story.

The proof is fairly technical. But it is a good chance to attest to our skill in complex analysis. The bread and butter of this proof is the following set:

Our is to prove that there is some $\psi \in \Sigma$ such that $\psi(\Omega)=U$. Note, once the non-emptiness is proved, since $|\psi|<1$ uniformly, we see $\Sigma$ is a **normal family**.

Pick $w_0 \in \mathbb{C} \setminus \Omega$. Then $g(z)=z-w_0 \in H(\Omega)$ and what is more important, $\frac{1}{g} \in H(\Omega)$. By 9 of proposition 1, there exists $\varphi \in H(\Omega)$ such that $\varphi^2(z)=g(z)$, i.e., informally, $\varphi(z)=\sqrt{z-w_0}$ in $\Omega$. If $\varphi(z_1)=\varphi(z_2)$, then $\varphi(z_1)^2=\varphi(z_2)^2=z_1-w_0=z_2-w_0$ and then $z_1=z_2$. Therefore $\varphi$ is one-to-one. On the other hand, if $\varphi(z_1)=-\varphi(z_2)$, we still have $\varphi^2(z_1)=\varphi^2(z_2)=z_1-w_0=z_2-w_0$, and $z_1=z_2$. This is shows that the “square-root” is well-defined here. This is the Koebe’s square root trick.

Since $\varphi$ is an open mapping, there is an open disc $D(a,r) \subset \varphi(\Omega)$, where $a \in \varphi(\Omega)$, $a \ne 0$ and $0<r<|a|$. But by arguments above we have $-a \not\in \varphi(\Omega)$, and therefore $D(-a,r) \cap \varphi(\Omega) = \varnothing$. For this reason, we can put

It follows that

and therefore $\psi(\Omega) \subset U$. Since $\varphi$ is one-to-one, $\psi$ is one-to-one as well and we deduce that $\psi \in \Sigma$, this set is not empty.

**Remark.** You may have trouble believing that $D(-a,r) \cap \varphi(\Omega)=\varnothing$. But if we pick any $w \in D(-a,r) \cap \varphi(\Omega)$, we have some $z’ \in \Omega$ such that $\varphi(z’)=w$. We also have $|-a-w|<r$ but this implies $|a-(-w)|=|a+w|=|-a-w|<r$, and therefore $-w \in D(a,r) \subset \varphi(\Omega)$. There exists some $z’’ \in \Omega$ such that $\varphi(z’’)=-w$. Hence $-w=w=0$. It follows that $|a|<r$ and this is a contradiction.

Since $D(-a,r) \cap \varphi(\Omega)=\varnothing$, we have $|\varphi(z)-(-a)|>r$ for all $z \in \Omega$ and therefore $|\psi(z)|<1$ is not a problem either.

If $\psi \in \Sigma$ and $\psi(\Omega) \subsetneqq U$, and $z_0 \in \Omega$, then there exists a $\psi_1 \in \Sigma$ such that $|\psi_1’(z_0)|>|\psi’(z_0)|$.

This step shows that we can “enlarge” the range in some way.

For convenience we use the Möbius transformation

Pick $\alpha \in U \setminus \psi(\Omega)$. Then $\varphi_\alpha \circ \psi \in \Sigma$ and $\varphi_\alpha \circ \psi$ has no zero in $\Omega$. Hence there is some $g \in H(\Omega)$ such that

Since $\varphi_\alpha \circ \psi$ is one-to-one, another application of Koebe’s square root trick shows that $g$ is one-to-one. Therefore we have $g \in \Sigma$ as well. If $\psi_1=\varphi_\beta \circ g$ where $\beta=g(z_0)$, we have $\psi_1 \in \Sigma$ (one-to-one). In particular, $\psi_1(z_0)=0$.

By putting $s(z)=z^2$, we have

If we put $F(z)=\varphi_{-\alpha} \circ s \circ \varphi_{-\beta}(z)$, then the chain rule shows that

(Note we used the fact that $\psi_1’(z_0)=0$.) If we can prove that $0<|F’(0)|<1$ then this step is complete. Note $F$ satisfy the condition in Schwarz-Pick lemma and therefore

The first equality does not hold because $F$ is not of the form $\varphi_{-\sigma}(\lambda\varphi_{\eta}(z))$ for $|\lambda|=1$. On the other hand we have

Therefore $0<|F’(0)|<1$ and the this step is complete.

We take the contraposition of step 2:

Fix $z_0 \in \Omega$. If $h \in \Sigma$ is an element such that $|h’(z_0)| \ge |\psi’(z_0)|$ for all $\psi \in \Sigma$, then $h(\Omega)=U$.

The proof is complete once we have found such a function! To do this, we use the fact that $\Sigma$ is a normal family. Put

By definition of $\eta$, there is a sequence $\{\psi_n\}$ such that $|\psi_n’(z_0)| \to \eta$ in $\Sigma$. By normality of $\Sigma$, we pick a subsequence $\varphi_k=\psi_{n_k}$ that converges uniformly on compact subsets of $\Omega$. Put the uniform limit to be $h \in H(\Omega)$. It follows that $|h’(z_0)|=\eta$. Since $\Sigma \ne \varnothing$ and $\eta \ne 0$, $h$ cannot be a constant. Since $\varphi_n(\Omega) \subset U$, we must have $h(\Omega) \subset \overline{U}$. But since $h$ is open, we are reduced to $h(\Omega) \subset U$.

It remains to show that $h$ is one-to-one. Fix distinct $z_1, z_2 \in \Omega$. Put $\alpha=h(z_1)$ and $\alpha_n=\varphi_n(z_1)$, then $\alpha_n \to \alpha$. Let $\overline{D}$ be a closed disc in $\Omega$ centred at $z_2$ with interior denoted by $D$ such that

- $z_1 \not\in \overline{D}$.
- $h-\alpha$ has no zero point on the boundary of $\overline{D}$.

We see $\varphi_n -\alpha_n$ converges to $h-\alpha$, uniformly on $\overline{D}$. They have no zero in $D$ since they are one-to-one and have a zero at $z_1$. By Rouché’s theorem, $h-\alpha$ has no zero in $D$ either, and in particular $h(z_2)-\alpha = h(z_2)-h(z_1) \ne 0$. This completes the proof. $\square$

**Remark.** First of all, such a $\overline{D}$ is accessible. This is because zero points of $h-\alpha$ has no limit point in $\Omega$, i.e., they are discrete (when defining $\overline{D}$, we don’t know how many are there yet).

Our choice of $\overline{D}$ enables us to use Rouché’s theorem (chances are you didn’t get it). Since $h-\alpha$ has no zero on the boundary, we have $\zeta=\inf_{z \in \partial D}|h(z)-\alpha|>0$. When $n$ is big enough, we see

The second inequality is another application of the maximum modulus theorem. Rouché’s theorem applies here naturally as well. $\square$

This proof is a reproduction of W. Rudin’s *Real and Complex Analysis*. For a comprehensive further reading, I highly recommend Tao’s blog post.

In the previous post we are convinced that the Galois group of a separable irreducible polynomial $f$ can be realised as a subgroup of the symmetric group, the elements of which permute the roots of $f$. We worked on cubic polynomials over a field with characteristic not equal to $2$ and $3$, and this definitely works with $\mathbb{Q}$. In this post we go one step further.

Let $f \in \mathbb{Q}[X]$ be an irreducible polynomial of prime degree $p$. Since it is also separable (see lemma 9.12.1 on the stack project), we can safely work on its Galois group $G$. One immediately wants to question the position of $\mathfrak{S}_p$. Indeed we have $G \subset \mathfrak{S}_p$. The question is, when does the equality hold? It is not likely to have an immediate answer. However, we have some interesting sufficient conditions, which will be discussed in this post.

We present some handy results in finite group theory that will be used in the main result. One may skip this section until needed. I will collapse the proof in case one wants to treat it as an exercise.

Lemma 1.Let $p$ be a prime number. The symmetric group $\mathfrak{S}_p$ is generated by $[12 \cdots p]$ and an arbitrary transposition $[rs]$.

*Proof.* We prove this by presenting several sets of generators of $\mathfrak{S}_n$ where $n$ is a positive integer.

It is generated by cycles. This is a really, really routine verification and sometimes this is assumed as a fact.

It is generated by transpositions, i.e., $2$-cycles. It suffices to show that a cycle is a product of transpositions. Indeed, for any cycle $[i_1\dots i_k]$ in $\mathfrak{S}_n$, we have $[i_1\cdots i_k]=[i_1i_2][i_2i_3]\cdots[i_{k-1}i_k]$. This proves our statement.

It is generated by translations of the form $[1k]$. It suffices to show that a transposition is generated as such. For any transposition $[rs]$, we have $[rs]=[1r][1s][1r]$.

It is generated by adjacent translations, i.e. the generators can be of the form $[k-1 ,k]$. This follows from the following identity:

- It is generated by two elements: $\sigma=[12]$ and $\tau=[12\cdots n]$. This follows from the following identity:

Now, back to the case when $n=p$ is prime. Put $\sigma=[rs]$ and $\tau=[12\cdots p]$. If $s-r=1$ then it is already proved in 5 by several conjugations. Therefore we may assume that $d=s-r>1$. From now on integers may be a number in either $\mathbb{Z}$ or $\mathbf{F}_p=\mathbb{Z}/p\mathbb{Z}$, depending on the context. Recall that $\mathbf{F}_p$ is a field. Pick the integer $w$ such that $dw=1$ in $\mathbf{F}_p$. By conjugation we see $\tau$ and $\sigma$ generate

The product of elements above is $[1,1+wd]=[12]$. Therefore we are still back to 5. $\square$

We have many good reasons to study the Galois group of *something*. It would be great if the group can be written down explicitly. In this section we show that the group can be revealed by the number of nonreal roots.

Proposition 1.Let $f(X) \in \mathbb{Q}[X]$ be an irreducible polynomial of prime degree. If $f$ has precisely two nonreal roots, then the Galois group $G$ over $\mathbb{Q}$ is $\mathfrak{S}_p$.

*Proof.* Let $L$ be the splitting field of $f$. It suffices to show that $G$ contains a transposition and a $p$-cycle, which is $[12\cdots p]$. By the Sylow’s theorem, $G$ has a subgroup $H$ of order $p$, which can only be cyclic. Say $H=\langle \sigma \rangle$. Suppose $\sigma$ is of cycle type $(k_1,\dots,k_r)$. Then the period of $\sigma$, which equals $p$, is the least common multiple of $k_1,\dots,k_r$, where $k_1+\dots+k_r=p$. This can only happen when $r=1$ and $k_1=p$. Therefore $\sigma$ is a $p$-cycle.

In fact, $\sigma$ can be considered as $[12\dots p]$. Suppose the order of roots of $f$ is given, for which we have $\sigma=[i_1 i_2 \dots i_p]$. Then If we re-order these roots, by putting the $k$th root to be the original $i_k$th root, then we can write $\sigma=[12\dots p]$. (This re-ordering is, in fact, a conjugation.)

It remains to prove that $G$ contains a transposition. Let $\alpha$ and $\beta$ be two nonreal roots of $f$. Since $\overline{\alpha}$ is also a root of $f$ (because coefficients of $f$ are real; if $\sum_{n=0}^{p}a_n\alpha^n=0$, then $\sum_{n=0}^{p}a_n\overline{\alpha}^n=\sum_{n=0}^{p}\overline{a_n\alpha^n}=\overline{0}=0$) we see $\beta=\overline{\alpha}$. Therefore complex conjugation over $\mathbb{Q}(\alpha)$ extends to $L$ as an element of order $2$, which is a transposition in $G$. This proves our assertion. $\square$

For example, consider the polynomial

With calculus one can show that it has exactly three roots, hence it has two nonreal roots. Eisenstein’s criterion shows that $f$ is irreducible. Therefore we are allowed to use proposition 1. The Galois group of $f$ is $\mathfrak{S}_5$.

This also works fine when $p=2$ or $3$. The case when $p=2$ is nothing but working around a quadratic polynomial. When $f(X)$ is irreducible of degree $3$, and it has two nonreal roots, we also know that it has an irrational root. Let the roots be $a+bi,a-bi,c$ where $b \ne 0$ and $c$ is irrational. We see

Therefore the Galois group is $\mathfrak{S}_3$.

It is way too ambitious to restrict ourselves in one single pair of roots. Also, it seems we have ignored the alternating group $\mathfrak{A}_p$ for no reason. Oz Ben-Shimol gave us a nice way to work around this (see arXiv:0709.2868). The whole paper is not easy but the result is pretty beautiful and generalised what we said above as $p \ge 5$.

Proposition 2.Let $f \in \mathbb{Q}[X]$ be an irreducible polynomial of prime degree $p \ge 5$. Suppose that $f$ has $k>0$ pairs of nonreal roots. If $p \ge 4k+1$, then the Galois group $G$ is isomorphic to $\mathfrak{A}_p$ or $\mathfrak{S}_p$. If $k$ is odd then $G \cong \mathfrak{S}_p$.

The proof is done by showing that $\mathfrak{A}_p \subset G \subset \mathfrak{S}_p$. As the index of $\mathfrak{A}_p$ is $2$, $G$ can only be one of them. The solvability of $G$ is also concerned here.

Indeed, what we have proved in “the simplest case” is nothing but $k=1$. When $p \ge 5$ we clearly have $p \ge 1+4 \times 1$. This refined the result of A. Bialostocki and T. Shaska (see arXiv:math/0601397), and the inequality used to be

When $k$ is big enough, we have $k(k\log{k}+2\log{k}+3) \ge 4k+1$. Oz Ben-Shimol’s result is a refinement because it is saying, $p$ does not need to that big. He also offered a refined algorithm to compute the Galois group, which we will present below. Also, computing $4k+1$ is much easier than computing $k^2\log{k}$ plus something.

1 | Input: An irreducible polynomial f(X) over Q with prime degree p >= 5 |

Here, $\Delta(f)$ is the discriminant of $f$. We have seen that whether $\Delta$ is a perfect square matters a lot. The discussion of `ReductionMethod`

can be trailed in Oz Ben-Shimol’s paper.

Let $k$ be an arbitrary field and suppose $f(X) \in k[X]$ is separable and, i.e., $f$ has no multiple roots in an algebraic closure, and of degree $\ge 1$. Let

be its factorisation in a splitting field $F$. Put $G=G(L/k)$. We say that $G$ is the Galois group of $f$ over $k$. Let $x_i$ be a root of $f$ and pick any $\sigma \in G$. By definition of Galois group, we see $\sigma(x_i)$ is still a root of $f$ (consider the map $\tilde\sigma:L[X] \to L[X]$ induced by $\sigma$ naturally; it is the identity when restricted to $k[X]$). This is to say, elements of $G$ permutes the roots of $f$.

For example, consider $L=\mathbb{C}$, $k=\mathbb{R}$, $f(X)=X^2+1$. The Galois group $G$ contains two elements and is generated by complex conjugation $\sigma:a+bi \mapsto a-bi$. A root of $f$ is $i$, and $\sigma(i)=-i$ is another root.

Based on this fact, we can consider $G$ as a subgroup of $\mathfrak{S}_n$, where $n$ is the degree of $f$. The structure of $\mathfrak{S}_n$ can be extremely complicated, but for now we assume that they are well-known. The question is, what subgroup is $G$ inside $\mathfrak{S}_n$. Let’s take a look into the case when $n=3$.

To begin with we note that we can assume that the quadratic term is $0$. Let $f(X)=X^3+aX^2+bX+c$ be a polynomial, then

and as a result $aX^2$ is cancelled. A translation does not change any property of a polynomial except the value of its roots. Therefore we can reduce our study to polynomials in the depressed form

In fact, for all $g(X)=X^n+a_{n-1}X^{n-1}+\dots+a_0$, we can cancel out $a_{n-1}X^{n-1}$ by a substitution $Y=X-\frac{a_{n-1}}{n}$.

Now back to our main story. First of all we study irreducibility. If $f$ is irreducible, then clearly it has no root in $K$. On the other hand, if $f$ has no root in $K$, does that mean $f$ is irreducible over $K$? This does not hold in general for all polynomials. For example, the polynomial $g(X)=(X^2+1)^2$ is not irreducible yet it has no root in $\mathbb{R}$ or $\mathbb{Q}$. But fortunately, $3$ is a beautiful number and we can proceed. Were $f$ irreducible, there would be a factorisation

with each $p_i(X)$ being a proper factor of $f(X)$. However, this is to say, at least one of $p_i(X)$ has degree $1$. A contradiction. We therefore have a result as follows:

Proposition 1.Let $f(X)$ be a cubic polynomial in $K[X]$ where $\operatorname{char}K=0,5,7,\dots$, then $f$ is irreducible over $K$ if and only if $f$ has no root in $K$.

Notation being above, we assume that $f$ is irreducible. Let $L$ be the splitting field of $f$. We claim that $f$ is separable. Before proving the claim, one should notice that the characteristic matters a lot. For example, $X^3-2$ is irreducible over $\mathbb{Q}$ but $X^3-2=(X+1)^3$ over $\mathbf{F}_3[X]$ and we therefore have a triple root.

$f$ is separable if and only if $\gcd(f,f’)=0$. The derivative of $f$, which should be simplified because $f$ has been, is given by

It is not equal to $a$ because the characteristic of $K$ is not $3$. We will show carefully that $f(X)$ is separable by working on these two polynomials.

The first question is the value of $a$ and $b$. If some of them is $0$ then things may be easier or harder. Note first we must have $b \ne 0$ because if not then $f(X)=X(X^2+a)$ and this is not irreducible. If $a=0$, then $f(X)=X^3+b$ and $f’(X)=3X^2 \ne 0$ because $\operatorname{char}K \ne 3$. It follows that $(f,f’)=0$ because either $X$ or $X^2$ divides $X^3+b$.

Now there only remains the most general case: $a \ne 0$ and $b \ne 0$. This is where the Euclidean Algorithm kicks in. Recall that for any three polynomials $p,q,r$ in $K[X]$, we have

This is how the Euclidean Algorithm works. Note we can write

It follows that $\gcd(f,f’)=\gcd(f’,r_0)$. We next work on $f’$ and $r_0$.

However, $r_0(X)$ and $r_1(X)$ has common divisor $0$, which implies that $f$ and $f’$ has common divisor $0$. Whichever the case is, we have $\gcd(f,f’)=0$ and therefore $f$ is separable. Note the fact that the characteristic of $K$ is not $2$ or $3$ is frequently used here, otherwise there are a lot of equations making no sense.

Where we are at? We want to ensure that $f$ is separable so that working with the Galois group of $f$ is not that troublesome. And $f$ is. We now back to the study of the Galois group $G=G(L/K)$, where $L$ is the splitting field of $f$. Let $\alpha_1$, $\alpha_2$, $\alpha_3$ be the roots of $f$ and pick one of them as $\alpha$. We see $[K(\alpha):K]=3$.

Since $G$ permutes three elements, $G$ has to be a subgroup of $\mathfrak{S}_3$. Therefore $|G|=[L:K] \ge [K(\alpha):K]=3$, which implies that $|G|=3$ or $6$. In the first case, $G=\mathfrak{A}_3$, the alternating group. In the second case, $G=\mathfrak{S}_3$ and $K(\alpha)$ is not normal over $K$ because, there is an irreducible polynomial $f(X) \in K[X]$ which has a root in $K(\alpha)$ that does not split into linear factors in $K(\alpha)$. This is the definition of normal extension.

The question now is, when $G$ is $\mathfrak{S}_3$ and when it is $\mathfrak{A}_3$? We get a good chance to review finite group theory. This is answered by the sign of elements in $G$. To be precise, $G=\mathfrak{S}_3$ if and only if $G$ has an odd element. If not then $G=\mathfrak{A}_3$. To work with this, we recall how the sign function work. Put

For any $\sigma \in G$, we have $\sigma(\delta)=\varepsilon(\sigma)\delta$, where $\varepsilon(\sigma)$ is the sign of $\sigma$. If we put $\Delta=\delta^2$, which is the discriminant, we see $\sigma(\Delta)=\Delta$. Therefore $\Delta \in L^G=K$. But wait, since $\sigma(\delta)=\pm\delta$, the sign is not guaranteed, we see $\delta$ is not guaranteed to be in $K$. This is where we crack the problem.

If $\delta \in K$, or more precisely, $\sqrt\Delta \in K$, then $\sigma(\delta)=\delta$ and it follows that $\varepsilon(\sigma)=1$ for all $\sigma \in G$. This can only happen if $G=\mathfrak{A}_3$.

If $\sqrt\Delta \not\in K$, then $\delta$ is not fixed by $G$. There is some $\sigma \in G$ such that $\sigma(\delta)=-\delta$, which is to say that $\varepsilon(\sigma)=-1$. This can only happen when $G=\mathfrak{S}_3$.

We have the following conclusion.

Proposition 2.Notation being above. Assume that $f$ is irreducible. Then the Galois group of $f$ is $\mathfrak{S}_3$ if and only if $\sqrt\delta \not\in K$. The group is $\mathfrak{A}_3$ if and only if $\sqrt\Delta \in K$.

A dirty calculation shows that $\Delta=-4a^3-27b^2$. One can show this using Vieta’s formulas. You shan’t feel this to be strange because in the quadratic case we have $\Delta=b^2-4ac$ and we did care if $\Delta>0$, which amounts to whether $\sqrt\Delta \in \mathbb{R}$.

Let’s conclude this post by a handy but nontrivial example. Consider

The discriminant is $-4 \cdot(-1)^3-27 \cdot (-1)^2=-23$, which lies in $\mathbb{Q}(\sqrt{-23})$ and therefore the Galois group over it is $\mathfrak{A}_3$. However, when the base field is a subfield, for example, $\mathbb{Q}$, then the Galois group is $\mathfrak{S}_3$.

]]>The method is presented by Artin: we will be actively using theories of the Sylow group theory. Recall that for a finite group $G$, if $p$ is a prime dividing $|G|$, then there is a $p$-Sylow subgroup. We are not caring about *other* $p$-Sylow groups here. However, one needs to also recall that a $p$-group $H$ is always solvable. If $|H|>1$, then $H$ admits nontrivial centre. If $|H|=p^n$, then there is a sequence of subgroups

where $H_{i}$ is normal in $H$ for all $i=0,\dots,n$ and $H_{i+1}/H_i$ is cyclic of order $p$. This is to say $|H_i|=p^i$.

On the other hand, we also make use of analysis (which is Gauss’s idea). For every $a>0$, there is a square root $\sqrt{a}>0$. In other word, we have a positive root of the equation $X^2-a=0$. On the other hand, every polynomial $f(X) \in \mathbb{R}[X]$ of odd degree has a root in $\mathbb{R}$. This is to say, such $f(X)$ is *not* irreducible over $\mathbb{R}$ unless $\deg f=1$.

Next we take a look at $\mathbb{C}=\mathbb{R}(i)$, where, $i$ is the imaginary unit, or, algebraically speaking, a root of $g(X)=X^2+1$. Note, every $z \in \mathbb{C}$ has a root. If we write $z=a+bi$, then

gives rise to $(c+di)^2=a+bi$. It follows that all polynomials $f(X) \in \mathbb{C}[X]$ of order $2$ has a root (if this is not very obvious, use a change-of-variable), hence *not* irreducible. With this being said, $\mathbb{C}$ does not have an extension of order $2$. Say, if $[E:\mathbb{C}]=2$, then $E=\mathbb{C}[X]/(p(X))$ and $p(X)$ is irreducible. But It can only be of order $2$, which is absurd already.

We also need a part of the following lemma on field extension. In brief, finite separable extension induces a *minimal* Galois extension.

Lemma.Let $E/F$ be a finite separable extension. Then $E$ is contained in an extension $K$ such that $K/F$ is Galois. It is minimal in the sense that, in a fixed algebraic closure $K^\mathrm{a}$ of $K$, any other Galois extension $L$ of $F$ containing $E$ must contain $K$ as well. We have the following tower:

*Proof.* First of all, we can find a finite Galois extension of $F$ containing $E$. For example, the composite of the splitting fields of the minimal polynomials for a basis for $E$ as a $F$-vector space. The intersection of all Galois extensions is exactly what we want. $\square$

The complex field $\mathbb{C}$ is algebraically closed.

The following proof focuses on algebra and tries its best to avoid analysis. If you are a fan of analysis, you can dive into complex analysis and use the maximum modulus theorem to study a polynomial. Or, you can study the behaviour of $\frac{1}{f(z)}$ where $f$ is a polynomial. If $f$ has no root, then perhaps it can only be a constant.

*Proof.* Let’s firstly make it a problem of Galois theory. Since $\mathbb{R} \supset \mathbb{Q}$, it is of characteristic $0$ (hence perfect) and every finite extension is separable. Hence, in particular, $\mathbb{C}/\mathbb{R}$ is finite and separable. Let $L/\mathbb{C}$ be a finite extension. Then $L/\mathbb{R}$ is still a finite and separable extension, since both the class of finite extensions and the class of separable extensions are distinguished.

Applying the lemma above, we can find a finite and Galois extension $K/\mathbb{R}$. We need to prove that $K=\mathbb{C}$.

Put $G=G(K/\mathbb{R})$. We want to show that $|G|=2$ hence $[K:\mathbb{R}]=[K:\mathbb{C}][\mathbb{C}:\mathbb{R}]=2$ and our result follows immediately. To do this, we first show that $|G|$ is even. Let $H \subset G$ be a $2$-Sylow subgroup of $G$ and we can say $|H|=2^n$, $|G|=2^nm$ and $m$ is even. Now we use the Galois correspondence. Put $F=K^H$. We see $K/F$ is Galois and $[K:F]=2^n$. It follows that $[F:\mathbb{R}]=m$. We claim that $m=1$.

Indeed, applying the lemma again, we see $F/\mathbb{R}$ is separable. Hence we may apply the primitive element theorem to obtain $F=\mathbb{R}(\alpha)$. $\alpha$ is the root of an irreducible polynomial in $\mathbb{R}[X]$ of degree $m$. But $m$ is odd, we must have $m=1$.

Therefore $G=H$ is a $2$-group. Since Galois extension remains normal under lifting, we see $K/\mathbb{C}$ is Galois. Let $G_1=G(K/\mathbb{C}) \subset G$ be the Galois group. We next claim that $G_1$ is trivial. If not, then, being a $2$-group, it has a subgroup $G_2$ of index $2$. Put $F’=K^{G_2}$, then we see $[K^{G_2}:\mathbb{C}]=G_1/G_2 \cong\mathbb{Z}/2\mathbb{Z}$. However, as mentioned above, $\mathbb{C}$ has no extension of order $2$. This contradiction implies that $G_1$ is trivial and therefore $K=\mathbb{C}$. $\square$

Why we have to prove that $K=\mathbb{C}$? If you didn’t get it, let me remind you that a Galois extension is, by definition, an **algebraic** extension which is normal and separable.

Let $G$ be a finite group and $R$ be a commutative ring. The *algebra* of $G$ over $R$ is denoted by $R[G]$, which firstly is an algebra over $R$. The basis of $R[G]$ is given by $e_s$ where $s \in G$. The product rule on $R[G]$ is made of

With this being said, given $u=\sum_{s \in G}a_se_s$ and $v=\sum_{t \in G}b_te_t$, we have

For example, take $G=C_3=\{1,x,x^2\}$, the cyclic group of three elements. If $u=a_1e_1+a_xe_x$ and $v=b_xe_x+b_{x^2}e_{x^2}$, then

As one will notice, the structure of this algebra should be determined by both $G$ and $R$ although we don’t know what would happen at this moment. If we take $R=\mathbb{C}$, then everything is very *simple*. A lot of things in elementary linear algebra can be recovered here. And that is part of the mission of this blog post. Before we dive in we need to look into group algebra in a general setting first. It is not often to see group algebra and representation theory to be treated together but let’s try it. While the majority of this post is (non-commutative) ring theory and module theory, we encourage the reader to try to use representation theory as examples. Standalone examples may drive us too far and we may not have enough space for them.

First of all, we list some very obvious facts that do not even need proof.

$R[G]$ is a free $R$-module with dimension $|G|$.

$R[G]$ is itself a ring. The commutativity of $R[G]$ is determined by $G$.

However, as one may ignore,

Proposition 1.$R[G]$ isnota division ring.

*Proof.* Pick $g \in G$ that is not the identity. Then $e_1-e_g$ is a zero-divisor because if we take $m=|G|$, then

But in a division ring, there is no zero-divisor. $\square$

As a ring, we certainly can consider modules over $R[G]$, which brings us the following section.

Let $R$ be a ring (not assumed to be commutative here). An $R$-module $E$ is called **simple** it has no nontrivial submodule. This may remind you of irreducible or simple representations of a group. We will see the connection later. Following the definition, we immediately have a special version of Schur’s lemma:

Proposition 2 (Schur’s Lemma).Let $E,F$ be two simple $R$-modules. Every nontrivial homomorphism $f:E \to F$ is an isomorphism.

*Proof.* Note $\ker{f}$ and $f(E)$ are submodules of $E$ and $F$ respectively. Since $f$ is nontrivial and $E,F$ are simple, we have $\ker{f}=0$ and $f(E)=F$, which is to say that $f$ is an isomorphism. $\square$

Corollary 1.If $E$ is a simple $R$-module, then $\operatorname{End}_R(E)$ is a division ring.

*Proof.* If $f:E \to E$ is nontrivial, then according to Schur’s lemma, it has an inverse. $\square$

This definitely reminds you of irreducible representations. But irreducible representations are not always the case, so are simple modules. Recall the Maschke’s theorem in representation theory: *Every representation of a finite group over $\mathbb{C}$ having positive dimension is completely reducible.* For modules, we have a similar statement.

Definition-Proposition 3.Let $E$ be an $R$-module. Then the following three conditions are equivalent:

SS 1.$E$ is a sum of simple $R$-modules.

SS 2.$E$ is a direct sum of simple $R$-modules.

SS 3.For every submodule $E’$ of $E$, there is another submodule $F$ such that $E = E’ \oplus F$, i.e. every submodule is a direct summand.If $E$ satisfies the three conditions above, then $E$ is called

semisimple. A ring $R$ is semisimple if it is a semisimple module over itself.

*Proof.* Assume **SS 1**, say we have $E=\sum_{i \in I}E_i$. Let $J$ be the maximal subset of $I$ such that $E_0=\sum_{j \in J}E_j$ is a direct sum (this $J$ exists by Zorn’s lemma). Pick any $i \in I$. Then $E_i \cap E_0$ is a submodule of $E_i$, which can either be $0$ or $E_i$. If $E_i \cap E_0 = E_i$ then $E_i \subset E_0$. If the intersection is $0$ however, $E_0 +E_i$ is direct, which is to say that $J \cup\{i\} \supsetneq J$ is the subset of $I$ yielding a direct sum. A contradiction. Hence $E_i \subset E_0$ holds for all $i \in I$, i.e. $E_0 = E$.

Next we assume **SS 2** and we have $E = \bigoplus_{i \in I}E_i$. Pick any submodule $E’ \subset E$. Let $J$ be the maximal subset of $I$ such that $E_0=E’+\bigoplus_{j \in J}E_j$ is direct. In the same manner we see $E_i \cap E_0=E_i$ for all $i \in I$, which proves **SS 3**.

Finally we assume **SS 3**. Let $E_0=\sum_{i \in I}E_i$ be the sum of all simple modules of $E$. Then there is a submodule $F$ of $E$ such that $E=E_0 \oplus F$. Assume $F \ne 0$, then $F$ has a simple submodule, which contradicts the definition of $E_0$. Hence $F=0$ and $E_0=E$. The reason why nontrivial $F$ must have a simple submodule is contained in the following lemma. $\square$

Lemma 4.Let $E$ be an $R$-module satisfyingSS 3, then every nontrivial submodule $F$ has a simple submodule.

*Proof.* It suffices to show that every nontrivial principal module has a simple submodule. Indeed, for any $F \ne 0$, we pick a nonzero $v \in F$, then $Rv \subset F$.

Let $L$ be the kernel of the morphism

Then $L$ is a left ideal, which is contained in a maximal ideal $M$ of $R$. It follows that $Mv$ is a maximal submodule of $Rv$ because $M/L$ is a maximal ideal of $R/L$ and the following isomorphism

By **SS 3**, we can find a submodule $M’$ such that

which gives

We claim that $M’ \cap Rv$ is maximal. Pick any proper submodule $E’ \subset M’ \cap Rv$, then $Mv \oplus E’$ is a submodule of $Rv$, which has to be $Mv$, i.e. $E’=0$ because of the maximality of $Mv$. This proves our statement. $\square$

Proposition 5.Let $E$ be a semisimple $R$-module, then every nontrivial submodule and quotient module of $E$ is semisimple.

*Proof.* Pick nontrivial submodule $F$ of $E$. Let $J$ be the maximal subset of $I$ such that

is direct. Then the direct sum is actually $E$. Therefore $F=\bigoplus_{k \in K}E_k$ where $K = I \setminus J$. In particular, since $(F \oplus F’)/F \cong F’$, a quotient module of $E$ is semisimple. $\square$

Corollary 6.$R$ is a semisimple ring if and only if every $R$-module is semisimple.

*Proof.* By the universal property of free modules, every $R$-module is a factor module of a free $R$-module, while a free $R$-module is a direct sum of some copies of $R$. Hence if $R$ is semisimple then every $R$-module is semisimple. Conversely, if every $R$-module is semisimple, then $R$ is semisimple because it is a left module over itself. $\square$

Let $R$ be a ring. We say it is a finite dimensional algebra if it is also a vector space over some field $K$ of finite dimension. We study the Jacobson radical $J(R)=\bigcup\{\text{left maximal ideals of }R\}$ in this subsection, which will be used in next section.

We summarise what we want to prove in the following proposition.

Proposition 7 (Jacobson Radical).Let $R$ be a ring (not necessarily commutative) and $J(R)$ be the Jacobson radical of $R$, then

$J(R)$ is a two-sided ideal containing all nilpotent elements.

For every simple $R$-module $E$ we have $J(R)E=0$. More precisely, $J(R)=\{a \in R:aE=0\text{ for all simple (R)-modulle (E)}\}$

Suppose $R$ is a finite dimensional algebra (or more generally, $R$ is Artinian), then $R/J(R)$ is semisimple, and if $I$ is a two-sided ideal such that $R/I$ is semisimple, then $J(R) \subset I$. It follows that $R$ is semisimple if and only if $J(R)$ is trivial.

Assumption being above, $J(R)$ is nilpotent.

*Proof.* We first prove 2. Pick any $a \in R$ such that $a$ annihilate all simple $R$-module. For any maximal left ideal $M$, $R/M$ is simple. Therefore $a(R/M)=0$, which implies that $a \in M$. Therefore $a \in J(R)$.

Conversely, suppose $J(R)E \ne 0$ for some simple $E$. Since $J(R)E$ is a submodule of $E$ and $E$ is simple, we have $J(R)E=E$. More precisely, there exists some $x \in E$ such that $J(R)x=E$. Therefore there exists $a \in J(R)$ such that $ax=x$. $a-1$ is in the annihilator $\operatorname{Ann}(x)$, which is contained in a maximal ideal $M$ (does not equal $R$). But we also have $J(R) \subset M$. Therefore $a \in M$ and $a-1 \in M$, which implies that $1 \in M$ and this is absurd. Hence 2 is proved.

Next we prove 1. By definition $J(R)$ is a left ideal. Now pick any $a \in J(R)$ and $b \in R$. It follows that $abE=0$ for all simple $E$. Indeed, if $bE \ne 0$, then $bE=E$ and therefore $abE=aE=0$. If $a$ is nilpotent and $E$ is simple, then $aE=0$. If not, say $aE=E$ and $a^n=0$, then $0=a^nE=a(a^{n-1}E)=aE=E$. A contradiction. Therefore 1 is proved as well.

To prove 3, we first note that $R$ is Artinian: every descending chain of left ideals $J_1 \supsetneq J_2 \supsetneq \cdots$ must stop. This is determined by the dimension of $R$. It follows that $J(R)$ is the intersection of finitely many maximal ideals, for the descending chain

must be finite. Therefore we can write $J(R)=\bigcap_{i=1}^{n}M_n$ for some maximal ideals of $R$. Now consider the map

Since $J(R)=\bigcap_{i=1}^{n}M_i$, this follows from nothing but the Chinese Remainder Theorem. $\phi$ is an isomorphism and each $R/M_i$ is simple. We are done.

Now suppose $I$ is a two-sided ideal such that $R/I$ is semisimple. By definition we can write

for some simple $L_j$. Pick any $a \in J(R)$, we have $aL_j=0$ for all $j$, therefore $a(R/I)=0$, which implies that $a \in I$, i.e. $J(R) \subset I$. (In fact, according to the structure theorem of semisimple ring, $J$ is finite.)

If $J(R)=0$, then $A/J(R)=A$ is semisimple. Conversely, if $A$ is semisimple, then $I=0$ is a two-sided ideal such that $A/I$ is semisimple. Hence $J(R)$ has to be trivial as well.

To prove 4, we work on the descending chain $N \supset N^2 \supset N^3 \supset \cdots$. Let $N^\infty$ be the ideal where the chain stops to shrink. Then according to Nakayama’s lemma, $NN^\infty=N^\infty$, which implies that $N^\infty=0$. $\square$

Let $R$ be a commutative ring and $G$ a finite group. Let $E$ be an $R$-module. We can study the representation

and we can also study the ring homomorphism

We show that they are the same thing. Given $\lambda$, then for any $g \in G$, $\lambda(e_g)$ is an automorphism because $\lambda(e_g)\lambda(e_{g^{-1}})=\lambda(e_1)=1$. Therefore $\lambda$ gives rise to representation $\rho:g \mapsto \lambda(e_g)$.

Conversely, for an representation $\rho$ and any $g \in G$, $\rho(g)$ is automatically an endomorphism and therefore we have a map

Therefore, the study of group representation can also be transferred into the study of group algebra. For simplicity we call such a module $E$ together with a representation $\rho$ as a $G$-module, which you may have known. *Note such a $G$-module can also be considered as a module over $R[G]$ in the usual sense. Conversely, an $R[G]$-module is a $G$-module.* When the context is clear, we write $gx$ in place of $\rho(g)x$.

We generalise Maschke’s theorem in an arbitrary field $K$.

Theorem 8 (Maschke).Let $G$ be a finite group of order $n$. Let $K$ be a field, then $K[G]$ is semisimple if and only if the characteristic of $K$ does not divide $n$ (it can also be $0$).

In introductory representation theory, we study the case when $K=\mathbb{R}$ or $\mathbb{C}$, whose characteristic is definitely $0$.

*Proof.* Let $E$ be a $G$-module, and let $F$ be a $G$-submodule. We show that $F$ is a direct summand of $E$, i.e., there exists some $E’ \subset E$ such that $E = E’ \oplus F$. It is natural to think about the projection $\pi:E \to F$ where $\pi(x)=x$ for all $x \in F$. It is seemingly clear that $E=\ker\pi \oplus F$ is what we want, but we can’t do this: we only know that $\pi$ is a $K$-linear map, but we have no idea if it is a $K[G]$-linear map. To work around this problem, we modify the projection into a $K[G]$-linear map.

To do this, we *average* $\pi$ over conjugation. To be precise, we consider the map

This map is $K[G]$-linear. We therefore can write $E=\ker\varphi \oplus F$ because it is the left inverse of the inclusion $i:F \to E$. Indeed, for any $x \in F$, we have

Note, since $F$ is a $G$-module, we have $g(x) \in F$ and therefore $\pi \circ g(x)=g(x)$. Also, the fact that $\operatorname{char}K \nmid n$ is used here: if the characteristic divides $n$, then $\sum_{g \in G}x=0$. Moreover, $n \cdot 1=0$ in $K$ and therefore $\frac{1}{n}$ is not defined.

Next we suppose that $p=\operatorname{char} K$ divides $n$. Consider the element

Note $gs:=e_gs$ for all $g \in G$ and therefore $s^2=(\sum_{g \in G}e_g)s=ns=0$ because $p \mid n$. Therefore $s$ is a nonzero nilpotent element, i.e. $J(K[G]) \ne 0$, from which it follows that $K[G]$ is not semisimple according to proposition 7. $\square$

In other words, if $E$ is a finitely dimensional representation over $K$ of group $G$, and the characteristic of $K$ does not divide $|G|$, then $E$ is completely reducible. Recall we also have matrix decomposition of a matrix representation. But this is not very easy to generalise. To work with it we need a clearer look at semisimple rings.

It would be great that, given a matrix representation of a representation, we can decompose it into diagonal block matrix, with each block being a subrepresentation. But it would not be a easy job: we need to know whether the field is algebraically closed, the characteristic of it, et cetera. Perhaps we need some Galois theory but it has gone too far from this post. Anyway we need to see through the structure to know how to work with it.

In this section we study the structure of $R$ in a more detailed way. We say a ring is **simple** if it is semisimple and all of its simple left ideals are isomorphic. A left ideal is called simple if it is a simple left $R$-module.

Theorem 9 (Structure theorem of semisimple rings).Let $R$ be a semisimple ring. Then the isomorphic class of left ideals of $R$ is finite. Say it is represented by $L_1,L_2,\dots,L_s$. If $R_i = \sum_{L \cong L_i}L$ (the sum of all left ideals isomorphic to $L_i$), then $R_i$ is a two-sided ideal, and is a simple ring. One can write $R$ as a productBesides, $R$ admits a Peirce decomposition with respect to these $R_i$. There are elements $e_i \in R_i$ such that

The $e_i$ are idempotent ($e_i^2=e_i$), orthogonal ($e_ie_j=0$ if $i \ne j$). As a ring, $e_i$ is the multiplication identity of $R_i$, and $R_i=e_iR=Re_i$.

*Proof.* To begin with we first study the behaviour of simple left ideals.

Lemma 10.Let $L$ be a simple left ideal of $R$ and $E$ be a simple $R$-module, then $LE = 0$ unless $L \cong E$.

*Proof of the lemma.* Since $E$ is simple, $LE=0$ or $E$. If $LE=E$, then there exists some $y \in E$ such that $Ly=E$ (again by the simplicity of $E$). Therefore the map

is surjective. It is injective because the kernel is a submodule of $L$ and it has to be trivial. $\blacksquare$

According to this lemma, $R_i R_j=0$ whenever $i \ne j$. This will be frequently used. For the time being we can write $R=\sum_{i \in I}R_i$ although we don’t know whether $I$ is finite. Firstly we show that $R_i$ is also a right ideal (since it is a sum of left ideals, it is by default a left ideal):

Therefore $R_i$ is also a right ideal for all $i$. But before we proceed we need to explain the relation above. Since $R$ contains the unit, we must have $R_i \subset R_i R$. We have $R_iR=R_iR_i$ because $R_iR_j=0$ for all $i \ne j$ and $R$ is a sum of all $R_j$ over $j \in I$. Therefore other terms are eliminated. Finally, we have $R_iR_i \subset R_i$ simply because $R_i$ is a left ideal.

Also note that $R_i \cap R_j=0$ for all $i \ne j$ because it is an intersection of two distinct classes of simple modules. Therefore we can write $R=\bigoplus_{i \in I}R_i$ for the time being.

Now consider $1=\sum_{i \in I}e_i$ with $e_i \in R_i$. This sum is finite (by definition of direct sum, where cofiniteness is required). Let $J \subset I$ be the finite subset such that $e_j \ne 0$ for all $j \in J$. It follows that $R_i=0$ for all $i \in I \setminus J$ because $R_i = 1 \cdot R_i = \sum_{j \in J}e_jR_i = 0$. We can therefore write $R=\bigoplus_{i=1}^{n}R_i$. All other direct summands are trivial. Since each $R_i$ represents a isomorphic class of simple left ideals, the class of simple left ideals are finite.

Now we study the relation of $e_i$, $R_i$ and $R$. For any $a_i \in R_i$, we have

Therefore $e_i$ is the unit in $R_i$ (it follows automatically that $e_i^2=e_i$). For any $a \in R$, we put $a_i=ae_i$, then there is a unique decomposition

This gives us a projection $R_i=Re_i=e_iR$. We also have $e_ie_j=0$ if $i \ne j$. Since $R_iR_j=0$, we can safely write $R=\prod_{i=1}^{n}R_i$. Each $R_i$ is simple because (1) it is semisimple ($R_i=\sum_{L \cong L_i}L$ and each $L$ is also a simple $R_i$-module) and (2) all simple left ideals of $R_i$ are isomorphic. To show this, assume that $L \subset R_i$ is a left ideal that is not isomorphic to $L_i$. Since we have $L = R_iL = RR_iL = RL$, $L$ is also a simple left ideal of $R$. But it contradicts the definition of $R_i$. $\square$

Let’s extract more information from this theorem. First of all the sum of $1$ is also finite in every $R_i$, hence each $R_i$ is also a finite direct sum. To be precise,

Theorem 11.Every simple ring $R$ admits a finite direct sum of simple left ideals

*Proof.* Since $R$ is semisimple, it is a sum of simple left ideals, the collection of which can be chosen to be direct. Say we have $R=\bigoplus_{i \in I}R_i$.

Consider $1 \in R$:

where $x_i \in R_i$. This sum is finite, say we have $1=\sum_{i=1}^{n}x_i$ and $x_i \ne 0$. Then

This proves our assertion. $\square$

Combining theorem 9 and 11, we see

Corollary 12.Every semisimple ring $R$ admits a decompositionwhere $n_iL_i$ denotes $n_i$ direct sums of isomorphic simple left ideals $L_i$. This direct sum is unique in the following sense. $L_1,\dots,L_r$ are unique up to isomorphism. $(n_i,L_i)$ are unique up to a permutation.

This must reminds you of the isotropical decomposition of a representation into irreducible representations. They are the same thing. It used the semisimplicity of $\mathbb{C}[G]$ and here we are talking about the semisimplicity of an arbitrary ring.

We include here a elementary ring theory result that really doesn’t need a proof here.

Proposition 13.Let $R_1, R_2,\cdots, R_n$ be rings with units. The direct producthas the following property. Every ideal (no matter left, right or two-sided) of $R_i$ is an ideal of $R$. Every minimal ideal of $R_i$ is an ideal of $R$. Every minimal ideal of $R$ is an ideal of some $R_i$.

The proof is quite similar to how we prove that $R_i$ is simple in our proof of theorem 9. This actually shows that

Corollary 14.If $R_1,\cdots,R_n$ are semisimple rings, then so is

We want to work with matrices, i.e., we want to work with linear equations. This becomes possible because of Wedderburn-Artin ring theory. We don’t know what can happen yet, so we can only try to generalise things very carefully.

When talking about matrices, we can talk about endormorphisms as well. So our first step is to find a bridge to endormorphisms. We now to need to consider $R$ as a left module over itself.

The most immediate one is multiplication. For $a \in R$, we may consider the multiplication induced by $a$:

It may looks natural but unfortunately it is not necessarily an endomorphism. The reason is simple because we have $\lambda_a(yx)\ne y\lambda_a(x)$ in general. However we can define

Now $\rho_a(yx)=y\rho_a(x)$ holds naturally. We can show that every endomorphism is defined in this way. Consider the map $\rho:a \mapsto (x \mapsto xa)$. We have

$\rho$ is anti-homomorphism. Indeed, $\rho(ab)=\rho(b)\rho(a)$ for all $a,b \in R$ and $\rho(a+b)=\rho(a)+\rho(b)$.

$\rho$ is surjective (as a function, not a homomorphism). For any $\psi:x \mapsto \psi(x)$, we have $\psi(x)=\psi(x \cdot 1)=x\psi(1)$. Therefore $\rho(\psi(1))=\psi$.

$\rho$ is injective. If $\rho(a)(x)=xa=0$ for all $x \in R$, then in particular $\rho(a)(1)=a=0$.

We can call $\rho$ an *anti-isomorphism* but that causes headaches. Instead, if we consider the opposite ring $A^{op}$ where addition is the same as $A$ and multiplication $\ast$ is given by

then we have

Proposition 14.Let $R$ be a ring. There is a natural isomorphism $R^{op} \cong \operatorname{End}_R(R)$ given by $a \mapsto (x \mapsto xa)$.

Note $(R^{op})^{op}=R$ so we may be able to take the opposite to decompose $\operatorname{End}_R(R)$ and take the opposite again.

Now write $R=\bigoplus_{i=1}^{r}n_iL_i$ as in corollary 12. We therefore have

However, by Schur’s lemma, $D_i=\operatorname{End}_R(L_i)$ is a division ring (we don’t necessarily have a field here). Therefore

For each $f \in \operatorname{End}_R(n_kL_k)$, we have a corresponding matrix $(p_ift_j)$:

where $t_j$ is the inclusion and $p_i$ is projection. This is to say, the isomorphism is given by

The verification is a matter of linear algebra and techniques frequently used in this post.

Therefore we have

Taking the opposite again we have

The isomorphism $\operatorname{Mat}_n(D)^{op} \cong \operatorname{Mat}_n(D^{op})$ is given by transpose of a matrix. However, the opposite ring of a division ring is still a division ring, we therefore have a decomposition

where $D_i$ is a division ring.

Conversely, rings of the form above is semisimple. This is easy because for $R=\operatorname{Mat}_n(D)$, the only proper two-sided ideal is trivial, hence $J(R)$ is also trivial, but $R/J(R)$ is semisimple. See the lemma below.

Lemma.Let $R$ be a ring. All two-sided ideals of $\operatorname{Mat}_n(R)$ are of the form $\operatorname{Mat}_n(I)$ where $I$ is a two-sided ideal of $R$.

*Proof.* If $I$ is a two-sided ideal of $R$, then clearly $\operatorname{Mat}_n(I)$ is a two-sided ideal of $\operatorname{Mat}_n(R)$. Conversely, suppose $J \subset \operatorname{Mat}_n(R)$ is a two-sided ideal, we show that $J=\operatorname{Mat}_n(I)$ for some $I \subset R$. To be precise, put

Then $I$ is a two-sided ideal. Now pick some $A \in \operatorname{Mat}_n(R)$. Let $E_{ij}$ be the element whose is $1$ on its $(i,j)$-th element and $0$ everywhere else. For any matrix $A=(a_{ij})$, we have

Therefore if $A \in J$, then $a_{11} \in A$ and in particular,

for all $j,k$. Therefore $J \subset \operatorname{Mat}_n(I)$. Conversely, for any $a \in I$, we can find $A=(a_{ij}) \in J$ such that $a=a_{11}$. Now $aE_{i\ell}=E_{i1}AE_{1\ell} \in J$. Note a matrix $A=(a_{ij}) \in \operatorname{Mat}_n(I)$ can be written in the form $\sum_{i,\ell}a_{i\ell}E_{i\ell}$ where $a_{i\ell} \in I$. This proves that $\operatorname{Mat}_n(I) \subset J$. $\square$

It follows that a matrix algebra over a division ring or a field is semisimple. But let’s head back to where we were.

The direct sum (or product because it is finite) of matrix algebras over division rings

To conclude, we have the Wedderburn-Artin theorem.

Theorem 15 (Wedderburn-Artin).$R$ is a semisimple ring if and only if it can be written as a direct sum (or product because they are the same when finite) of matrix algebras over some division rings

Since the opposite of a division ring is a division ring, we also have

Corollary 16.A ring $R$ is semisimple if and only if $R^{op}$ is.

Now back to representation theory. But it can be extremely hard: we have no idea about the division ring. However, when the ring is algebraically closed, there is no problem. Note some author also use *skew field* in place of division ring.

Proposition 17.Let $K$ be an algebraically closed field and $D$ be a finite dimensional division ring over $K$, then $D \cong K$.

*Proof.* Pick $a \in D$ that is not $0$. Note the map $\rho_a:x \mapsto ax$ is a $K$-linear map. Since $K$ is algebraically closed, $\rho_a$ has at least one eigenvalue, say $\lambda$. It follows that

for some nonzero $x$ where $e$ is the unit of $D$. Since $D$ is a division ring, we have $a=\lambda{e}$. We actually established an isomorphism $a \mapsto \lambda$ and therefore $D \cong K$. $\square$

If you have studied Banach algebra theory, you will realise that this nothing but Gelfand-Mazur theorem (see any book in functional analysis that discusses Banach algebra, for example, *Functional Analysis* by W. Rudin). In infinite dimensional space we have to consider the topology of the field and the algebra.

Therefore we can now state Maschke’s theorem in the finest way possible:

Theorem 18 (Maschke).Let $G$ be a finite group, and $K$ be an algebraically closed field whose characteristic does not divide the order of $G$, thenThose $n_i$ are uniquely determined. In particular, $n_1^2+\cdots+n_r^2=|G|$.

*Algebra Revised Third Edition*, Serge Lang.*Abstract Algebra*, Pierre Antoine Grillet.*Linear Representation of Finite Groups*, Jean-Pierre Serre

in a different style.

Again, if we consider the map

we will see that $\ker\Phi=(X^2+Y^2-1)$ and therefore

Following the same step as in the previous post, we can show that $R’=\mathbb{C}[\cos{x},\sin{x}]$ is Dedekind. However, the map

shows that

(Proposition 1)

The localisation of a UFD is a UFD, hence we see $\mathbb{C}[\sin{x},\cos{x}]$ is a UFD. There are other ways to do it. For example, we can directly put $\mathbb{C}[\sin{x},\cos{x}]=\mathbb{C}[e^{ix},e^{-ix}]$. And this is even quicker. As another way, since $\cos{x}=\frac{e^{ix}+e^{-ix}}{2}$ and $\sin{x} = \frac{e^{ix}-e^{-ix}}{2i}$, all trigonometric polynomials can be decomposed into the following form

where $P(X) \in \mathbb{C}[X]$. Conversely, All elements of the form $e^{-inx}P(e^{ix})$ is in $\mathbb{C}[\cos{x},\sin{x}]$ and therefore we have an isomorphism

Note it follows that $T^{-1}$ maps to $\cos{x}-i\sin{x}$.

Now we return to the identity

In $\mathbb{R}[\cos{x},\sin{x}]$, since $\sin{x}$, $1-\cos{x}$, $1+\cos{x}$ are all irreducible, or more precisely, elements of the form $a+b\sin{x}+c\cos{x}$ are irreducible where $(b,c) \ne (0,0)$, we see $\mathbb{R}[\cos{x},\sin{x}]$ is a UFD. In fact, we can deduce the fact that $R$ is not a UFD by the fact that $Cl(R) \cong \mathbb{Z}/2\mathbb{Z}$, i.e., the ideal class group is nontrivial (corollary 3.22).

However, since $R’$ is a UFD, $\sin^2{x}=(1-\cos{x})(1+\cos{x})$ tells us *nothing*. We need to figure out why and what is going on. To work with it we consider the form $R’=\mathbb{C}[T,T^{-1}]$. What are irreducible elements in this ring? We will make use of the fact that $\mathbb{C}$ is algebraically closed (why not!). Since $T$ and $T^{-1}$ are units in this ring, we can use them to modify the degree of an element. More precisely, as an application of the fundamental theorem of classical algebra,

$P(T)=\sum_{j=m}^{n}a_jT^{j}$ (you should be reminded of Laurent series!) is irreducible where $m,n \in \mathbb{Z}$ if and only if $Q(T)=T^{-m}P(T)$ is irreducible. However, $Q(T) \in \mathbb{C}[T]$ is irreducible if and only if $Q$ is of degree $1$), which is equivalent to say that $n-m=1$ in $P(T)$.

Therefore irreducible elements in the form $aT^m+bT^{m+1}$ where $a,b \ne 0$ . Dropping $bT^m$ because it is a unit, we obtain a finer result:

(Proposition 2)Irreducible elements of $R’$ is of the form

With this being said, $\sin{x}$, $1-\cos{x}$ and $1+\cos{x}$ are all *not* irreducible. For example, for $\sin{x}$ we actually have

We can find some obvious facts about these two rings. For example, $R$ is a free $\mathbb{R}[\cos{x}]$-algebra with basis $\{1,\sin{x}\}$ (note all $\sin^nx$ of even degree can be transformed into $\cos{x}$ by the relation $\sin^2{x}=1-\cos^2{x}$). Likewise $R’$ is a free $\mathbb{C}[\cos{x}]$-algebra with basis $\{1,\sin{x}\}$. We can also write $R’$ as $R \oplus iR$ or $R[i]$. That is, $R’$ is a free $R$-algebra with basis $\{1,i\}$. These are quite elementary and don’t touch the structure of polynomial pretty much. Now we touch it by studying the quotient field of $R$ and $R’$ respectively.

Treating $R$ as a free $\mathbb{R}[\cos{x}]$-algebra, we can write any polynomial $f(\cos{x},\sin{x})$ as

where $P,Q \in \mathbb{R}[X]$. For simplicity we write $f=P+Q\sin{x}$. Suppose we now have $f=P_1+Q_1\sin{x}$ and $g=P_2+Q_2\sin{x}$ with $g \ne 0$, then

Therefore every element of $K(R)$ can be written in the form $U(\cos{x})+V(\cos{x})\sin{x}$ where $U,V \in \mathbb{R}(\cos{x})$, the rational field of $\cos{x}$ over $\mathbb{R}$. Since $\sin^2{x} \in \mathbb{R}(\cos{x})$, we obtain:

(Proposition 3)The quotient field of $R$ isLikewise,

can be proved in exactly the same way.

Since $R$ is Dedekind, it is integrally closed in $K(R)$. But what about its relation with $K(R’)$? For this we have an elegant result:

(Proposition 4)$R’$ is the integral closure of $R$ in $K(R’)$.

*Proof.* Let $C$ be the closure of $R$ in $K(R’)$. Note $K(R’)=K(R)[i]$. For any $f+ig \in C$, we see $f \in R$ and $g \in R$ and hence $f+ig \in R’$ because $f,g \in K(R)$ and $R$ is integrally closed. Therefore $C \subset R’$. Conversely, any $f+ig \in R’$ is in $C$ because $f,g \in R \subset C$ and $i \in C$. Therefore $R’ \subset C$. $\square$

*We are using the notation that Hartshorne used in his book Algebraic Geometry.*

Put $f(X,Y)=X^2+Y^2-1$, then $Y=Z(f)$ is an irreducible affine curve in the affine space $A^2_{\mathbb{C}}$. This curve is non-singular everywhere because the matrix

has rank $1$. The coordinate ring $A(Y)$ is exactly $R’$.

Let $P$ be a point on $Y$, which, by Hilbert’s Nullstelensatz, corresponds to a unique maximal ideal $\mathfrak{m}_P \subset A(Y)\cong R’$. Since $R’$ is a PID, and by proposition 2, $\mathfrak{m}_P=(\cos{x}+i\sin{x}+a)$ where $a \ne 0$. Hence $P$ corresponds to a nonzero complex number $a$.

(Proposition 5)Every point $P$ on the curve $Z(X^2+Y^2-1)$ corresponds to a unique nonzero complex number $a \in C^\ast$.

Since $Y$ is nonsingular, it also follows that $\dim_{\mathbb{C}}\mathfrak{m}/\mathfrak{m}^2=\dim R’=1$ for all maximal ideal of $R’$. This is to say, the tangent space is always of dimension $1$ as a $\mathbb{C}$-vector space, or $2$ as a $\mathbb{R}$-space. Besides, if we localise it at $\mathfrak{m}_P$, we see $\mathcal{O}_{P,Y} \cong R’_{\mathfrak{m}_P}$ is always a regular local ring.

*Introduction to Commutative Algebra*, M. F. Atiyah & I. G. MacDonald.*Algebraic Geometry*, Robin Hartshorne.*Commutative Ring Theory and Applications*, edited by Marco Fontana, Salah-Eddine Kabbaj and Sylvia Wiegand.