The classic version of Chinese remainder theorem tells us that we can find solutions out of modulus relations. You may have seen this poem when you were young.

有物不知其數，三三數之剩二，五五數之剩三，七七數之剩二。問物幾何？

Translation:

There are certain things whose number is unknown. If we count them by threes, we have two left over; by fives, we have three left over; and by sevens, two are left over. How many things are there?

This poem can be translated into finding the solution of an equation:

In modern language, we consider the integer ring $\mathbb{Z}$ (Z for Zahlen in German). Ideals $(3)$, $(5)$ and $(7)$ are pairwise coprime (or comaximal), and as a result, the map

is considered. The poem is all about finding a solution to the pre-image of $(2 +(3), 3+(5), 2+ (7))$. The Chinese remainder theorem tells us that, the map is surjective. Actually $233$ is such an element.

Nevertheless, what really matters here is about rings and ideals. That is why we try revisit the Chinese Remainder theorem in the language of ring theories. Although the importance of the classic version should never been ignored, we should also see it further.

We will study the Chinese remainder theorem in a ring, assuming the ring is commutative or assume something much weaker. We will also see a special case in Dedekind domains. We try our best to make assumptions as few as possible.

We want to apply restrictions as little as possible. Let $A$ be a ring that is not necessarily commutative and does not necessarily contain a unit. A lot of things developed in ring theory will fail here, but we can still consider **direct product** of rings and **coprime** (or comaximal) ideals. Two ideals $\mathfrak{a}$ and $\mathfrak{b}$ of $A$ are coprime if $\mathfrak{a}+\mathfrak{b}=A$. We will do the Chinese remainder theorem on two levels of abstraction. Throughout, when discussing ideals, we are all talking about two-sided ideals.

When the ring has a unit, we have an easy view of the intersection and product of ideals

Proposition 1.Let $A$ be a ring with unity, $\mathfrak{a}$ and $\mathfrak{b}$ two ideals of $A$. If $\mathfrak{a}$ and $\mathfrak{b}$ are coprime, i.e. $\mathfrak{a}+\mathfrak{b}=A$, thenIn particular, when $A$ is commutative, one always has $\mathfrak{a} \cap \mathfrak{b} = \mathfrak{ab}$.

*Proof.* The last statement follows from the relation that $\mathfrak{ab}=\mathfrak{ba}$, therefore it suffices to prove the first relation. Notice that there exists $x \in \mathfrak{a}$ and $y \in \mathfrak{b}$ such that $x+y=1$. As a result, for any $a \in \mathfrak{a} \cap \mathfrak{b}$, one has

Conversely, since both $\mathfrak{a}$ and $\mathfrak{b}$ are two-sided ideals, we see $\mathfrak{ab} \subset \mathfrak{a} \cap \mathfrak{b}$ and $\mathfrak{ba} \subset \mathfrak{a} \cap \mathfrak{b}$, so is their sum. $\square$

Let $A$ be a ring with unity. Consider a finite number of ideals $\mathfrak{a}_1,\dots,\mathfrak{a}_n$. Define a homomorphism

We do not assume that these $\mathfrak{a}_i$ are pairwise coprime just yet. We will see what happens when they are.

Theorem 1.For the homomorphism $\phi$ defined above,

- $\phi$ is injective if and only if $\bigcap_{i=1}^{n}\mathfrak{a}_i=(0)$.
- If the $\mathfrak{a}_i$ are pairwise coprime and $A$ is commutative, then $\prod_{i=1}^{n}\mathfrak{a}_i=\bigcap_{i=1}^{n}\mathfrak{a}_i$.
- $\phi$ is surjective if and only if the $\mathfrak{a}_i$ are pairwise coprime.

*Proof.* The first statement follows from the fact that $\ker\phi=\bigcap_{i=1}^{n}\mathfrak{a}_i=(0)$.

For the second statement, according to proposition 1, this equality holds for $n=2$. Now suppose that $n>2$ and this statement holds for $\mathfrak{a}_1,\dots,\mathfrak{a}_{n-1}$. Let $\mathfrak{b}=\prod_{i=1}^{n-1}\mathfrak{a}_i=\bigcap_{i=1}^{n-1}\mathfrak{a}_i$, then we will show that $\mathfrak{a}_n+\mathfrak{b}=A$ and therefore

Notice that $\mathfrak{a}_i$ and $\mathfrak{a}_n$ are pairwise coprime for all $1 \le i \le n-1$. Therefore, in particular, for each of these $i$, we have equation $x_i+y_i =1$ where $x_i \in \mathfrak{a}_i$ and $y_i \in \mathfrak{b}_i$. Here we use the fact that $1 \in A$. Commutativity of $A$ is also used because proposition 1 shows us that without commutativity we cannot even prove it for $n=2$.

From this equation we deduce that

Expanding the product on the right hand side, we see $\prod_{i=1}^{n-1}(1-y_i)=\prod_{i=1}^{n-1}x_i \equiv 1 \pmod{\mathfrak{a}_n}$. This implies that there exists $y \in \mathfrak{a}_n$ such that $\prod_{i=1}^{n-1}x_i+y=1$. Since $\prod_{i=1}^{n-1}x_i \in \mathfrak{b}$, we have shown that $\mathfrak{a}_n+\mathfrak{b}=A$.

For the third statement, we first assume that $\phi$ is surjective. It suffices to show, for example, $\mathfrak{a}_1$ and $\mathfrak{a}_2$ are coprime. There exists $x \in A$ such that $\phi(x)=(1+\mathfrak{a}_1,0+\mathfrak{a}_2,\dots,0+\mathfrak{a}_n)$. This shows us that $x \in \mathfrak{a}_2$ and $1-x \in \mathfrak{a}_1$. As a result,

hence these two ideals are coprime. This procedure applies to all other $\mathfrak{a}_i$ by merely modifying the index.

Conversely, assume the $\mathfrak{a}_i$ are pairwise coprime, then it suffices to show that there exists $x \in A$ such that $\phi(x)=(1+\mathfrak{a}_1,0+\mathfrak{a}_2,\dots,0+\mathfrak{a}_n)$ because we can apply the same procedure to all other $i$th component and $1$ can be then replaced with any other element $a$ of $A$ (for example, for this case, we replace $x$ with $ax$). All other cases can be generated by addition.

Since $\mathfrak{a}_1+\mathfrak{a}_i=(1)$ for all $i>1$, we have $u_i+v_i = 1$ where $u_i \in \mathfrak{a}_1$ and $v_i \in \mathfrak{a}_i$. Take

then we see $x \equiv 0 \pmod{\mathfrak{a}_i}$ for all $i>1$ but $x \equiv 1 \pmod{\mathfrak{a}_1}$. This $x$ will be mapped to $(1+\mathfrak{a}_1,0+\mathfrak{a}_2,\dots,0+\mathfrak{a}_n)$ as expected. $\square$

It also matters that ideals are two-sided, otherwise these products of $n-1$ terms will make less sense.

Corollary 1 (Chinese Remainder Theorem).If the $\mathfrak{a}_i$ are pairwise coprime, then $\phi$ is an isomorphism:If $A$ is commutative, then

We first need a clarification of what do we mean by “noncommutative ring”. When we say “let $A$ be a noncommutative ring”, we mean $A$ is not necessarily commutative (it can be but we do not care); when we say “$A$ is noncommutative”, we mean $A$ is not commutative. This is a matter of convenience.

What hurts most on this level is that we cannot use unity anymore this time (there can be a unit, but we should not care here). To work around this, we need to figure out what we essentially did when proving the surjectivity of $\phi$ in theorem 1. We find a suitable element in the first ideal, and a suitable element in the intersection of all other ideals.

For this reason, we replace being pairwise coprime with a different condition. It is easy to see that if all the ideals are pairwise coprime, then the following condition will be satisfied automatically.

Theorem 2 (Chinese Remainder Theorem).Let $A$ be a noncommutative ring, let $\mathfrak{a}_1,\dots,\mathfrak{a}_n$ be ideals such thatfor all $i=1,2,\dots,n$, then one has an isomorphism

induced by the map

*Proof.* We have $\ker\phi=\bigcap_{i=1}^{n}\mathfrak{a}_i$. Therefore it remains to show that our improved coprime condition implies that $\phi$ is surjective. Again, it suffices to show that the preimage of $(a+\mathfrak{a}_1,0+\mathfrak{a}_2,\dots,0+\mathfrak{a}_n)$ exists for all $r \in A$ (wait! One should not consider $1$ here!). Since $\mathfrak{a}_1+\bigcap_{i=2}^{n}\mathfrak{a}_i=A$, for any $a \in A$, there exists $a_1 \in \mathfrak{a}_1$ and $a_2 \in \bigcap_{i=1}^{n}\mathfrak{a}_2$ such that

As a result,

This proves the surjectivity of $\phi$. $\square$

We

Theorem 3 (Chinese Remainder Theorem).Let $A$ be a ring, let $\mathfrak{a}_1,\dots,\mathfrak{a}_n$ be let $\mathfrak{a}_1,\dots,\mathfrak{a}_n$ be ideals of $A$. If all the $\mathfrak{a}_i$ are pairwise coprime and for all $i$,then Chinese remainder theorem holds.

If $A$ has a unit, then the condition $A=\mathfrak{a}_i+A^2$ is automatically satisfied.

*Proof.* Assume first all the $\mathfrak{a}_i$ are coprime and satisfy $A=\mathfrak{a}_i+A^2$. It suffices to prove the case when $i=1$. Notice that

Therefore $A=\mathfrak{a}_1 + \mathfrak{a}_2 \cap \mathfrak{a}_3$. Suppose now $3<m < n$ and

then

By induction we have $\mathfrak{a}_1+\bigcap_{j=2}^{n}\mathfrak{a}_j = A$ as desired. The next follows from theorem 2. $\square$

Finally we offer an interesting version of Chinese remainder theorem, involving Dedekind domains.

Theorem 4 (Chinese Remainder Theorem).Let $\mathfrak{a}_1,\dots,\mathfrak{a}_n$ be ideals and let $x_1,\dots,x_n$ be elements in a Dedekind domain $A$. Then the system of congruences $x\equiv x_i \pmod{\mathfrak{a}_i}$ ($1 \le i \le n$) has a solution $x$ in $A$ if and only if $x_i \equiv x_j \pmod{\mathfrak{a}_i+\mathfrak{a}_j}$ whenver $i \ne j$.

*Proof.* Define $\phi:A \to \bigoplus_{i=1}^{n}A/\mathfrak{a}_i$ by $x \mapsto (x+\mathfrak{a}_1,\dots,x+\mathfrak{a}_n)$ and $\psi: \bigoplus_{i=1}^{n}A/\mathfrak{a}_i \to \bigoplus_{i<j}A/(\mathfrak{a}_i+\mathfrak{a}_j)$ such that the $(i,j)$-component of $\psi(x_1+\mathfrak{a}_1,\dots,x_n+\mathfrak{a}_n)$ is $x_i-x_j+\mathfrak{a}_i+\mathfrak{a}_j$, then the statement is equivalent to saying that the sequence of $A$-modules

is exact. It is clear that $\operatorname{im}\phi \subset \ker \psi$. We need to show the converse. Since exactness is a local property, we can assume that $A$ is a discrete valuation ring, meaning that there is an element $x \in A$ such that all ideals $\mathfrak{a}$ are of the form $(x^k)$. Therefore we can rearrange the $\mathfrak{a}_i$ so that $\mathfrak{a}_1 = (x^{k_1}) \supset \mathfrak{a}_2=(x^{k_2}) \supset \cdots \supset \mathfrak{a}_n=(x^{k_n})$. In this case, one has $k_1 \le k_2 \le \dots \le k_n$. In this case, we have $\mathfrak{a}_i+\mathfrak{a}_j=\mathfrak{a}_i$ whenever $i<j$.

Now pick $(x_1+\mathfrak{a}_1,\dots,x_n+\mathfrak{a}_n)\in \ker\psi$, then $x_i-x_j \in \mathfrak{a}_i+\mathfrak{a}_j=\mathfrak{a}_i$. Therefore $x_i \equiv x_j \pmod{\mathfrak{a}_i}$. In particular, taking $j=n$, we see

as is wanted. $\square$

If we replace $A$ with $\mathbb{Z}$ and the $\mathfrak{a}_i$ with ideals generated by coprime numbers, then we reach the classic version of the Chinese remainder theorem.

For non-commutative case, the reader can see this post

For another example, we consider the Lagrange interpolation, which is the special case of the Chinese Remainder theorem on the ring $\mathbb{R}[X]$ with extra consideration on evaluations. As you may guess, $\mathbb{R}$ can be replaced with other fields.

Find a polynomial $f(X) \in \mathbb{R}[X]$ passing three points $(1,2)$, $(2,-1)$ and $(3,2)$.

Consider ideals $\mathfrak{a}_1=(x-1)$, $\mathfrak{a}_2=(x-2)$ and $\mathfrak{a}_3=(3,2)$. Then, for example, $f(X) \equiv f_1(X) \pmod{\mathfrak{a}_1}$, where $f_1$ is a real polynomial that attains $2$ at the point $1$. The Chinese Remainder theorem tells us that such $f$ exists. This approach is seemingly an unnecessary overkill but it allows us to view a theorem in numerical analysis in an algebraic way.

We can also make things that are not related to polynomial rings a matter of polynomial rings. For example, via Chinese Remainder Theorem, we can compute $\mathbb{C} \otimes_\mathbb{R} \mathbb{C}$ in the following way:

There is a proof of Hilbert’s theorem 90 (of cyclic extensions) where the isomorphism above is used, with degrees higher than $2$: https://mathoverflow.net/a/21117/172944

Our last remark shows the geometrical interpretation of Chinese Remainder Theorem. In geometry, we consider the spectrum of a unitary commutative ring, which gives rise to an affine scheme. If $A=\prod_{i=1}^{n}A_i$ is a direct product of such rings $A_i$, then $\operatorname{Spec}A \cong \coprod_{i=1}^{n}\operatorname{Spec}(A_i)$. Conversely, using Chinese remainder theorem, we can show that if $\operatorname{Spec}A$ is a disjoint union of two spectrums, then $A$ is a direct product of two other rings. To be precise:

Let $A$ be a unitary commutative ring, then the following statements are equivalent:

- $X=\operatorname{Spec}(A)$ is disconnected.
- $A=A_1 \times A_2$ where neither of the two rings is a zero ring.
- There is an element $e \ne 0,1$ such that $e^2=e$, i.e. an idempotent element.

In particular, since in a local ring any idempotent element is $0$ or $1$, we see the spectrum of it has to be connected.

Michael Atiyah, I.G. MacDonald,

*Introduction to Commutative Algebra*Ravi Vakil,

*Foundations of Algebraic Geometry*

In another post we gave an exposition of irreducible representations of $SO(3)$, where we find ourselves studying harmonic polynomials on a sphere. In this post, we study another category of representations of $SO(3)$ that have its own significance in physics: projective representation. The result will be written as direct sums of irreducible representations of $SU(2)$ so the reader is advised to review the corresponding post. We recall that

Every irreducible unitary irreducible representation of $SU(2)$ is of the form $V_n$, where

Representation theory has a billion applications in physics. The group $SO(3)$ acts as the group of orientation-preserving orthogonal symmetries in $\mathbb{R}^3$ in an obvious way. The invariance under this action justifies the principle that physical reactions such as those between elementary particles should not depend on the observer’s vantage point.

Nevertheless, applications of representation theory in physics do not end at finite dimensional vector spaces. Put infinite dimensional vector spaces aside, we sometimes also need a class of vectors, in lieu of a single vector. For example, given a wavefunction $\psi$, we know $|\psi|^2$ has an interpretation of probability density. But then for any $\lambda \in S^1$, we see $|\lambda\psi|^2=|\psi|^2$, therefore $\lambda\psi$ and $\psi$ should be equivalent in a sense. By considering these equivalent classes, we find ourselves considering the projective space. Hence it makes sense to consider projective representations

where $G$ is compact. In this post we will assume $G=SO(3)$ and see how far we can go.

We begin with a simple group-theoretic lemma:

Lemma 1.One haswhere $C_n$ is the group of $n$th roots of unity, embedded into $SL(n,\mathbb{C})$ via the map $\xi \to \xi I$.

*Proof.* Consider the canonical map

This map is surjective. For any $B\mathbb{C}^\ast \in GL(n,\mathbb{C})/\mathbb{C}$, we have $B\mathbb{C}^\ast=\frac{1}{|B|}B\mathbb{C}^\ast$, and $\frac{1}{|B|}B \in SL(n,\mathbb{C})$ is the preimage of $B\mathbb{C}^\ast$.

On the other hand, we see $\ker p$ consists of scalar matrices in $SL(n,\mathbb{C})$. If $\lambda I \in SL(n,\mathbb{C})$, then $|\lambda I|=\lambda^n=1$, thereby $\ker p$ can be identified as $C_n$, proving the isomorphism. $\square$

Therefore, when studying a projective representation $G \to PGL(n,\mathbb{C})$, we are quickly reduced to special linear group, which is much simpler. Besides the group of $n$th roots of unity is much simpler than the group of nonzero complex numbers.

However, our simplification has not reach the end. We will see next that special linear group can be then reduced to special unitary group. Recall that a linear matrix representation of a compact Lie group is similar to a unitary one. The following lemma is a projective analogy.

Lemma 2.Let $G$ be a compact Lie group. Every homomorphism $\varphi:G \to PGL(n,\mathbb{C})=SL(n,\mathbb{C})/C_n$ is conjugate to a homomorphism whose image lies in $SU(n)/C_n$.

*Proof.* Consider the fibre product $H$ of $G$ and $SL(n,\mathbb{C})$ over $PGL(n,\mathbb{C})$:

Here, $p$ is the canonical projection of $SL(n,\mathbb{C}) \to SL(n,\mathbb{C})/C_n$. It suffices to show that $\tilde\varphi$ is similar to a unitary representation. Explicitly, one has

with $\tilde\varphi:(g,A) \mapsto A$ and $\tilde{p}:(g,A) \to g$. Since $G$ is compact and $\tilde{p}$ has finite kernel $C_n$, one sees that $H$ is a compact Lie group. Therefore the matrix representation $\tilde\varphi:H \to SL(n,\mathbb{C})$ is similar to a homomorphism $H \to SU(n)$, from which the lemma follows. $\square$

Therefore we are reduced to considering homomorphisms

for sake of this post. But we are not done yet. Having to deal with a quotient group is not satisfactory anyway.

Since $SU(n)$ is simply connected (see this video), the projections $SU(n) \to SU(n)/C_n$ are universal coverings. In particular, when $n=2$, we see $SU(2) \to SU(2)/C_2 = SO(3)$ is our well-known universal covering. If we lift $\varphi$ to universal coverings, we see ourselves dealing with $SU(2) \to SU(n)$. To be precise, we have the following commutative diagram (universal cover is a functor):

Dealing with $\tilde\varphi$ is much simpler. Physicists are more interested in unitary representations of the quaternion group $SU(2) = \operatorname{Spin}(3)$ rather than $SO(3)$, even though it looks more natural.

Now we are interested in finding all unitary representations that can be pushed down to a projective representation of $SO(3)$. We have two questions:

Question 1.Does it suffice to consider maps of the form $\tilde\varphi:SU(2)\to SU(n)$?

The answer is yes. Notice that every homomorphism $f:SU(2) \to U(1)$ has to be trivial. If not, then $\ker f$ should be a nontrivial proper normal subgroup of $SU(2)$, i.e. it has to be $C_2$. But $SU(2)/C_2 \cong SO(3)$. A contradiction.

Also recall the exact sequence

Let $g:SU(2) \to U(n)$ be any homomorphism, and consider the canonical projection $\pi:U(n) \to \frac{U(n)}{SU(n)}=U(1)$. We see $\pi \circ g$ sends any elements in $SU(2)$ to $1$, meaning the image of $SU(2)$ in $U(n)$ must bee in $SU(n)$. Therefore, by considering maps of the form $SU(2) \to SU(n)$, we are not missing anything. $\square$

Question 2.What should be considered in order to determine whether $\tilde\varphi:SU(2) \to SU(n)$ can be pushed down into a morphism $\varphi:SO(3) \to SU(n)/C_n$?

The answer is, one should consider the element $-I$. Let $p:SU(2) \to SO(3)$ be the universal covering, and let $p_n:SU(n) \to SU(n)/C_n$ be the corresponding universal covering. For $\tilde\varphi:SU(2) \to SU(n)$, we want to know when there will be a homomorphism $\varphi:SO(3) \to SU(n)/C_n$ such that $p_n \circ \tilde\varphi = \varphi \circ p$.

Notice that $p(-I)=I$, therefore, should $\varphi$ exist, one has $p_n \circ \tilde\varphi(-I)=e$, the identity in the group $SU(n)/C_n$, because one should have $\varphi(I)=e$. Hence $\tilde\varphi(-I) \in \ker p_n$. Therefore $\varphi(-I)$ can be identified as a $n$th root of unity. Since $\tilde\varphi(-I)\tilde\varphi(-I)=\tilde\varphi(I)$, we see $\tilde\varphi(-I)$ should also be identified as a square root of $1$. That is, $\tilde\varphi(-I)$ is either $\operatorname{id}$ or $-\operatorname{id}$. We discuss these two cases in the following question.

On the other hand, if $\tilde\varphi(-I)=\pm\operatorname{id}$, then one can verify that $p_n \circ \tilde\varphi \circ p^{-1}$ can be well-defined. Therefore $\tilde\varphi$ can be pushed down into a morphism of $SO(3)$ if and only if $\tilde\varphi(-I)=\pm\operatorname{id}$. $\square$

Question 3.Let $W=\bigoplus_n k_n V_n$ be a representation of $SU(2)$. What will happen if it can be pushed down to a projective representation of $SO(3)$?

Let $\tilde\varphi:SU(2) \to SU(n)$ be the homomorphism corresponding to $W$. We have known for certain that when $\tilde\varphi$ can be pushed down to $SO(3)$ if and only if $\tilde\varphi(-I)=\pm\operatorname{id}$.

If $\tilde\varphi(-I)=\operatorname{id}$, then all the $n$ have to be even because the action on the polynomials cannot be the identity when $n$ is odd. If $\tilde\varphi(-I)=-\operatorname{id}$, then all the $n$ have to be odd because when $n$ is even the action of $-I$ on the polynomials must be the identity.

To be more explicit, $W=\bigoplus_n k_{2n}V_{2n}$ or $W=\bigoplus_{n}k_{2n+1}V_{2n+1}$.

Theorem 1.The projective representations of $SO(3)$ are given up to conjugations of $SU(2)$ of the formdepending on whether $(-I)$ acts by $\operatorname{id}$ or $-\operatorname{id}$.

In brief, when thinking about projective representations of $SO(3)$, one thinks about polynomials in two variables whose terms are either all even or all odd.

When studying $\tilde\varphi:SU(2) \to SU(n)$, we see $\tilde\varphi(-I)$ can be identified both as a $n$th root of unity and a square root of unity. When $n$ is odd however, we see $\tilde\varphi(-I)$ cannot be identified as $-1$, i.e. $-I$ cannot act as $-\operatorname{id}$. Unexpectedly, number theory plays a small role here.

]]>Historically, thanks to Gauss, the quadratic reciprocity law marked the beginning of algebraic number theory. Therefore it it deserves a good dose of attention. However, whacking the definition to the beginner would not work pretty well.

We consider the equation

one of the simplest non-trivial multi-variable Diophantine equations that can be imagined. Trying to violently search all solutions without any precaution is not wise. Therefore we consider reductions first. In order that $x^2+by=a$ has a solution, it is necessary that

Then the Chinese remainder theorem inspires us to first look into the case when $b$ is a prime. The case when $b=2$ is excluded because we are only allowed to study whether $x$ is odd or even.

Therefore we study the equation $x^2=a$ in the finite field of order $p$ where $p \ne 2$. We give a very straightforward characterisation, which is seemingly stupid. For $a \in \mathbf{F}_p^\ast$, define

It is also convenient to define $\left(\frac{0}{p}\right)=0$.

This post will start with an equivalent form that is easier to compute (although less intuitive). Then we will demonstrate how to do basic computation of it, and finally we try to view it in a view of algebraic number theory.

We begin with a simplified formula for the Legendre symbol.

Proposition 1.$\left(\frac{a}{p}\right) = a^\frac{p-1}{2}$ for $a \in \mathbf{F}_p^\ast$.

N.B. The power on the right hand side is taken in the corresponding finite field. For example, $\left(\frac{2}{3}\right)=2=-1$ in $\mathbf{F}_3$. By abuse of language, we identify integers $1$ and $-1$ with its canonical images in the finite field.

*Proof.* Notice that $\left(\frac{a}{p}\right)=1$ if and only if $a \in \mathbf{F}_p^{\ast 2}$. The rest comes from the following lemma which deserves to be stated separately in a more general literature. $\square$

Lemma 1.Let $p$ be a prime (it can be $2$ this time) and $K$ a finite field of order $q=p^n$ for some $n>0$. Then

If $p=2$, then all elements of $K$ are squares.

If $p \ne 2$, then the squares $K^{\ast 2}$ of $K^\ast$ form a subgroup of index $2$ in $K^*$; it is the kernel of the map $p:x \mapsto x^{(q-1)/2}$ from $K^\ast $ to $\{-1,1\}$.

To be precise, one has an exact sequence of cyclic groups:

*Proof.* The first case is a restatement on the condition of Frobenius endomorphism being an automorphism (see nlab). For the second case, let $\overline{K}$ be an algebraic closure of $K$. If $x \in K^\ast$, let $y \in \overline{K}$ be a square root of $x$, i.e. such that $y^2=x$. We have

Since $x \in K^{\ast 2}$ if and only if $y \in K^\ast$, which is equivalent to $y^{q-1}=p(x)=1$, one has $\ker p = K^{\ast 2}$. The rest follows from elementary calculation. $\square$

You need to recall or study basic structures of finite fields. For example, a finite field is always of prime power order. All finite fields of order $p^n$ are isomorphic, uniquely determined as a subfield of an algebraic closure of $\mathbf{F}_p$, being the splitting field of the polynomial $X^{p^n}-X$. Besides, the multiplicative group of a finite field is cyclic.

From proposition 1 it follows that

Corollary 1.For any prime number $p \ne 2$,

- The Legendre symbol is multiplicative, i.e. $\left(\frac{ab}{p}\right)=\left(\frac{a}{p}\right)\left(\frac{b}{p}\right)$.
- $\left(\frac{1}{p}\right)=1$
- $\left(\frac{-1}{p}\right)=(-1)^{\varepsilon(p)}$ where $\varepsilon(p)=\frac{p-1}{2} \pmod{2}$.

The harder thing to compute is the Legendre symbol when $a=2$.

Proposition 2.One has $\left(\frac{2}{p}\right)=(-1)^{\omega(p)}$ where $\omega(p)=\frac{p^2-1}{8}\pmod{2}$.

We want to find a square root of $2$, i.e. an element $y$ satisfying $y^2=2$ so that computing $2^{(p-1)/2}$ becomes computing $y^{p-1}$. This is not a easy job, and we do not expect to find it inside the field. For example, $\left(\frac{2}{3}\right)=2=-1$ and $\left(\frac{2}{5}\right)=4=-1$, meaning there is not such a $y$ in $\mathbf{F}_3$ and $\mathbf{F}_5$. However, there is an easy way to generate a $2$. Consider $y=\alpha+\alpha^{-1}$, then $y^2=2+\alpha^2+\alpha^{-2}$. If we have $\alpha^2+\alpha^{-2}=0$ then we are done. To find such an $\alpha$, notice that $\alpha^2+\alpha^{-2}=0$ implies that $\alpha^4+1=0$. Therefore $\alpha^8=1$. It suffices to use a primitive $8$th root of unity.

*Proof.* Let $\alpha$ be a primitive $8$th root of unity in a algebraic closure $\Omega$ of $\mathbf{F}_p$. Then $y=\alpha+\alpha^{-1}$ verifies $y^2=2$. Since $\Omega$ has characteristic $p$, we have

Observe that if $p \equiv 1 \pmod{8}$, then $y^p=\alpha+\alpha^{-1}=y$ (we used the fact that $\alpha$ is an $8$th root of unity). Therefore $y^{p-1}=\left(\frac{2}{p}\right)=1$. This inspires us to determine $y^{p-1}$ through the relation between $p$ and $8$. As $p$ is odd, there are four possibilities: $p\equiv 1,3,5,7 \pmod{8}$.

If $p \equiv 7 \pmod{8}$, i.e. $p \equiv -1 \pmod{8}$, we still have $y^p=\alpha^{-1}+\alpha=y$. Therefore $\left(\frac{2}{p}\right)=1$ whenever $p \equiv \pm 1 \pmod{8}$. This discovery inspires us to study $p \equiv \pm 5 \pmod{8}$ together. When this is the case, one finds $y^p=\alpha^5+\alpha^{-5}$. Since $\alpha^4=\alpha^{-4}=-1$ (the primitivity of $\alpha$ matters here), $y^p$ becomes $-(\alpha+\alpha^{-1})=-y$. Cancelling $y$ on both sides, we obtain $y^{p-1}=\left(\frac{2}{p}\right)=-1$. To conclude,

It remains to justify the $\omega$ function as above. We need to find a function $\omega(n)$ such that $\omega(n) \equiv 0 \pmod 2$ when $p \equiv \pm 1 \pmod 8$ and $\omega(n) \equiv 1 \pmod 2$ when $p \equiv \pm 5 \pmod 8$. If we square $p$, we can ignore the difference of the signs:

Therefore, whether $(p^2-1)/8$ is odd or even is completely determined by the remainder of $p$ modulo $8$. We therefore put $\omega(p)=(p^2-1)/8$ and this concludes our proof. $\square$

To conclude in simpler form, we have

- $1$ is always a square root in a finite field.
- $-1$ is a square root in $\mathbf{F}_p$ if and only if $\frac{p-1}{2}$ is even, i.e., $p \equiv 1 \pmod{4}$.
- $2$ is a square root in $\mathbf{F}_p$ if and only if $\frac{p^2-1}{8}$ is even, i.e. $p \equiv \pm 1 \pmod{8}$.

Nevertheless, you do not want to compute $\left(\frac{37}{53}\right)$ by hand in the basic way as above. However, granted the following law, things is much easier.

Proposition 3 (Gauss’s Quadratic Reciprocity Law).For two distinct odd prime numbers $p$ and $\ell$, the following identity holds:Alternatively,

Instead of computing $37^{(53-1)/2}$ modulo $53$, we obtain $\left(\frac{37}{53}\right)$ in a much easier way

In other words, there exist solutions of the equation $x^2+53y=37$.

The proof is carried out by Gauss sum. The proof looks contrived, but one can see a lot of important tricks. We will use corollary 1.1 frequently.

*Proof.* Again, let $\Omega$ be an algebraic closure of $\mathbf{F}_p$, and let $\omega \in \Omega$ be a primitive $\ell$-th root of unity. If $x \in \mathbf{F}_\ell$, then $\omega^x$ is well-defined. Thus it is legitimate to write the “Gauss sum”:

Following the inspiration of what we have done in proposition 2, we study $y^2$ and $y^{p-1}$ again. The second one is quick.

Claim 1.$y^{p-1}=\left(\frac{p}{\ell}\right)$.

To show claim $1$, we notice that, as $\Omega$ is of characteristic $p$, we have

and therefore

Claim 2.$y^2 = \left(\frac{-1}{\ell}\right)\ell$ (by abuse of language, $\ell$ (the one outside the Legendre symbol) is used to denote the image of $\ell$ in the field $\mathbf{F}_p$.)

Notice that

Terms where $t=0$ are ignored safely. Then we notice that

For this reason we put

It follows that

It remains to compute the coefficients $C_u$. We see

When $u \ne 0$, the term $s=1-ut^{-1}$ runs over all of $\mathbf{F}_\ell$ except $1$. Therefore

since $[\mathbf{F}_\ell:\mathbf{F}_\ell^2]=2$ (read: exactly half of the elements of $\mathbf{F}_\ell$ are squares, the rest are not). Therefore

Recall that $1-\omega^\ell=(1-\omega)(1+\omega+\dots+\omega^{\ell-1})=0$. As $\omega$ is a primitive root, we see $\omega \ne 1$ and therefore $1+\omega+\dots+\omega^{-\ell-1}=0$. The result follows.

Finally, the reciprocity follows because

We invite the reader to expand the identity above using corollary 1 and see the result. $\square$

In this section we introduce some observation from a point of view of algebraic number theory without complete proofs.

Let $p$ be an odd prime, and $\zeta_p$ a primitive $p$-th root of unity. We have seen that the Gauss’s sum

satisfies the relation

Therefore the field $\mathbb{Q}(\sqrt{p})$ is contained in $\mathbb{Q}(\zeta_p)$ or $\mathbb{Q}(\zeta_p,i)$, depending on the sign of $\left(\frac{-1}{p}\right)$. The first one is a cyclotomic extension of $\mathbb{Q}$ by definition. The second one is not, but is a finite abelian extension of $\mathbb{Q}$. However, every finite abelian extension of $\mathbb{Q}$ is a subfield of a cyclotomic field. See this note. To conclude,

Every field of the form $\mathbb{Q}(\sqrt{p})$ lies in a subfield of $\mathbb{Q}(\zeta_m)$ for some $m>1$.

Solving the equation $x^2 \equiv a \pmod p$ also inspires us to look at the quadratic field $K=\mathbb{Q}(\sqrt{a})$. For simplicity we assume that $a$ is square free. If $\left(\frac{a}{p}\right)=1$, then there exists $\alpha \in \mathbb{Z}$ such that

This equation is interesting because on the left hand side we actually have the minimal polynomial of $K$, namely $p(x)=x^2-a$. The equation split completely modulo $p$. The relation above actually signifies that there exists prime ideals $\mathfrak{P}_1,\mathfrak{P}_2\subset \mathfrak{o}_k$ such that

where the **ramification indices** $e_1=e_2=1$. This says the prime ideal $(p)$ is **totally split** in $\mathbb{Q}(\sqrt{a})$. Conversely, if $(p)$ is totally split in $\mathbb{Q}(\sqrt{a})$ (where $(a,p)=1$ for sure), then $\left(\frac{a}{p}\right)=1$. To conclude,

The Legendre symbol $\left(\frac{a}{p}\right)=1$ if and only if $(p)$ totally splits in $\mathbb{Q}(\sqrt{a})$.

In fact, one can have a more profound observation of number fields which will imply the quadratic reciprocity law:

Let $\ell$ and $p$ be two distinct odd primes, $S_\ell=\left(\frac{-1}{\ell}\right)\ell$, then $(p)$ is totally split in $\mathbb{Q}(S_\ell)$ if and only if $(p)$ splits totally into two an even number of prime ideals in $\mathbb{Q}(\zeta_\ell)$.

Besides you may want to know about Artin’s reciprocity which generalised Gauss’s reciprocity, but that’s quite advanced topic (class field theory). This also shows the significance of quadratic reciprocity law.

- Jean-Pierre Serre, A Course in Arithmetic
- Jürgen Neukirch, Algebraic Number Theory
- Serge Lang, Algebraic Number Theory

In analysis and probability theory, one studies various sort of convergences (of random variables) for various reasons. In this post we study vague convergence, which is responsible for the convergence in distribution.

Vaguely speaking, vague convergence is the weakest kind of convergence one can expect (whilst still caring about continuity whenever possible). We do not consider any dependence relation between the sequence of random variables.

Throughout, fix a probability space $(\Omega,\mathscr{F},\mathscr{P})$, where $\Omega$ is the sample space, $\mathscr{F}$ the event space and $\mathscr{P}$ the probability function. Let $(X_n)$ be a sequence of random variables on this space. Each random variable $X_n$ canonically induces a probability space $(\mathbb{R},\mathscr{B},\mu_n)$ where $\mathscr{B}$ is the Borel $\sigma$-measure. To avoid notation hell we only consider the correspondence $X_n \leftrightarrow \mu_n$ where

Here comes the question: if $X_n$ tends to a limit, then we would expect that $\mu_n$ converges to a limit (say $\mu$) in some sense (at least on some intervals). But is that always the case? Even if the sequence converges, can we even have $\mu(\mathbb{R})=1$? We will see through some examples that this is really not the case.

Let $X_n\equiv\frac{(-1)^n}{n}$, then $X_n \to 0$ deterministically. For any $a>0$, the sequence $\mu_n((0,a))$ oscillates between $0$ and $1$, i.e. it ends up in the form

which does not converge at all. Likewise, for any $b<0$, the sequence $\mu_n((b,0))$ oscillates between $1$ and $0$.

As another example of convergence failure, consider $b_n<0<a_n$ with $a_n \to 0$ and $b_n \to 0$ as $n \to \infty$, and let $X_n$ be the sequence of random variables having the uniform distribution on $(b_n,a_n)$. We see $X_n \to 0$ a.e. but $\mu_n((b,0))$, which subjects to the area under $X_n$ between $b_n$ and $0$, may not converge at all, or converge to any number between $0$ and $1$.

We compose an example where $\mu_n$ converges to a measure $\mu$ where $\mu(\mathbb{R})<1$, preventing $\mu$ from being a probability measure. To do this, fix two positive numbers $\alpha$ and $\beta$ such that $\alpha+\beta<1$. Consider the sequence of random variables $X_n$ with

Then $X_n \to X$ where

Then $\lim_n\mu_n(\mathbb{R})=1-\alpha-\beta<1$. Atoms of this measure has escaped to $+\infty$ and $-\infty$.

These examples inspire us to develop a weaker sense of convergence, where we only take intervals into account (because we would expect continuous functions to play a role).

From the example above, it is clear that it is not expected to reach $=1$ all the time. Therefore we consider $\le 1$ instead, hence the following weakened version of probability measure and distribution function follow.

Definition 1.A measure $\mu$ on $\mathbb{R}$ is asubprobability measure(s.p.m.) if $\mu(\mathbb{R}) \le 1$. Correspondingly, one defines thesubdistribution function(s.d.f.) with respect to $\mu$ by

When $\mu(\mathbb{R})=1$, there is nothing new, but even if not, we do not have very much obstacles. Still we see $F(x)$ is a right continuous function with $F(-\infty)=0$ and $F(+\infty)=\mu(\mathbb{R}) \le 1$. For brevity’s sake, we will write $\mu((a,b])$ into $\mu(a,b]$ from now on, and similarly for other kind of intervals. We also put $\mu(a,b)=0$ when $a>b$ because why not.

Our examples also warn us that atoms are a big deal, which leads us to the following definition concerning intervals.

Definition 2.Notation being above, an interval $(a,b)$ is called acontinuous intervalif neither $a$ nor $b$ is an atom of $\mu$, i.e. if $\mu(a,b)=\mu[a,b]$.

One can test if $(0,1)$ is a continuous interval in our first group of examples. Now we are ready for the definition of vague convergence.

Definition 3.A sequence $(\mu_n)$ of s.p.m. is said toconverge vaguelyto an s.p.m. $\mu$ if there exists a dense subset $D \subset \mathbb{R}$ such thatWe write $\mu_n \xrightarrow{v} \mu$.

Let $(F_n)$ be the corresponding s.d.f. of $(\mu_n)$ and $F$ the s.d.f. of $\mu$. Then we say that $F_n$ converges vaguely to $F$ and write $F_n \xrightarrow{v} F$.

It is unfair that we are not building the infrastructure for random variables (r.v.) in this context. We introduce the following concept that you may have already studied in the calculus-based probability theory:

Definition 4.Let $(X_n)$ be a sequence of r.v.’s with corresponding cumulative distribution functions (c.d.f.) $(F_n)$. We say $X_n$ converge weakly or in distribution to $X$ (with corresponding c.d.f. $F$) if $F_n \xrightarrow{v} F$.

In calculus-based probability theory, one studies that $F_n(x) \to F(x)$ whenever $F$ is continuous at $x$. This definition is easier to understand but has skipped a lot of important details.

In this section we study vague convergence in a view of measure theory, utilising $\varepsilon-\delta$ arguments most of the time. We will see that the convergence looks quite similar to the convergence of $\mathbb{R}$.

Let $(a_n)$ be a sequence of real numbers, we can recall that

- If $(a_n)$ converges, then the limit is unique.
- If $(a_n)$ is bounded, then it has a bounded subsequence.
- If every subsequence of $(a_n)$ converges to $a$, then $a_n$ converges to $a$.

These results are natural in the context of calculus, but in the world of topology and functional analysis, these are not naturally expected. However, s.p.m.’s enjoy all three of them (for the second point, notice that an s.p.m. is bounded in a sense anyway.) Nevertheless, it would be too ambitious to include everything here and assume that the reader will finish it in one shot.

Theorem 1.Let $(\mu_n)$ and $\mu$ be s.p.m.’s. The following conditions are equivalent:(1) $\mu_n \xrightarrow{v} \mu$.

(2) For every finite interval $(a,b)$ and $\varepsilon>0$, there exists an $n_0(a,b,\varepsilon)$ such that whenever $n \ge n_0$,

(3) For every continuity interval $(a,b]$ of $\mu$, we have

When $(\mu_n)$ and $\mu$ are p.m.’s, the second condition is equivalent to the “uniformed” edition:

(4) For every $\delta>0$ and $\varepsilon>0$, there exists $n_0(\delta,\varepsilon)$ such that if $n \ge n_0$, then for

everyinterval $(a,b)$, possibly infinite:

*Proof.* We first study the equivalence of the first three statements. Suppose $\mu_n$ converges vaguely to $\mu$. We are given a dense subset $D$ of the real line such that whenever $a,b \in D$ and $a**0$, there are $a_1,a_2,b_1,b_2 \in D$ satisfying**

By vague convergence, there exists $n_0>0$ such that whenever $n \ge n_0$,

for $i=1,2$ and $j=1,2$. It follows that

and on the other hand

Combining both, the implication is clear.

Next, we assume (2), and let $(a,b)$ be a continuous interval of $\mu$, i.e we have $\mu(a,b)=\mu[a,b]$. The relation $\mu(a+\varepsilon,b-\varepsilon)-\varepsilon \le \mu_n(a,b)$ implies that

holds for all $\varepsilon>0$. On the other hand, as $\varepsilon \to 0$ on the left hand side, we see

Likewise, the relation $\mu_n(a,b) \le \mu(a-\varepsilon,b+\varepsilon)+\varepsilon$ yields

As $\varepsilon \to 0$ on the right hand side, we obtain

To conclude both sides, notice that

This forces $\mu_n(a,b)$ to converge to $\mu(a,b)$. This implies that $\mu_n(a,b] \to \mu(a,b]$. To see this, pick another continuous interval $(a,b’)$ which properly contains $(a,b)$. Then $(b,b’)$ is another continuous interval. It follows that

Assume (3). Notice that the set of atoms $A$ of $\mu$ has to be at most countable, therefore $D=\mathbb{R} \setminus A$ is dense in $\mathbb{R}$. On the other hand, $(a,b]$ is a continuous interval if and only if $a,b \in D$. This implies (1).

The arguments above also shows that when discussing vague convergence, one can replace $(a,b]$ with $(a,b)$, $(a,b]$ or $[a,b]$ freely, as long as $(a,b)$ is a continuous interval. It also follows that $\mu_n(\{a\}) \to 0$.

For (4), as (4) implies (2) (by taking $\delta=\varepsilon>0$), it remains to show that (3) implies (4) assuming that $\mu_n$ and $\mu$ are p.m.’s. Indeed, it suffices to prove it on a finite interval, and we will firstly justify this action. Let $A$ denote the set of atoms of $\mu$. First of all we can pick integer $n>0$ such that $\mu(-n,n) > 1-\frac{\varepsilon}{4}$ (that is, the interval is so big that the measure is close to $1$ enough). Pick $\alpha,\beta \in A^c$ such that $a \le -n$ and $b \ge n$ (this is possible because $A^c$ is dense). For the interval $(\alpha,\beta)$, we can put a finite partition

such that $|a_{j+1}-a_j| \le \delta$ and $a_j \in A^c$ for all $j=1,\dots,\ell-1$. Therefore, we have

By (3), there exists $n_0$ depending on $\varepsilon$ and $\ell$ (thereby $\delta$) such that

for all $n \ge n_0$. Adding over all $j$, replacing the endpoint with open interval, we see

It follows that

(This is where being p.m. matters.) Therefore when $n \ge n_0$ and discussing $\mu(a,b)$ versus $\mu_n(a,b)$, ignoring $(a,b) \setminus (a_1,a_\ell)$ results only in an error of $<\frac{\varepsilon}{2}$. Therefore it suffices to assume that $(a,b) \subset (a_1,a_\ell)$ and show that

Since $(a,b) \subset (a_1,a_\ell)$, there exists $j,k$ with $1 \le j \le k < \ell$ such that

This concludes the proof and demonstrates why our specific choice of $a_j$ is important. $\square$

We cannot give a treatment of all three points above but the first point, the unicity of vague limit, is now clear.

Corollary 1 (Unicity of vague limit).Notation being in definition 3. If there is another s.p.m. $\mu’$ and another dense set $D’$ such that whenever $a,b \in D’$ and $a<b$, one has $\mu_n(a,b] \to \mu’(a,b]$, then $\mu$ and $\mu’$ are identical.

*Proof.* Let $A$ be the set of atoms of $\mu$ and $\mu’$; then if $a,b \in A^c$, one has $\mu_n(a,b] \to \mu(a,b]$ and $\mu_n(a,b] \to \mu’(a,b]$. Therefore $\mu(a,b]=\mu’(a,b]$. Since $A^c$ is dense in $\mathbb{R}$, them two must be identical. $\square$

Let $G$ be a locally compact abelian group (for example, $\mathbb{R}$, $\mathbb{Z}$, $\mathbb{T}$, $\mathbb{Q}_p$). Then every irreducible unitary representation $\pi:G \to U(\mathcal{H}_\pi)$ is one dimensional, where $\mathcal{H}_\pi$ is a non-zero Hilbert space, in which case we take it as $\mathbb{C}$. It follows that $\pi(x)(z)=\xi(x)z$ for all $z \in \mathbb{C}$ where $\xi \in \operatorname{Hom}(G,\mathbb{T})$, viewing $\mathbb{T}$ as the unit circle in the complex plane. Such homomorphisms are called (unitary) **characters**, and we denote all characters of $G$ by $\widehat{G}$, call it the Pontryagin dual group. It should ring a bell about representation theory in finite groups. For convenience, instead of $\xi(x)$, we often write $\langle x,\xi \rangle$. We also write $\langle x,\xi\rangle\langle y,\xi \rangle=\langle x+y ,\xi\rangle$, and the following examples will remind the reader the reason.

Some easily accessible examples are:

- $\widehat{\mathbb{R}} \cong \mathbb{R}$, with $\langle x,\xi \rangle = e^{2\pi i \xi x}$.
- $\widehat{\mathbb{T}} \cong \mathbb{Z}$, with $\langle z, n \rangle = z^n$.
- $\widehat{Z} \cong \mathbb{T}$, with $\langle n,z \rangle = z^n$.
- $\widehat{\mathbb{Z}/k\mathbb{Z}} \cong \mathbb{Z}/k\mathbb{Z}$, with $\langle m,n\rangle =e^{2\pi i m n / k}$.

But we want to show that

It is broken down into several steps. But it shall be clear that $\mathbb{Q}_p$ is a topological group with respect to addition.

Every $p$-adic number $x \in \mathbb{Q}_p$ can be written in the form

where $m \in \mathbb{Z}$, $x_j \in \{1,2,\dots,p-1\}$ for all $j$. We immediately define

and claim that $\xi_1$ is a character. Notice that the right hand side is always well-defined, because all summands when $j \ge 0$ contributes nothing as $\exp(2\pi i x_jp^j)=1$. That is to say, the right hand side can be understood as a finite product: when $m \ge 0$, i.e. $x \in \mathbb{Z}_p$, the pairing $\langle x, \xi \rangle = 1$; when $m<0$ however, $\langle x,\xi_1 \rangle = \exp\left( 2\pi i \sum_{j=m}^{-1}x_jp^j\right)$. Therefore it is legitimate to write

From this it follows immediately that

The function $\xi_1$ is continuous because it is continuous on $\mathbb{Z}_p$, being constant. Therefore it is safe to say that $\xi_1$ is a character with kernel $\mathbb{Z}_p$.

A quick thought would be, generating all characters out of $\xi_1$, something like $\xi_p$, $\xi_{1+p+p^2+\dots}$. But that might lead to a nightmare of subscripts. Instead, we try to discover as many as possible. For any $y \in \mathbb{Q}_p$, we define

In other words, $\xi_y$ is defined by $x \mapsto \langle xy,\xi_1\rangle$. Since multiplication is continuous, we see immediately that $\xi_y$ is a character, not very more complicated than $\xi_1$. We will show that this is all we need. To do this, we need to *characterise* all characters. Characters have the same image but their kernels differ. That is where we attack the problem.

For $\xi_y$ above, notice that $\langle x,\xi_y\rangle=1$ if and only if $xy \in \ker\xi_1=\mathbb{Z}_p$, i.e. $|xy|_p \le 1$. Therefore

We expect that all characters are of the form $\xi_y$. Therefore their kernels shall be like $\ker\xi_y$ naturally. Notice that for fixed $y$, we have $|y|_p=p^m$ for some $m \in \mathbb{Z}$. As a result $\ker\xi_y = \overline{B}(0,p^{-m})$. For this reason we have the following (more obscure) argument

Lemma 1.If $\xi \in \widehat{\mathbb{Q}}_p$, there exists an integer $k$ such that $\overline{B}(0,p^{-k}) \subset \ker\xi$.

*Proof.* Since $\xi$ is continuous, $\langle 0,\xi\rangle=1$ on the circle, there exists $k$ such that $\overline{B}(0,p^{-k}) \subset \xi^{-1}\{z \in \mathbb{T}:|z-1| < 1\}$ (this is to say the right hand side is an open set). But $\overline{B}(0,p^{-k})$ is a group (as $|\cdot|_p$ is non-Archimedean), therefore it maps into a subgroup of $\mathbb{T}$, which can only be $\{1\}$. $\square$

We cannot say the kernel of $\xi$ is exactly of the form $\overline{B}(0,p^{-k})$ yet, but we have a way to formalise them now. If $\overline{B}(0,p^{-k}) \subset \ker\xi$ for all $k$, then $\xi=1$ is the unit in $\widehat{\mathbb{Q}}_p$. Otherwise, for each $\xi$, there is a smallest $k_0$ such that $\overline{B}(0,p^{-k_0})\subset \ker\xi$ but $\overline{B}(0,p^{-k}) \not \subset \ker\xi$ whenever $k<k_0$. In another way around, we have $\langle p^{k_0-1},\xi\rangle \ne1$ but $\langle p^k,\xi\rangle=1$ whenever $k \ge k_0$. As one may guess, such $k_0$ subjects to the “size” of $\xi$. For convenience we study the case when $k_0=0$ first.

Lemma 2 (“Fourier series”).Suppose for given $\xi \in \widehat{\mathbb{Q}}_p$, $\langle 1,\xi \rangle = 1$ but $\langle p^{-1},\xi \rangle \ne 1$. There is a sequence $(c_j)$ taking values in $\{0,1,\dots,p-1\}$ such that $\langle p^{-k},\xi \rangle=\exp\left(2\pi i\sum_1^k c_{k-j}p^{-j}\right)$ for all $k=1,2,\dots$. In particular, $c_0 \ne 0$.

*Proof.* Put $\omega_k=\langle p^{-k},\xi\rangle$. Then $\omega_0=1$ but $\omega_k \ne 1$ for all $k \ge 1$. Since

each $\omega_{k+1}$ is a $p$-th root of $\omega_{k}$, and in particular $\omega_1$ is a $p$-th root of unity. There exists $c_0 \in \{1,\dots,p-1\}$ such that

and the overall formula for $\omega_k$ follows from induction. $\square$

One would guess that for the corresponding $k_0$, the “size” of $\xi$ should be $p^{k_0}$. This looks realistic, but will be tedious. Right now we still only study the case when $k_0=0$.

Lemma 3.Notation being in lemma 2, there exists $y \in \mathbb{Q}_p$ with $|y|_p=1$ such that $\xi = \xi_y$.

*Proof.* From lemma 2 we obtain a series $y=\sum_{j=0}^{\infty}c_jp^j$ with $c_0 \ne 0$. Then in particular $|y|_p=1$. By expanding the term, we see

It follows that $\langle x,\xi \rangle = \langle x,\xi_y \rangle$ for all $x \in \mathbb{Q}_p$. $\square$

Now we are ready to conclude our observation of the dual group.

Theorem.The map $\Lambda:y \mapsto \xi_y$ is an isomorphism of topological groups. Hence $\mathbb{Q}_p \cong \widehat{\mathbb{Q}}_p$.

*Proof.* First of all we study the algebraic isomorphism. First of all if $\xi_y=1$, then

Hence the map $\Lambda$ is injective. To show that $\Lambda$ is surjective, fix $\xi \in \widehat{\mathbb{Q}}_p$. By the comment below lemma 1, there is a smallest integer $k$ such that $\langle p^j,\xi \rangle = 1$. Then one considers the character $\eta$ defined by

It satisfies the condition in lemma 3, therefore there exists $z \in \mathbb{Q}_p$ such that $\eta=\xi_z$, and it follows that $\xi=\xi_{p^{-k}z}$.

Next we show that $\Lambda$ is a homeomorphism. Observe the following sets

ranging over $\ell \ge 1$ and $k \in \mathbb{Z}$. These sets constitute a local base at $1$ for $\widehat{\mathbb{Q}}_p$. We need to show that it corresponds to a local base of $\mathbb{Q}_p$ under the map $\Lambda$:

The image of the set $\{x:|x|_p \le p^k\}$ under $\xi_1$ is $\{1\}$ if $k \le 1$ and is the group of $p^k$-th roots of unity if $k>0$, and hence is contained in $\{z:|z-1|<\ell^{-1}\}$ if and only if $k \le 0$. It follows that $\xi_y \in N(\ell,k)$ if and only if $|y|_p \le p^{-k}$, i.e., $y \in \overline{B}(0,p^{-k})$. We are done. $\square$

]]>Let $p$ be a prime number. Then the space of $p$-adic numbers $\mathbb{Q}_p$ is a locally compact abelian group. This can be observed through the local basis

where $|\cdot|_p$ is the $p$-adic norm such that, whenever we write $r=p^mq$ such that $q$ is prime to $p$, we have $|r|_p=p^{-m}$.

We remind the reader that every locally compact abelian group $G$ admits a Haar measure, which is unique up to a scalar multiplication (proof). In this post, we try to find the Haar measure on $\mathbb{Q}_p$, which makes it possible to do harmonic analysis on it. For this reason, in future posts, we also find the dual group of $\mathbb{Q}_p$ as well as the dual measure.

Let us first recall the basic structure of $\mathbb{Q}_p$. Every element is in the form of Laurent series

where $m \in \mathbb{Z}$ and $c_j \in \{0,\dots,p-1\}$. The ring of integers $\mathbb{Z}_p$ is exactly the closed disc of radius $1$ at the origin. That is, $\mathbb{Z}_p=\overline{B}(0,1)$ is a compact set. Let $\mu$ be an arbitrary Haar measure on $\mathbb{Q}_p$. Then $\mu(\mathbb{Z}_p)$ is non-zero and finite. We can therefore put

Then in particular $m_p(\mathbb{Z}_p)=1$. This is the canonical Haar measure we are looking for. But it would be hilarious to end the post here. We will give a closer look at it, at least on a $p$-adic level.

Recall that when studying the Lebesgue measure on $\mathbb{R}$ we have encountered some definition in the form of

where the infimum is taken over all countable collections of open intervals $\{I_j\}$ such that $\bigcup_j I_j \supset E$, and $\ell(I_j)$ is the length of $I_j$. In fact, we can actually write

On $\mathbb{Q}_p$, we write

The point here is how to express $V$. For this reason we need to recall some topology of $\mathbb{Q}_p$.

$\mathbb{Q}_p$ is a separable metric space. Therefore every open set $V$ is a union of open balls.

There is nothing special about this statement. The space has already been equipped with a norm. Besides, as $\mathbb{Q}$ is dense in $\mathbb{Q}_p$, we have nothing to worry about second countability.

Every closed ball of $\mathbb{Q}_p$ is open (hence we call them “balls” thereafter). Every point in the ball is a “centre”. If two balls intersect then one is contained in the other.

This is dramatically different from our understanding of $\mathbb{R}$ or $\mathbb{C}$. Notice that the $p$-adic norm $|\cdot|_p$ only takes the values from $p^k$ with $k \in \mathbb{Z}$ or $0$. For any $r>0$, there exists some $\varepsilon>0$ such that

The clopenness of balls in $\mathbb{Q}_p$ follows.

Next, recall that $|\cdot|_p$ is non-Archimedean. Consider $y \in \overline{B}(x,r)$. It follows that $|x-y|_p=|y-x|_p \le r$. On the other hand, for any $z \in \overline{B}(x,r)$, we have $|x-z|_p \le r$. Therefore $|y-z|_p \le r$. Hence $\overline{B}(x,r)\subset \overline{B}(y,r)$. Symmetrically we see $\overline{B}(y,r) \subset \overline{B}(x,r)$. Hence they are equal.

Let $\overline{B}(x,r)$ and $\overline{B}(x’,r’)$ be two balls that intersect, and without loss of generality we assume that $r \le r’$. Let $y$ be a point in the intersection, then we see

So far so good. We next try to compute the Haar measure of every ball.

Every ball of radius $p^k$ has measure $p^k$ ($k \in \mathbb{Z}$).

First of all notice that $\overline{B}(0,1)=\mathbb{Z}_p$, and we defined $m_p$ so that $m_p(\mathbb{Z}_p)=1$. Therefore every ball of the form $\overline{B}(x,1)$ has measure $1$. Next, notice that $\overline{B}(0,p^k)=p^{-k}\mathbb{Z}_p$ for all $k \in \mathbb{Z}$, it is necessary to unwind $\mathbb{Z}_p$ a little bit more.

We have

Therefore $\mathbb{Z}_p$ is a disjoint union of $p^k$ balls of radius $p^{-k}$ when $k>0$. Hence in this case,

as expected. In other words, for $k<0$, the ball $\overline{B}(0,p^k)$ has measure $p^k$.

For the counterpart, we notice that

which is to say $\overline{B}(0,p^k)=p^{-k}\mathbb{Z}_p$ is a disjoint union of $p^k$ balls of radius $1$. Hence its measure is $p^k$. This concludes our computation of balls in $\mathbb{Q}_p$.

Now we come back to the definition of $m_p$. Now every open set $V$ can be written in the form

The union is countable because $\mathbb{Q}_p$ is second countable. By combining intersecting balls, we can assume that the union is also disjoint. It follows that

Note: this should be understood in the sense of real series, instead of $p$-adic number, because $m_p$ takes the values in $\mathbb{R}$. So for an arbitrary measurable set, we have

]]>The notion of Cohen-Macaulay ring is sufficiently general to a wealth of examples in algebraic geometry, invariance theory and combinatorics; meanwhile it is sufficiently strict to allow a rich theory. The notion of Cohen-Macaulay is a workhorse of commutative algebra. In this post, we discover an important subclass of Cohen-Macaulay ring - regular local rings (one would be thinking about $k[[x_1,\dots,x_n]]$). See also “Why Cohen-Macaulay rings have become important in commutative algebra?” on MathOverflow.

It is recommended to be familiar with basic commutative algebra tools such as Nakayama’s lemma and minimal prime ideals.

The content can be generalised to modules to a good extent, but we are not doing it for sake of quick accessibility.

Definition 1.TheKrull dimensionof $R$, written as $\dim{R}$, is the supremum taking over the length of prime ideal chains

This definition was introduced to define dimension of affine varieties, in a global sense. Locally, we have the following definition.

Definition 2.Theembedding dimensionof $R$ is the dimension of a vector spaceThe right hand side is the dimension of a $k$-vector space $\mathfrak{m}/\mathfrak{m}^2$.

Let $R$ be the local ring of a complex variety $X$ at a point $P$, in other words we write $R=\mathcal{O}_{P,X}$. Then $(\mathfrak{m}/\mathfrak{m}^2)^\ast$ is the Zariski tangent space of $X$ at $P$, whose dimension equals $\dim_k(\mathfrak{m}/\mathfrak{m}^2)=emb.\dim{R}$. The embedding dimension of $R$ is the smallest integer $n$ such that some analytic neighbourhood of $P$ in $X$ embeds into $\mathbb{C}^n$. If this dimension equals the dimension of $X$, then $X$ is “smooth” at $P$. For this reason we define regular local ring.

Definition 3.The ring $R$ is calledregularif $\dim{R}=emb.\dim{R}$.

The most immediate intuitive example of regular local ring has to be rings of the form

where $K$ is a field. These kind of rings are regular local rings of Krull dimension $n$. As one would imagine, this ring contains much more information than $K[x_1,\dots,x_n]$. Power series in complex analysis is much more powerful than polynomials.

But by working on regular local rings, we are not essentially restricting ourselves into the ring of power series over a field. For example, the ring $\mathbb{Z}[X]_{(2,X)}$ is also a regular local ring, but it does not even contain a field.

Nevertheless, our primary model of regular local rings is still a ring of the form $A=K[[x_1,\dots,x_n]]$, which has a maximal ideal $\mathfrak{m}=(x_1,\dots,x_n)$. To study local rings in the flavour of $A$, we develop an analogy of elements $\{x_1,\dots,x_n\}$.

Definition 4.Aregular sequenceof $R$, also written as $R$-sequence, is a sequence $[x_1,\dots,x_n]$ of elements in $\mathfrak{m}$ such that $x_1$ is a non-zero-divisor in $R$, and such that given $i>1$, each $x_i$ is a non-zero-divisor in $R/(x_1,\dots,x_i)$.The

gradeof $R$, $G(R)$, is the longest length of regular sequences. If $G(R)=\dim{R}$, then $R$ is calledCohen-Macaulay.

It is quite intuitive that, for $A=K[[x_1,\dots,x_n]]$, the longest $R$-sequence has to be $[x_1,\dots,x_n]$, and therefore $A$ is Cohen-Macaulay. But such an argument does not bring us to the conclusion that quick. We will show later, anyway, every regular local ring is a Cohen-Macaulay ring.

Amongst many sequences, we are in particular interested in the sequence that are mapped onto a basis of the $k$-vector space $\mathfrak{m}/\mathfrak{m}^2$. We will show later that this “regular” sequence is indeed the *regular* sequence.

Proposition 1.Let $x_1,\dots,x_n$ be elements in $\mathfrak{m} \subset R$ whose images form a basis of $\mathfrak{m}/\mathfrak{m}^2$, then $x_1,\dots,x_n$ generate the maximal ideal $\mathfrak{m}$.

*Proof.* Nakayama’s lemma (8). Notice that as $R$ is local, the Jacobson radical is $\mathfrak{m}$. Besides, we take $I=M=\mathfrak{m}$. $\square$

Proposition 2.If $R$ is a regular local ring of dimension $n$ and $x_1, \dots,x_n \in \mathfrak{m}$ map to a basis of $\mathfrak{m}/\mathfrak{m}^2$, then $R/(x_1,\dots,x_i)$ is a regular local ring of dimension $n-i$.

*Proof.* By proposition 1, we have $\mathfrak{m}=(x_1,\dots,x_i,x_{i+1},\dots,x_n)$. The dimension of $R/(x_1,\dots,x_i)$ is determined by the chain in $R$:

which has length $n-i$. That is, $\dim R/(x_1,\dots,x_i)=n-i$. On the other hand, the maximal ideal $\mathfrak{M}$ in $R/(x_1,\dots,x_i)$ is isomorphic to $(x_{i+1},\dots,x_n)$, and $x_{i+1},\dots,x_n$ map to a basis of $\mathfrak{M}/\mathfrak{M}^2$, which consequently has dimension $n-i$. $\square$

It looks quite promising now that the sequence of basis can get everything down to earth, and we will show that in the following section.

Proposition 3.If $R$ is regular, then $R$ is an integral domain.

*Proof.* We use induction on $\dim R$. When $\dim{R}=0$ and $R$ is regular, $R$ has to be a field, hence an integral domain by definition. Next we assume that $\dim{R}>0$ and the argument has been proved for $\dim{R}-1$.

Pick $x \in \mathfrak{m} \setminus \mathfrak{m}^2$. Then this element map to a nonzero element in $\mathfrak{m}/\mathfrak{m}^2$. There exists a basis of $\mathfrak{m}/\mathfrak{m}^2$ that contains $\overline{x}$. Therefore by proposition 2, $R/(x)$ is a regular local ring of dimension $\dim{R}-1$, which is an integral domain by assumption. It follows that $(x)$ is prime.

We claim that there exists $x \in \mathfrak{m}/\mathfrak{m}^2$ such that $(x)$ has height $1$. If not, then for all $x \in \mathfrak{m}/\mathfrak{m}^2$, $(x)$ is a minimal. It follows that there exists finitely many minimal prime ideals $\mathfrak{p}_1,\dots,\mathfrak{p}_r$ such that

and consequently $\mathfrak{m} \subset \mathfrak{p}_j$ for some $1 \le j \le r$. It follows that $\dim{R}=0$, contradicting our assumption that $\dim{R}>0$. [Note: the prime avoidance allows at most two ideals to be non-prime. See P. 90 of Eisenbud’s Commutative Algebra, with a View Toward Algebraic Geometry.]

Thus, as our claim is true, we can write $\mathfrak{p} \subsetneq (x)$ with $\mathfrak{p}$ prime and $x \in \mathfrak{m} \setminus \mathfrak{m}^2$. We see $\mathfrak{p} \in (x^n)$ for all $n$ because if $p=rx^n \in \mathfrak{p}$, then $r \in \mathfrak{q} \subset (x)$ and therefore we write $r=sx$ or equivalently $p = sx^{n+1} \in (x^{n+1})$. When this is the case, we have $\mathfrak{p} \subset \bigcap_{n=1}^{\infty}(x^n)=0$. Therefore $R/\mathfrak{p}=R/0=R$ is an integral domain.

We now reach our conclusion of this post.

Proposition 4.If $R$ is regular and of Krull dimension $n$, any $x_1,\dots,x_n \in \mathfrak{m}$ mapping to a basis of $\mathfrak{m}/\mathfrak{m}^2$ gives rise to a regular sequence ($R$-sequence). Hence $G(R)=\dim{R}$ and therefore $R$ is Cohen-Macaulay.

*Proof.* As $G(R) \le \dim{R}$, once we have shown that $[x_1,\dots,x_n]$ is a regular sequence, we have $G(R) \ge \dim{R}$. To show it being a regular sequence, first of all notice that $x_1$ is non-zero-divisor (because $R$ is an integral domain). For any $i>1$, we see $R/(x_1,\dots,x_i)$ is a regular local ring of dimension $d-i$, hence again an integral domain. Therefore $x_{i+1},\dots,x_i$ are non-zero-divisors. $\square$

- Charles A. Weibel,
*An Introduction to Homological Algebra*. - M. F. Atiyah, I. G. MacDonald,
*Introduction to Commutative Algebra*. - David Eisenbud,
*Commutative Algebra: with a View Toward Algebraic Geometry*. - Winfred Bruns, Jürgen Herzog,
*Cohen-Macaulay Rings*

For example, If $f(X)=(X-1)^{100}$, we have $n_0(f)=1$. It seems we are diving into calculus but actually there is still a lot of algebra.

Theorem 1 (Mason-Stothers).Let $a(X),b(X),c(X) \in K[X]$ be polynomials such that $(a,b,c)=1$ and $a+b=c$. Then

*Proof.* Putting $f=a/c$ and $g=b/c$, we have

This implies

We interrupt the proof here for some good reasons. Rational functions of the form $f’/f$ remind us of the chain rule applied to $\log{x}$. In the context of calculus, we have $\left(\log{f(x)}\right)’=f’/f$. On the ring $K[x]$, we define $D:K[x] \to K[x]$ to be the formal derivative morphism. Then this endomorphism extends to $K(x)$ by

On $K(x)^\ast$ (read: the multiplicative group of the rational function field $K(x)$), we define the logarithm derivative

It follows that

Also observe that, just as in calculus, if $f$ is a constant function, then $D(f)=0$. Now we write

Then it follows that

Now we can be back to the proof.

*Proof (continued).* Since $K$ is algebraically closed,

We see, for example

Therefore

Likewise

Combining both, we obtain

Next, multiplying $f’/f$ and $g’/g$ by

which has degree $n_0(abc)$ (since $(a,b,c)=1$, these three polynomials share no root). Both $N_0f’/f$ and $N_0g’/g$ are polynomials of degrees at most $n_0(abc)-1$ (this is because $\deg h’=\deg h-1$ for non-constant $h \in K[X]$, while $f$ and $g$ are non-constant (why?); we assume $\operatorname{char} K=0$ for this reason).

Next we observe the degrees of $a,b$ and $c$. Since $a+b=c$, we actually have $\deg c \le \max\{\deg a,\deg b\}$. Therefore $\max\{\deg a,\deg b,\deg c\}=\max\{\deg a,\deg b\}$. From the relation

and the assumption that $(a,b)=1$, one can find polynomial $h \in K[X]$ such that

Taking the degrees of both sides, we see

This proves the theorem. $\square$

We present some applications of this theorem.

Corollary 1 (Fermat’s theorem for polynomials).Let $a(X),b(X)$ and $c(X)$ be relatively prime polynomials in $K[X]$ such that not all of them are constant, and such thatThen $n \le 2$.

Alternatively one can argue the curve $x^n+y^n=1$ on $K(X)$.

*Proof.* Since $a,b$ and $c$ are relatively prime, we also have $a^n$, $b^n$ and $c^n$ to be relatively prime. By Mason-Stothers theorem,

Replacing $a$ by $b$ and $c$, we see

It follows that

In this case $n<3$. $\square$

Corollary 2 (Davenport’s inequality).Let $f,g \in K[X]$ be non-constant polynomials such that $f^3-g^2 \ne 0$. Then

One may discuss cases separately on whether $f$ and $g$ are coprime, and try to apply Mason-Stothers theorem respectively, and many documents only record the proof of coprime case, which is a shame. The case when $f$ and $g$ are not coprime can be a nightmare. Instead, for sake of accessibility, we offer the elegant proof given by Stothers, starting with a lemma about the degree of the difference of two polynomials.

Lemma 1.Suppose $p,q \in K[X]$ are two distinct non-constant polynomials, then

*Proof.* Let $k(f)$ be the leading coefficient of a polynomial $f$. If $\deg p \ne \deg q$ or $k(p) \ne k(q)$, then $\deg(p-q)\ge \deg p \ge \deg p - n_0(p)-n_0(q)+1$ because $n_0(p) \ge 1$ and $n_0(q) \ge 1$.

Next suppose $\deg p = \deg q$ and $k(p)=k(q)$. If $(p,q)=1$, then by Mason-Stothers,

Otherwise, suppose $(p,q)=r$. Then $p/r$ and $q/r$ are coprime. Again by Mason-Stothers,

Therefore

On the other hand,

Combining all these inequalities, we obtain what we want. $\square$

*Proof (of corollary 2).* Put $\deg{f}=m$ and $\deg{g}=n$. If $3m \ne 2n$, then

because $m \ge 1$. Next we assume that $3m=2n$, or in other word, $m=2r$ and $n=3r$. By lemma 1, we can write

This proves the inequality. $\square$

One may also generalise the case to $f^m-g^n$. But we put down some more important remarks. First of all, Mason-Stothers is originally a generalisation of Davenport’s inequality (by Stothers). I personally do not think any mortal can find the original paper of Davenport’s inequality, but on [Shioda 04] there is a reproduced proof using linear algebra (lemma 3.1).

For more geometrical interpretation, one may be interested in [Zannier 95], where Riemann’s existence theorem is also discussed.

In Stothers’s paper [Stothers 81], the author discussed the condition where the equality holds. If you look carefully you will realise his theorem 1.1 is exactly the Mason-Stothers theorem.

- [Davenport 65] H. Davenport,
*On $f^3(t)-g^2(t)$*, 1965. (can someone find a digital copy of this paper?) - [Ma 84] R. C. Mason,
*Diophantine Equations over Function Fields*, 1984. - [Shioda 04] Tetsuji Shioda,
*The abc-theorem, Davenport’s inequality and elliptic surfaces*, 2004 (https://www2.rikkyo.ac.jp/web/shioda/papers/esdstadd.pdf) - [Stothers 81] W. W. Stothers,
*POLYNOMIAL IDENTITIES AND HAUPTMODULN*, 1981. (https://doi.org/10.1093/qmath/32.3.349) - [Zannier 95] Umberto Zannier (Venezia),
*On Davenport’s bound for the degree of $f^3-g^2$ and Riemann’s Existence Theorem*, 1995. (https://eudml.org/doc/206763)

The **Riemann zeta function** is widely known:

It is widely known mainly because of the celebrated hypothesis by Riemann that remains unsolved after more than a century’s attempts by mathematicians and 150 million attempts by computers:

Riemann Hypothesis:The non-trivial zeros of $\zeta(s)$ lie on the line $\Re(s)=\frac{1}{2}$.

People are told by pop-science how important and mysterious this hypothesis is. Or how disastrous if this would be solved one day. We can put them aside. A question is, why would Riemann ever think about the zero set of *such* a function? Perhaps something else? According to Riemann, the distribution function of primes

may be written as the series

where

and $\rho$ varies over all zeros of $\zeta(s)$. With these being said, once this *hypothesis* is proven true, we may have a much more concrete say of the distribution of prime numbers.

But this is not the topic of this post actually. The author of this post is not trying to prove the Riemann Hypothesis in a few pages, and nobody could. In this post, we investigate the analytic continuation of $\zeta(s)$ step-by-step, so that it will make sense to even think about evaluating the value at $\frac{1}{2}$. For the theory of analytic continuation, I recommend Real and Complex Analysis by Walter Rudin. Although in his book he went into modular function and Picard’s little theorem, instead of $\zeta(s)$ function and related.

A sketch of our procedure follows. The function $\zeta(s)$ does not ring a bell of power series, although a straightforward observation shows that $\sum_{n=1}^{\infty}\frac{1}{n^{s}}$ represents an analytic function in the half-plane $\Re(s)>1$. We need to develop tools that can easily be utilised into the study of the zeta function. Our two main tools are the Gamma function and Mellin transform.

With these two tools being developed, we will observe the so-called complete zeta function, which will bring us to THE continuation we are looking for.

We will carry out details more about non-trivial processes, instead of basic complex analysis. The reader may skip our preparation if they are familiar with these content.

The Gamma function should be studied in an analysis course:

In an analysis course we have studied some of this function’s important properties:

$\Gamma(1)=1$.

$\Gamma(s+1)=s\Gamma(s)$ (as a result $n!=\Gamma(n+1)$)

$\log\Gamma(s)$ is a convex function.

In this section however, we will study it in the context of complex analysis.

Theorem 1.The Gamma functionis well-defined as an analytic function in the half plane $\Re(s)>0$.

*Proof.* If we write $s=u+iv$ with $u>0$ and $t=e^c$, then

Therefore

The desired properties then follows. $\square$

Theorem 2.If $\Re(s)>0$, thenand as a consequence $\Gamma(n+1)=n!$ for $n=0,1,\dots$.

*Proof.* The second statement follows immediately because $\Gamma(1)=1$. For the first equation, we do a integration by parts:

Taking $\varepsilon \to 0$, we get what we want. $\square$

Now we are ready for the analytic continuation for the Gamma function, which builds a bridge to the analytic continuation of $\zeta$.

Theorem 3.The function $\Gamma(s)$ defined in theorem 1 admits an analytic continuation to a meromorphic function on the complex plane whose singularities are simple poles at $0,-1,\dots$, with corresponding residue $\frac{(-1)^n}{n!}$.

*Proof.* It suffices to show that we can $\Gamma$ to $\Re(s)>-m$, for all $m>0$ (hence we can extend it to all the complex plane). For this reason, we put $\Gamma_0(s)=\Gamma(s)$, which is defined in theorem 1. Then

is THE analytic continuation of $\Gamma_0(s)$ at $\Re(s)>-1$, with the only singularity $s=0$. Then

Likewise, we can define

Overall, whenever $m \ge 1$ is an integer, we can define

This function is meromorphic in $\Re(s)>-m$ and has simple poles at $s=0,-1,\dots,-m+1$ with residues

Successive applications of the lemma shows that $\Gamma_m(s)=\Gamma(s)$ for $\Re(s)>0$. Therefore we have obtained the analytic continuation through this process. $\square$

Throughout, unless specified, we will call the function obtained in the proof of theorem 3 as THE function $\Gamma$.

For all $s \in \mathbb{C}$, this function satisfies $\Gamma(s+1)=s\Gamma(s)$ as it should be.

Before we proceed, we develop two relationship between $\Gamma$ function and $\zeta$ function, in an attempt to convince the reader that we are not doing something for nothing.

If we perform a chance of variable $t=nu$ in the definition of $\Gamma(s)$, we see

This is to say,

Taking the sum of all $n$, we see

This relationship is beautiful, but may make our computation a little bit more complicated. However, if we get our hand dirty earlier, our study will be easier. Thus we will do a “uglier” change of variable $t \mapsto \pi n^2y$ to obtain

which implies

Either case, it is legal to change the order of summation and integration, because of the monotone convergence theorem.

Before we proceed, we need some more properties of the Gamma function.

Theorem 3 (Euler’s reflection formula).For all $s \in \mathbb{C}$,

Observe that this identity makes sense at all poles. Since $\Gamma(s)$ has simple poles at $0,-1,\dots$ meanwhile $\Gamma(1-s)$ has simple poles at $1,2,\dots$. As a result, $\Gamma(s)\Gamma(1-s)$ has simple poles at all integers, a property which is shared by $\pi/\sin\pi{s}$.

By analytic continuation, it suffices to prove it for $0<s<1$ because it will be extended to all of $\mathbb{C}$.

*Proof (real version).* First of all, observe that

On the other hand, we have

by taking $t=\frac{1}{1+y}$. Next we compute this integral for both $(0,1]$ and $[1,\infty)$.

(One shall be disturbed by our exchange of infinite sum and integration due to his or her study in analysis, but will be relaxed after being informed of Arzelà’s dominated convergence theorem of Riemann integrals.)

On the other hand, taking $y=\frac{1}{u}$, we see

Summing up, one has

It remains to show that $\pi\csc{\pi{x}}$ satisfies such an expansion as well, which is not straightforward because neither Fourier series nor Taylor series can drive us there directly. One can start with the infinite product expansion of $\sin{x}$ but here we follow an alternative approach. Notice that for $\alpha \in \mathbb{R} \setminus \mathbb{Z}$,

Taking $t=0$ and multiplying both sides by $\pi\csc\pi\alpha$, we obtain what we want. $\square$

*Proof (complex version).* By definition,

Here we performed a change-of-variable on $v=tu$. To compute the last integral, we put $u=e^x$, and it follows that

The integral on the right hand side can be computed to be $\frac{\pi}{\sin(1-s)\pi}=\frac{s}{\sin\pi s}$. This is a easy consequence of the residue formula (by considering a rectangle with centre $z=\pi i$, height $2\pi$ and one side being the real axis). $\square$

- In very much particular, by putting $s=1/2$, we obtain

As a bonus of this, by putting $t=u^2$, we also see

Therefore

To conclude this section, we mention the

Theorem 4 (Legendre duplication formula).

One can find a proof here.

Put $Z(s)=\pi^{-s/2}\Gamma(s/2)\zeta(s)$. It looks we are pretty close to a great property of $\zeta(s)$, if we can figure out $Z$ a little bit more, because $\pi^{-s/2}$ and $\Gamma(s/2)$ behave nicely. Therefore we introduce the Jacobi theta function

and try to deduce its relation with $Z(s)$.

To begin with, we first show that

Proposition 1.The theta function is holomorphic on the right half plane.

*Proof.* Let $C$ be a compact subset of the right half plane, and put $y_0=\inf_{s \in C}\Re(s)$. Pick any $n_0\ge \frac{1}{y_0}$. For $s=u+iv \in C$, we have $u \ge y_0$ and therefore

Therefore $\theta(s)$ converges absolutely on any compact subset of the right half plane. (Note we have used the fact that $n^2y_0 \ge |n|n_0y_0 \ge |n|$ when we are studying the convergence.) Since each term is holomorphic, we have shown that $\theta(s)$ itself is holomorphic. $\square$

Therefore it is safe to work around theta function. Now we are ready to deduce a functional equation.

Theorem 4.The theta function satisfies the functional equation on $\{\Re(s)>0\}$:

The square root is chosen to be in the branch with positive real part.

*Proof.* Consider the function $f(x)=e^{-\pi x^2}$. We know that this is the fixed point of Fourier transform (in this convenient form)

Now we put $g(x)=e^{-\pi u x^2}=f(\sqrt{u}x)$. The Fourier transform of $g$ is easy to deduce:

Since $g(x)$ is a Schwartz function, by Poisson summation formula, we have

The result follows from an analytic continuation. $\square$

For Schwartz functions, also known as rapidly decreasing functions, we refer the reader to chapter 7 of W. Rudin’s *Functional Analysis*.

Next we will study the behaviour of $\theta(s)$ on the half real line, especially at the origin and infinity. By the functional equation above, once we have a better view around the origin, we can quickly know what will happen at the infinity.

Proposition 2.When the real number $t \to 0$, the theta function is equivalent to $\frac{1}{\sqrt{t}}$. More precisely, when $t$ is small enough, the following inequality holds:

*Proof.* Rewrite $\theta(t)$ in the form

Therefore

Pick $t>0$ small enough so that

It follows that

$\square$

As a result, we also know how $\theta(t)$ behaves at the infinity. To be precise, we have the following corollary.

Corollary 1.The limit of $\theta(t)$ at infinity is $1$ in the following sense: when $t$ is big enough,

*Proof.* Let $t$ be big enough such that $\frac{1}{t}$ is small enough. That is,

according to proposition 2. The result follows. $\square$

To begin with, we introduce the Mellin transform. In a manner of speaking, this transform can actually be understood as the multiplicative version of the two-sided Laplace transform.

Definition.Given a function $f:\mathbb{R}_+ \to \mathbb{C}$, the Mellin transform of $f$ is defined to beprovided that the limit exists.

For example, $\Gamma(s)$ is the Mellin transform of $e^x$. Moreover, for the two-side Laplace transform

we actually have

where $\tilde{f}(x)=f(e^{-x})$.

Our goal is to recover $Z(s)$ through the Mellin transform of $\theta(x)$. As we have proved earlier,

It seems we can get our result really quick by studying $\frac{1}{2}(\theta(s)-1)$. However we see $\theta(x)$ goes to $\frac{1}{\sqrt{x}}$ rapidly as $x \to 0$, and goes to $1$ rapidly as $x \to \infty$. Convergence has to be taken care of. Therefore we add error correction terms. For this reason, we study the function

We use $s/2$ in place of $s$ because we do not want $\zeta$ to be evaluated at $2s$ all the time.

The partition $(0,1) \cup (1,\infty)$ immediately inspires one to use the change-of-variable $y=\frac{1}{x}$. As a result,

Now we are ready to compute $\phi(s)$. For the first part,

On the other hand,

Therefore

Therefore

In particular,

Expanding this equation above, we see

This gives

Finally we try to simplify the quotient above. By Legendre’s duplication formula,

By Euler’s reflection formula,

Combining these two equations, we obtain

Proposition 3.The Riemann Zeta function $\zeta(s)$ admits an analytic continuation satisfying the functional equation

In particular, since we also have

it is immediate that $\zeta(s)$ admits a simple hole at $s=1$ with residue $1$. Another concern is $s=0$. Nevertheless, since we have

there is no pole at $s=0$ (notice that $\phi(s)$ is entire). We now know a little bit more about the analyticity of $\zeta(s)$.

Corollary 2.The Riemann zeta function $\zeta(s)$ has its analytic continuation defined on $\mathbb{C} \setminus \{1\}$, with a simple pole at $s=1$ with residue $1$.

Now we are safe to compute $\zeta(-1)$.

But I believe, after these long computations of the analytical continuation, we can be confident enough to say that, when $\Re(s) \le 1$, the Riemann zeta function $\zeta(s)$ can not remotely be immediately explained by its ordinary definition $\sum_{n=1}^{\infty}n^{-s}$. Claiming $1+2+\dots=-\frac{1}{12}$ is a ridiculous abuse of language.

This post ends with Greg Gbur‘s criticism of the infamous Numberphile video.

]]>So why is this important? Part of what I’ve tried to show on this blog is that mathematics and physics can be extremely non-intuitive, even bizarre, but that they have their own rules and logic that make perfect sense once you get familiar with them. The original video, in my opinion, acts more like a magic trick than an explanation: it shows a peculiar, non-intuitive result and tries to pass it off as absolute truth without qualification. Making science and math look like incomprehensible magic does not do any favors for the scientists who study it nor for the public who would like to understand it.

Let $K$ be a field (in this post we mostly assume that $K \supset \mathbb{Q}$) and $n$ an integer $>1$ which is not divisible by the characteristic of $K$. Then the polynomial

is separable because its derivative is $nX^{n-1} \ne 0$. Hence in the algebraic closure $\overline{K}$, the polynomial has $n$ distinct roots, which forms a group $U$, and is cyclic. In fact, as an exercise, one can show that, for a field $k$, any subgroup $U$ of the multiplicative group $k^\ast$ is a cyclic group.

The generator $\zeta_n$ of $U$ is called the primitive $n$-th root of unity. Let $K=\mathbb{Q}$ and $L$ be the smallest extension that contains all elements of $U$, then we have $L=\mathbb{Q}(\zeta_n)$. As a matter of fact, $L/K$ is a Galois extension (to be shown later), and the cyclotomic polynomial $\Phi_n(X)$ is the irreducible polynomial of $\zeta_n$ over $\mathbb{Q}$. We first need to find the degree $[L:K]$.

Proposition 1.Notation being above, $L/K$ is Galois, the Galois group $\operatorname{Gal}(L/K) \cong (\mathbb{Z}/n\mathbb{Z})^\ast$ (the group of units in $\mathbb{Z}/n\mathbb{Z}$) and $[L:K]=\varphi(n)$.

Let’s first elaborate the fact that $|(\mathbb{Z}/n\mathbb{Z})^\ast|=\varphi(n)$. Let $[0],[1],\dots,[n-1]$ be representatives of $\mathbb{Z}/n\mathbb{Z}$. An element $[x]$ in $\mathbb{Z}/n\mathbb{Z}$ is a unit if and only if there exists $[y]$ such that $[xy]=[1]$, which is to say, $xy \equiv 1 \mod n$. Notice that $xy \equiv 1 \mod n$ if and only if $xy+mn=1$ for some $y,n \in \mathbb{Z}$, if and only if $\gcd(x,n)=1$. Therefore $|(\mathbb{Z}/n\mathbb{Z})^\ast|=\varphi(n)$ is proved.

The proof can be produced by two lemmas, the first of which is independent to the characteristic of the field.

Lemma 1.Let $k$ be a field and $n$ be not divisible by the characteristic $p$. Let $\zeta=\zeta_n$ be a primitive $n$-th root of unity in $\overline{k}$, then $(\mathbb{Z}/n\mathbb{Z})^\ast \supset \operatorname{Gal}(k(\zeta)/k)$ and therefore $[k(\zeta):k] \le \varphi(n)$. Besides, $k(\zeta)/k$ is a normal abelian extension.

*Proof.* Let $\sigma$ be an embedding of $k(\zeta)$ in $\overline{k}$ over $k$, then

so that $\sigma\zeta$ is also an $n$-th root of unity also. Hence $\sigma\zeta=\zeta^i$ for some $i=i(\sigma)$, uniquely determined modulo $n$. It follows that $\sigma$ maps $k(\zeta)$ into itself. This is to say, $k(\zeta)$ is normal over $k$. Let $\tau$ be another automorphism of $k(\zeta)$ over $k$ then

It follows that $i(\sigma)$ and $i(\tau)$ are prime to $n$ (otherwise, $\sigma\zeta$ would have a period smaller than $n$, implying that the period of $\zeta$ is smaller than $n$, which is absurd). Therefore for each $\sigma \in \operatorname{Gal}(k(\zeta)/k)$, $i(\sigma)$ can be embedded into $(\mathbb{Z}/n\mathbb{Z})^\ast$, thus proving our theorem. $\square$

It is easy to find an example with strict inclusion. One only needs to look at $k=\mathbb{R}$ or $\mathbb{C}$.

Lemma 2.Let $\zeta=\zeta_n$ be a primitive $n$-th root of polynomial over $\mathbb{Q}$, then for any $p \nmid n$, $\zeta^p$ is also a primitive $n$-th root of unity.

*Proof.* Let $f(X)$ be the irreducible polynomial of $\zeta$ over $\mathbb{Q}$, then $f(X)|(X^n-1)$ by definition. As a result we can write $X^n-1=f(X)h(X)$ where $h(X)$ has leading coefficient $1$. By Gauss’s lemma, both $f$ and $h$ have integral coefficients.

Suppose $\zeta^p$ is not a root of $f$. Since $(\zeta^p)^n-1=(\zeta^n)^p-1=0$, it follows that $\zeta^p$ is a root of $h$, and $\zeta$ is a root of $h(X^p)$. As a result, $f(X)$ divides $h(X^p)$ and we write

Again by Gauss’s lemma, $g(X)$ has integral coefficients.

Next we reduce these equations in $\mathbf{F}_p=\mathbb{Z}/p\mathbb{Z}$. We firstly have

By Fermat’s little theorem $a^p=a$ for all $a \in \mathbf{F}_q$, we also have

Therefore

which implies that $\overline{f}(X)|\overline{h}(X)^p$. Hence $\overline{f}$ and $\overline{h}$ must have a common factor. As a result, $X^n-\overline{1}=\overline{f}(X)\overline{h}(X)$ has multiple roots in $\mathbf{F}_p$, which is impossible because of our choice of $p$. $\square$

Now we are ready for Proposition 1.

*Proof of Proposition 1.* Since $\mathbb{Q}$ is a perfect field, $\mathbb{Q}(\zeta)/\mathbb{Q}$ is automatically separable. This extension is Galois because of lemma 1. By lemma 1, it suffices to show that $[\mathbb{Q}(\zeta):\mathbb{Q}] \ge \varphi(n)$.

Recall in elementary group theory, if $G$ is a finite cyclic group of order $m$ and $x$ is a generator of $G$, then the set of generators consists elements of the form $x^\nu$ where $\nu \nmid m$. In this occasion, if $\zeta$ generates $U$, then $\zeta^p$ also generates $U$ because $p \nmid n$. It follows that every primitive $n$-th root of unity can be obtained by raising $\zeta$ to a succession of prime numbers that do not divide $n$ (as a result we obtain exactly $\varphi(n)$ such primitive roots). By lemma 2, all these numbers are roots of $f$ in the proof of lemma 2. Therefore $\deg f = [L:K] \ge \varphi(n)$. Hence the proposition is proved. $\square$

We will show that $f$ in the proof lemma 2 is actually the cyclotomic polynomial $\Phi_n(x)$ you are looking for. The following procedure works for all fields where the characteristic does not divide $n$, but we assume characteristic to be $0$ for simplicity.

We have

where the product is taken over all $n$-th roots of unity. Collecting all roots with the same period $d$ (i.e., those $\zeta$ such that $\zeta^d=1$), we put

Then

It follows that $\Phi_1(X)=X-1$ and

This presentation makes our computation much easier. But to understand $\Phi_n$, we still should keep in mind that the $n$-th cyclotomic polynomial is defined to be

whose roots are all primitive $n$-th roots of unity. As stated in the proof of proposition 1, there are $\varphi(n)$ primitive $n$-th roots of unity, and therefore $\deg\Phi_n(X)=\varphi(n)$. Besides, $f|\Phi_n$. Since both have the same degree, these two polynomials equal. It also follows that $\sum_{d|n}\varphi(n)=n$.

Proposition 2.The cyclotomic polynomial is irreducible and is the irreducible polynomial of $\zeta$ over $\mathbb{Q}$, where $\zeta$ is a primitive $n$-th root of unity.

We end this section by a problem in number fields, making use of what we have studied above.

Problem 0.A number field $F$ only contains finitely many roots of unity.

*Solution.* Let $\zeta \in F$ be a root of unity with period $n$. Then $\Phi_n(\zeta)=0$ and therefore $[\mathbb{Q}(\zeta):\mathbb{Q}]$ has degree $\varphi(n)$. Since $\mathbb{Q}(\zeta)$ is also a subfield of $F$, we also have $\varphi(n) \le [F:\mathbb{Q}]$. Since $\{n:\varphi(n) \le [F:\mathbb{Q}]\}$ is certainly a finite set, the number of roots of unity lie in $F$ is finite. $\square$

We will do some dirty computation in this section.

Problem 1.If $p$ is prime, then $\Phi_p(X)=X^{p-1}+X^{p-2}+\dots+1$, and for an integer $\nu \ge 1$, $\Phi_{p^\nu}(X)=\Phi_p(X^{p^{\nu-1}})$.

*Solution.* The only integer $d$ that divides $p$ is $1$ and we can only have

For the second statement, we use induction on $\nu$. When $\nu=1$ we have nothing to prove. Suppose now

is proved, then $X^{p^\nu}-1=\prod_{r=0}^{\nu}\Phi_{p^r}(X)$ and therefore

Problem 2.Let $p$ be a prime number. If $p \nmid n$, then

*Solution.* Assume $p \nmid n$ first. It holds clearly for $n=1$. Suppose now the statement holds for all integers $<n$ that are prime to $p$. We see

Problem 3.If $n$ is an odd number $>1$, then $\Phi_{2n}(X)=\Phi_n(-X)$.

*Solution.* By problem 2, $\Phi_{2n}(X)=\Phi_n(X^2)/\Phi_n(X)$. To show the identity it suffices to show that

For $n=3$ we see

Now suppose it holds for all odd numbers $3 \le d < n$, then

The following problem would not be very easy without the Möbius inversion formula so we will use it anyway. Problems above can also be deduced from this formula. Let $f:\mathbb{Z}_{\ge 0} \to \mathbb{Z}_{\ge 0}$ be a function and $F(n)=\prod_{d|n}f(d)$, then the Möbius inversion formula states that

with

Putting $f(d)=\Phi_d(X)$, we see

Now we proceed.

Problem 4.If $p|n$, then $\Phi_{pn}(X)=\Phi_n(X^p)$.

*Solution.* By the Möbius inversion formula, we see

because all $d$ that divides $np$ but not $n$ must be divisible by $p^2$. Problem 2 can also follow from here.

Problem 5.Let $n=p_1^{r_1}\dots p_s^{r_s}$, then

*Solution.* This problem can be solved by induction on the number of primes. For $s=1$ it is problem 1. Suppose it has been proved for $s-1$ primes, then for

and a prime $p_s$, we have

On the other hand,

if we put $Y=X^{p_1^{r_1-1}\dots p_{s-1}^{r_{s-1}-1}}$. When it comes to higher degree of $p_s$, it’s merely problem 2. Therefore we have shown what we want.

Let $\zeta$ be a primitive $n$-th root of unity, put $K=\mathbb{Q}(\zeta)$ and $G$ the Galois group.. We will compute the norm of $1-\zeta$ with respect to the extension $K/\mathbb{Q}$. Since this extension is separable, we have

Since $G$ acts on the set of primitive roots transitively, $\{\sigma\zeta\}_{\sigma \in G}$ is exactly the set of primitive roots of unity, which are roots of $\Phi_n(X)$. It follows that

If $n=p^r$, then $N_\mathbb{Q}^K(1-\zeta)=\Phi_p(1^{p^{r-1}})=\Phi_p(1)=p$. On the other hand, if

then

]]>

Definition.For a polynomial with coefficients in a number field $K$the

heightof $f$ is defined to bewhere

is the

Gauss normfor any place $v$.

Here, $M_K$ refers to the canonical set of non-equivalent places on $K$. See first four pages of this document for a reference.

As one can expect, this can tell us about some complexity of a polynomial, just like how the height of an algebraic number tells us its complexity. Let us compute some examples.

Let us consider the simplest one

first. Since $|x^2-1|_v=1$ for all places $v$, the height of $f$ is a sum of $0$, which is still $0$.

Next, we take care of a polynomial that involves prime numbers

We see $|g(x)|_\infty=2$, $|g(x)|_2=2^{-(-2)}=4$, $|g(x)|_3=3^{-(-1)}=3$, and the Gauss norm is $1$ for all other primes. Therefore

Put $u(x,y)=\sqrt{2}x^2 + 3\sqrt{2}xy+5y^2+7 \in \mathbb{Q}(\sqrt{2})[x,y]$, we can compute its height carefully. Notice that $|\sqrt{2}|_v=\sqrt{|2|_v}$ for all places $v$ and we therefore have

If $f \in K[s_1,\dots,s_n]$ and $g \in K[t_1,\dots,t_m]$ are two polynomials in different variables, then as a polynomial in $K[s_1,\dots,s_n;t_1,\dots,t_m]$, $fg$ has height $h(f)+h(g)$. This is immediately realised once we notice that the height of a polynomial is equal to the height of the vector of coefficients in appropriate projective space. The identity $h(fg)=h(f)+h(g)$ follows from the Segre embedding.

But if variables coincide, things get different. For example, $h(x+1)=0$ but $h((x+1)^2)=2$. This is because we do not have $|fg|_\infty=|f|_\infty|g|_\infty$. Nevertheless, for non-Archimedean places, things are easier.

Gauss’s lemma.If $v$ is not Archimedean, then $|fg|_v=|f|_v|g|_v$.

*Proof.* First of all, it suffices to prove it for univariable cases. If $f$ and $g$ have multiple variables $x_1,\dots,x_n$, let $d$ be an integer greater than the degree of $fg$. Then the Kronecker substitution

reduces our study into $K[t]$. This is because, with such a $d$, this substitution gives a univariable polynomial with the same set of coefficients.

Therefore we only need to show that $|f(t)g(t)|_v=|f(t)|_v|g(t)|_v$. Without loss of generality we assume that $|f(t)|_v=|g(t)|_v=1$. Write $f(t)=\sum a_k t^k$ and $g(t)=\sum b_k t^k$, we have $f(t)g(t)=\sum c_jt^j$ where $c_j=\sum_{j=k+l}a_kb_l$.

We suppose that $|fg|_v<1$, i.e., $|c_j|_v<1$ for all $j$, and see what contradiction we will get. If $|a_j|=1$ for all $j$, then $|c_j|_v<1$ implies that $|b_k|_v<1$ for all $k$ and therefore $|g|_v<1$, a contradiction. Therefore we may assume that, without loss of generality, $|a_0|_v<1$ but $|a_1|_v=1$. Then, since

we have $|a_1b_{j-1}|_v=|b_{j-1}|_v<1$ for all $j \ge 1$. It follows that $|g(t)|_v<1$, still a contradiction. $\square$

So much for non-Archimedean case. For Archimedean case things are more complicated so we do not have enough space to cover that. Nevertheless, we have

Gelfond’s lemma.Let $f_1,\dots,f_m$ be complex polynomials in $n$ variables an set $f=f_1\cdots f_n$, thenwhere $d$ is the sum of the partial degrees of $f$, and $\ell_\infty(f)=\max_j|a_j|=|f|_\infty$.

Combining Gelfond’s lemma and Gauss’s lemma, we obtain

Is not actually given by Mahler initially. It was named after Mahler because he successfully extended it to multivariable cases in an elegant way. We will cover the original motivation anyway.

Say we want to find prime numbers large enough. Pierce came up with an idea. Consider $p(x) \in \mathbb{Z}[x]$, which is factored into

Consider $\Delta_n=\prod_i(\alpha^n_i-1)$. Then by some Galois theory, this is indeed an integer. So perhaps we may find some interesting integers in the factors of $\Delta_n$. Also, we expect it to grow slowly. Lehmer studied $\frac{\Delta_{n+1}}{\Delta_n}$ and observed that

So it makes sense to compare all roots of $p(x)$ with $1$. He therefore suggested the following function related to $p(x)$:

This number appears if we consider $\lim_{n \to \infty}\Delta_{n+1}/\Delta_n$.

He also asked the following question, which is now understood as **Lehmer conjecture**, although in his paper he addressed it as a problem instead of a conjecture:

Is there a constant $c$ such that, $M(p)>1 \implies M(p)>c$?

It remains open but we can mention some key bounds.

- Lehmer himself found that

and actually this is the finest result that has ever been discovered. It was because of this discovery that he gave his *problem*.

This polynomial has also led to the discovery of a large prime number $\sqrt{\Delta_{379}}=1, 794, 327, 140, 357$, although by studying $x^3-x-1$, we have found a bigger prime number $\Delta_{127}=3, 233, 514, 251, 032, 733$.

- Breusch (and later Smyth) discovered that if $p$ is monic, irreducible and nonreciprocal, i.e. it does not satisfy $p(x)=\pm x^{\deg p}f(1/x)$, then

- E. Dobrowlolski found that, t if $p(x)$ is monic, irreducible and noncyclotomic, and

has degree $d$ then

for some $c>0$.

Definition.For $f \in \mathbb{C}[x_1,\dots,x_n]$, theMahler measureis defined to bewhere $d\mu_i=\frac{1}{2\pi}d\theta_i$, i.e., $d\mu_1\dots d\mu_n$ corresponds to the (completion of) Harr measure on $\mathbb{T}^n$ with total measure $1$.

We see through Jensen’s formula that when $n=1$ this coincides with what we have defined before. Observe first that $M(fg)=M(f)M(g)$. Consider $f(t)=a\prod_{i=1}^{d}(t-\alpha_i)$, then

On the other hand, as an exercise in complex analysis, one can show that

Combining them, we see

Taking the logarithm we also obtain **Jensen’s formula**

We first give a reasonable and useful estimation of $M(f)$, which will be used to prove the Northcott’s theorem.

Definition.For $f(t)=a_dt^d+\dots+a_0$, the $\ell_p$-norm of $f$ is naturally defined to beFor $p=\infty$, we have $\ell_\infty(f)=\max_j|a_j|$.

Lemma 1.Notation being above, $M(f) \le \ell_1(f)$ and

*Proof.* To begin with, we observe those obvious ones. First of all,

Therefore

Next, by Jensen’s inequality

However, by Parseval’s formula, the last term equals

For the remaining inequality, we use Vieta’s formula

and therefore

for all $0 \le r \le d$. Replacing $|a_{d-r}|$ with $\ell_\infty(f)$, we have finished the proof. $\square$

Before proving Northcott’s theorem, we show the connection between Mahler measure and heights.

Proposition 1.Let $\alpha \in \overline{\mathbb{Q}}$ and let $f$ be the minimal polynomial of $\alpha$ over $\mathbb{Z}$. Thenand

*Proof.* Put $d=\deg(\alpha)$ and write

Choose a number field $K$ that contains $\alpha$ and is a Galois extension of $\mathbb{Q}$, with Galois group $G$. Then $(\sigma\alpha:\sigma \in G)$ contains every conjugate of $\alpha$ exactly $[K:\mathbb{Q}]/d$ times. Since $a_0,\dots,a_d$ are coprime, for any non-Archimedean absolute value $v \in M_K$, we must have $\max_i|a_i|_v=|f|_v=1$. Combining with Gauss’s lemma and Galois theory, we see

Now we are ready to compute the height of $\alpha$ to rediscover the Mahler’s measure. Notice that

We therefore obtain

The last term corresponds to what we have computed above about non-Archimedean absolute values so we break it down a little bit:

for some $u \mid \infty$, according to the product formula. On the other hand, for $v \mid \infty$,

All in all,

The second assertion follows immediately because

The set of non-zero algebraic integers of height $0$ lies on the unit circle, and they are actually roots of unit, by Kronecker’s theorem. However keep in mind that algebraic integers on the unit circle are not necessarily roots of units. See this short paper.

When it comes to algebraic integers of small heights, things may get complicated, but Northcott’s theorem assures that we will be studying a finite set.

Northcott’s Theorem.Given an integer $N>0$ and a real number $H \ge1$, there are only a finite number of algebraic integers $\alpha$ satisfying $\deg(\alpha) \le N$ and $h(\alpha) \le \log H$.

*Proof.* Let $\alpha$ be a algebraic integer of degree $d<N$ and height $h(\alpha) \le \log H$. Suppose $f(t)=a_dt^d+\dots+a_0 \in \mathbb{Z}[t]$ is the minimal polynomial of $\alpha$. Then lemma 1 shows us that

On the other hand, by proposition 1,

we have actually

This gives rise to no more than $(2\lfloor (2H)^d \rfloor+1)^{d+1}$ distinct polynomials $f$, which produces at most $d(2\lfloor (2H)^d \rfloor+1)^{d+1}<\infty$ algebraic integers. Ranging through all $d \le N$ we get what we want. $\square$

We also have the **Northcott property**, where we do not care about degrees. A set $L$ of algebraic integers is said to satisfy Northcott property if, for every $T>0$, the set

is finite. Such a set $L$ is said to satisfy **Bogomolov property** if, there exists $T>0$ such that the set

is empty. As a matter of elementary topology, Northcott property implies Bogomolov property. It would be quite interesting if $L$ is a field. This paper can be quite interesting.

Erico Bombieri, Walter Gubler,

*Heights in Diophantine Geometry*.Michel Waldschmidt,

*Diophantine Approximation on Linear Algebraic Groups, Transcendence Properties of the Exponential Function in Several Variables*.Chris Smyth,

*THE MAHLER MEASURE OF ALGEBRAIC NUMBERS: A SURVEY*.

Let $F$ be a non-Archimedean local field, meaning that $F$ is complete under the metric induced by a non-Archimedean absolute value $|\cdot|$. Consider the ring of integers

and its unique prime (hence maximal) ideal

The residue field $k=\mathfrak{o}_F/\mathfrak{p}$ is finite because it is compact and discrete. For compactness notice that $\mathfrak{o}_F$ is compact, and the canonical projection $\mathfrak{o}_F \to k$ is open. For discreteness, notice that $\mathfrak{p}$ is open, connected and contains the unit.

Let $f \in \mathfrak{o}_F[x]$ be a polynomial. Hensel’s lemma states that, if $\overline{f} \in k[x]$, the reduction of $f$, has a simple root $a$ in $k$, then the root can be lifted to a root of $f$ in $\mathfrak{o}_F$ and hence $F$. This blog post is intended to offer a well-organised proof of this lemma.

To do this, we need to use Newton’s method of approximating roots of $f(x)=0$, something like

We know that $a_n \to \zeta$ where $f(\zeta)=0$ at a $A^{2^n}$ speed for some constant $A$, in calculus (do Walter Rudin’s exercise 5.25 of *Principles of Mathematical Analysis* if you are not familiar with it, I heartily recommend.). Now we will steal Newton’s method into number theory to find roots in a non-Archimedean field, which is violently different from $\mathbb{R}$, the playground of elementary calculus.

We will also use induction, in the form of which I would like to call “double induction”. Instead of claiming that $P(n)$ is true for all $n$, we claim that $P(n)$ and $Q(n)$ are true for all $n$. When proving $P(n+1)$, we may use $Q(n)$, and vice versa.

This method is inspired by this lecture note, where actually a “quadra induction” is used, and everything is proved altogether. Nevertheless, I would like to argue that, the quadra induction is too dense to expose the motivation and intuition of this proof. Therefore, we reduce the induction into two arguments and derive the rest with more reasonings.

Hensel’s Lemma.Let $F$ be a non-Archimedean local field with ring of integers $\mathfrak{o}_F=\{\alpha \in F:|\alpha| \le 1\}$ and prime ideal $\mathfrak{p}=\{\alpha \in F:|\alpha|<1\}$. Let $f \in \mathfrak{o}_F[x]$ be a polynomial whose reduction $\overline{f} \in k[x]$ has a simple root $a \in k$, then $a$ can be lifted to $\alpha \equiv a \mod \mathfrak{p}$, such that $f(\alpha)=0$.

By simple root we mean $\overline{f}(a)=0$ but $\overline{f}’(a) \ne 0$. Before we prove this lemma, we see some examples.

Put $F=\mathbb{Q}_7$. Then $\mathfrak{o}_F=\mathbb{Z}_7$, $\mathfrak{p}=7\mathbb{Z}_7$ and $k=\mathbb{F}_7$. We show that square roots of $2$ are in $F$. Note $\overline{f}(x)=x^2-2=(x-3)(x+3) \in k[x]$, we therefore two simple roots of $\overline{f}$, namely $3$ and $-3$. Lifting to $\mathfrak{o}_F$, we have two roots $\alpha_1 \equiv 3 \mod 7\mathbb{Z}_7$ and $\alpha_2 \equiv -3 \mod 7\mathbb{Z}_7$, of $f$. For $\alpha_1$, we have

Hence we can put $\alpha=\sqrt{2}=3+7+2\cdot 7^2+6\cdot 7^3\cdots\in\mathbb{Z}_7 \subset \mathbb{Q}_7$. Likewise $\alpha_2$ can be understood as $-\sqrt{2}$. This expansion is totally different from our understanding in $\mathbb{Q}$ or $\mathbb{R}$.

Since $k$ is a finite field, we see $k^\times$ is a cyclic group of order $q-1$ where $q=p^n=|k|$ for some prime $p$. It follows that $x^{q-1}=1$ for all $x \in k^\times$. Therefore $f(x)=x^{q-1}-1$ has $q-1$ distinct roots in $k$. By Hensel’s lemma, $F$ contains all $(q-1)$st roots of unity. It does not matter whether $F$ is isomorphic to $\mathbb{Q}_p$ or $\mathbb{F}_q((t))$.

Pick any $a_0 \in \mathfrak{o}_F$ that is a lift of $a\mod\mathfrak{p}$. Define

then we claim that $a_n$ converges to the root we are looking for.

First of all, we need to show that $a_n \in \mathfrak{o}_F$, i.e., $|a_n| \le 1$ for all $n$. It suffices to show that $|f(a_{n-1})/f’(a_{n-1})| \le 1$. We firstly observe the case when $n=1$.

Since $\overline{f}(a)=0$ but $\overline{f}’(a) \ne 0$, we have $f(a_0) \in \mathfrak{p}$ but $f’(a_0)\not\in\mathfrak{p}$. As a result, $|f(a_0)|<1$ but $|f’(a_0)|=1$. As a result, $|f(a_0)/f’(a_0)|<1$, which implies that $f(a_0)/f’(a_0) \in \mathfrak{o}_F$ and therefore $a_1 \in \mathfrak{o}_F$.

By Taylor’s theorem.

for some $g_n \in \mathfrak{o}_F[x]$. When $n=1$, we see $g_1(a_1) \in \mathfrak{o}_F$ and as a result $|g_1(a_1)| \le 1$. Therefore

Since $a_1 \in \mathfrak{o}_F$, we also see that $f(a_1) \in \mathfrak{o}_F$ hence its absolute value is not greater than $1$. As a result $|f(a_1)/f’(a_1)| \le 1$, which implies that $a_2 \in \mathfrak{o}_F$.

This inspires us to claim the following *two* statements:

(a) $|f(a_n)| < 1$ for all $n \ge 0$.

(b) $|f’(a_n)|=|f’(a_0)|=1$ for all $n \ge 0$.

We have verified (a) and (b) for $n=0$ and $n=1$. Now assume that (a) and (b) are true for $n-1$, then, for $n$, we will verify as follows.

First of all, by (a) and (b) for $n-1$, we see $a_n \in \mathfrak{o}_F$.

Consider the Taylor’s expansion

where $h_n \in \mathfrak{o}_F[x]$. It follows that $|h_n(a_n)| \le 1$. Since $|f’(a_{n-1})|=1$, by (b) we actually have

To prove (b) for $n$, we consider the Taylor’s expansion

Notice that since $a_n \in \mathfrak{o}_F$, we have $f’’(a_{n-1}),g_n(a_n) \in \mathfrak{o}_F$. By (a) and (b) for $n-1$, we see

Hence

bearing in mind that for a non-Archimedean absolute value, $|x+y|=\max\{|x|,|y|\}$ iff $|x| \ne |y|$. Through this process we have also proved (b).

We need to show that $\{a_n\}$ is a Cauchy sequence. To do this, it suffices to show that $|f(a_n)| \to 0$ sufficiently quick. Recall in the proof of (a) we have shown that $|f(a_n)| \le |f(a_{n-1})|^2$ for all $n$. By applying this relation inductively, we see $|f(a_n)| \le |f(a_0)|^{2^n}$. Since $|f(a_0)|<1$, it follows that $|f(a_n)| \to 0$ as $n \to \infty$.

For any $\varepsilon>0$, there exists $N>0$ such that $|f(a_n)| <\varepsilon$ for all $n \ge N$. As a result, for all $m>n>N$, we have

Therefore $\{a_n\}$ is Cauchy. Since $F$ is complete, $a_n$ converges to some $\alpha \in \mathfrak{o}_F \subset F$ such that $f(\alpha)=\lim_{n \to \infty}f(a_n)=0$.

In local fields, congruence is determined by inequality. In fact, we only need to show that $|\alpha-a_0|<1$, which means that $\alpha-a_0 \in \mathfrak{p}$, and therefore $\alpha \equiv a \mod \mathfrak{p}$ as expected. To do this, we show by induction that $|a_n-a_0|<1$. For $n=1$ we see $|a_1-a_0|=|f_0|<1$.

Suppose $|a_{n-1}-a_0|<1$ then

Therefore $|\alpha-a_0|=\lim_{n \to \infty}|a_n-a_0|<1$, from which the result follows. $\square$

In fact we have not explicitly used the fact that $a$ is a simple root. We only used the fact that $|f(a_0)|<1$ but $|f’(a_0)|=1$. Moreover, what really matters here is that $|f(a_n)|$ converges to $0$ quick enough. Therefore $1$ may be replaced by a smaller constant. For this reason we introduce a stronger version of Hensel’s lemma.

Hensel’s lemma, stronger version.Let $F$ be a non-Archimedean local field with ring of integers $\mathfrak{o}_F$. Suppose there exists $a \in \mathfrak{o}_F$ such that $|f(a)|<|f’(a)|^2$, then there exists some $b \in \mathfrak{o}_F$ such that $f(b)=0$ and $|b-a|<|f’(a)|$.

Instead of asserting $|f’(a_n)|=1$ for all $n$, we claim that $|f’(a_n)|=|f’(a_0)|$ (as it should be!). Instead of asserting $|f(a_n)|<1$, we claim that $|f(a_n)| \le \lambda^{2^n}|f’(a_0)|$ where $\lambda=|f(a_0)|/|f’(a_0)|^2$. The proof will be nearly the same.

For example, we can find a square root of $257$ in $\mathbb{Z}_2 \subset \mathbb{Q}_2$. The polynomial $f(x)=x^2-257$ is reduced to $\overline{f}(x)=x^2-1=(x-1)^2$ in $\mathbb{F}_2[x]$, where $1$ is not a simple root. Therefore we cannot apply the original version of Hensel’s lemma to this polynomial. Nevertheless, we see $f(1)=-256$ and $f’(1)=2$. Therefore $|f(1)|=\frac{1}{2^8}$ while $|f’(1)|=\frac{1}{2}$. We can apply Newton’s method here to find a square root of $257$ without worrying about repeated roots.

There are a lot of variants of Hensel’s lemma, for example you can do exercise 10.9 of Atiyah-MacDonald. In fact, we later even have Henselian ring and Henselisation of a ring.

There are some other proofs of Hensel’s lemma in this post, for example, since Newton’s method can also be understood as a contraction mapping, we can also prove it using properties of contraction mapping (see K. Conrad’s note).

]]>The group $GL_2(\mathbb{F}_q)$ consists of invertible $2 \times 2$ matrices with entries in the finite field $\mathbb{F}_q$, where $q=p^n$ for some prime $p$ (throughout we exclude the case when $p=2$ because it can be quite difficult). As a $\mathbb{F}_p$-vector space, $\mathbb{F}_q$ has dimension $n$. The Galois group $G(\mathbb{F}_q/\mathbb{F}_p)$ is cyclic and is generated by the Frobenius map.

The field $\mathbb{F}_q$ itself is already pretty complicated, let alone a matrix group over it. In this post we try to follow Fulton-Harris’ idea on *Representation Theory: A First Course* to classify all irreducible representations of $G=GL_2(\mathbb{F}_q)$. To be specific, we are talking about group homomorphisms $\rho:G \to GL(V)$ where $V$ is a $\mathbb{C}$-vector space.

First of all we determine the cardinality of $G=GL_2(\mathbb{F}_q)$. Along the way, we will introduce some important subgroups.

The cardinality of $G$ is determined by the class formula, consider the canonical action on $\mathbb{P}^1(\mathbb{F}_q)$.

First of all, notice that $|\mathbb{P}^1(\mathbb{F}_q)|=q+1$. There are $q^2$ elements in $\mathbb{F}_q \times \mathbb{F}_q$, excluding the zero, we have $q^2-1$ remaining. Since $(r:s)=a(r:s)$ for all $a \in \mathbb{F}_q^\ast$, we divide $q^2-1$ by $|\mathbb{F}_q^\ast|=q-1$ to obtain the cardinality of the projective space.

The action of $G$ on $\mathbb{P}^1(\mathbb{F}_q)$ is defined canonically as follows:

In particular, $B$ is the isotropy group of the set $\{(1:0)\}$, because in this case, $(ar+bs:cr+ds)=(a:0)=(1:0)$. There are $(q-1)(q-1)q$ elements of $B$.

Since $G$ clearly acts on $\mathbb{P}^1(\mathbb{F}_q)$ transitively, by the class equation, we have

In general, the cardinality of $GL_n(\mathbb{F}_q)$ is $\prod_{k=0}^{n-1}(q^n-q^k)$. One can check this document.

We next consider the diagonal subgroup

Let $\mathbb{F}’=\mathbb{F}_{q^2}$ be the extension of $\mathbb{F}_q$ of degree $2$. We can certainly identify $GL_2(\mathbb{F}_q)$ as the group of $\mathbb{F}_q$-linear invertible automorphisms of $\mathbb{F}’$. Each $h \in (\mathbb{F}’)^\ast$ induces a $\mathbb{F}_q$-linear automorphism by multiplication, hence $h$ can be embedded into $GL_2(\mathbb{F}_q)$. The question is, how. Let $K = (\mathbb{F}’)^\ast$ be this subgroup. We write down the matrix representation explicitly.

Let $\varepsilon $ be a generator of the cyclic group $\mathbb{F}_q^\ast$, then $X^2-\varepsilon$ is irreducible in $\mathbb{F}_q[X]$. We therefore have $\mathbb{F}_{q^2} \cong \mathbb{F}_q[X]/(X^2-\varepsilon)$. We see $\{1,X\}$, or more precisely, $\{1,\sqrt\varepsilon\}$, is a basis of $\mathbb{F}_{q^2}$ as a vector space over $\mathbb{F}_q$. We can then identify $(\mathbb{F}’)^\ast$ as a subgroup $K$ of $G$ where

The isomorphism is given by

To make $K$ a subgroup of $G$, each entry must be in $\mathbb{F}_q$. That’s why we write $\varepsilon$ instead of $\sqrt\varepsilon$ in the definition of $K$.

*At the end of this section one can see a table of the result.*

To matrices, conjugacy gives rise to eigenvalues and Jordan canonical form. So we immediately come up with the three following forms:

For each $x$ and $y$, $a_x$, $b_x$ and $c_{x,y}$ represents three different conjugacy classes respectively, and they do not intersect. We will study these three families of conjugacy classes and see how far we can go. Spoiler: we will miss $\frac{q(q-1)}{2}$ conjugacy classes, which will be found in the subgroup $K$.

Conjugacy classes represented by $a_x$ is the easiest one. Since scalar matrices commutes with any matrix, for any invertible matrix $A$, we have

Therefore there is only one element in the conjugacy class represented by $a_x$. Ranging through all $x \ne 0$, we obtain $q-1$ such classes.

For Jordan canonical form like $b_x$, the story is different. Ranging through all $x \ne 0$, we again obtain $q-1$ such classes. Nevertheless, to determine the cardinality of each class, it is unrealistic to work only in the scope of matrices.

Let $\mathcal{C}=(b_x)$ be a conjugacy class. Let $G$ act on $\mathcal{C}$ by conjugation. The action is transitive: for $A,B \in \mathcal{C}$, there are invertible matrices $U$ and $V$ such that $U^{-1}AU=b_x=V^{-1}BV$, and therefore $A=(VU^{-1})^{-1}B(VU^{-1})$.

To determine the cardinality of $\mathcal{C}$, we use the class formula again. Suppose $A=(a_{ij})$ fixes $b_x$, i.e. $A^{-1}b_xA=b_x$, or $b_xA=Ab_x$, then

The equation above implies that $c=0$ and $a=d$. Therefore the isotropy group of $\{b_x\}$ is

It follows that $|\mathcal{C}|=|G|/|J|=(q^2-q)(q^2-1)/(q^2-q)=q^2-1$.

Let $\mathcal{D}=(c_{x,y})$ be a conjugacy class. Ranging through all $x,y \ne 0$ with $y \ne x$, then divide it by $2$, we obtain $\frac{(q-1)(q-2)}{2}$ conjugacy classes in the same form of $\mathcal{D}$. We divide it by $2$ because $c_{x,y}$ is conjugate to $c_{y,x}$ as they share the same eigenvalues.

We determine the cardinality of $\mathcal{D}$ in the same way as $\mathcal{C}$. The isotropy group of $\{c_{x,y}\}$ is $D$ in the introduction. Therefore $|\mathcal{D}|=|G|/|D|=q^2+q$.

Now let’s count how many conjugacy classes we have obtained:

We still need to find $\frac{q^2(q-1)^2}{2}$ elements. Look at subgroups we derived in the introduction. Subgroups like $B$, $N$ and $D$ all go down into Jordan canonical form immediately, but $K$ is not the case. Consider

Then the eigenvalues of $d_{x,y}$ are $x\pm \sqrt\varepsilon y$, none of which lies in $\mathbb{F}_q$. Therefore it has nothing to do with Jordan canonical form. We will explore the remainder of conjugacy classes of $G$ here.

Ranging through all $x$ and $y \ne 0$, then divide it by $2$, we obtain $\frac{q(q-1)}{2}$ conjugacy classes. We divide it by $2$ because $d_{x,y}$ and $d_{x,-y}$ are conjugate by any

Now let $\mathcal{E}=(d_{x,y})$ be a conjugacy class. Notice the isotropy group of $d_{x,y}$ is $K$, so we can obtain the cardinality $|\mathcal{E}|=|G|/|K|=(q-1)^2q(q+1)/(q^2-1)=q^2-q$. Now our search is complete.

Representative | Number of Elements in Class | Number of Classes |
---|---|---|

$a_x=\begin{pmatrix}x & 0 \\ 0 & x\end{pmatrix}$ | $1$ | $q-1$ |

$b_x = \begin{pmatrix}x & 1 \\ 0 & x \end{pmatrix}$ | $q^2-1$ | $q-1$ |

$c_{x,y} = \begin{pmatrix}x & 0 \\ 0 & y \end{pmatrix} (x \ne y)$ | $q^2+q$ | $\frac{(q-1)(q-2)}{2}$ |

$d_{x,y}=\begin{pmatrix} x & \varepsilon{y} \\ y & x \end{pmatrix}, y \ne 0$ | $q^2-q$ | $\frac{q(q-1)}{2}$ |

*These matrices would not frequently appear in the remainder of the post because it will mess up the format.*

There are $q-1+q-1+\frac{(q-1)(q-2)+q(q-1)}{2}=q^2-1$ conjugacy classes, so we need to find $q^2-1$ irreducible representations. Of course we cannot list down all of them. We will instead classify them with certain reasonings. A character table can be found in the next section.

Some computations are omitted because if not, this section would be unreadable. However, the author of this post has checked most of them on paper. The reader should find it easy to compute by themselves. For completed computation, one refers to this note. Note however the classification is a little bit different from here. Reading this first may help you to get the hang of it.

Recall how we find irreducible representations of $\mathfrak{S}_3$: consider permutation of a basis in a vector space of dimension $3$. We do a similar thing here. Let $G$ acts on $\mathbb{P}^1(\mathbb{F}_q)$ by permutation. This induces a $q+1$ dimensional representation $W$ because $\mathbb{P}^1(\mathbb{F}_q)$ has $q+1$ elements. It contains the trivial representation $U$. Let $V$ be the complement of $U$, i.e. $W=U \oplus V$, then $V$ has dimension $q$. Now we determine the character of $V$. Since $\chi_V=\chi_W-\chi_U=\chi_W-1$, we only need to calculate $\chi_W$, i.e. to see fixed points of the permutation on each conjugacy class.

$\chi_W(a_x)=q+1$. It fixes every point.

$\chi_W(b_x)=1$. It only fixes one point: $(1:0)$.

$\chi_W(c_{x,y})=2$. It fixes two points: $(1:0)$ and $(0:1)$.

$\chi_W(d_{x,y})=0$. If $d_{x,y}$ fixes $(a:b)$, then $a^2=\varepsilon b^2$, and this cannot happen.

Therefore we have

$a_x$ | $b_x$ | $c_{x,y}$ | $d_{x,y}$ | |
---|---|---|---|---|

$\chi_V$ | $q$ | $0$ | $1$ | $-1$ |

We see, $(\chi_V,\chi_V)=1$. Therefore $V$ is irreducible and we cannot decompose $W$ further. We have to find different approaches.

The Pontryagin dual of a group $H$ is defined to be

If $H$ admits a topology, we may want to eliminate non-continuous homomorphisms but it’s not our concern here because we only care about finite groups now, which admits discrete topology. Notice that if $H$ is finite and cyclic, then $\hat{H} \cong H$. We will use this fact right now.

Since $G$ can be pretty big, it is not realistic to study all eigenvalues of representations. Instead, we consider the Pontryagin dual of $\mathbb{F}_q^\ast$, which is again a finite cyclic group. For each of the $q-1$ elements in $\hat{H}$, $\alpha:\mathbb{F}_q^\ast \to S^1$, we have a one-dimensional representation $U_\alpha$ of $G$ defined by

Note the trivial representation is one of the $U_\alpha$, once one realises that $\alpha$ defined by $\alpha(x)=1$ for all $x \in \mathbb{F}_q^\ast$ is also a homomorphism into $S^1$.

Tensoring $U_\alpha$ with $V$, we obtain another family of irreducible representations $\{V_\alpha = V \otimes U_\alpha\}$. Note $V$ is one of the $V_\alpha$. The character table of them are easily computed.

$a_x$ | $b_x$ | $c_{x,y}$ | $d_{x,y}$ | |
---|---|---|---|---|

$U_\alpha$ | $\alpha(x)^2$ | $\alpha(x)^2$ | $\alpha(x)\alpha(y)$ | $\alpha(x^2-\varepsilon y^2)$ |

$V_\alpha$ | $q\alpha(x)^2$ | $0$ | $\alpha(x)\alpha(y)$ | $-\alpha(x^2-\varepsilon y^2)$ |

We have successfully determined $2(q-1)$ irreducible representations, still $(q-1)^2$ of them to be found. We now make use of those subgroups we have determined. For each $\alpha,\beta \in \widehat{\mathbb{F}_q^\ast}$, we have a new character of a representation:

Let $W’_{\alpha,\beta}$ be the representation of $B$ with character $\gamma_{\alpha,\beta}$, and let $W_{\alpha,\beta}=\operatorname{Ind}_B^G W_{\alpha,\beta}’$. We can quite easily (no, with a lot of dirty computation) write down the character table of $W_{\alpha,\beta}$.

$a_x$ | $b_x$ | $c_x$ | $d_x$ | |
---|---|---|---|---|

$W_{\alpha,\beta}$ | $(a+1)\alpha(x)\beta(x)$ | $\alpha(x)\beta(x)$ | $\alpha(x)\beta(y)+\alpha(y)\beta(x)$ | $0$ |

If $\alpha=\beta$, then $W_{\alpha,\beta}=U_\alpha \oplus V_\beta$ so not irreducible. However, if $\alpha \ne \beta$, then $W_{\alpha,\beta} \cong W_{\beta,\alpha}$ is irreducible, if one computes $(\chi,\chi)$. The dimension of $W_{\alpha,\beta}$ is $[G:B]=q+1$, and there are $\frac{1}{2}(q-1)(q-2)$ of them. Still there are $\frac{1}{2}q(q-1)$ irreducible representations to be found.

We haven’t used this subgroup yet so we first explore it in the same vein as attempt 3. We consider the the dual of $\mathbb{F}’ \cong K$. Each $\varphi:(\mathbb{F}’)^\ast \to \mathbb{C}^\ast$ also determines a representation on $K$ with character $\varphi$. One immediately think about $\operatorname{Ind}_K^G(\varphi)$, for which we simply write $\operatorname{Ind}(\varphi)$. It is easy to compute the character table so far.

$a_x$ | $b_x$ | $c_{x,y}$ | $d_{x,y}$ | |
---|---|---|---|---|

$\operatorname{Ind}(\varphi)$ | $q(q-1)\varphi(x)$ | $0$ | $0$ | $\varphi(\zeta)+\varphi(\zeta)^q$ |

We put $\zeta=x+y\sqrt\varepsilon \in K = (\mathbb{F}’)^\ast$. Since $\operatorname{Ind}(\varphi) \cong \operatorname{Ind}(\varphi^q)$, we obtain $\frac{1}{2}q(q-1)$ representations out of this, with the restriction that $\varphi^q \ne \varphi$. Nevertheless, we have $(\chi,\chi)=q$ if $\varphi^q=\varphi$, $(\chi,\chi)=q-1$ if $\varphi^q \ne \varphi$. We still need to work on it from different direction.

Let’s try to tensor what we have found. It is easy to see that $V_\alpha \otimes U_\gamma = V_{\alpha\gamma}$ and $W_{\alpha,\beta} \otimes U_{\gamma}=W_{\alpha\gamma,\beta\gamma}$. So we cannot find anything new here. But tensoring $W_{\alpha,\beta}$ and $V_\alpha$ gives us something quite different. We see for $\alpha \ne 1$,

$a_x$ | $b_x$ | $c_{x,y}$ | $d_{x,y}$ | |
---|---|---|---|---|

$V \otimes W_{\alpha,1}$ | $q(q+1)\alpha(x)$ | $0$ | $\alpha(x)+\alpha(y)$ | $0$ |

Let $\varphi \in \widehat{(\mathbb{F}’)^\ast}$ be a homomorphism such that $\varphi|_{\mathbb{F}_{q}^\ast}=\alpha$. Computing inner products gives us

We see $W_{\alpha,1}$ is contained in the representation determined by $\operatorname{Ind}(\varphi)$. We also see $W_{\alpha,1}$ is contained in $V \otimes W_{\alpha,1}$. Besides, $\operatorname{Ind}(\varphi)$ and $V \otimes W_{\alpha,1}$ has a lot of subrepresentations in common. The first guess would be that $\operatorname{Ind}(\varphi)$ is a subrepresentation of $V \otimes W_{\alpha,1}$. So, maybe we can find something we have been missing here. For this reason, consider the virtual character

We can compute that $(\chi_\varphi,\chi_\varphi)=1$. To see this is actually a real character, we compute the character table.

$a_x$ | $b_x$ | $c_{x,y}$ | $d_{x,y}$ | |
---|---|---|---|---|

$\chi_\varphi$ | $(q-1)\alpha(x)$ | $-\alpha(x)$ | $0$ | $\varphi(\zeta)+\varphi(\zeta)^q$ |

It follows that $\chi_\varphi(1)=q-1>0$. Hence $\chi_\varphi$ is irreducible. Since each $\chi_\varphi$ is determined by $\varphi$ with $\varphi^q \ne \varphi$, and there are $\frac{1}{2}q(q-1)$ of such $\varphi$, we have actually determined all of the irreducible characters. They are denoted by $X_\varphi$.

$GL_2(\mathbb{F}_q)$ | $a_x$ | $b_x$ | $c_{x,y}$ | $d_{x,y}\leftrightarrow\zeta$ | $\dim$ |
---|---|---|---|---|---|

$U_\alpha$ | $\alpha(x^2)$ | $\alpha(x^2)$ | $\alpha(xy)$ | $\alpha(\zeta^q)$ | $1$ |

$V_\alpha$ | $q\alpha(x^2)$ | $0$ | $\alpha(xy)$ | $-\alpha(\zeta^q)$ | $q$ |

$W_{\alpha,\beta}$ | $(q+1)\alpha(x)\beta(x)$ | $\alpha(x)\beta(x)$ | $\alpha(x)\beta(y)+\alpha(y)\beta(x)$ | $0$ | $q+1$ |

$X_\varphi$ | $(q-1)\varphi(x)$ | $-\varphi(x)$ | $0$ | $-(\varphi(\zeta)+\varphi(\zeta^q))$ | $q-1$ |

A few remarks are in order. We can call these four classes of irreducible representations in the following way (excerpted from this document):

$U_\alpha$: $1$-dimensional representations. There are $q-1$ of them.

$V_\alpha=V \otimes U_\alpha$: $q$-dimensional representations. Here $V$ is also called Steinberg representation. There are $q-1$ of them.

$W_{\alpha,\beta}$: ($q+1$)-dimensional irreducible principle series. There are $\frac{1}{2}(q-1)(q-2)$ of them. Some authors may also treat $q$-dimensional representations as principle series.

$X_\varphi$: irreducible cuspidal representations or complementary series representations. There are $\frac{1}{2}q(q-1)$ of them. A representation is cuspidal if the Jacquet module is trivial.

The Segre embedding allows us to define the product of projective varieties reasonably, and we will discuss it right now. To begin with we consider the product of $\mathbb{P}^m$ an $\mathbb{P}^n$. In this section the ground field is an arbitrary algebraically closed field.

Definition 1.TheSegre embeddingis defined as follows:Clearly, $N=(m+1)(n+1)-1=mn+m+n$. The image on the right hand side has $X_iY_j$ ordered lexicographically.

First of all we make sure that this function is well-defined, otherwise our work will be useless.

Proposition 1.The Segre embedding is a well-defined injective map.

*Proof.* Assume $X_i’=\lambda X_i$ and $Y_j’=\mu Y_j$ for some $\lambda,\mu \ne 0$, then

Next suppose that $[X_iY_j]=[X_i’Y_j’]$. Without loss of generality we can assume that $X_0 \ne 0$ and $X_0’ \ne 0$ so that we can put them to $1$. Then by looking at first $n+1$ elements we can identify $[Y_0:\dots:Y_n]$ with $[Y_0’:\dots:Y_n’]$. Then using $X_1Y_i=\lambda X_1’Y_i’$ we can immediately identify $X_1$ and $X_1’$. Likewise, other components are identified. $\square$

Next we study the image further using linear algebra

We can write elements in $\mathbb{P}^N$ as a $(m+1) \times (n+1)$ matrix, which can make things easier:

Therefore the image of $\iota$ is given by $Z_{ij}=X_iY_j$. Through an elementary observation, we see the matrix

has rank $1$. The question is, is the converse true? For this reason we study the set

Note $Z_{ij}Z_{kl}-Z_{kj}Z_{il}$ is the determinant of all $2\times2$ submatrices the matrix $[Z_{ij}]$. This $Z$ contains all $[Z_{ij}]$ with rank $1$. To show the converse, we consider the standard affine cover. Let $U_{kl}=Z(Z_{kl})$ and put $V_{kl}=\mathbb{P}^N \setminus U_{kl}$. Then $\{V_{kl}\}$ is the standard affine cover of $\mathbb{P}^N$ as we all know. Then likewise we use the affine open subset $U_k \subset \mathbb{P}^m$ and $U_l’ \subset \mathbb{P}^n$, to obtain

This is indeed the inverse map of $\iota$ on $U_k \times U_l’$. Hence the converse is true.

Therefore, the image of the Segre embedding is a projective variety. As a classic example, the image of $\mathbb{P}^1 \times \mathbb{P}^1 \to \mathbb{P}^3$ is determined by the polynomial $xy-zw=0$.

In this section we offer a way to understand the Segre embedding in number fields. To begin with, we need some definition.

Height is computed by absolute values on a field so we first normalise all absolute values on $\mathbb{Q}$. Recall that two absolute values $|\cdot|_1$ and $|\cdot|_2$ are equivalent if there exists some $\lambda>0$ such that $|\cdot|_1=|\cdot|_2^\lambda$. The question is, which $\lambda$ should we pick.

The ordinary absolute value $|\cdot|_\infty$ has nothing to worry about. But for $p$-adic absolute values $|\cdot|_p$, instead of using other equivalent ones, we have a restriction that $|p|_p=\frac{1}{p}$. All these absolute values will be denoted by

Likewise we define $M_K$, where throughout $K$ will always be a number field. $M_K$ consists of the ordinary absolute value and extensions of $p$-adic ones defined as follows:

In particular, $M_K$ satisfies the product formula:

for all $x \in K^\times$. This restriction allows us to work fine with projective spaces, as we will see later.

Definition 2.The (absolute logarithmic)heightof $x\in \mathbb{P}^n_{\overline{\mathbb{Q}}}$, with coordinates $(x_1,\dots,x_m) \in K$, is defined by

Actually, the height function can show the “algebraic complication” of $x$, and is well-defined in many senses.

Proposition 2.The height $h(x)$ is independent of the choice of $K$.

*Sketch of the proof.* Let $L$ be another number field containing $x_0,\dots,x_m$, then we can assume that $K \subset L$, then $L/K$ is a finite separable extension. But in this case,

which implies that

Therefore

gives what we want. $\square$

Proposition 3.$h(x)$ is well-defined on $\mathbb{P}^n_{\overline{\mathbb{Q}}}$.

*Proof.* It remains to show that $h(x)$ is independent of the choice of coordinates. For $\lambda \ne 0$, we see

Note $\sum_{v \in M_K}\log|\lambda|_v=0$ because o the product formula. $\square$

To highlight the ability of height to measure algebraic complication, let’s mention the following theorem of Kronecker.

Theorem 1 (Kronecker).The height of $\zeta\in\overline{\mathbb{Q}}^\times$ is $0$ if and only if $\zeta$ is a root of unity.

One direction is straightforward. To prove the converse, one may need some combinatorics, symmetric functions and Dirichlet’s pigeon-hole principle. See theorem 2.4 of this note for a proof.

Now let’s invite the Segre embedding into the party:

Using the fact that $\max_{i,j}|x_iy_j|_v=\max_i|x_i|_v \cdot \max_j|y_j|_v$, we see immediately that

The Segre embedding is immediately used after introducing the height of a polynomial.

Definition 3.For $f(t_1,\dots,t_n) \in K[t_1,\dots,t_n]$, we writeThen the

heightof $f$ is defined to bewhere

Likewise, it can show the complication of $f$ in some way, but we are not interested in it at this moment. Notice that products of multivariable polynomials in **different variables** can be understood as tensor products and therefore we have the following fact

Proposition 4.Let $f(t_1,\dots,t_n)$ and $g(s_1,\dots,s_m)$ be polynomials in different sets of variables, then

Note: if $f$ and $g$ share the same variable, the story is different. Say we have $f_1,\dots,f_m \in \overline{\mathbb{Q}}[X_1,\dots,X_n]$, and put $f=f_1 \dots f_n$, then

where $d$ is the sum of the partial degrees of $f$.

]]>We want to apply calculus to fields, but tools are needed. For the ordinary calculus, on $\mathbb{R}$ or $\mathbb{C}$, the most important role is played by limit:

However we cannot immigrate absolute value into other fields directly. Indeed, if the field $k$ is an extension of $\mathbb{Q}$, then we may define an absolute value on $k$ to be the restriction of the absolute value of $\mathbb{C}$. But this is not always the case: this method does now work on fields with positive characteristic. For example, $\mathbf{F}_8$ is not a subfield of $\mathbb{Q}$ because $\mathbf{F}_2$ is not. Besides, we should not restrict ourselves to the case faithful to ordinary calculus and ignore other potentials. The most important trait of the ordinary absolute value is triangle inequality, but perhaps we can omit that and replace it with something different. Maybe there are much more different absolute values to be studied. For these reasons, we define absolute values on fields out of nowhere first.

Definition 1.Anabsolute valueon a field $K$ is a real valued function $|\cdot|:K \to \mathbb{R}_+$ such that

For all $x \in K$, we have $|x| \ge 0$ and $|x|=0$ if and only if $x=0$.

$|xy|=|x||y|$.

There exists $c>0$ such that $|x+y| \le c \max\{|x|,|y|\}$.

Before we dive into some technical details of the inequality, let’s see some trivial and non-trivial examples.

On any field $K$, we can define $|x|=1$ for all $x \ne 0$. This is the most trivial absolute value and it carries little to none information. But whether the absolute value is trivial, we always have $|1|=1$ because $|1x|=|1||x|=|x|$.

If $K=\mathbb{Q}$, we can define $|m/n|$ to be the ordinary absolute value $\sqrt{\left(\frac{m}{n}\right)^2}$. We are familiar with it for sure. It is customary to write $|\cdot|_\infty$.

However, for $K=\mathbb{Q}$, and $m/n \in K$, we can also write

where $m’$ and $n’$ are integers coprime to $p$. Under this presentation we can put

In this way we obtain an absolute value $|\cdot|_p$ that is totally different from $|\cdot|_\infty$. The “difference” will be discussed later. One can verify that $|\cdot|_p$ is indeed an absolute value and the constant $c$ in definition should be $1$. This is called the $p$-adic absolute value.

- Let $K=\mathbf{F}_q$ be a finite field, then the only absolute value on $K$ is trivial. To see this, notice that $K^\times$ is a cyclic group. Pick any $x \in K^\times$, we have $|x|^{q-1}=|x^{q-1}|=|1|=1$.

It seems we have ignored the triangle inequality for no reason, but actually we didn’t. To see this, we give a refinement of the triangle inequality first.

Proposition 1.Let $|\cdot|:K \to \mathbb{R}$ be an absolute value with $|x+y|\le c\max\{|x|,|y|\}$, then the following two statements are equivalent:

$c \le 2$.

For all $a,b \in K$, we have $|a+b|\le |a|+|b|$. This is the

triangle inequality.

*Proof.* It is obvious that $|a|+|b| \le 2\max\{|a|,|b|\}$ so we only need to show that 1 implies 2. To do this, we will use a forward-backward induction.

Assume first that $n=2^m$ for some positive integer $m$ and let $a_1,\dots,a_n$ be a sequence of elements of $K$. Then by induction we immediately have

For all positive integers satisfying $2^{m-1} < n \le 2^m$, and $a_1,\dots,a_{n} \in K$, we can always put $a_{n+1}=\cdots=a_{2^m}=0$ to obtain

Let $\tilde{n}$ be the image of $n$ in $K$, i.e. $\tilde{n}=\underbrace{1+\dots+1}_{n\text{ times} }$. If we put $a_k=1$ for all $1 \le k \le n$, we in particular have $|\tilde{n}| \le 2n$. Moreover, we also have

We therefore can write

It follows that

Since $\lim_{n \to \infty}\sqrt[n]{4(n+1)}=1$, we are done. $\square$

Triangle inequality is always desirable but it is not always the case. To see this, consider $\mathbb{C}((X))$, the field of formal Laurent series, where each element is of the form

where $a_n \ne 0$. We can define an absolute value on $\mathbb{C}((X))$ by $|f|=|a_n|$. Three conditions are easily verified but triangle inequality is not the case. For example, if $f(X)=1+2X$ and $g(X)=-1+3X$, then $|f+g|=5$ while $|f|=|g|=1$.

For this reason, we are seeking ‘replacements’ of an absolute value.

Notice that an absolute value induces a translate-invariant metric in an obvious way:

A topology comes up in the nature of things. We cannot apply theorems in functional analysis directly because we do not have a real or complex vector space. But we can try to import those important concepts. When studying open mapping theorem, we care about equivalent norms or metrics, on whether they induce the same topology. Here we will also do that.

Definition 2.Two absolute values $|\cdot|_1$ and $|\cdot|_2$ areequivalentif they induces the same topology (this is clearly an equivalence relation). An equivalence class of absolute values is called aplace.

Clearly, the topology is discrete if and only if the absolute value is trivial. Therefore a trivial absolute value is not equivalent to any non-trivial ones. But let’s see two non-trivial absolute values that are not equivalent.

On $\mathbb{Q}$, consider $|\cdot|_\infty$ and $|\cdot|_2$. The sequence $\left\{\frac{1}{n}\right\}$ converges to $0$ under the first absolute value. However

if we take odd numbers into account. On the other hand, $\{2^n\}$ does not converge under $\left|\cdot\right|_\infty$ but $\left|2^n\right|_2=2^{-n} \to 0$ as $n \to \infty$. The topology induced by $|\cdot|_\infty$ is totally different from the one induced by $|\cdot|_p$ for prime $p$.

We have an important characterisation of equivalent absolute values.

Proposition 2.Let $|\cdot|_1$ and $|\cdot|_2$ be two non-trivial absolute values, then the following statements are equivalent.

$|\cdot|_1$ and $|\cdot|_2$ are equivalent.

$|x|_1<1$ implies that $|x|_2<1$.

There exists $\lambda>0$ such that $|\cdot|_1=|\cdot|_2^\lambda$.

*Proof.* Assume that $|\cdot|_1$ and $|\cdot|_2$ are equivalent. If $|x|_1<1$, then $\lim_{n \to \infty}x^n=0$. Therefore $|x|_2<1$ or otherwise $|x|_2^n$ would not convergent to $0$. Likewise $|x|_2<1 \implies |x|_1<1$.

Assume that $|x|_1<1$ always implies that $|x|_2<1$. It follows that $|x|_1>1$ implies that $|x|_2>1$ because $|x^{-1}|_1<1$. Since $|\cdot|_1$ is not trivial, there exists $x_0 \in K$ such that $|x_0|_1>1$. Put $a=|x_0|_1$ and $b=|x_0|_2$ and let $\lambda=\log(b)/\log(a)=\log_a{b}$. Pick $x \in K$ such that $|x|_1 \ge 1$. Then $|x|_1=|x_0|_1^\alpha$ for some $\alpha \ge 0$. We show that $|x|_2=|x_0|_2^\alpha$ by approximating $\alpha$ with rational numbers. If $m,n$ are positive integers such that $m/n>\alpha$, then $|x|_1<|x_0|_1^{m/n}$ and therefore $|x^n/x_0^m|_1<1$. It follows that $|x^n/x_0^m|_2<1$, i.e. $|x|_2<|x_0|^{m/n}_2$. If $m/n<\alpha$, we can similarly get $|x|_2>|x_0|_2^{m/n}$. Therefore $|x|_2=|x_0|_2^\alpha$. Therefore

3 implying 1 is immediate because $f(x)=x^\lambda$ do not change the limit. $\square$

If $|\cdot|_1=|\cdot|_2^\lambda$, $|x+y|_1\le c_1\max\{|x|_1,|y|_1\}$ and $|x+y|_2 \le c_2\max\{|x|_2,|y|_2\}$, then $c_1$ can be replaced by $c_2^\lambda$. If $c_2 >2$, then we can pick $\lambda$ small enough such that $c_2^\lambda \le 2$. Therefore

Proposition 3.Each absolute value is equivalent to one that satisfies the triangle inequality.

Bearing this in mind, we can study the case when $c=1$ in the definition of absolute values.

Proposition 4.Let $|\cdot|$ be an absolute value on $K$. Then the following statements are equivalent:

$|\cdot|$ satisfies the ultrametric inequality: $|x+y|\le\max\{|x|,|y|\}$.

$|\tilde{n}|\le 1$ for all $n \in \mathbb{N}$.

*Proof.* Suppose that $|x+y| \le \max\{|x|,|y|\}$. Then $|\tilde{1}|=|1|=1$ and $|\tilde{2}|=|1+1|=\max\{|1|,|1|\}=1$. Assume that $|\tilde{n}|\le 1$, then

Conversely, suppose that $|\tilde{n}| \le 1$ for all $n$. Replace the absolute value with one satisfying triangle inequality if necessary. It follows that

Therefore $|a+b| \le \sqrt[n]{n+1}\max\{|a|,|b|\}$. The result follows from the fact that $\sqrt[n]{n+1} \to 1$ as $n \to \infty$. $\square$

Definition 3.An absolute value is callednon-Archimedean, orultrametric, if the condition in proposition 4 is satisfied. Otherwise it is calledArchimedeanorordinary.

For example, trivial absolute values are ultrametric but we are not interested in that. What is interesting is that $p$-adic absolute values are non-Archimedean.

There is a second classification - Ostrowski’s theorem, which states that all nontrivial places on $\mathbb{Q}$ can be represented by $|\cdot|_\infty$ or $|\cdot|_p$ for some prime $p$. For other fields, we have quite some interesting analogues. But we do not have enough space to include these proofs. The reader can check

Theorem 4.2 of this note for the ordinary theorem of Ostrowski on $\mathbb{Q}$.

This expository paper for the theorem of Ostrowski on number fields.

This expository paper.pdf) for the theorem of Ostrowski on function fields.

When we have a field extension $L/K$, we certainly want to know how an absolute value on $L$ will be restricted to $K$, or conversely, how an absolute value can be extended to $L$. For an absolute value itself, we can also perform an action of completion just like we did to elementary calculus: $\overline{\mathbb{Q}}=\mathbb{R}$.

Definition 4.A field $K$ iscompletewith respect to $|\cdot|$ if $K$ is a complete metric space with respect to the metric $d(x,y)=|x-y|$.

To employ the similar device, we will define completion in a similar style. Let $\mathscr{P}_F$ be the set of all places of a field $F$. Each place $v$ on $L$ induces place $u=v|_F$ on $K$. We therefore have a map induced by restriction:

from the places of $L$ to places of $K$.

Definition 5.Let $L/K$ be a field extension and $u \in \mathscr{P}_K$. If $v \in r^{-1}(u)$, we write $v|w$ and say $v$divides$w$ or $v$lies over$u$.

Definition 6.A completion of $K$ with respect to a place $v$ is an extension field $K_v$ with a place $w$ such that

$w|v$.

The topology of $K_v$ induced by $w$ is complete.

$K$ is a dense subset of $K_v$.

The extension exists and is unique up to isomorphism (to see this, modify the proof on the completion of $\mathbb{Q}$). The classic example is of course $K=\mathbb{Q}$ and $v$ being represented by $|\cdot|_\infty$, and $K_v$ would therefore be $\mathbb{R}$ with the ordinary absolute value. With the help of Gelfand-Marzur, one can show that the only Archimedean complete fields are $\mathbb{Q}$ and $\mathbb{C}$, which are completions of $\mathbb{Q}$ and $\mathbb{Q}(i)$.

For $|\cdot|_p$ on $\mathbb{Q}$, we have the completion $\mathbb{Q}_p$ ($p$-adic numbers). The compact subset $\mathbb{Z}_p$ ($p$-adic integers) is the closure of $\mathbb{Z}$ in $\mathbb{Q}_p$. If $p,q$ are two distinct primes, then $\mathbb{Q}_p$ is not isomorphic to $\mathbb{Q}_q$ because they are completed using two different places.

As an striking example, in $\mathbb{Q}_2$, we have

because

There is nothing skippy or misunderstanding as that Numberphile video on the “identity” $1+2+\dots=-\frac{1}{12}$.

To conclude this post and prepare for future posts, we show that absolute values works fine with norms over a vector space (do not confuse with norms in Galois theory).

Definition 7.Let $K$ be a field with absolute value $|\cdot|$ and $E$ be a vector space over $K$. A norm $E \to \mathbb{R}$ compatible with $|\cdot|$ is a function $|\cdot|$ that satisfies

$|\xi|\ge 0$ for all $\xi \in E$, and $|\xi|=0$ if and only if $\xi=0$.

For all $x \in K$ and $\xi \in E$, $|x\xi|=|x||\xi|$.

$|\xi_1+\xi_2| \le |\xi_1|+|\xi_2|$ for all $\xi_1,\xi_2 \in E$.

Two norms $|\cdot|_1$ and $|\cdot|_2$ are **equivalent** if there exist numbers $C_1,C_2>0$ such that for all $\xi \in E$ we have

This is an equivalence relation and we have already seen this in elementary linear algebra and functional analysis. This is equivalent to the fact that $|\cdot|_1$ and $|\cdot|_2$ induce the same topology. When the dimension of $E$ is infinite, things are troublesome, as we may need things like open mapping theorem. For finite dimensional spaces, we can pick a basis $\xi_1,\xi_2,\dots,\xi_n \in E$ so that every $\xi \in E$ can be written in the form

We can define norms like $|\xi|_1=|x_1|+\dots+|x_n|$, $|\xi|_2=\sqrt{|x_1|^2+\dots+|x_n|^2}$ and $|\xi|_\infty=\max\{|x_1|,\dots,|x_n|\}$. In elementary linear algebra, we know that they are equivalent. Now things are the same over a field.

Proposition 5.Let $K$ be a complete field under a non-trivial absolute value $|\cdot|$, and let $E$ be a finite-dimensional space over $K$. Then any two norms on $E$ that are compatible with $|\cdot|$ are equivalent.

*Proof.* It suffices to show that $E \cong K \times \cdots \times K$ in topology under a norm that is compatible with the absolute value. Put $n=\dim E$. If $n=1$ things are trivial. Therefore we assume that $n \ge 2$. We need to show that given a basis $\xi_1,\xi_2,\dots,\xi_n$,

is a Cauchy sequence (with respect to a norm) in $E$ only if each one of the $n$ sequences $x_i^{(\nu)}$ is a Cauchy sequence in $K$. It suffices to assume that $\xi^{(\nu)}$ converges to $0$ as $\nu \to \infty$ (as we can replace $\xi^{(\nu)}$ with $\xi^{(\nu)}-\xi^{(\mu)}$ for $\nu,\mu \to \infty$ if necessary). Then we must show that each $x_i^{(\nu)}$ converges to $0$ as well.

Suppose this is false for $x_1^{(\nu)}$. Then there exists a number $a>0$ such that $|x_1^{(\nu)}|>a$ when $\nu$ is sufficiently large. Therefore for a subsequence of $(\nu)$, $\xi^{(\nu)}/x_1^{(\nu)}$ converges to $0$, and we can write

Taking the limit, we see $\xi_1$ is a linear combination of $\xi_2,\dots,\xi_n$ and this is absurd. $\square$

We will need this proposition to work with finite field extensions.

Erico Bombieri and Walter Gubler,

*Heights in Diophantine Geometry*.Serge Lang,

*Algebra Revisited Third Edition*.Dinakar Ramakrishman and Robert J. Valenza,

*Fourier Analysis on Number Fields*.

In our previous post on the irreducible representations of $SU(2)$ and $SO(3)$, the irreducible representations of $SU(2)$ has been determined explicitly: $V_n=\operatorname{Sym}^n\mathbb{C}^2$, and irreducible representations $W_n$ of $SO(3)$ correspond to $V_{2n}$.

The result is satisfying for $SU(2)$ but not for $SO(3)$. We hope it has something to do with $\mathbb{R}^3$ but $V_{2n}$ has not. In this post, we are delivering a much clearer characterisation of $W_n$.

This post would be relatively easier to read than the previous post. Other than the basic language of representation theory (of Lie groups), only multivariable calculus and linear algebra are needed.

The group $SO(3)$ has a rich background in physics. See, for example, “Why do we look at the representations of $SO(3)$ in QM?“ on Physics Stack Exchange.

Like in the previous post, we first determine a good playground and then show that this is all we need. The playground here is

The reason for the symmetric product of $\mathbb{R}^3$ is simple: we will be working on homogeneous polynomials (these functions can be considered to be defined on the unit sphere). We complexify this space to make sure that we will not worry about eigenvalues (of $SO(3)$). In other words, $P_\ell$ is the complex vector space of homogeneous polynomials in three variables of degree $\ell$, viewed as functions on $\mathbb{R}^3$.

Recall that

Therefore $\dim P_\ell=\frac{(\ell+2)(\ell+1)}{2}$, as a $\mathbb{C}$-vector space. We will extract what we want from spaces of this form.

The action of $SO(3)$ or $GL(3,\mathbb{R})$ in general on $P_\ell$ is defined in a similar way. For any $A \in GL(3,\mathbb{R})$ and $f \in P_\ell$, we define

Here, $x=(x_1,x_2,x_3) \in \mathbb{R} \times \mathbb{R} \times \mathbb{R}$, and $xA$ is a product of $x$ and $A$ in the sense of matrix multiplication. It is easy to verify that this indeed gives rise to a group representation.

To study this representation, we need to find some morphisms $P_\ell \to P_\ell$. The most obvious choice is the Laplacian, which is given by

In other words, $\Delta$ is the trace of the Hessian matrix of $f$. Trace is used in representation theory to define character so there is a chance to find its good connection to the representation of $SO(3)$.

We shall also not forget the **kernel** of the Laplacian, which is called **harmonic polynomials** of degree $\ell$ in this context:

Since functions in $P_\ell$ are homogeneous, the value of $f$ at a point $x$ is determined by the value at $\frac{x}{|x|} \in S^2$, the unit sphere. Therefore we also call $\mathfrak{H}_\ell$ the **spherical harmonics** of degree $\ell$. And we certainly need to know the nullity of $\Delta$.

Lemma 1.The dimension of $\mathfrak{H}_\ell$ is $2\ell+1$.

*Proof.* To begin with, we need to write a more explicit expression of the Laplacian. First of all we perform a Taylor expansion of $f \in P_\ell$ with respect to the first variable $x_1$:

Here, $f_k(x_2,x_3)$ is homogeneous of degree $\ell-k$ in $x_2,x_3$. Therefore we only need to study one term of the right hand side.

Now we can put them together naturally:

Let’s try to explore the last term a little bit more. If $k=\ell-1$ or $\ell$, then $f_k$ is of order $0$ and $1$ and consequently the second order derivative is $0$. Therefore we write

It follows that $\Delta{f}=0$ if and only if

Therefore, once $f_0$ and $f_1$ are determined, all of $f_k$ are determined and so is $f$ itself. Translating into the language of linear algebra,

where $P_k^2$ is the space of homogeneous polynomials in two variables, hence is isomorphic to $\mathbb{C} \otimes_\mathbb{R}\operatorname{Sym}^k \mathbb{R}^2$, and we therefore have

To conclude,

$\square$

Recall that $\dim W_n=2n+1$. This should not be a coincidence, and we shall dive into it right now. To do this we immediately establish the connection between $\Delta$ and $SO(3)$.

Lemma 2.The action of the Laplacian on $C^\infty(\mathbb{R}^3;\mathbb{C})$ (which contains $P_\ell$ for all $\ell \ge 0$) commutes with the action of $SO(3)$, i.e. $\Delta$ is $SO(3)$-equivariant.

*Proof.* Really routine verification. $\square$

As a result, we have an very important result:

Theorem 1.The space $\mathfrak{H}_\ell$ is an $SO(3)$-invariant subspace of $P_\ell$.

We start with a direct observation of matrices in $SO(3)$. Roughly speaking, an $SO(3)$ rotation can be “downgraded” into a rotation on a plane:

Lemma 2.Every element in $SO(3)$ is conjugate to $R(t)$, where

*Proof.* Pick any $A \in SO(3)$. First of all we observe that $A$ has an eigenvalue $1$. Note

we therefore have $\det(I-A)=0$. Hence we can pick $v_1 \in \ker(I-A) \setminus \{0\}$ with norm $1$. Pick $v_2 \in \mathbb{R}^3$ pedicular to $v_1$ with norm $1$ and $v_3=v_1 \times v_2$. Then $\{v_1,v_2,v_3\}$ is an orthonormal basis and $V=(v_1,v_2,v_3)$ is in $SO(3)$. The matrix $A$ is therefore conjugate to

In particular, $R \in SO(3)$ also implies

Solving this equation system we must have $a=d$, $b=-c$ so that we can assign $a=\cos{t}$ and $c=\sin{t}$, and the result follows. $\square$

Since characters are invariant under conjugation, the study of the character of $SO(3)$ is reduced to $T$, the subgroup generated by matrices of the form $R(t)$. But direct computation is a nightmare so we try our best to do it elegantly. For this reason, we return to the irreducible representations of $SU(2)$ (there are only two variables, anyway). The canonical map $SU(2) \to SO(3)$ has a specific value:

One can refer to this document for the map above. Our study of characters is now reduced to $SU(2)$, because $\chi_{W_n}(R(t))=\chi_{V_{2n}}(e(t/2))$, using the facts that character is invariant under isomorphism and that $V_{2n} \cong W_n$. We can compute that

Now we are ready for the irreducible representations of $SO(3)$.

Since we basically have $\dim \mathfrak{H}_\ell=\dim W_\ell$, it is natural to guess that $\mathfrak{H}_\ell \cong W_\ell$, in the sense of $SO(3)$-modules, and the following theorem answers this question affirmatively.

Theorem 2.The space $\mathfrak{H}_\ell$ is isomorphic to $W_\ell$. In other words, irreducible $SO(3)$-modules are determined by spherical harmonics whose Laplacians are $0$.

*Proof.* We will use the fact every compact Lie group is completely reducible. (First of all, $SO(3)$ is compact as it is a closed subgroup of $GL(3,\mathbb{R})$; see p. 3 of this document. On the other hand, the fact that every compact Lie group is completely reducible can be found in section 3 of this document.)

Therefore we have

where each $W_{n_\nu}$ is an irreducible representation of $SO(3)$. Applying dimension on both sides yields

To prove that $\mathfrak{H}_\ell=W_{\ell}$, it suffices to show that $n_\nu \ge \ell$ for some $n_\nu$. On the other hand, applying characters on both sides yields

In other words, the character is a linear combination of $\exp(ikt)$ with $|k| \le \max_\nu n_\nu$ for all $k$. Each $\exp(ikt)$ appeared above is an eigenvalue of the action of $R(t)$. We have no idea about the distribution of $k$ and we don’t have to because our job is done if we can show that the action of $R(t)$ has an eigenvalue $\exp(-i\ell t)$.

To do this, we can consider vector $f(x_1,x_2,x_3)=(x_2+ix_3)^\ell \in \mathfrak{H}_\ell$. This is because for this vector we have

$\square$

There are no even-dimensional irreducible representations of $SO(3)$. This is what every reader has to take away from this post.

We find the eigenvalue because it shows that $\exp(-i\ell t)$ appears in the summand of $\chi_{\mathfrak{H}_\ell}(R(t))$, hence $|-\ell|=\ell \le \max n_\nu$. Since $\{n_\nu\}$ is finite, the maximum can be attained, and therefore our argument on dimension is done.

The representation of $U(2)$ can be deduced algebraically, for one only need to notice that $U(2) = (S^1 \times SU(2))/H$, where $H=\{(1,I),(-1,-I)\}$. One will also need an odd-even argument just like we did to $SO(3)$.

Likewise, since $O(3)=SO(3) \times \mathbb{Z}/2\mathbb{Z}$, we can deduce irreducible representations of $O(3)$ in a similar fashion.

Representation theory is important in various branches of mathematics and physics. When studying representation of finite groups, we have quite some algebra and combinatorics. When differentiation (more precisely, smoothness) joins the party, we have Lie group, involving calculus, linear algebra, geometry and much more. Especially, theories around $SU(2)$ and $SO(3)$ are of great importance. On one hand, they are those simplest non-elementary and high-dimensional Lie groups. On the other hand, they describes rotations in $\mathbb{C}^2$ and $\mathbb{R}^3$ respectively, which is “physically realistic”. I believe students in physics have more to say.

In this post we develop a way to study irreducible representations of these two Lie groups, in a mathematician’s way. I try my best to make sure that everything is down-to-earth, and everything can be “reduced” to 19th (pre-modern) mathematics.

Nevertheless, the reader has to be assumed to be familiar with elementary languages of representation theory (and you know that, there are a lot of abuse of language), which I think is not a problem because otherwise you wouldn’t be reading this post. You need to recall eigenvalue theories in linear algebra, as well as Fourier series. We need the fact that the trigonometric system is complete. In other words trigonometric polynomials are dense in the space of continuous functions. $\def\sym{\operatorname{Sym}}$

We will first study $SU(2)$ and a first classification of irreducible representations of $SO(3)$ follows at once. This is because we have an isomorphism

This is to say, $SU(2)$ is a “double cover” of $SO(3)$. To see this, notice that $SU(2) \cong S^3$ and $SO(3) \cong \mathbb{R}P^3$ as Lie groups, meanwhile $\mathbb{R}P^3 \cong S^3/\{-1,1\}$ can be considered as the definition.

Of course, by representation we mean finite dimensional and unitary representations.

Indeed it seems we have nowhere to start. Instead of trying to find all of them, we will try to work on seemingly immediate representations and it turns out that they are all we are looking for.

Let $V_0$ be the trivial representation on $\mathbb{C}$ and $V_1$ be the standard representation on $\mathbb{C}^2$, which is given by ordinary matrix multiplication. These representations are irreducible. We want to extend this family to $V_n$ for $n \ge 2$. It is natural to think about generate representations of higher dimensions through $V_1$. Here are several ways available.

Direct sum: $\bigoplus_{i=1}^{n}V_1$. The dimension is $2n$ and unfortunately, the representation is determined by each component so essentially there is no “new thing”.

Tensor product: $\bigotimes_{i=1}^{n}V_1$. The dimension is $2^n$ which is way too big.

Wedge product: $\bigwedge^{n}V_1$. It stops at $n=2$ and we have to deal with $u \wedge v = - v \wedge u$. This can be annoying.

Symmetric product: $\sym^{n}V_1$. The dimension is $n+1$ and it doesn’t stop. Besides, it can be understood as homogeneous polynomials of degree $n$ in two variables. This is a fantastic choice. Besides we have $\sym^0 V_1=V_0$ so nothing is abruptly excluded.

Put $V_n=\sym^nV_1$, which can be understood as the space of homogeneous polynomials of degree $n$ in variables $z_1$ and $z_2$. $V_n$ therefore has a canonical basis

And we will make use of it later.

For each $g \in SU(2)$, we have a left action

In other words, $\rho(g)P(z)=P(zg)$ where $z=(z_1,z_2)$ and $zg$ is matrix multiplication. Each $g \in SU(2)$ has matrix representation

Then

When there is no confusion, we will write $gP(z)=P(zg)$, viewing $g$ itself as an automorphism of $V_n$. One can also replace $SU(2)$ with $GL(2,\mathbb{C})$ but we are not studying that bigger one.

Since $z \mapsto zg$ is a homogeneous map of degree $1$ as it is linear and is non-degenerate, we have $gP(z) \in V_n$. In other words, $V_n$ are $SU(2)$-invariant. **We now have a well-defined representation.** Note $V_0=\mathbb{C}$ so the representation is trivial, and $V_1=\mathbb{C}^2$ yields linear maps. Again, nothing is abruptly excluded. Even more satisfyingly, those $V_n$ are all irreducible.

Proposition 1.The representations $V_n$ are irreducible.

*Proof.* By Schur’s lemma, we need to show that each $SU(2)$-equivariant automorphism $A$ of $V_n$ is a non-zero multiple of the identity, i.e. $A=\lambda I$ for some $\lambda \ne 0$. By definition, for each $g \in SU(2)$, we have $A\rho(g)P=\rho(g)AP$ for all $P \in V_n$. And for simplicity we write $Ag=gA$, realising $g$ as a linear transform of $V_n$, instead of an element of $SU(2)$.

The group $SU(2)$ can be complicated, but $U(1) \cong S^1$ is simple and can be considered as a subgroup of $SU(2)$ in two ways. We show that these two ways are just enough to expose the irreducibility of $V_n$.

First of all we embed $S^1$ into $SU(2)$ by

Call the matrix right hand side $g_a$. Then

for all $k$. This is to say, $P_k$ is the **eigenvector** corresponding to **eigenvalue** $a^{2k-n}$. As $g_aA=Ag_a$, information on eigenvalues and eigenvectors can help a lot so we dig into it first.

Since $\{P_k\}$ are linearly independent, under this basis, we have a matrix representation

but we don’t know how eigenspaces are spanned because we may have $a^j=a^k$ for $j \ne k$. However, the number $a$ can always be chosen that $a^{-n},a^{-n+2},\dots,a^n$ are pairwise distinct (for example, one can pick $a$ to be a primitive $m$-th root of $1$ and $m$ is big enough). As a result, $g_a$ has $n$ distinct eigenvalues. Therefore, the $a^{2k-n}$-eigenspace can only be generated by $P_k$.

On the other hand, by definition of $A$, we have

Hence $AP_k$ lies in $a^{2k-n}$-eigenspace. Therefore we have $AP_k=c_kP_k$ for some $c_k \ne 0$. In other words, $P_k$ is the $c_k$-eigenvector of $A$. We obtain another matrix representation under the basis $\{P_k\}$

We want this matrix to be a scalar matrix. The result follows from another embedding of $U(1)$ into $SU(2)$. Note $a \in S^1$ can be determined by $t \in [0,2\pi)$, and we therefore have a matrix

Still we have $Ag_t=g_tA$. As we can see,

This follows from our observation on eigenvalues. Next, we immediately use the eigenvalue $c_n$ to obtain

This is the definition of $g_tP_n$. Comparing coefficients of $P_k$, we must have $c_k=c_n$ for all $0 \le k \le n$. Recall that $\{P_k\}$ is a basis so coefficients must be unique for a given vector. But we have already obtained what we want: $A=c_n I$. $\square$

So far we have used diagonalisation of representations of $SU(2)$ but the diagonalisation of $SU(2)$ itself is not touched yet. Neither have we made use of character functions. So now we invite them to the party.

Let’s recall diagonalisation in $SU(2)$. Pick $g \in SU(2)$. First of all it is diagonalisable. Let $\lambda_1$ and $\lambda_2$ be their two eigenvalues, then $|g|=\lambda_1\lambda_2=1$. Therefore we have

where $\lambda$ is one of the eigenvalues of $g$. Since the diagonalised matrix is still in $SU(2)$, we have $|\lambda|=1$, i.e., $\lambda \in S^1$. We therefore write $g \sim e(t) \sim e(-t)$ where

We see, $e(s) \sim e(t)$ if and only if $s = \pm t \mod 2\pi$. By periodicity of $\exp$ function, we also see $e(t)$ is in particular $2\pi$-periodic. If $f:SU(2) \to \mathbb{C}$ is a class function, then $f \circ e:\mathbb{R} \to \mathbb{C}$ is an even $2\pi$-periodic function. Conversely, given an even $2\pi$-periodic function $h:\mathbb{R} \to \mathbb{C}$, we can recover it as a class function, and the process is as follows.

Define $\Lambda:SU(2) \to S^1$ sending $g \in SU(2)$ to the eigenvalue of $g$ with non-negative imaginary part (one can also pick non-positive one, because $h$ is even). Then $E:SU(2) \to [0,\pi]$ given by $g \mapsto \frac{1}{i}\log\Lambda(g)$ is a well defined function sending $g$ into $\mathbb{R}$ and $h \circ E:SU(2) \to \mathbb{C}$ is a class function. Besides we have $E \circ e(t)= \pm t \mod 2\pi$ and $e \circ E(g)$ is the diagonalisation of $g$. Therefore $h \circ E \circ e(t)=h(t)$ and $f \circ e \circ E(g)=f$ as is expected.

With help of this $e(t)$ and $E(t)$, we have this correspondence

Recall that the space on the right hand side has a countable uniform basis

In other words, $\{\cos{nt}\}_{n \ge 0}$ spans a dense subspace. This is about the completeness of trigonometric system. Since there are only even functions, $\sin{nt}$ are excluded. For a reference to the completeness, one can check 4.25 *Real and Complex Analysis* by W. Rudin.

For class functions, we certainly want to know about characters. Let $\chi_n$ be the character of $V_n$, then

When $t \in \pi\mathbb{Z}$, then $\chi_n(e(t)) \in \mathbb{Z}$. Otherwise, as a classic exercise in calculus, we have

We have $\kappa_0(t)=1$. For $\kappa_n(t)$ when $n >0$, we have

We see $\kappa_1(t)=2\cos{t}$. By induction, every $\kappa_n(t)$ is a polynomial in variables $1,\cos{t},\dots,\cos{nt}$. Therefore $\{\kappa_n(t)\}_{n \ge 0}$ spans the same space as $\{\cos{nt}\}_{n \ge 0}$, which is dense in the space of even $2\pi$-periodic functions. Note the $\kappa_n(t)$ are linearly independent, because the leading term is $\cos{nt}$.

The argument above shows that $\chi_n$ spans a dense subspace in the space of class functions. In other word, $\chi_n$ is the Fourier basis of class functions. As we all know, Fourier series is powerful. Let’s see how powerful it is in the calculus of Lie group $SU(2)$ itself.

Proposition 2.For continuous class function $f:SU(2) \to \mathbb{C}$, we have

*Proof.* On one hand, since the $V_n$ are irreducible, by fixed point theorem of representations,

Here, for a group $G$ and a representation $V$, $V^G$ is the fixed point set, i.e. the space of elements that are fixed by the action of $G$ on $V$. Since $\chi_n$ is irreducible, fixed points can only be $0$ unless the representation itself is trivial. Now we move on and check the right hand side.

On the right hand side we are looking for even $2\pi$-periodic continuous functions, reflecting the denseness of $\kappa_n(t)$. However we have $\int_{-\pi}^{\pi}\kappa_1(t)dt=\pi$ so it does not vanish on $n>0$. However, if we multiply it by $\sin^2{t}$, then it is transformed into the form $\sin{mt}\sin{nt}$ and we are familiar with this orthonormality. More precisely,

Since the functional $h \mapsto \frac{1}{2\pi}\int_{-\pi}^{\pi}h\sin^2{t}dt$ is continuous in the uniform topology and $\kappa_n$ spans a dense subspace, the result is now obtained. $\square$

Finally, surprisingly and satisfyingly enough, the denseness have actually axed out all other possibilities of irreducible representation. In other words, our search in symmetric products is optimal. We can see this through Parseval’s identity. This is the heart of this blog post.

Proposition 3.Every irreducible representation of $SU(2)$ is isomorphic to one of the $V_n$.

*Proof.* Suppose we have a character that is different from all of the $\chi_n$. Then the orthonormality shows that $\langle \chi,\chi_n \rangle = 0$ for all $n \ge 0$ and $\langle \chi,\chi \rangle=1$. Now let’s see why this is absurd.

Since $\{\chi_n\}_{n \ge 0}$ spans a dense subspace in the space of class functions, we actually have

Therefore

and

It is impossible to have the sum of $0$ to be $1$. $\square$

Now we head to $SO(3)$. In fact the result follows immediately from the surjection

We have $\ker\pi=\{-I,I\}$. Let $W$ be a representation of $SO(3)$, i.e., we have a map

Then

by $g \mapsto \rho(\pi(g))$ is an induced representation, and we write $\pi^\ast W$. If $W$ is irreducible, then $\pi^\ast W$ is also irreducible. In particular, $\pi^\ast\rho(-I)=\operatorname{id}_W$.

On the other hand, if $\vartheta:SU(2) \to GL(V)$ is an irreducible representation where $\vartheta(-I)=\operatorname{id}_V$, then we have an associated representation

given by $g\ker\pi \mapsto \vartheta(g)$. Let’s denote it by $\pi_\ast V$. Again, if $V$ is irreducible, then $\pi_\ast V$ is irreducible.

Therefore we have realised a correspondence

So it remains to determine those of $SU(2)$. Let $\rho_n:SU(2) \to GL(V_n)$ be an irreducible representation, then

because $P \in \mathbb{C}[z_1,z_2]$ is homogeneous of degree $n$. Therefore $-I$ acts as an identity if and only if $n$ is even. We obtain

Proposition 4.Every irreducible representation of $SO(3)$ is of the formwhere $V_{2n}$ is described in proposition 2.

This is, of course, just a first classification. But to introduce a classification as explicit as what we have done for $SU(2)$, there has to be another post. As a quick overview, here is the result.

Let $P_{\ell}$ be the complex vector space of homogeneous polynomials in three variables of degree $\ell$, which can be considered as functions on $\mathbb{R}^3$ immediately. This setting makes sense immediately, just as what we have done for $SU(2)$. Then, in fact,

This is to say, $W_\ell$ can be understood as harmonic homogeneous polynomials in $\mathbb{R}^3$, which can also be considered to be uniquely determined on the unit sphere $S^2$.

- Tendor Bröker and Tammo tom Dieck,
*Representations of Compact Lie Groups*. - Walter Rudin,
*Real and Complex Analysis, 3rd Edition*.

We want to compute the Fourier transform

As one can expect, the computation can be quite interesting, as $f_c(x)$ is related to the Gaussian integral in the following way:

Now we dive into this integral and see what we can get.

Let’s admit, trying to compute the integral straightforward is somewhat unrealistic. So we need to go through an alternative way. For convenience (of writing MathJax codes) we may write $\varphi(t)=\hat{f}_c(t)$.

First of all, $\hat{f}_c(t)$ is always well-defined, this is because

so we can compute it without worrying about anything.

It’s hard to think about but we do have it. An integration by parts gives

On the other hand, we have

(The well-definedness of the integral can be verified easily.) Combining both, we obtain an differential equation

This differential equation corresponds to an integral equation

And we solve it to obtain

or alternatively,

Now put the initial value back in. As we have shown above, this subjects to the Gaussian integral

Therefore

is exactly what we want.

Before showing another method, we first have an question: can we have $\hat{f}_c=f_c$? Solving an equation with variable in $c$ answers this question affirmatively:

In other words, $f_\frac{1}{2}$ is a fixed point of the Fourier transform. For this class of functions, the fixed point is this and only this one.

We can also make use of the Gaussian integral to get what we want.

As a classic property of the Fourier transform, for $f,g \in L^1$, we have

where

By the way, $f \in L^1$ means $\int_{-\infty}^{\infty}|f(x)|dx<\infty$. One can verify that $f \ast g \in L^1$ here as well.

With this result, we can compute $f_a \ast f_b$ easily. Note

We expect that there exist some $\gamma$ and $c$ such that $f_a \ast f_b = \gamma f_c$. In other words, we are looking for $\gamma,c \in \mathbb{R}$ such that

We should have

We also have

Therefore

where $c$ is given above. We do not even have to compute the integral of convolution explicitly.

]]>Is intended to supply a detailed proofs of the Riemann mapping theorem.

Riemann mapping theorem.Every simply connected region $\Omega \subsetneq \mathbb{C}$ is conformally equivalent to the open unit disc $U$.

Fortunately the proof can be found in many textbooks of complex analysis, but the proof is fairly technical so it can be painful to read. This post can be considered as a painkiller. In this post you will see the proof being filled with many details. However, the writer still encourage the reader to reproduce the proof by their own pen and paper. The writer also hopes that this post can increase the accessibility of this theorem and the proof.

However, there is a bar. We need to assume some background in complex analysis, although they are very basic already. Minimal prerequisite is being able to answer the following questions.

Contour integration, Cauchy’s formula.

Almost uniform convergence. Let $\Omega \subset \mathbb{C}$ be open and suppose that $f_j \in H(\Omega)$ for all $j=1,2,\dots$, and $f_j \to f$ uniformly on every compact subset $K \subset \Omega$. Does $f \in H(\Omega)$? What is the uniform limit of $f’_j$? Informally, we call the phenomenon that a sequence of functions uniformly converges on every compact subset

*almost uniform*convergence. This has nothing to do with*almost everywhere*in integration theory. In fact, this post does not require background in Lebesgue integration theory.Open mapping theorem (complex analysis version).

Maximum modulus principle and some variants.

Rouché’s theorem. Or even more, the calculus of residues.

Despite of the prerequisites, we still need some preparation beforehand.

Definition 1.Let $X$ be aconnectedtopological space. We say $X$ issimply connectedif every curve is null-homotopic. Let $\gamma:[0,1] \to X$ be a closed curve, i.e., it is a continuous map such that $\gamma(0)=\gamma(1)$. We say $\gamma$ is null-homotopic if it is homotopic to a constant map $\gamma_0:[0,1] \to \{x\}$ with $x \in X$.

Intuitively, if $X$ is simply connected, then $X$ contains no “hole”. For example, the unit disc $U$ is simply connected. However, $U \setminus \{0\}$ is not. On the other hand, $U \setminus [0,1)$ is still simply connected. Another satisfying result is that every convex and connected open set is simply connected. This is up to a convex combination.

There are a lot of good properties of simply connected region, which will be summarised below.

Proposition 1.For a region (open and connected subset of $\mathbb{R}^2$), the following conditions are equivalent. Each one can imply other eight.

- $\Omega$ is homeomorphic to the open unit disc $U$.
- $\Omega$ is simply connected.
- $\operatorname{Ind}_\gamma(\alpha)=0$ for every path $\gamma$ in $\Omega$ and $\alpha \in S^2 \setminus \Omega$, where $S^2$ is the Riemann sphere.
- $S^2 \setminus \Omega$ is connected.
- Every $f \in H(\Omega)$ can be approximated by polynomials, almost uniformly..
- For every $f \in H(\Omega)$ and every closed path $\gamma$ in $\Omega$,

- Every $f \in H(\Omega)$ has anti-derivative. That is, there exists an $F \in H(\Omega)$ such that $F’=f$.
- If $f \in H(\Omega)$ and $1/f \in H(\Omega)$, then there exists a $g \in H(\Omega)$ such that $f=\exp{g}$.
- For such $f$, there also exists a $\varphi \in H(\Omega)$ such that $f=\varphi^2$.

5~9 are pretty much saying, calculus is fine here and we are not worrying about nightmare counterexamples, to some extent. Most of the implications $n \implies n+1$ are not that difficult, but there are some deserve a mention. 4 implying 5 is a consequence of Runge’s theorem. In the implication of 7 to 8, one needs to use the fact that $\Omega$ is connected. When we have $f=\exp{g}$, then we can put $\varphi=\exp\frac{g}{2}$ from which we obtain $f=\varphi^2$. 9 implying 1 is partly a consequence of the Riemann mapping theorem. Indeed, if $\Omega$ is the plane then the homeomorphism is easy: $z \mapsto \frac{z}{1+|z|}$ is a homeomorphism of $\Omega$ onto $U$. But we need the Riemann mapping theorem to give the remaining part, when $\Omega$ is a proper subset.

If you know the definition of sheaf, you will realise that $(\mathbb{C},H(\cdot))$ is indeed a sheaf. For each open subset $\Omega \subset \mathbb{C}$, $H(\Omega)$ is a ring, even more precisely, a $\mathbb{C}$-algebra. The exponential map $\exp:g \mapsto e^g$ is a sheaf morphism. However, we now see that it is surjective if and only if $\Omega$ is simply connected. I hope this can help you figure out an exercise in algebraic geometry. You know, that celebrated book by Robin Hartshorne.

Since we haven’t prove the Riemann mapping theorem, we cannot use the equivalence above yet. However, we can use 9 right away. This gives rise to Koebe’s square root trick.

Equicontinuity is quite an important concept. You may have seen it in differential equation, harmonic function, maybe just sequence of functions. We will use it to describe a family of functions, where almost uniform convergence can be well established.

Definition 2.Let $\mathscr{F}$ be a family of functions $(X,d) \to \mathbb{C}$ where $(X,d)$ is a metric space.We say that $\mathscr{F}$ is

equicontinuousif, to every $\varepsilon>0$, there corresponds a $\delta>0$ such that whenever $d(x,y)<\delta$, we have $|f(x)-f(y)|<\varepsilon$ for all $f \in \mathscr{F}$. In particular, by definition, all functions in $\mathscr{F}$ are uniformly continuous.We say that $\mathscr{F}$ is

pointwise boundedif, to every $x \in X$, there corresponds some $0 \le M(x) < \infty$ such that $|f(x)| \le M(x)$ for every $f \in \mathscr{F}$.We say that $\mathscr{F}$ is

uniformly bounded on each compact subsetif, to each compact $K \subset X$, there corresponds a number $M(K)$ such that $|f(z)| \le M(K)$ for all $f \in \mathscr{F}$ and $z \in K$.

These concepts are talking about “a family of” continuity and boundedness. In our proof of the Riemann mapping theorem, we do not construct the map explicitly, instead, we will use these concepts above to obtain one (which is a limit) that exists. In this post we simply put $X=\Omega \subset \mathbb{C}$, a simply connected region and $d$ is the natural one.

A famous result of equicontinuity is Arzelà-Ascoli, which says that pointwise boundedness and equicontinuity implies almost uniform convergence.

Theorem 1 (Arzelà-Ascoli)Let $\mathscr{F}$ be a family of complex functions on a metric space $X$, which is pointwise bounded and equicontinuous. $X$ is separable, i.e., it contains a countable dense set. Then every sequence $\{f_n\}$ in $\mathscr{F}$ has then a subsequence that converges uniformly on every compact subset of $X$.

Here is a self-contained proof.

Certainly it is OK to let $X$ be a subset of $\mathbb{R}$, $\mathbb{C}$ or their product. We use this in real and complex analysis for this reason. We will need this almost uniform convergence to establish our conformal map. To specify its application in complex analysis, we introduce the concept of normal family.

Definition 3.Suppose $\mathscr{F} \subset H(\Omega)$, for some region $\Omega \subset \mathbb{C}$. We call $\mathscr{F}$ anormal familyif every sequence of members of $\mathscr{F}$ contains a subsequence, which converges uniformly on every compact subset of $\mathscr{F}$. The limit function is not required to be in $\mathscr{F}$.

We now apply Arzelà-Ascoli to complex analysis.

Theorem 2 (Montel).Suppose $\mathscr{F} \subset H(\Omega)$ is uniformly bounded, then $\mathscr{F}$ is a normal family.

*Proof.* We need to show that $\mathscr{F}$ is “almost” equicontinuous, since uniformly boundedness clearly implies pointwise boundedness, we can apply Arzelà-Ascoli later.

Let $\{K_n\}$ be a sequence of compact sets such that (1) $\bigcup_n K_n = \Omega$ and (2) $K_n \subset K^\circ_{n+1} \subset K_{n+1}$, the interior of $K_{n+1}$. Then for **every** $z \in K_n$, there exists a positive number $\delta_n$ such that

where $D(a,r)$ is the disc centred at $a$ with radius $r$. If such $\delta_n$ does not exist, then there exists a point $z \in K_{n}$ such that whenever $\delta>0$, $D(z,\delta) \setminus K_{n+1} \ne \varnothing$, which is to say, $z$ is a boundary point. But this is impossible because $z$ lies in the interior of $K_{n+1}$ by definition.

For such $\delta_n$, we pick $z’,z’’ \in K_n$ such that $|z’-z’’| < \delta_n$. Let $\gamma$ be the positively oriented circle with centre at $z’$ and radius $2\delta_n$, i.e. the boundary of $D(z’,2\delta_n)$. Recall that the Cauchy formula says

We will make use of this. By the formula above, we have

Now we make use of our choice of $z’$, $z’’$ and $\gamma$. By definition, for $\zeta \in \gamma^\ast$ (the range of $\gamma$), we have $|\zeta-z’|=2\delta_n$. Since $|z’-z’’|<\delta_n$, we have $|\zeta-z’|=2\delta_n=|\zeta-z’’+z’’-z|\le |\zeta-z’’|+|z’’-z’|$. Therefore $|\zeta-z’’| \ge 2\delta_n-|z’’-z’|>\delta_n$. Bearing this in mind, we see

This may looks confusing so we explain it a little more. Since $D(z’,2\delta_n) \subset K^\circ_{n+1}$, we must have $\overline{D}(z’,2\delta_n) \subset K_{n+1}$, therefore whenever $\zeta \in \gamma^\ast=\partial D(z’,2\delta_n)$, we have $|f(\zeta)| \le M(K_{n+1})$. This is where we use the hypothesis of uniformly bounded. we have $|(\zeta-z’)(\zeta-z’’)|>2\delta_n\delta_n$. The integral of the norm of the integrand $\frac{f(\zeta)}{(\zeta-z’)(\zeta-z’’)}$, is therefore bounded by $\frac{M(K_{n+1})}{2\delta_n^2}$. The integral over $\gamma$ is therefore bounded by $\frac{M(K_{n+1})}{2\delta_n^2}$ times $2\pi\delta_n$ and the result follows.

What does this inequality imply? For $\varepsilon>0$, if we pick $\delta=\min\{\delta_n,\frac{2\delta_n\varepsilon}{M(K_{n+1})}\}$, then $|f(z’)-f(z’’)|<\varepsilon$ for every $f \in \mathscr{F}$ and $|z’-z’’|<\delta$. That is, for each $K_n$, the **restrictions** of the members of $\mathscr{F}$ to $K_n$ form an equicontinuous family.

Now consider a sequence $\{f_j\}$ in $\mathscr{F}$. For each $n$, we apply Arzelà-Ascoli theorem to the restriction of $\mathscr{F}$ to $K_n$, and it gives us an infinite subset $S_n \subset \mathbb{N}$ such that $f_j$ converges uniformly on $K_n$ as $j \to \infty $ and $j \in S_n$. Note we can make sure $S_n \supset S_{n+1}$ because if the subsequence converges uniformly within $S_{n+1}$ then it converges uniformly within $S_n$ as well. Pick a new sequence $\{s_j\}$ where $s_j \in S_j$, then we see $\lim_{j \to \infty}f_{s_j}$ converges uniformly on every $K_n$ and therefore on every compact subset $K$ of $\Omega$. The statement is now proved. $\square$

**Remarks.** We have no idea what the limit is, and this happens in our proof of the Riemann map theorem as well.

The sequence $\{K_n\}$ can be constructed explicitly, however. In fact, for every open set $\Omega$ in the plane there is a sequence $\{K_n\}$ of compact sets such that

- $\bigcup_n K_n=\Omega$.
- $K_n \subset K_{n+1}^\circ$.
- For every compact $K \subset \Omega$, there is some $n$ such that $K \subset K_n$.
- Every component of $S^2 \setminus K_n$ contains a component of $S^2 \setminus \Omega$.

The set is constructed as follows and can be verified to satisfy what we want above. or each $n$, define

Then $K=S^2 \setminus V_n$ is what we want.

Is another important tool for our proof of the Riemann mapping theorem. We need this lemma to establish important inequalities. This lemma as well as its variants show the rigidity of holomorphic maps. We make use of the maximum modulus theorem. For simplicity, let $H^\infty$ be the Banach space of bounded holomorphic functions on $U$, equipped with supremum norm $| \cdot |_\infty$.

Theorem 3 (Schwarz lemma).Suppose $f:U \to \mathbb{C}$ is a holomorphic map in $H^\infty$ such that $f(0)=0$ and $|f|_\infty \le 1$, thenon the other hand, if $|f(z)|=|z|$ holds for some $z \in U \setminus \{0\}$, or if $|f’(0)|=1$ holds, then $f(z)=\lambda{z}$ for some complex constant $\lambda$ such that $|\lambda|=1$.

*Proof.* Since $f(0)=0$, $f(z)/z$ has a removable singularity at $z=0$. Hence there exists $g \in H(U)$ such that $f(z)=zg(z)$. Fix $0<r<1$. For any $z \in U$ such that $|z|<r$, we have

Therefore when $r \to 1$, we see $|g(z)| \le 1$ for all $z \in U$. Therefore $|f(z)| \le |z|$ follows. On the other hand, if $|g(z)|=1$ at some point, the maximum modulus forces $g(z)$ to be a constant, say $\lambda$, from which it follows that $|\lambda|=|g(z)|=1$ and $f(z)=\lambda{z}$. $\square$

There are many variances of the Schwarz lemma, and we will be using Schwarz-Pick.

Definition 4.For any $\alpha \in U$, define

This family is a subfamily of Möbius transformation, but we are not paying very much attention to this family right now. We need the fact that such $\varphi_\alpha$ is always a one-to-one mapping which carries $S^1$ (the unit circle) onto $S^1$ and $U$ onto $U$ and $\alpha$ to $0$. This requires another application of the maximum modulus theorem. A direct computation shows that

Theorem 4 (Schwarz-Pick lemma).Suppose $\alpha,\beta \in U$, $f \in H^\infty$ and $| f|_\infty \le 1$, $f(\alpha)=\beta$. Then

*Proof.* Consider

We see $g \in H^\infty$ and $|g|_\infty \le 1$. What’s more important, $g(0)=\varphi_\beta \circ f(\alpha)=\varphi_\beta(\beta)=0$. By the Schwarz lemma, $|g’(0)| \le 1$. On the other hand, we see

and therefore

In particular, equality holds if and only if $g(z)=\lambda{z}$ for some constant $\lambda$. If this is the case, then

The story can go on but we halt here and continue our story of the Riemann mapping theorem.

Each $z \ne 0$ determines a *direction* from the origin, which can be described by

Let $f:\Omega \to \mathbb{C}$ be a map. We say $f$ *preserves angles* at $z_0 \in \Omega$ if

exists and is independent of $\theta$.

Conformal mappings preserves angles in a reasonable way. A function $f$ is **conformal** if it is holomorphic and $f’(z) \ne 0$ everywhere. We have a theorem describes that, but it is pretty elementary so we are not including the proof in this post.

Theorem 5.Let $f$ map a region $\Omega$ into the plane. If $f’(z_0)$ exists at some $z_0 \in \Omega$ and $f’(z_0) \ne 0$, then $f$ preserves angles at $z_0$. Conversely, if the differential $Df$ exists and is different from $0$ at $z_0$, and if $f$ preserves angles at $z_0$, then $f’(z_0)$ exists and is different from $0$.

There is no confusion about $f’(z_0)$. By differential $Df$ we mean a linear map $L:\mathbb{R}^2 \to \mathbb{R}^2$ such that, writing $z_0=(x_0,y_0)$, we have

where $\eta(x,y) \to 0$ as $x \to 0$ and $y \to 0$. To prove this, one can assume that $z_0=f(z_0)=0$. When the differential exists, one writes

We say that two regions $\Omega_1$ and $\Omega_2$ are **conformally equivalent** if there is a conformal one-to-one mapping of $\Omega_1$ onto $\Omega_2$. The Riemann mapping theorem states that

Theorem 6 (Riemann mapping theorem).Every proper simply connected region $\Omega$ in the plane is conformally equivalent to the open unit disc $U$.

As a famous example, the upper plane $\mathbb{H}$ is conformally equivalent to $U$ by the Cayley transform.

As one may expect, this theorem asserts that the study of a simply connected region $\Omega$ can be reduced to $U$ to some extent. But a conformal equivalence is not just about homeomorphism. If $\varphi:\Omega_1 \to \Omega_2$ is a conformal one-to-one mapping, then $\varphi^{-1}:\Omega_2 \to \Omega_1$ is also a conformal mapping. In the language of algebra, such a mapping $\varphi$ **induces** a ring isomorphism

Therefore, the ring $H(\Omega_2)$ is algebraically the same as $H(\Omega_1)$. The Riemann mapping theorem also states that, if $\Omega$ is a simply connected region, then $H(\Omega) \cong H(U)$. From this we can exploit much more information on top of homeomorphism. One can also extend the story to $S^2$, the Riemann sphere, but that’s another story.

The proof is fairly technical. But it is a good chance to attest to our skill in complex analysis. The bread and butter of this proof is the following set:

Our is to prove that there is some $\psi \in \Sigma$ such that $\psi(\Omega)=U$. Note, once the non-emptiness is proved, since $|\psi|<1$ uniformly, we see $\Sigma$ is a **normal family**.

Pick $w_0 \in \mathbb{C} \setminus \Omega$. Then $g(z)=z-w_0 \in H(\Omega)$ and what is more important, $\frac{1}{g} \in H(\Omega)$. By 9 of proposition 1, there exists $\varphi \in H(\Omega)$ such that $\varphi^2(z)=g(z)$, i.e., informally, $\varphi(z)=\sqrt{z-w_0}$ in $\Omega$. If $\varphi(z_1)=\varphi(z_2)$, then $\varphi(z_1)^2=\varphi(z_2)^2=z_1-w_0=z_2-w_0$ and then $z_1=z_2$. Therefore $\varphi$ is one-to-one. On the other hand, if $\varphi(z_1)=-\varphi(z_2)$, we still have $\varphi^2(z_1)=\varphi^2(z_2)=z_1-w_0=z_2-w_0$, and $z_1=z_2$. This is shows that the “square-root” is well-defined here. This is the Koebe’s square root trick.

Since $\varphi$ is an open mapping, there is an open disc $D(a,r) \subset \varphi(\Omega)$, where $a \in \varphi(\Omega)$, $a \ne 0$ and $0<r<|a|$. But by arguments above we have $-a \not\in \varphi(\Omega)$, and therefore $D(-a,r) \cap \varphi(\Omega) = \varnothing$. For this reason, we can put

It follows that

and therefore $\psi(\Omega) \subset U$. Since $\varphi$ is one-to-one, $\psi$ is one-to-one as well and we deduce that $\psi \in \Sigma$, this set is not empty.

**Remark.** You may have trouble believing that $D(-a,r) \cap \varphi(\Omega)=\varnothing$. But if we pick any $w \in D(-a,r) \cap \varphi(\Omega)$, we have some $z’ \in \Omega$ such that $\varphi(z’)=w$. We also have $|-a-w|<r$ but this implies $|a-(-w)|=|a+w|=|-a-w|<r$, and therefore $-w \in D(a,r) \subset \varphi(\Omega)$. There exists some $z’’ \in \Omega$ such that $\varphi(z’’)=-w$. Hence $-w=w=0$. It follows that $|a|<r$ and this is a contradiction.

Since $D(-a,r) \cap \varphi(\Omega)=\varnothing$, we have $|\varphi(z)-(-a)|>r$ for all $z \in \Omega$ and therefore $|\psi(z)|<1$ is not a problem either.

If $\psi \in \Sigma$ and $\psi(\Omega) \subsetneqq U$, and $z_0 \in \Omega$, then there exists a $\psi_1 \in \Sigma$ such that $|\psi_1’(z_0)|>|\psi’(z_0)|$.

This step shows that we can “enlarge” the range in some way.

For convenience we use the Möbius transformation

Pick $\alpha \in U \setminus \psi(\Omega)$. Then $\varphi_\alpha \circ \psi \in \Sigma$ and $\varphi_\alpha \circ \psi$ has no zero in $\Omega$. Hence there is some $g \in H(\Omega)$ such that

Since $\varphi_\alpha \circ \psi$ is one-to-one, another application of Koebe’s square root trick shows that $g$ is one-to-one. Therefore we have $g \in \Sigma$ as well. If $\psi_1=\varphi_\beta \circ g$ where $\beta=g(z_0)$, we have $\psi_1 \in \Sigma$ (one-to-one). In particular, $\psi_1(z_0)=0$.

By putting $s(z)=z^2$, we have

If we put $F(z)=\varphi_{-\alpha} \circ s \circ \varphi_{-\beta}(z)$, then the chain rule shows that

(Note we used the fact that $\psi_1’(z_0)=0$.) If we can prove that $0<|F’(0)|<1$ then this step is complete. Note $F$ satisfy the condition in Schwarz-Pick lemma and therefore

The first equality does not hold because $F$ is not of the form $\varphi_{-\sigma}(\lambda\varphi_{\eta}(z))$ for $|\lambda|=1$. On the other hand we have

Therefore $0<|F’(0)|<1$ and the this step is complete.

We take the contraposition of step 2:

Fix $z_0 \in \Omega$. If $h \in \Sigma$ is an element such that $|h’(z_0)| \ge |\psi’(z_0)|$ for all $\psi \in \Sigma$, then $h(\Omega)=U$.

The proof is complete once we have found such a function! To do this, we use the fact that $\Sigma$ is a normal family. Put

By definition of $\eta$, there is a sequence $\{\psi_n\}$ such that $|\psi_n’(z_0)| \to \eta$ in $\Sigma$. By normality of $\Sigma$, we pick a subsequence $\varphi_k=\psi_{n_k}$ that converges uniformly on compact subsets of $\Omega$. Put the uniform limit to be $h \in H(\Omega)$. It follows that $|h’(z_0)|=\eta$. Since $\Sigma \ne \varnothing$ and $\eta \ne 0$, $h$ cannot be a constant. Since $\varphi_n(\Omega) \subset U$, we must have $h(\Omega) \subset \overline{U}$. But since $h$ is open, we are reduced to $h(\Omega) \subset U$.

It remains to show that $h$ is one-to-one. Fix distinct $z_1, z_2 \in \Omega$. Put $\alpha=h(z_1)$ and $\alpha_n=\varphi_n(z_1)$, then $\alpha_n \to \alpha$. Let $\overline{D}$ be a closed disc in $\Omega$ centred at $z_2$ with interior denoted by $D$ such that

- $z_1 \not\in \overline{D}$.
- $h-\alpha$ has no zero point on the boundary of $\overline{D}$.

We see $\varphi_n -\alpha_n$ converges to $h-\alpha$, uniformly on $\overline{D}$. They have no zero in $D$ since they are one-to-one and have a zero at $z_1$. By Rouché’s theorem, $h-\alpha$ has no zero in $D$ either, and in particular $h(z_2)-\alpha = h(z_2)-h(z_1) \ne 0$. This completes the proof. $\square$

**Remark.** First of all, such a $\overline{D}$ is accessible. This is because zero points of $h-\alpha$ has no limit point in $\Omega$, i.e., they are discrete (when defining $\overline{D}$, we don’t know how many are there yet).

Our choice of $\overline{D}$ enables us to use Rouché’s theorem (chances are you didn’t get it). Since $h-\alpha$ has no zero on the boundary, we have $\zeta=\inf_{z \in \partial D}|h(z)-\alpha|>0$. When $n$ is big enough, we see

The second inequality is another application of the maximum modulus theorem. Rouché’s theorem applies here naturally as well. $\square$

This proof is a reproduction of W. Rudin’s *Real and Complex Analysis*. For a comprehensive further reading, I highly recommend Tao’s blog post.

In the previous post we are convinced that the Galois group of a separable irreducible polynomial $f$ can be realised as a subgroup of the symmetric group, the elements of which permute the roots of $f$. We worked on cubic polynomials over a field with characteristic not equal to $2$ and $3$, and this definitely works with $\mathbb{Q}$. In this post we go one step further.

Let $f \in \mathbb{Q}[X]$ be an irreducible polynomial of prime degree $p$. Since it is also separable (see lemma 9.12.1 on the stack project), we can safely work on its Galois group $G$. One immediately wants to question the position of $\mathfrak{S}_p$. Indeed we have $G \subset \mathfrak{S}_p$. The question is, when does the equality hold? It is not likely to have an immediate answer. However, we have some interesting sufficient conditions, which will be discussed in this post.

We present some handy results in finite group theory that will be used in the main result. One may skip this section until needed. I will collapse the proof in case one wants to treat it as an exercise.

Lemma 1.Let $p$ be a prime number. The symmetric group $\mathfrak{S}_p$ is generated by $[12 \cdots p]$ and an arbitrary transposition $[rs]$.

*Proof.* We prove this by presenting several sets of generators of $\mathfrak{S}_n$ where $n$ is a positive integer.

It is generated by cycles. This is a really, really routine verification and sometimes this is assumed as a fact.

It is generated by transpositions, i.e., $2$-cycles. It suffices to show that a cycle is a product of transpositions. Indeed, for any cycle $[i_1\dots i_k]$ in $\mathfrak{S}_n$, we have $[i_1\cdots i_k]=[i_1i_2][i_2i_3]\cdots[i_{k-1}i_k]$. This proves our statement.

It is generated by translations of the form $[1k]$. It suffices to show that a transposition is generated as such. For any transposition $[rs]$, we have $[rs]=[1r][1s][1r]$.

It is generated by adjacent translations, i.e. the generators can be of the form $[k-1 ,k]$. This follows from the following identity:

- It is generated by two elements: $\sigma=[12]$ and $\tau=[12\cdots n]$. This follows from the following identity:

Now, back to the case when $n=p$ is prime. Put $\sigma=[rs]$ and $\tau=[12\cdots p]$. If $s-r=1$ then it is already proved in 5 by several conjugations. Therefore we may assume that $d=s-r>1$. From now on integers may be a number in either $\mathbb{Z}$ or $\mathbf{F}_p=\mathbb{Z}/p\mathbb{Z}$, depending on the context. Recall that $\mathbf{F}_p$ is a field. Pick the integer $w$ such that $dw=1$ in $\mathbf{F}_p$. By conjugation we see $\tau$ and $\sigma$ generate

The product of elements above is $[1,1+wd]=[12]$. Therefore we are still back to 5. $\square$

We have many good reasons to study the Galois group of *something*. It would be great if the group can be written down explicitly. In this section we show that the group can be revealed by the number of nonreal roots.

Proposition 1.Let $f(X) \in \mathbb{Q}[X]$ be an irreducible polynomial of prime degree. If $f$ has precisely two nonreal roots, then the Galois group $G$ over $\mathbb{Q}$ is $\mathfrak{S}_p$.

*Proof.* Let $L$ be the splitting field of $f$. It suffices to show that $G$ contains a transposition and a $p$-cycle, which is $[12\cdots p]$. By the Sylow’s theorem, $G$ has a subgroup $H$ of order $p$, which can only be cyclic. Say $H=\langle \sigma \rangle$. Suppose $\sigma$ is of cycle type $(k_1,\dots,k_r)$. Then the period of $\sigma$, which equals $p$, is the least common multiple of $k_1,\dots,k_r$, where $k_1+\dots+k_r=p$. This can only happen when $r=1$ and $k_1=p$. Therefore $\sigma$ is a $p$-cycle.

In fact, $\sigma$ can be considered as $[12\dots p]$. Suppose the order of roots of $f$ is given, for which we have $\sigma=[i_1 i_2 \dots i_p]$. Then If we re-order these roots, by putting the $k$th root to be the original $i_k$th root, then we can write $\sigma=[12\dots p]$. (This re-ordering is, in fact, a conjugation.)

It remains to prove that $G$ contains a transposition. Let $\alpha$ and $\beta$ be two nonreal roots of $f$. Since $\overline{\alpha}$ is also a root of $f$ (because coefficients of $f$ are real; if $\sum_{n=0}^{p}a_n\alpha^n=0$, then $\sum_{n=0}^{p}a_n\overline{\alpha}^n=\sum_{n=0}^{p}\overline{a_n\alpha^n}=\overline{0}=0$) we see $\beta=\overline{\alpha}$. Therefore complex conjugation over $\mathbb{Q}(\alpha)$ extends to $L$ as an element of order $2$, which is a transposition in $G$. This proves our assertion. $\square$

For example, consider the polynomial

With calculus one can show that it has exactly three roots, hence it has two nonreal roots. Eisenstein’s criterion shows that $f$ is irreducible. Therefore we are allowed to use proposition 1. The Galois group of $f$ is $\mathfrak{S}_5$.

This also works fine when $p=2$ or $3$. The case when $p=2$ is nothing but working around a quadratic polynomial. When $f(X)$ is irreducible of degree $3$, and it has two nonreal roots, we also know that it has an irrational root. Let the roots be $a+bi,a-bi,c$ where $b \ne 0$ and $c$ is irrational. We see

Therefore the Galois group is $\mathfrak{S}_3$.

It is way too ambitious to restrict ourselves in one single pair of roots. Also, it seems we have ignored the alternating group $\mathfrak{A}_p$ for no reason. Oz Ben-Shimol gave us a nice way to work around this (see arXiv:0709.2868). The whole paper is not easy but the result is pretty beautiful and generalised what we said above as $p \ge 5$.

Proposition 2.Let $f \in \mathbb{Q}[X]$ be an irreducible polynomial of prime degree $p \ge 5$. Suppose that $f$ has $k>0$ pairs of nonreal roots. If $p \ge 4k+1$, then the Galois group $G$ is isomorphic to $\mathfrak{A}_p$ or $\mathfrak{S}_p$. If $k$ is odd then $G \cong \mathfrak{S}_p$.

The proof is done by showing that $\mathfrak{A}_p \subset G \subset \mathfrak{S}_p$. As the index of $\mathfrak{A}_p$ is $2$, $G$ can only be one of them. The solvability of $G$ is also concerned here.

Indeed, what we have proved in “the simplest case” is nothing but $k=1$. When $p \ge 5$ we clearly have $p \ge 1+4 \times 1$. This refined the result of A. Bialostocki and T. Shaska (see arXiv:math/0601397), and the inequality used to be

When $k$ is big enough, we have $k(k\log{k}+2\log{k}+3) \ge 4k+1$. Oz Ben-Shimol’s result is a refinement because it is saying, $p$ does not need to that big. He also offered a refined algorithm to compute the Galois group, which we will present below. Also, computing $4k+1$ is much easier than computing $k^2\log{k}$ plus something.

1 | Input: An irreducible polynomial f(X) over Q with prime degree p >= 5 |

Here, $\Delta(f)$ is the discriminant of $f$. We have seen that whether $\Delta$ is a perfect square matters a lot. The discussion of `ReductionMethod`

can be trailed in Oz Ben-Shimol’s paper.