Dedekind Domain and Properties in an Elementary Approach

You can find contents about Dedekind domain (or Dedekind ring) in almost all algebraic number theory books. But many properties can be proved inside ring theory. I hope you can find the solution you need in this post, and this post will not go further than elementary ring theory. With that being said, you are assumed to have enough knowledge of ring and ring of fractions (this post serves well), but not too much mathematics maturity is assumed (at the very least you are assumed to be familiar with terminologies in the linked post).\(\def\mb{\mathbb}\) \(\def\mfk{\mathfrak}\)


There are several ways to define Dedekind domain since there are several equivalent statements of it. We will start from the one based on ring of fractions. As a friendly reminder, \(\mb{Z}\) or any principal integral domain is already a Dedekind domain. In fact Dedekind domain may be viewed as a generalization of principal integral domain.

Let \(\mfk{o}\) be an integral domain (a.k.a. entire ring), and \(K\) be its quotient field. A Dedekind domain is an integral domain \(\mfk{o}\) such that the fractional ideals form a group under multiplication. Let's have a breakdown. By a fractional ideal \(\mfk{a}\) we mean a nontrivial additive subgroup of \(K\) such that

  • \(\mfk{o}\mfk{a}=\mfk{a}\),
  • there exists some nonzero element \(c \in \mfk{o}\) such that \(c\mfk{a} \subset \mfk{o}\).

What does the group look like? As you may guess, the unit element is \(\mfk{o}\). For a fractional ideal \(\mfk{a}\), we have the inverse to be another fractional ideal \(\mfk{b}\) such that \(\mfk{ab}=\mfk{ba}=\mfk{o}\). Note we regard \(\mfk{o}\) as a subring of \(K\). For \(a \in \mfk{o}\), we treat it as \(a/1 \in K\). This makes sense because the map \(i:a \mapsto a/1\) is injective. For the existence of \(c\), you may consider it as a restriction that the 'denominator' is bounded. Alternatively, we say that fractional ideal of \(K\) is a finitely generated \(\mfk{o}\)-submodule of \(K\). But in this post it is not assumed that you have learned module theory.

Let's take \(\mb{Z}\) as an example. The quotient field of \(\mb{Z}\) is \(\mb{Q}\). We have a fractional ideal \(P\) where all elements are of the type \(\frac{np}{2}\) with \(p\) prime and \(n \in \mb{Z}\). Then indeed we have \(\mb{Z}P=P\). On the other hand, take \(2 \in \mb{Z}\), we have \(2P \subset \mb{Z}\). For its inverse we can take a fractional ideal \(Q\) where all elements are of the type \(\frac{2n}{p}\). As proved in algebraic number theory, the ring of algebraic integers in a number field is a Dedekind domain.

Before we go on we need to clarify the definition of ideal multiplication. Let \(\mfk{a}\) and \(\mfk{b}\) be two ideals, we define \(\mfk{ab}\) to be the set of all sums \[ x_1y_1+\cdots+x_ny_n \] where \(x_i \in \mfk{a}\) and \(y_i \in \mfk{b}\). Here the number \(n\) means finite but is not fixed. Alternatively we cay say \(\mfk{ab}\) contains all finite sum of products of \(\mfk{a}\) and \(\mfk{b}\).


(Proposition 1) A Dedekind domain \(\mfk{o}\) is Noetherian.

By Noetherian ring we mean that every ideal in a ring is finitely generated. Precisely, we will prove that for every ideal \(\mfk{a} \subset \mfk{o}\) there are \(a_1,a_2,\cdots,a_n \in \mfk{a}\) such that, for every \(r \in \mfk{a}\), we have an expression \[ r = c_1a_1 + c_2a_2 + \cdots + c_na_n \qquad c_1,c_2,\cdots,c_n \in \mfk{o}. \] Also note that any ideal \(\mfk{a} \subset \mfk{o}\) can be viewed as a fractional ideal.

Proof. Since \(\mfk{a}\) is an ideal of \(\mfk{o}\), let \(K\) be the quotient field of \(\mfk{o}\), we see since \(\mfk{oa}=\mfk{a}\), we may also view \(\mfk{a}\) as a fractional ideal. Since \(\mfk{o}\) is a Dedekind domain, and fractional ideals of \(\mfk{a}\) is a group, there is an fractional ideal \(\mfk{b}\) such that \(\mfk{ab}=\mfk{ba}=\mfk{o}\). Since \(1 \in \mfk{o}\), we may say that there exists some \(a_1,a_2,\cdots, a_n \in \mfk{a}\) and \(b_1,b_2,\cdots,b_n \in \mfk{o}\) such that \(\sum_{i = 1 }^{n}a_ib_i=1\). For any \(r \in \mfk{a}\), we have an expression \[ r = rb_1a_1+rb_2a_2+\cdots+rb_na_n. \] On the other hand, any element of the form \(c_1a_1+c_2a_2+\cdots+c_na_n\), by definition, is an element of \(\mfk{a}\). \(\blacksquare\)

From now on, the inverse of an fractional ideal \(\mfk{a}\) will be written like \(\mfk{a}^{-1}\).

(Proposition 2) For ideals \(\mfk{a},\mfk{b} \subset \mfk{o}\), \(\mfk{b}\subset\mfk{a}\) if and only if there exists some \(\mfk{c}\) such that \(\mfk{ac}=\mfk{b}\) (or we simply say \(\mfk{a}|\mfk{b}\))

Proof. If \(\mfk{b}=\mfk{ac}\), simply note that \(\mfk{ac} \subset \mfk{a} \cap \mfk{c} \subset \mfk{a}\). For the converse, suppose that \(a \supset \mfk{b}\), then \(\mfk{c}=\mfk{a}^{-1}\mfk{b}\) is an ideal of \(\mfk{o}\) since \(\mfk{c}=\mfk{a}^{-1}\mfk{b} \subset \mfk{a}^{-1}\mfk{a}=\mfk{o}\), hence we may write \(\mfk{b}=\mfk{a}\mfk{c}\). \(\blacksquare\)

(Proposition 3) If \(\mfk{a}\) is an ideal of \(\mfk{o}\), then there are prime ideals \(\mfk{p}_1,\mfk{p}_2,\cdots,\mfk{p}_n\) such that \[ \mfk{a}=\mfk{p}_1\mfk{p}_2\cdots\mfk{p}_n. \]

Proof. For this problem we use a classical technique: contradiction on maximality. Suppose this is not true, let \(\mfk{A}\) be the set of ideals of \(\mfk{o}\) that cannot be written as the product of prime ideals. By assumption \(\mfk{U}\) is nonempty. Since as we have proved, \(\mfk{o}\) is Noetherian, we can pick an maximal element \(\mfk{a}\) of \(\mfk{A}\) with respect to inclusion. If \(\mfk{a}\) is maximal, then since all maximal ideals are prime, \(\mfk{a}\) itself is prime as well. If \(\mfk{a}\) is properly contained in an ideal \(\mfk{m}\), then we write \(\mfk{a}=\mfk{m}\mfk{m}^{-1}\mfk{a}\). We have \(\mfk{m}^{-1}\mfk{a} \supsetneq \mfk{a}\) since if not, we have \(\mfk{a}=\mfk{ma}\), which implies \(\mfk{m}=\mfk{o}\). But by maximality, \(\mfk{m}^{-1}\mfk{a}\not\in\mfk{U}\), hence it can be written as a product of prime ideals. But \(\mfk{m}\) is prime as well, we have a prime factorization for \(\mfk{a}\), contradicting the definition of \(\mfk{U}\).

Next we show uniqueness up to permutation. If \[ \mfk{p}_1\mfk{p}_2\cdots\mfk{p}_k=\mfk{q}_1\mfk{q}_2\cdots\mfk{q}_j, \] since \(\mfk{p}_1\mfk{p}_2\cdots\mfk{p}_k\subset\mfk{p}_1\) and \(\mfk{p}_1\) is prime, we may assume that \(\mfk{q}_1 \subset \mfk{p}_1\). By the property of fractional ideal we have \(\mfk{q}_1=\mfk{p}_1\mfk{r}_1\) for some fractional ideal \(\mfk{r}_1\). However we also have \(\mfk{q}_1 \subset \mfk{r}_1\). Since \(\mfk{q}_1\) is prime, we either have \(\mfk{q}_1 \supset \mfk{p}_1\) or \(\mfk{q}_1 \supset \mfk{r}_1\). In the former case we get \(\mfk{p}_1=\mfk{q}_1\), and we finish the proof by continuing inductively. In the latter case we have \(\mfk{r}_1=\mfk{q}_1=\mfk{p}_1\mfk{q}_1\), which shows that \(\mfk{p}_1=\mfk{o}\), which is impossible. \(\blacksquare\)

(Proposition 4) Every nontrivial prime ideal \(\mfk{p}\) is maximal.

Proof. Let \(\mfk{m}\) be an maximal ideal containing \(\mfk{p}\). By proposition 2 we have some \(\mfk{c}\) such that \(\mfk{p}=\mfk{mc}\). If \(\mfk{m} \neq \mfk{p}\), then \(\mfk{c} \neq \mfk{o}\), and we may write \(\mfk{c}=\mfk{p}_1\cdots\mfk{p}_n\), hence \(\mfk{p}=\mfk{m}\mfk{p}_1\cdots\mfk{p}_n\), which is a prime factorisation, contradicting the fact that \(\mfk{p}\) has a unique prime factorisation, which is \(\mfk{p}\) itself. Hence any maximal ideal containing \(\mfk{p}\) is \(\mfk{p}\) itself. \(\blacksquare\)

(Proposition 5) Suppose the Dedekind domain \(\mfk{o}\) only contains one prime (and maximal) ideal \(\mfk{p}\), let \(t \in \mfk{p}\) and \(t \not\in \mfk{p}^2\), then \(\mfk{p}\) is generated by \(t\).

Proof. Let \(\mfk{t}\) be the ideal generated by \(t\). By proposition 3 we have a factorisation \[ \mfk{t}=\mfk{p}^n \] for some \(n\) since \(\mfk{o}\) contains only one prime ideal. According to proposition 2, if \(n \geq 3\), we write \(\mfk{p}^n=\mfk{p}^2\mfk{p}^{n-2}\), we see \(\mfk{p}^2 \supset \mfk{p}^n\). But this is impossible since if so we have \(t \in \mfk{p}^n \subset \mfk{p}^2\) contradicting our assumption. Hence \(0<n<3\). But If \(n=2\) we have \(t \in \mfk{p}^2\) which is also not possible. So \(\mfk{t}=\mfk{p}\) provided that such \(t\) exists.

For the existence of \(t\), note if not, then for all \(t \in \mfk{p}\) we have \(t \in \mfk{p}^2\), hence \(\mfk{p} \subset \mfk{p}^2\). On the other hand we already have \(\mfk{p}^2 = \mfk{p}\mfk{p}\), which implies that \(\mfk{p}^2 \subset \mfk{p}\) (proposition 2), hence \(\mfk{p}^2=\mfk{p}\), contradicting proposition 3. Hence such \(t\) exists and our proof is finished. \(\blacksquare\)

Characterisation of Dedekind domain

In fact there is another equivalent definition of Dedekind domain:

A domain \(\mfk{o}\) is Dedekind if and only if

  • \(\mfk{o}\) is Noetherian.
  • \(\mfk{o}\) is integrally closed.
  • \(\mfk{o}\)​ has Krull dimension \(1\)​ (i.e. every non-zero prime ideals are maximal).

This is equivalent to say that faction ideals form a group and is frequently used by mathematicians as well. But we need some more advanced techniques to establish the equivalence. Presumably there will be a post about this in the future.

Several ways to prove Hardy's inequality

Suppose \(1 < p < \infty\) and \(f \in L^p((0,\infty))\) (with respect to Lebesgue measure of course) is a nonnegative function, take \[ F(x) = \frac{1}{x}\int_0^x f(t)dt \quad 0 < x <\infty, \] we have Hardy's inequality \(\def\lrVert[#1]{\lVert #1 \rVert}\) \[ \lrVert[F]_p \leq q\lrVert[f]_p \] where \(\frac{1}{p}+\frac{1}{q}=1\) of course.

There are several ways to prove it. I think there are several good reasons to write them down thoroughly since that may be why you find this page. Maybe you are burnt out since it's left as exercise. You are assumed to have enough knowledge of Lebesgue measure and integration.

Minkowski's integral inequality

Let \(S_1,S_2 \subset \mathbb{R}\) be two measurable set, suppose \(F:S_1 \times S_2 \to \mathbb{R}\) is measurable, then \[ \left[\int_{S_2} \left\vert\int_{S_1}F(x,y)dx \right\vert^pdy\right]^{\frac{1}{p}} \leq \int_{S_1} \left[\int_{S_2} |F(x,y)|^p dy\right]^{\frac{1}{p}}dx. \] A proof can be found at here by turning to Example A9. You may need to replace all measures with Lebesgue measure \(m\).

Now let's get into it. For a measurable function in this place we should have \(G(x,t)=\frac{f(t)}{x}\). If we put this function inside this inequality, we see \[ \begin{aligned} \lrVert[F]_p &= \left[\int_0^\infty \left\vert \int_0^x \frac{f(t)}{x}dt \right\vert^p dx\right]^{\frac{1}{p}} \\ &= \left[\int_0^\infty \left\vert \int_0^1 f(ux)du \right\vert^p dx\right]^{\frac{1}{p}} \\ &\leq \int_0^1 \left[\int_0^\infty |f(ux)|^pdx\right]^{\frac{1}{p}}du \\ &= \int_0^1 \left[\int_0^\infty |f(ux)|^pudx\right]^{\frac{1}{p}}u^{-\frac{1}{p}}du \\ &= \lrVert[f]_p \int_0^1 u^{-\frac{1}{p}}du \\ &=q\lrVert[f]_p. \end{aligned} \] Note we have used change-of-variable twice and the inequality once.

A constructive approach

I have no idea how people came up with this solution. Take \(xF(x)=\int_0^x f(t)t^{u}t^{-u}dt\) where \(0<u<1-\frac{1}{p}\). Hölder's inequality gives us \[ \begin{aligned} xF(x) &= \int_0^x f(t)t^ut^{-u}dt \\ &\leq \left[\int_0^x t^{-uq}dt\right]^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}} \\ &=\left(\frac{1}{1-uq}x^{1-uq}\right)^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}} \end{aligned} \] Hence \[ \begin{aligned} F(x)^p & \leq \frac{1}{x^p}\left\{\left(\frac{1}{1-uq}x^{1-uq}\right)^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}}\right\}^{p} \\ &= \left(\frac{1}{1-uq}\right)^{\frac{p}{q}}x^{\frac{p}{q}(1-uq)-p}\int_0^x f(t)^pt^{up}dt \\ &= \left(\frac{1}{1-uq}\right)^{p-1}x^{-up-1}\int_0^x f(t)^pt^{up}dt \end{aligned} \]

Note we have used the fact that \(\frac{1}{p}+\frac{1}{q}=1 \implies p+q=pq\) and \(\frac{p}{q}=p-1\). Fubini's theorem gives us the final answer: \[ \begin{aligned} \int_0^\infty F(x)^pdx &\leq \int_0^\infty\left[\left(\frac{1}{1-uq}\right)^{p-1}x^{-up-1}\int_0^x f(t)^pt^{up}dt\right]dx \\ &=\left(\frac{1}{1-uq}\right)^{p-1}\int_0^\infty dx\int_0^x f(t)^pt^{up}x^{-up-1}dt \\ &=\left(\frac{1}{1-uq}\right)^{p-1}\int_0^\infty dt\int_t^\infty f(t)^pt^{up}x^{-up-1}dx \\ &=\left(\frac{1}{1-uq}\right)^{p-1}\frac{1}{up}\int_0^\infty f(t)^pdt. \end{aligned} \] It remains to find the minimum of \(\varphi(u) = \left(\frac{1}{1-uq}\right)^{p-1}\frac{1}{up}\). This is an elementary calculus problem. By taking its derivative, we see when \(u=\frac{1}{pq}<1-\frac{1}{p}\) it attains its minimum \(\left(\frac{p}{p-1}\right)^p=q^p\). Hence we get \[ \int_0^\infty F(x)^pdx \leq q^p\int_0^\infty f(t)^pdt, \] which is exactly what we want. Note the constant \(q\) cannot be replaced with a smaller one. We simply proved the case when \(f \geq 0\). For the general case, one simply needs to take absolute value.

Integration by parts

This approach makes use of properties of \(L^p\) space. Still we assume that \(f \geq 0\) but we also assume \(f \in C_c((0,\infty))\), that is, \(f\) is continuous and has compact support. Hence \(F\) is differentiable in this situation. Integration by parts gives \[ \int_0^\infty F^p(x)dx=xF(x)^p\vert_0^\infty- p\int_0^\infty xdF^p = -p\int_0^\infty xF^{p-1}(x)F'(x)dx. \] Note since \(f\) has compact support, there are some \([a,b]\) such that \(f >0\) only if \(0 < a \leq x \leq b < \infty\) and hence \(xF(x)^p\vert_0^\infty=0\). Next it is natural to take a look at \(F'(x)\). Note we have \[ F'(x) = \frac{f(x)}{x}-\frac{\int_0^x f(t)dt}{x^2}, \] hence \(xF'(x)=f(x)-F(x)\). A substitution gives us \[ \int_0^\infty F^p(x)dx = -p\int_0^\infty F^{p-1}(x)[f(x)-F(x)]dx, \] which is equivalent to say \[ \int_0^\infty F^p(x)dx = \frac{p}{p-1}\int_0^\infty F^{p-1}(x)f(x)dx. \] Hölder's inequality gives us \[ \begin{aligned} \int_0^\infty F^{p-1}(x)f(x)dx &\leq \left[\int_0^\infty F^{(p-1)q}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty f(x)^pdx\right]^{\frac{1}{p}} \\ &=\left[\int_0^\infty F^{p}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty f(x)^pdx\right]^{\frac{1}{p}}. \end{aligned} \] Together with the identity above we get \[ \int_0^\infty F^p(x)dx = q\left[\int_0^\infty F^{p}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty f(x)^pdx\right]^{\frac{1}{p}} \] which is exactly what we want since \(1-\frac{1}{q}=\frac{1}{p}\) and all we need to do is divide \(\left[\int_0^\infty F^pdx\right]^{1/q}\) on both sides. So what's next? Note \(C_c((0,\infty))\) is dense in \(L^p((0,\infty))\). For any \(f \in L^p((0,\infty))\), we can take a sequence of functions \(f_n \in C_c((0,\infty))\) such that \(f_n \to f\) with respect to \(L^p\)-norm. Taking \(F=\frac{1}{x}\int_0^x f(t)dt\) and \(F_n = \frac{1}{x}\int_0^x f_n(t)dt\), we need to show that \(F_n \to F\) pointwise, so that we can use Fatou's lemma. For \(\varepsilon>0\), there exists some \(m\) such that \(\lrVert[f_n-f]_p < \frac{1}{n}\). Thus \[ \begin{aligned} |F_n(x)-F(x)| &= \frac{1}{x}\left\vert \int_0^x f_n(t)dt - \int_0^x f(t)dt \right\vert \\ &\leq \frac{1}{x} \int_0^x |f_n(t)-f(t)|dt \\ &\leq \frac{1}{x} \left[\int_0^x|f_n(t)-f(t)|^pdt\right]^{\frac{1}{p}}\left[\int_0^x 1^qdt\right]^{\frac{1}{q}} \\ &=\frac{1}{x^{1/p}}\left[\int_0^x|f_n(t)-f(t)|^pdt\right]^{\frac{1}{p}} \\ &\leq \frac{1}{x^{1/p}}\lrVert[f_n-f]_p <\frac{\varepsilon}{x^{1/p}}. \end{aligned} \] Hence \(F_n \to F\) pointwise, which also implies that \(|F_n|^p \to |F|^p\) pointwise. For \(|F_n|\) we have \[ \begin{aligned} \int_0^\infty |F_n(x)|^pdx &= \int_0^\infty \left\vert\frac{1}{x}\int_0^x f_n(t)dt\right\vert^p dx \\ &\leq \int_0^\infty \left[\frac{1}{x}\int_0^x |f_n(t)|dt\right]^{p}dx \\ &\leq q\int_0^\infty |f_n(t)|^pdt \end{aligned} \] note the third inequality follows since we have already proved it for \(f \geq 0\). By Fatou's lemma, we have \[ \begin{aligned} \int_0^\infty |F(x)|^pdx &= \int_0^\infty \lim_{n \to \infty}|F_n(x)|^pdx \\ &\leq \lim_{n \to \infty} \int_0^\infty |F_n(x)|^pdx \\ &\leq \lim_{n \to \infty}q^p\int_0^\infty |f_n(x)|^pdx \\ &=q^p\int_0^\infty |f(x)|^pdx. \end{aligned} \]

Tensor Product as a Universal Object (Category Theory & Module Theory)


It is quite often to see direct sum or direct product of groups, modules, vector spaces. Indeed, for modules over a ring \(R\), direct products are also direct products of \(R\)-modules as well. On the other hand, the direct sum is a coproduct in the category of \(R\)-modules.

But what about tensor products? It is some different kind of product but how? Is it related to direct product? How do we write a tensor product down? We need to solve this question but it is not a good idea to dig into numeric works.

The category of bilinear or even \(n\)-multilinear maps

From now on, let \(R\) be a commutative ring, and \(M_1,\cdots,M_n\) are \(R\)-modules. Mainly we work on \(M_1\) and \(M_2\), i.e. \(M_1 \times M_2\) and \(M_1 \otimes M_2\). For \(n\)-multilinear one, simply replace \(M_1\times M_2\) with \(M_1 \times M_2 \times \cdots \times M_n\) and \(M_1 \otimes M_2\) with \(M_1 \otimes \cdots \otimes M_n\). The only difference is the change of symbols.

The bilinear maps of \(M_1 \times M_2\) determines a category, say \(BL(M_1 \times M_2)\) or we simply write \(BL\). For an object \((f,E)\) in this category we have \(f: M_1 \times M_2 \to E\) as a bilinear map and \(E\) as a \(R\)-module of course. For two objects \((f,E)\) and \((g,F)\), we define the morphism between them as a linear function making the following diagram commutative: \(\def\mor{\operatorname{Mor}}\)


This indeed makes \(BL\) a category. If we define the morphisms from \((f,E)\) to \((g,F)\) by \(\mor(f,g)\) (for simplicity we omit \(E\) and \(F\) since they are already determined by \(f\) and \(g\)) we see the composition \[ \mor(f,g) \times \mor(h,g) \to \mor(h,f) \] satisfy all axioms for a category:

CAT 1 Two sets \(\mor(f,g)\) and \(\mor(f',g')\) are disjoint unless \(f=f'\) and \(g=g'\), in which case they are equal. If \(g \neq g'\) but \(f = f'\) for example, for any \(h \in \mor(f,g)\), we have \(g = h \circ f = h \circ f' \neq g'\), hence \(h \notin \mor(f,g)\). Other cases can be verified in the same fashion.

CAT 2 The existence of identity morphism. For any \((f,E) \in BL\), we simply take the identity map \(i:E \to E\). For \(h \in \mor(f,g)\), we see \(g = h \circ f = h \circ i \circ f\). For \(h' \in \mor(g,f)\), we see \(f = h' \circ g = i \circ h' \circ g\).

CAT 3 The law of composition is associative when defined.

There we have a category. But what about the tensor product? It is defined to be initial (or universally repelling) object in this category. Let's denote this object by \((\varphi,M_1 \otimes M_2)\).

For any \((f,E) \in BL\), we have a unique morphism (which is a module homomorphism as well) \(h:(\varphi,M_1 \otimes M_2) \to (f,E)\). For \(x \in M_1\) and \(y \in M_2\), we write \(\varphi(x,y)=x \otimes y\). We call the existence of \(h\) the universal property of \((\varphi,M_1 \otimes M_2)\).

The tensor product is unique up to isomorphism. That is, if both \((f,E)\) and \((g,F)\) are tensor products, then \(E \simeq F\) in the sense of module isomorphism. Indeed, let \(h \in \mor(f,g)\) and \(h' \in \mor(g,h)\) be the unique morphisms respectively, we see \(g = h \circ f\), \(f = h' \circ g\), and therefore \[ g = h \circ h' \circ g \\ f = h' \circ h \circ f \] Hence \(h \circ h'\) is the identity of \((g,F)\) and \(h' \circ h\) is the identity of \((f,E)\). This gives \(E \simeq F\).

What do we get so far? For any modules that is connected to \(M_1 \times M_2\) with a bilinear map, the tensor product \(M_1 \oplus M_2\) of \(M_1\) and \(M_2\), is always able to be connected to that module with a unique module homomorphism. What if there are more than one tensor products? Never mind. All tensor products are isomorphic.

But wait, does this definition make sense? Does this product even exist? How can we study the tensor product of two modules if we cannot even write it down? So far we are only working on arrows, and we don't know what is happening inside an module. It is not a good idea to waste our time on 'nonsenses'. We can look into it in an natural way. Indeed, if we can find a module satisfying the property we want, then we are done, since this can represent the tensor product under any circumstances. Again, all tensor products of \(M_1\) and \(M_2\) are isomorphic.

A natural way to define the tensor product

Let \(M\) be the free module generated by the set of all tuples \((x_1,x_2)\) where \(x_1 \in M_1\) and \(x_2 \in M_2\), and \(N\) be the submodule generated by tuples of the following types: \[ (x_1+x_1',x_2)-(x_1,x_2)-(x_1',x_2) \\ (x_1,x_2+x_2')-(x_1,x_2)-(x_1,x_2') \\ (ax_1,x_2)-a(x_1,x_2) \\ (x_1,ax_2) - a(x_1,x_2) \] First we have a inclusion map \(\alpha=M_1 \times M_2 \to M\) and the canonical map \(\pi:M \to M/N\). We claim that \((\pi \circ \alpha, M/N)\) is exactly what we want. But before that, we need to explain why we define such a \(N\).

The reason is quite simple: We want to make sure that \(\varphi=\pi \circ \alpha\) is bilinear. For example, we have \(\varphi(x_1+x_1',x_2)=\varphi(x_1,x_2)+\varphi(x_1',x_2)\) due to our construction of \(N\) (other relations follow in the same manner). This can be verified group-theoretically. Note \[ \varphi(x_1+x_1',x_2)=(x_1+x_1',x_2)+N \\ \varphi(x_1,x_2)+\varphi(x_1',x_2)=(x_1,x_2)+(x_1',x_2)+N \] but \[ \varphi(x_1+x_1',x_2)-\varphi(x_1,x_2)-\varphi(x_1',x_2)=(x_1+x_1',x_2)-(x_1,x_2)-(x_1',x_2) +N = 0+N. \] Hence we get the identity we want. For this reason we can write \[ \begin{aligned} (x_1+x_1')\otimes x_2 &= x_1 \otimes x_2 + x_1' \otimes x_2, \\ x_1 \otimes (x_2 + x_2') &= x_1 \otimes x_2 + x_1 \otimes x_2', \\ (ax_1) \otimes x_2 &= a(x_1 \otimes x_2), \\ x_1 \otimes (ax_2) &= a(x_1 \otimes x_2). \end{aligned} \] Sometimes to avoid confusion people may also write \(x_1 \otimes_R x_2\) if both \(M_1\) and \(M_2\) are \(R\)-modules. But before that we have to verify that this is indeed the tensor product. To verify this, all we need is the universal property of free modules.


By the universal property of \(M\), for any \((f,E) \in BL\), we have a induced map \(f_\ast\) making the diagram inside commutative. However, for elements in \(N\), we see \(f_\ast\) takes value \(0\), since \(f_\ast\) is a bilinear map already. We finish our work by taking \(h[(x,y)+N] = f_\ast(x,y)\). This is the map induced by \(f_\ast\), following the property of factor module.

Trivial tensor product

For coprime integers \(m,n>1\), we have \(\def\mb{\mathbb}\) \[ \mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} = O \] where \(O\) means that the module only contains \(0\) and \(\mb{Z}/m\mb{Z}\) is considered as a module over \(\mb{Z}\) for \(m>1\). This suggests that, the tensor product of two modules is not necessarily 'bigger' than its components. Let's see why this is trivial.

Note that for \(x \in \mb{Z}/m\mb{Z}\) and \(y \in \mb{Z}/n\mb{Z}\), we have \[ m(x \otimes y) = (mx) \otimes y = 0 \\ n(x \otimes y) = x \otimes(ny) = 0 \] since, for example, \(mx = 0\) for \(x \in \mb{Z}/m\mb{Z}\) and \(\varphi(0,y)=0\). If you have trouble understanding why \(\varphi(0,y)=0\), just note that the submodule \(N\) in our construction contains elements generated by \((0x,y)-0(x,y)\) already.

By Bézout's identity, for any \(x \otimes y\), we see there are \(a\) and \(b\) such that \(am+bn=1\), and therefore \[ \begin{aligned} x \otimes y &= (am+bn)(x \otimes y) \\ &=am(x \otimes y)+bn (x \otimes y) \\ &= 0. \end{aligned} \] Hence the tensor product is trivial. This example gives us a lot of inspiration. For example, what if \(m\) and \(n\) are not necessarily coprime, say \(\gcd(m,n)=d\)? By Bézout's identity still we have \[ d(x \otimes y) = (am+bn)(x \otimes y) = 0. \] This inspires us to study the connection between \(\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z}\) and \(\mb{Z}/d\mb{Z}\). By the universal property, for the bilinear map \(f:\mb{Z}/m\mb{Z} \times \mb{Z}/n\mb{Z} \to \mb{Z}/d\mb{Z}\) defined by \[ (a+m\mb{Z},b+n\mb{Z})\mapsto ab+d\mb{Z} \] (there should be no difficulty to verify that \(f\) is well-defined), there exists a unique morphism \(h:\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} \to \mb{Z}/d\mb{Z}\) such that \[ h \circ \varphi(a+m\mb{Z},b+n\mb{Z}) = h((a+m\mb{Z}) \otimes(b+n\mb{Z})) = ab+d\mb{Z}. \] Next we show that it has a natural inverse defined by \[ \begin{aligned} g:\mb{Z}/d\mb{Z} &\to \mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} \\ a+d\mb{Z} &\mapsto (a+m\mb{Z}) \otimes (1+n\mb{Z}). \end{aligned} \] Taking \(a' = a+kd\), we show that \(g(a+d\mb{Z})=g(a'+\mb{Z})\), that is, we need to show that \[ (a+m\mb{Z})\otimes(1+n\mb{Z}) = (a'+m\mb{Z}) \otimes (1+n\mb{Z}). \] By Bézout's identity, there exists some \(r,s\) such that \(rm+sn=d\). Hence \(a' = a + ksn+krm\), which gives \[ \begin{aligned} (a'+m\mb{Z}) \otimes (1+n\mb{Z}) &= (a+ksn+krm+m\mb{Z}) \otimes(1+n\mb{Z}) \\ &= (a+ksn+m\mb{Z}) \otimes (1+n\mb{Z}) \\ &=(a+m\mb{Z}) \otimes(1+n\mb{Z}) + (ksn+m\mb{Z})\otimes(1+n\mb{Z}) \\ &=(a+m\mb{Z}) \otimes (1+n\mb{Z}) \end{aligned} \] since \[ (ksn+m\mb{Z}) \otimes (1+n\mb{Z}) =n(ks+m\mb{Z}) \otimes (1+n\mb{Z}) = (ks+m\mb{Z}) \otimes(n+n\mb{Z}) = 0. \] So \(g\) is well-defined. Next we show that this is the inverse. Firstly \[ \begin{aligned} g \circ h((a+m\mb{Z}) \otimes(b+n\mb{Z})) &= g(ab+d\mb{Z})\\ &= (ab+m\mb{Z}) \otimes (1+n\mb{Z}) \\ &=b(a+m\mb{Z}) \otimes(1+n\mb{Z}) \\ &= (a+m\mb{Z}) \otimes (b+n\mb{Z}). \end{aligned} \] Secondly, \[ \begin{aligned} h \circ g(a+d\mb{Z}) &= h((a+m\mb{Z}) \otimes(1+n\mb{Z})) \\ &= a+d\mb{Z}. \end{aligned} \] Hence \(g = h^{-1}\) and we can say \[ \mb{Z}/m\mb{Z} \otimes \mb{Z} /n\mb{Z} \simeq \mb{Z} /\gcd(m,n)\mb{Z}. \] If \(m,n\) are coprime, then \(\gcd(m,n)=1\), hence \(\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} \simeq \mb{Z}/\mb{Z}\) is trivial. More interestingly, \(\mb{Z}/m\mb{Z}\otimes \mb{Z}/m\mb{Z}=\mb{Z}/m\mb{Z}\). But this elegant identity raised other questions. First of all, \(\gcd(m,n)=\gcd(n,m)\), which implies \[ \mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} \simeq \mb{Z}/\gcd(m,n)\mb{Z} \simeq \mb{Z}/\gcd(n,m)\mb{Z} \simeq\mb{Z}/n\mb{Z}\otimes\mb{Z}/m\mb{Z}. \] Further, for \(m,n,r >1\), we have \(\gcd(\gcd(m,n),r)=\gcd(m,\gcd(n,r))=\gcd(m,n,r)\), which gives \[ (\mb{Z}/m\mb{Z}\otimes\mb{Z}/n\mb{Z})\otimes\mb{Z}/r\mb{Z} \simeq \mb{Z}/\gcd(m,n)\mb{Z}\otimes\mb{Z}/r\mb{Z} \simeq \mb{Z}/\gcd(m,n,r)\mb{Z} \\ \mb{Z}/m\mb{Z}\otimes(\mb{Z}/n\mb{Z} \otimes\mb{Z}/r\mb{Z}) \simeq \mb{Z}/m\mb{Z} \otimes\mb{Z}/\gcd(n,r)\mb{Z} \simeq \mb{Z}/\gcd(m,n,r)\mb{Z} \] hence \[ (\mb{Z}/m\mb{Z}\otimes\mb{Z}/n\mb{Z})\otimes\mb{Z}/r\mb{Z} \simeq \mb{Z}/m\mb{Z}\otimes(\mb{Z}/n\mb{Z}\otimes\mb{Z}/r\mb{Z}). \] Hence for modules of the form \(\mb{Z}/m\mb{Z}\), we see the tensor product operation is associative and commutative up to isomorphism. Does this hold for all modules? The universal property answers this question affirmatively. From now on we will be keep using the universal property. Make sure that you have got the point already.

Tensor product as a binary operation

Let \(M_1,M_2,M_3\) be \(R\)-modules, then there exists a unique isomorphism \[ \begin{aligned} (M_1 \otimes M_2) \otimes M_3 &\xrightarrow{\simeq} M_1 \otimes (M_2 \otimes M_3) \\ (x \otimes y) \otimes z &\mapsto x \otimes(y \otimes z) \end{aligned} \] for \(x \in M_1\), \(y \in M_2\), \(z \in M_3\).

Proof. Consider the map \[ \begin{aligned} \lambda_x:M_2 \times M_3 &\to (M_1 \otimes M_2)\otimes M_3 \\ (y,z) &\mapsto (x \otimes y ) \otimes z \end{aligned} \] where \(x \in M_1\). Since \((\cdot\otimes\cdot)\) is bilinear, we see \(\lambda_x\) is bilinear for all \(x \in M_1\). Hence by the universal property there exists a unique map of the tensor product: \[ \overline{\lambda}_x:M_2 \otimes M_3 \to (M_1 \otimes M_2) \otimes M_3. \] Next we have the map \[ \begin{aligned} \mu_x: M_1 \times (M_2 \otimes M_3) &\to (M_1 \otimes M_2) \otimes M_3 \\ (x,y \otimes z) &\mapsto \overline{\lambda}_x(y \otimes z) \end{aligned} \] which is bilinear as well. Again by the universal property we have a unique map \[ \overline{\mu}_x: M_1 \otimes (M_2 \otimes M_3) \to (M_1 \otimes M_2) \otimes M_3. \] This is indeed the isomorphism we want. The reverse is obtained by reversing the process. For the bilinear map \[ \lambda_x':M_1 \times M_2 \to M_1 \otimes (M_2 \otimes M_3) \] we get a unique map \[ \overline{\lambda'}_x: M_1 \otimes M_2 \to M_1 \otimes (M_2 \otimes M_3). \] Then from the bilinear map \[ \mu'_x:(M_1 \otimes M_2) \times M_3 \to M_1 \otimes (M_2 \otimes M_3) \] we get the unique map, which is actually the reverse of \(\overline{\mu}_x\): \[ \overline{\mu'}_x:(M_1 \otimes M_2) \otimes M_3 \to M_1 \otimes (M_2 \otimes M_3). \] Hence the two tensor products are isomorphic. \(\square\)

Let \(M_1\) and \(M_2\) be \(R\)-modules, then there exists a unique isomorphism \[ \begin{aligned} M_1 \otimes M_2 &\xrightarrow{\simeq} M_2 \otimes M_1 \\ x_1 \otimes x_2 &\mapsto x_2 \otimes x_1 \end{aligned} \] where \(x_1 \in M_1\) and \(x_2 \in M_2\).

Proof. The map \[ \begin{aligned} \lambda:M_1 \times M_2 &\to M_2 \otimes M_1 \\ (x,y) &\mapsto y \otimes x \end{aligned} \] is bilinear and gives us a unique map \[ \overline{\lambda}:M_1 \otimes M_2 \to M_2 \otimes M_1 \] given by \(x \otimes y \mapsto y \otimes x\). Symmetrically, the map \(\lambda':M_2 \times M_1 \to M_1 \otimes M_2\) gives us a unique map \[ \overline{\lambda'}:M_2 \otimes M_1 \to M_1 \otimes M_2 \] which is the inverse of \(\overline{\lambda}\). \(\square\)

Therefore, we may view the set of all \(R\)-modules as a commutative semigroup with the binary operation \(\otimes\).

Maps between tensor products

Consider commutative diagram:


Where \(f_i:M_i \to M_i'\) are some module-homomorphism. What do we want here? On the left hand, we see \(f_1 \times f_2\) sends \((x_1,x_2)\) to \((f_1(x_1),f_2(x_2))\), which is quite natural. The question is, is there a natural map sending \(x_1 \otimes x_2\) to \(f_1(x_1) \otimes f_2(x_2)\)? This is what we want from the right hand. We know \(T(f_1 \times f_2)\) exists, since we have a bilinear map by \(\mu = \varphi' \circ (f_1\times f_2)\). So for \((x_1,x_2) \in M_1 \times M_2\), we have \(T(f_1 \times f_2)(x_1 \otimes x_2) = \varphi' \circ (f_1 \times f_2)(x_1,x_2) = f_1(x_1) \otimes f_2(x_2)\) as what we want.

But \(T\) in this graph has more interesting properties. First of all, if \(M_1 = M_1'\) an \(M_2 = M_2'\), both \(f_1\) and \(f_2\) are identity maps, then we see \(T(f_1 \times f_2)\) is the identity as well. Next, consider the following chain \[ \cdots \to M_1 \times M_2 \xrightarrow{(f_1 \times f_2)}M_1' \times M_2' \xrightarrow{(g_1 \times g_2)}M_1'' \times M_2''\to \cdots. \] We can make it a double chain:


It is obvious that \((g_1 \circ f_1 \times g_2 \circ f_2)=(g_1 \times g_2) \circ (f_1 \times f_2)\), which also gives \[ T(g_1 \times g_2) \circ T(f_1 \times f_2) = T(g_1 \circ f_1 \times g_2 \circ f_2). \] Hence we can say \(T\) is functorial. Sometimes for simplicity we also write \(T(f_1,f_2)\) or simply \(f_1 \otimes f_2\), as it sends \(x_1 \otimes x_2\) to \(f_1(x_1) \otimes f_2(x_2)\). Indeed it can be viewed as a map \[ \begin{aligned} T:L(M_1, M_1') \times L(M_2,M_2') &\to L(M_1 \otimes M_2, M_1' \otimes M_2') \\ (f_1 \times f_2) &\mapsto f_1 \otimes f_2. \end{aligned} \]

Why Does a Vector Space Have a Basis (Module Theory)

Module and vector space

First we recall some backgrounds. Suppose \(A\) is a ring with multiplicative identity \(1_A\). A left module of \(A\) is an additive abelian group \((M,+)\), together with an ring operation \(A \times M \to M\) such that \[ \begin{aligned} (a+b)x &= ax+bx \\ a(x+y) &= ax+ay \\ a(bx) &= (ab)x \\ 1_Ax &= x \end{aligned} \] for \(x,y \in M\) and \(a,b \in A\). As a corollary, we see \((0_A+0_A)x=0_Ax=0_Ax+0_Ax\), which shows \(0_Ax=0_M\) for all \(x \in M\). On the other hand, \(a(x-x)=0_M\) which implies \(a(-x)=-(ax)\). We can also define right \(A\)-modules but we are not discussing them here.

Let \(S\) be a subset of \(M\). We say \(S\) is a basis of \(M\) if \(S\) generates \(M\) and \(S\) is linearly independent. That is, for all \(m \in M\), we can pick \(s_1,\cdots,s_n \in S\) and \(a_1,\cdots,a_n \in A\) such that \[ m = a_1s_1+a_2s_2+\cdots+a_ns_n, \] and, for any \(s_1,\cdots,s_n \in S\), we have \[ a_1s_1+a_2s_2+\cdots+a_ns_n=0_M \implies a_1=a_2=\cdots=a_n=0_A. \] Note this also shows that \(0_M\notin S\) (what happens if \(0_M \in S\)?). We say \(M\) is free if it has a basis. The case when \(M\) or \(A\) is trivial is excluded.

If \(A\) is a field, then \(M\) is called a vector space, which has no difference from the one we learn in linear algebra and functional analysis. Mathematicians in functional analysis may be interested in the cardinality of a vector space, for example, when a vector space is of finite dimension, or when the basis is countable. But the basis does not come from nowhere. In fact we can prove that vector spaces have basis, but modules are not so lucky. \(\def\mb{\mathbb}\)

Examples of non-free modules

First of all let's consider the cyclic group \(\mb{Z}/n\mb{Z}\) for \(n \geq 2\). If we define \[ \begin{aligned} \mb{Z} \times \mb{Z}/n\mb{Z} &\to \mb{Z}/n\mb{Z} \\ (m,k+n\mb{Z}) &\mapsto mk+n\mb{Z} \end{aligned} \] which is actually \(m\) copies of an element, then we get a module, which will be denoted by \(M\). For any \(x=k+n\mb{Z} \in M\), we see \(nk+n\mb{Z}=0_M\). Therefore for any subset \(S \subset M\), if \(x_1,\cdots,x_k \in M\), we have \[ nx_1+nx_2+\cdots+nx_k = 0_M, \] which gives the fact that \(M\) has no basis. In fact this can be generalized further. If \(A\) is a ring but not a field, let \(I\) be a nontrivial proper ideal, then \(A/I\) is a module that has no basis.

Following \(\mb{Z}/n\mb{Z}\) we also have another example on finite order. Indeed, any finite abelian group is not free as a module over \(\mb{Z}\). More generally,

Let \(G\) be a abelian group, and \(G_{tor}\) be its torsion subgroup. If \(G_{tor}\) is non-trival, then \(G\) cannot be a free module over \(\mb{Z}\).

Next we shall take a look at infinite rings. Let \(F[X]\) be the polynomial ring over a field \(F\) and \(F'[X]\) be the polynomial sub-ring that have coefficient of \(X\) equal to \(0\). Then \(F[X]\) is a \(F'[X]\)-module. However it is not free.

Suppose we have a basis \(S\) of \(F[X]\), then we claim that \(|S|>1\). If \(|S|=1\), say \(P \in S\), then \(P\) cannot generate \(F[X]\) since if \(P\) is constant then we cannot generate a polynomial contains \(X\) with power \(1\); If \(P\) is not constant, then the constant polynomial cannot be generate. Hence \(S\) contains at least two polynomials, say \(P_1 \neq 0\) and \(P_2 \neq 0\). However, note \(-X^2P_1 \in F'[X]\) and \(X^2P_2 \in F'[X]\), which gives \[ (X^2P_2)P_1-(X^2P_1)P_2=0. \] Hence \(S\) cannot be a basis.

Why does a vector space have a basis

I hope those examples have convinced you that basis is not a universal thing. We are going to prove that every vector space has a basis. More precisely,

Let \(V\) be a nontrivial vector space over a field \(K\). Let \(\Gamma\) be a set of generators of \(V\) over \(K\) and \(S \subset \Gamma\) is a subset which is linearly independent, then there exists a basis of \(V\) such that \(S \subset B \subset \Gamma\).

Note we can always find such \(\Gamma\) and \(S\). For the extreme condition, we can pick \(\Gamma=V\) and \(S\) be a set containing any single non-zero element of \(V\). Note this also gives that we can generate a basis by expanding any linearly independent set. The proof relies on a fact that every non-zero element in a field is invertible, and also, Zorn's lemma. In fact, axiom of choice is equivalent to the statement that every vector has a set of basis. The converse can be found here. \(\def\mfk{\mathfrak}\)

Proof. Define \[ \mfk{T} =\{T \subset \Gamma:S \subset T, \text{ $T$ is linearly independent}\}. \] Then \(\mfk{T}\) is not empty since it contains \(S\). If \(T_1 \subset T_2 \subset \cdots\) is a totally ordered chain in \(\mfk{T}\), then \(T=\bigcup_{i=1}^{\infty}T_i\) is again linearly independent and contains \(S\). To show that \(T\) is linearly independent, note that if \(x_1,x_2,\cdots,x_n \in T\), we can find some \(k_1,\cdots,k_n\) such that \(x_i \in T_{k_i}\) for \(i=1,2,\cdots,n\). If we pick \(k = \max(k_1,\cdots,k_n)\), then \[ x_1,x_2,\cdots,x_n \in \bigcup_{i=1}^{n}T_{k_i}=T_k. \] But we already know that \(T_k\) is linearly independent, so \(a_1x_1+\cdots+a_nx_n=0_V\) implies \(a_1=\cdots=a_n=0_K\).

By Zorn's lemma, let \(B\) be the maximal element of \(\mfk{T}\), then \(B\) is also linearly independent since it is an element of \(\mfk{T}\). Next we show that \(B\) generates \(V\). Suppose not, then we can pick some \(x \in \Gamma\) that is not generated by \(B\). Define \(B'=B \cup \{x\}\), we see \(B'\) is linearly independent as well, because if we pick \(y_1,y_2,\cdots,y_n \in B\), and if \[ \sum_{k=1}^{n}a_ky_k+bx=0_V, \] then if \(b \neq 0\) we have \[ x = -\sum_{k=1}^{n}b^{-1}a_ky_k \in B, \] contradicting the assumption that \(x\) is not generated by \(B\). Hence \(b=0_K\). However, we have proved that \(B'\) is a linearly independent set containing \(B\) and contained in \(S\), contradicting the maximality of \(B\) in \(\mfk{T}\). Hence \(B\) generates \(V\). \(\square\)

Rings of Fractions and Localisation

Is perhaps the most important technical tools in commutative algebra. In this post we are covering definitions and simple properties. Also we restrict ourselves into ring theories and no further than that. Throughout, we let \(A\) be a commutative ring. With extra effort we can also make it to non-commutative rings for some results but we are not doing that here.

In fact the construction of \(\mathbb{Q}\) from \(\mathbb{Z}\) has already been an example. For any \(a \in \mathbb{Q}\), we have some \(m,n \in \mathbb{Z}\) with \(n \neq 0\) such that \(a = \frac{m}{n}\). As a matter of notation we may also say an ordered pair \((m,n)\) determines \(a\). Two ordered pairs \((m,n)\) and \((m',n')\) are equivalent if and only if \[ mn'-m'n=0. \] But we are only using the ring structure of \(\mathbb{Z}\). So it is natural to think whether it is possible to generalize this process to all rings. But we are also using the fact that \(\mathbb{Z}\) is an entire ring (or alternatively integral domain, they mean the same thing). However there is a way to generalize it. \(\def\mfk{\mathfrak}\)

Multiplicatively closed subset

(Definition 1) A multiplicatively closed subset \(S \subset A\) is a set that \(1 \in S\) and if \(x,y \in S\), then \(xy \in S\).

For example, for \(\mathbb{Z}\) we have a multiplicatively closed subset \[ \{1,2,4,8,\cdots\} \subset \mathbb{Z}. \] We can also insert \(0\) here but it may produce some bad result. If \(S\) is also an ideal then we must have \(S=A\) so this is not very interesting. However the complement is interesting.

(Proposition 1) Suppose \(A\) is a commutative ring such that \(1 \neq 0\). Let \(S\) be a multiplicatively closed set that does not contain \(0\). Let \(\mfk{p}\) be the maximal element of ideals contained in \(A \setminus S\), then \(\mfk{p}\) is prime.

Proof. Recall that \(\mfk{p}\) is prime if for any \(x,y \in A\) such that \(xy \in \mfk{p}\), we have \(x \in \mfk{p}\) or \(y \in \mfk{p}\). But now we fix \(x,y \in \mfk{p}^c\). Note we have a strictly bigger ideal \(\mfk{q}_1=\mfk{p}+Ax\). Since \(\mfk{p}\) is maximal in the ideals contained in \(A \setminus S\), we see \[ \mfk{q}_1 \cap S \neq \varnothing. \] Therefore there exist some \(a \in A\) and \(p \in \mfk{p}\) such that \[ p+ax \in S. \] Also, \(\mfk{q}_2=\mfk{p}+Ay\) has nontrivial intersection with \(S\) (due to the maximality of \(\mfk{p}\)), there exist some \(a' \in A\) and \(p' \in \mfk{p}\) such that \[ p' + a'y \in S. \] Since \(S\) is closed under multiplication, we have \[ (p+ax)(p'+a'y) = pp'+p'ax+pa'y+aa'xy \in S. \] But since \(\mfk{p}\) is an ideal, we see \(pp'+p'ax+pa'y \in \mfk{p}\). Therefore we must have \(xy \notin \mfk{p}\) since if not, \((p+ax)(p'+a'y) \in \mfk{p}\), which gives \(\mfk{p} \cap S \neq \varnothing\), and this is impossible. \(\square\)

As a corollary, for an ideal \(\mfk{p} \subset A\), if \(A \setminus \mfk{p}\) is multiplicatively closed, then \(\mfk{p}\) is prime. Conversely, if we are given a prime ideal \(\mfk{p}\), then we also get a multiplicatively closed subset.

(Proposition 2) If \(\mfk{p}\) is a prime ideal of \(A\), then \(S = A \setminus \mfk{p}\) is multiplicatively closed.

Proof. First \(1 \in S\) since \(\mfk{p} \neq A\). On the other hand, if \(x,y \in S\) we see \(xy \in S\) since \(\mfk{p}\) is prime. \(\square\)

Ring of fractions of a ring

We define a equivalence relation on \(A \times S\) as follows: \[ (a,s) \sim (b,t) \iff \exists u \in S, (at-bs)u=0. \]

(Proposition 3) \(\sim\) is an equivalence relation.

Proof. Since \((as-as)1=0\) while \(1 \in S\), we see \((a,s) \sim (a,s)\). For being symmetric, note that \[ (at-bs)u=0 \implies (bs-at)u=0 \implies (b,t) \sim (a,s). \] Finally, to show that it is transitive, suppose \((a,s) \sim (b,t)\) and \((b,t) \sim (c,u)\). There exist \(u,v \in S\) such that \[ (at-bs)v=(bu-ct)w=0. \] This gives \(bsv=atv\) and \(buw = ctw\), which implies \[ bsvuw=atvuw=ctwsv \implies (au-cs)tvw =0. \] But \(tvw \in S\) since \(t,v,w \in S\) and \(S\) is multiplicatively closed. Hence \[ [(a,s) \sim (b,t)] \land [(b,t) \sim (c,u)] \implies (a,s) \sim (c,u). \] \(\square\)

Let \(a/s\) denote the equivalence class of \((a,s)\). Let \(S^{-1}A\) denote the set of equivalence classes (it is not a good idea to write \(A/S\) as it may coincide with the notation of factor group), and we put a ring structure on \(S^{-1}A\) as follows: \[ (a/s)+(b/t)=(at+bs)/st, \\ (a/s)(b/t)=ab/st. \] There is no difference between this one and the one in elementary algebra. But first of all we need to show that \(S^{-1}A\) indeed form a ring.

(Proposition 4) The addition and multiplication are well defined. Further, \(S^{-1}A\) is a commutative ring with identity.

Proof. Suppose \((a,s) \sim (a',s')\) and \((b,t) \sim (b',t')\) we need to show that \[ (a/s)+(b/t)=(a'/s')+(b'/t') \] or \[ (at+bs)/st = (a't'+b's')/s't'. \] There exists \(u,v \in S\) such that \[ (as'-a's)u=0 \quad (bt'-b't)v=0. \] If we multiply the first equation by \(vtt'\) and second equation by \(uss'\), we see \[ as'uvtt'-a'suvtt'+bt'vuss'-b'tvuss'=[(at)s't'+(bs)s't'-(a't')st-(b's')st]uv, \] which is exactly what we want.

On the other hand, we need to show that \[ ab/st = a'b'/s't'. \] That is, \[ \exists y \in S,(abs't'-a'b'st)y=0. \] Again, we have \[ (as'-a's)u=(as'-a's)uvbt'=(abs't'-a'bst')uv=0, \\ (bt'-b't)v=(bt'-b't)vua's=(a'bst'-a'b'st)uv=0. \] Hence \[ (abs't'-a'bst')uv+(a'bst'-a'b'st)uv=(abs't'-a'b'st)uv=0. \] Since \(uv \in S\), we are done.

Next we show that \(S^{-1}A\) has a ring structure. If \(0 \in S\), then \(S^{-1}A\) contains exactly one element \(0/1\) since in this case, all pairs are equivalent: \[ (at-bs)0=0. \] We therefore only discuss the case when \(0 \notin S\). First \(0/1\) is the zero element with respect to addition since \[ 0/1+a/s = (0s+1a)/1s = a/s. \] On the other hand, we have the inverse \(-a/s\): \[ -a/s+a/s = (-as+as)/ss=0/ss=0/1. \] \(1/1\) is the unit with respect to multiplication: \[ (1/1)(a/s)=1a/1s=a/s. \] Multiplication is associative since \[ [(a/s)(b/t)](c/u)=(ab/st)(c/u)=abc/stu. \\ (a/s)[(b/t)(c/u)]=(a/s)(bc/tu)=abc/stu. \] Multiplication is commutative since \[ ab/st+(-ba)/st=(abst-bast)/s^2t^2=0. \] Finally distributivity. \[ (a/s+b/t)(c/u)=(c/u)(a/s+b/t)=[(at+bs)/st](c/u)=(act+bcs)/stu \\ (a/s)(c/u)+(b/t)(c/u)=ac/su+bc/tu=(actu+bcsu)/stu^2=(act/bcs)/stu \] Note \(ab/cb=a/c\) since \((abc-abc)1=0\). \(\square\) \(\def\mb{\mathbb}\)

Cases and examples

First we consider the case when \(A\) is entire. If \(0 \in S\), then \(S^{-1}A\) is trivial, which is not so interesting. However, provided that \(0 \notin S\), we get some well-behaved result:

(Proposition 5) Let \(A\) be an entire ring, and let \(S\) be a multiplicatively closed subset of \(A\) that does not contain \(0\), then the natural map \[ \begin{aligned} \varphi_S: A &\to S^{-1}A \\ x &\mapsto x/1 \end{aligned} \] is injective. Therefore it can be considered as a natural inclusion. Further, every element of \(\varphi_S(S)\) is invertible.

Proof. Indeed, if \(x/1=0/1\), then there exists \(s \in S\) such that \(xs=0\). Since \(A\) is entire and \(s \neq 0\), we see \(x=0\), hence \(\varphi_S\) is entire. For \(s \in S\), we see \(\varphi_S(s)=s/1\). However \((1/s)\varphi_S(s)=(1/s)(s/1)=s/s=1\). \(\square\)

Note since \(A\) is entire we can also conclude that \(S^{-1}A\) is entire. As a word of warning, the ring homomorphism \(\varphi_S\) is not in general injective since, for example, when \(0 \in S\), this map is the zero.

If we go further, making \(S\) contain all non-zero element, we have:

(Proposition 6) If \(A\) is entire and \(S\) contains all non-zero elements of \(A\), then \(S^{-1}A\) is a field, called the quotient field or the field of fractions.

Proof. First we need to show that \(S^{-1}A\) is entire. Suppose \((a/s)(b/t)=ab/st =0/1\) but \(a/s \neq 0/1\), we see however \[ ab/st=0/1 \implies \exists u \in S, (ab-0)u=0 \implies ab=0. \] Since \(A\) is entire, \(b\) has to be \(0\), which implies \(b/t=0/1\). Second, if \(a/s \neq 0/1\), we see \(a \neq 0\) and therefore is in \(S\), hence we've found the inverse \((a/s)^{-1}=s/a\). \(\square\)

In this case we can identify \(A\) as a subset of \(S^{-1}A\) and write \(a/s=s^{-1}a\).

Let \(A\) be a commutative ring, an let \(S\) be the set of invertible elements of \(A\). If \(u \in S\), then there exists some \(v \in S\) such that \(uv=1\). We see \(1 \in S\) and if \(a,b \in S\), we have \(ab \in S\) since \(ab\) has an inverse as well. This set is frequently denoted by \(A^\ast\), and is called the group of invertible elements of \(A\). For example for \(\mb{Z}\) we see \(\mb{Z}^\ast\) consists of \(-1\) and \(1\). If \(A\) is a field, then \(A^\ast\) is the multiplicative group of non-zero elements of \(A\). For example \(\mb{Q}^\ast\) is the set of all rational numbers without \(0\). For \(A^\ast\) we have

If \(A\) is a field, then \((A^\ast)^{-1}A \simeq A\).

Proof. Define \[ \begin{aligned} \varphi_S:A &\to (A^\ast)^{-1}A \\ x &\mapsto x/1. \end{aligned} \] Then as we have already shown, \(\varphi_S\) is injective. Secondly we show that \(\varphi_S\) is surjective. For any \(a/s \in (A^\ast)^{-1}A\), we see \(as^{-1}/1 = a/s\). Therefore \(\varphi_S(as^{-1})=a/s\) as is shown. \(\square\)

Now let's see a concrete example. If \(A\) is entire, then the polynomial ring \(A[X]\) is entire. If \(K = S^{-1}A\) is the quotient field of \(A\), we can denote the quotient field of \(A[X]\) as \(K(X)\). Elements in \(K(X)\) can be naturally called rational polynomials, and can be written as \(f(X)/g(X)\) where \(f,g \in A[X]\). For \(b \in K\), we say a rational function \(f/g\) is defined at \(b\) if \(g(b) \neq 0\). Naturally this process can be generalized to polynomials of \(n\) variables.

Local ring and localization

We say a commutative ring \(A\) is local if it has a unique maximal ideal. Let \(\mfk{p}\) be a prime ideal of \(A\), and \(S = A \setminus \mfk{p}\), then \(A_{\mfk{p}}=S^{-1}A\) is called the local ring of \(A\) at \(\mfk{p}\). Alternatively, we say the process of passing from \(A\) to \(A_\mfk{p}\) is localization at \(\mfk{p}\). You will see it makes sense to call it localization:

(Proposition 7) \(A_\mfk{p}\) is local. Precisely, the unique maximal ideal is \[ I=\mfk{p}A_\mfk{p}=\{a/s:a \in \mfk{p},s \in S\}. \] Note \(I\) is indeed equal to \(\mfk{p}A_\mfk{p}\).

Proof. First we show that \(I\) is an ideal. For \(b/t \in A_\mfk{p}\) and \(a/s \in I\), we see \[ (b/t)(a/s)=ba/ts \in A_\mfk{p} \] since \(a \in \mfk{p}\) implies \(ba \in \mfk{p}\). Next we show that \(I\) is maximal, which is equivalent to show that \(A_\mfk{p}/I\) is a field. For \(b/t \notin I\), we have \(b \in S\), hence it is legit to write \(t/b\). This gives \[ (b/t+I)(t/b+I)=1/1+I. \] Hence we have found the inverse.

Finally we show that \(I\) is the unique maximal ideal. Let \(J\) be another maximal ideal. Suppose \(J \neq I\), then we can pick \(m/n \in J \setminus I\). This gives \(m \in S\) since if not \(m \in \mfk{p}\) and then \(m/n \in I\). But for \(n/m \in A_\mfk{p}\) we have \[ (m/n)(n/m)=1/1 \in J. \] This forces \(J\) to be \(A_\mfk{p}\) itself, contradicting the assumption that \(J\) is a maximal ideal. Hence \(I\) is unique. \(\square\)


Let \(p\) be a prime number, and we take \(A=\mb{Z}\) and \(\mfk{p}=p\mb{Z}\). We now try to determine what do \(A_\mfk{p}\) and \(\mfk{p}A_\mfk{p}\) look like. First \(S = A \setminus \mfk{p}\) is the set of all entire numbers prime to \(p\). Therefore \(A_\mfk{p}\) can be considered as the ring of all rational numbers \(m/n\) where \(n\) is prime to \(p\), and \(\mfk{p}A_\mfk{p}\) can be considered as the set of all rational numbers \(kp/n\) where \(k \in \mb{Z}\) and \(n\) is prime to \(p\).

\(\mb{Z}\) is the simplest example of ring and \(p\mb{Z}\) is the simplest example of prime ideal. And \(A_\mfk{p}\) in this case shows what does localization do: \(A\) is 'expanded' with respect to \(\mfk{p}\). Every member of \(A_\mfk{p}\) is related to \(\mfk{p}\), and the maximal ideal is determined by \(\mfk{p}\).

Let \(k\) be a infinite field. Let \(A=k[x_1,\cdots,x_n]\) where \(x_i\) are independent indeterminates, \(\mfk{p}\) a prime ideal in \(A\). Then \(A_\mfk{p}\) is the ring of all rational functions \(f/g\) where \(g \notin \mfk{p}\). We have already defined rational functions. But we can go further and demonstrate the prototype of the local rings which arise in algebraic geometry. Let \(V\) be the variety defined by \(\mfk{p}\), that is, \[ V=\{x=(x_1,x_2,\cdots,x_n) \in k^n:\forall f \in \mfk{p}, f(x)=0\}. \] Then what about \(A_\mfk{p}\)? We see since for \(f/g \in A_\mfk{p}\) we have \(g \notin \mfk{p}\), therefore for \(g(x)\) is not equal to \(0\) almost everywhere on \(V\). That is, \(A_\mfk{p}\) can be identified with the ring of all rational functions on \(k^n\) which are defined at almost all points of \(V\). We call this the local ring of \(k^n\) along the variety \(V\).

Universal property

Let \(A\) be a ring and \(S^{-1}A\) a ring of fractions, then we shall see that \(\varphi_S:S \to S^{-1}A\) has a universal property.

(Proposition 8) Let \(g:A \to B\) be a ring homomorphism such that \(g(s)\) is invertible in \(B\) for all \(s \in S\), then there exists a unique homomorphism \(h:S^{-1}A \to B\) such that \(g = h \circ \varphi_S\).

Proof. For \(a/s \in S^{-1}A\), define \(h(a/s)=g(a)g(s)^{-1}\). It looks immediate but we shall show that this is what we are looking for and is unique.

Firstly we need to show that it is well defined. Suppose \(a/s=a'/s'\), then there exists some \(u \in S\) such that \[ (as'-a's)u=0. \] Applying \(g\) on both side yields \[ (g(a)g(s')-g(a')g(s))g(u)=0. \] Since \(g(x)\) is invertible for all \(s \in S\), we therefore get \[ g(a)g(s)^{-1}=g(a')g(s')^{-1}. \] It is a homomorphism since \[ \begin{aligned} h[(a/s)(a'/s')]&=g(a)g(a')g(s)^{-1}g(s')^{-1} \\ h(a/s)h(a'/s')&=g(a)g(s)^{-1}g(a')g(s')^{-1}, \end{aligned} \] and \[ h(a/s+a'/s')=h((as'+a's)/ss')=g(as'+a's)g(ss')^{-1} \\ h(a/s)+h(a'/s')=g(a)g(s)^{-1}+g(a')g(s')^{-1} \] they are equal since \[ \begin{aligned} g(as'+a's)g(ss')^{-1}&=g(as')g(ss')^{-1}+g(a's)g(ss')^{-1} \\ &=g(a)g(s')g(s)^{-1}g(s')^{-1}+g(a')g(s)g(s)^{-1}g(s')^{-1} \\ &=g(a)g(s)^{-1}+g(a')g(s')^{-1}. \end{aligned} \] Next we show that \(g=h \circ \varphi_S\). For \(a \in A\), we have \[ h(\varphi_S(a))=h(a/1)=g(a)g(1)^{-1}=g(a). \] Finally we show that \(h\) is unique. Let \(h'\) be a homomorphism satisfying the condition, then for \(a \in A\) we have \[ h'(a/1)=h'(\varphi_S(a))=g(a). \] For \(s \in S\), we also have \[ h'(1/s)=h'((s/1)^{-1})=h'(\varphi_S(s)^{-1})=h'(\varphi_S(s))^{-1}=g(s)^{-1}. \] Since \(a/s = (a/1)(1/s)\) for all \(a/s \in S^{-1}A\), we get \[ h'(a/s)=h'((a/1)(1/s))=g(a)g(s)^{-1}. \] That is, \(h'\) (or \(h\)) is totally determined by \(g\). \(\square\)

Let's restate it in the language of category theory (you can skip it if you have no idea what it is now). Let \(\mfk{C}\) be the category whose objects are ring-homomorphisms \[ f:A \to B \] such that \(f(s)\) is invertible for all \(s \in S\). Then according to proposition 5, \(\varphi_S\) is an object of \(\mfk{C}\). For two objects \(f:A \to B\) and \(f':A \to B'\), a morphism \(g \in \operatorname{Mor}(f,f')\) is a homomorphism \[ g:B \to B' \] such that \(f'=g \circ f\). So here comes the question: what is the position of \(\varphi_S\)?

Let \(\mfk{A}\) be a category. an object \(P\) of \(\mfk{A}\) is called universally attracting if there exists a unique morphism of each object of \(\mfk{A}\) into \(P\), an is called universally repelling if for every object of \(\mfk{A}\) there exists a unique morphism of \(P\) into this object. Therefore we have the answer for \(\mfk{C}\).

(Proposition 9) \(\varphi_S\) is a universally repelling object in \(\mfk{C}\).

Principal and factorial ring

An ideal \(\mfk{o} \in A\) is said to be principal if there exists some \(a \in A\) such that \(Aa = \mfk{o}\). For example for \(\mb{Z}\), the ideal \[ \{\cdots,-2,0,2,4,\cdots\} \] is principal and we may write \(2\mb{Z}\). If every ideal of a commutative ring \(A\) is principal, we say \(A\) is principal. Further we say \(A\) is a PID if \(A\) is also an integral domain (entire). When it comes to ring of fractions, we also have the following proposition:

(Proposition 10) Let \(A\) be a principal ring and \(S\) a multiplicatively closed subset with \(0 \notin S\), then \(S^{-1}A\) is principal as well.

Proof. Let \(I \subset S^{-1}A\) be an ideal. If \(a \in S\) where \(a/s \in I\), then we are done since then \((s/a)(a/s) = 1/1 \in I\), which implies \(I=S^{-1}A\) itself, hence we shall assume \(a \notin S\) for all \(a/s \in I\). But for \(a/s \in I\) we also have \((a/s)(s/1)=a/1 \in I\). Therefore \(J=\varphi_S^{-1}(I)\) is not empty. \(J\) is an ideal of \(A\) since for \(a \in A\) and \(b \in J\), we have \(\varphi_S(ab) =ab/1=(a/1)(b/1) \in I\) which implies \(ab \in J\). But since \(A\) is principal, there exists some \(a\) such that \(Aa = J\). We shall discuss the relation between \(S^{-1}A(a/1)\) and \(I\). For any \((c/u)(a/1)=ca/u \in S^{-1}A(a/1)\), clearly we have \(ca/u \in I\), hence \(S^{-1}A(a/1)\subset I\). On the other hand, for \(c/u \in I\), we see \(c/1=(c/u)(u/1) \in I\), hence \(c \in J\), and there exists some \(b \in A\) such that \(c = ba\), which gives \(c/u=ba/u=(b/u)(a/1) \in I\). Hence \(I \subset S^{-1}A(a/1)\), and we have finally proved that \(I = S^{-1}A(a/1)\). \(\square\)

As an immediate corollary, if \(A_\mfk{p}\) is the localization of \(A\) at \(\mfk{p}\), and if \(A\) is principal, then \(A_\mfk{p}\) is principal as well. Next we go through another kind of rings. A ring is called factorial (or a unique factorization ring or UFD) if it is entire and if every non-zero element has a unique factorization into irreducible elements. An element \(a \neq 0\) is called irreducible if it is not a unit and whenever \(a=bc\), then either \(b\) or \(c\) is a unit. For all non-zero elements in a factorial ring, we have \[ a=u\prod_{i=1}^{r}p_i, \] where \(u\) is a unit (invertible).

In fact, every PID is a UFD (proof here). Irreducible elements in a factorial ring is called prime elements or simply prime (take \(\mathbb{Z}\) and prime numbers as an example). Indeed, if \(A\) is a factorial ring and \(p\) a prime element, then \(Ap\) is a prime ideal. But we are more interested in the ring of fractions of a factorial ring.

(Proposition 11) Let \(A\) be a factorial ring and \(S\) a multiplicatively closed subset with \(0 \notin S\), then \(S^{-1}A\) is factorial.

Proof. Pick \(a/s \in S^{-1}A\). Since \(A\) is factorial, we have \(a=up_1 \cdots p_k\) where \(p_i\) are primes and \(u\) is a unit. But we have no idea what are irreducible elements of \(S^{-1}A\). Naturally our first attack is \(p_i/1\). And we have no need to restrict ourselves to \(p_i\), we should work on all primes of \(A\). Suppose \(p\) is a prime of \(A\). If \(p \in S\), then \(p/1 \in S\) is a unit, not prime. If \(Ap \cap S \neq \varnothing\), then \(rp \in S\) for some \(r \in A\). But then \[ (p/1)(r/rp)=1, \] again \(p/1\) is a unit, not prime. Finally if \(Ap \cap S = \varnothing\), then \(p/1\) is prime in \(S^{-1}A\). For any \[ (a/s)(b/t)=ab/st=p/1, \] we see \(ab=stp \not\in S\). But this also gives \(ab \in Ap\) which is a prime ideal, hence we can assume \(a \in Ap\) and write \(a=rp\) for some \(r \in A\). With this expansion we get \[ ab=stp \implies rbp=stp \implies rb=st \implies (r/s)(b/t)=1/1. \] Hence \(b/t\) is a unit, \(p/1\) is a prime.

Conversely, suppose \(a/s\) is irreducible in \(S^{-1}A\). Since \(A\) is factorial, we may write \(a=u\prod_{i}p_i\). \(a\) cannot be an element of \(S\) since \(a/s\) is not a unit. We write \[ a/s=1/s[(u/1)(p_1/1)(p_2/1)\cdots(p_n/1)] \] We see there is some \(v \in A\) such that \(uv=1\) and accordingly \((u/1)(v/1)=uv/1=1/1\), hence \(u/1\) is a unit. We claim that there exist a unique \(p_k\) such that \(1 \leq k \leq n\) and \(Ap \cap S = \varnothing\). If not exists, then all \(p_j/1\) are units. If both \(p_{k}\) and \(p_{k'}\) satisfy the requirement and \(p_k \neq p_k'\), then we can write \(a/s\) as \[ a/s = \{1/s[(u/1)(p_1/1)\cdots(p_{k-1}/1)(p_{k+1}/1)\cdots(p_{k'-1}/1)(p_{k'+1}/1)\cdots(p_n/1)](p_k/1)\}(p_{k'}/1). \] Neither the one in curly bracket nor \(p_{k'}/1\) is unit, contradicting the fact that \(a/s\) is irreducible. Next we show that \(a/s=p_k/1\). For simplicity we write \[ b = u\prod_{i=1 \\ i \neq k}^{n} p_i, \quad a = bp_k. \] Note \(a/s = bp_k/s = (b/s)(p_k/1)\). Since \(a/s\) is irreducible, \(p_k/1\) is not a unit, we conclude that \(b/s\) is a unit. We are done for the study of irreducible elements of \(S^{-1}A\): it is of the form \(p/1\) (up to a unit) where \(p\) is prime in \(A\) and \(Ap \cap S = \varnothing\).

Now we are close to the fact that \(S^{-1}A\) is also factorial. For any \(a/s \in S^{-1}A\), we have an expansion \[ a/s=1/s[(u/1)(p_1/1)(p_2/1)\cdots(p_n/1)]. \] Let \(p'_1,p'_2,\cdots,p'_j\) be those whose generated prime ideal has nontrivial intersection with \(S\), then \(p'_1/1, p'_2/1,\cdots,p'_j/1\) are units of \(S^{-1}A\). Let \(q_1,q_2,\cdots,q_k\) be other \(p_i\)'s, then \(q_1/1,q_2/1,\cdots,q_k/1\) are irreducible in \(S^{-1}A\). This gives \[ a/s = [(1/s)(p'_1/1)(p'_2/1)\cdots(p'_j/1)]\prod_{i=1}^{k}(q_i/1). \] Hence \(S^{-1}A\) is factorial as well. \(\square\)

We finish the whole post by a comprehensive proposition:

(Proposition 12) Let \(A\) be a factorial ring and \(p\) a prime element, \(\mfk{p}=Ap\). The localization of \(A\) at \(\mfk{p}\) is principal.

Proof. For \(a/s \in S^{-1}A\), we see \(p\) does not divide \(s\) since if \(s = rp\) for some \(r \in A\), then \(s \in \mfk{p}\), contradicting the fact that \(S = A \setminus \mfk{p}\). Since \(A\) is factorial, we may write \(a = cp^n\) for some \(n \geq 0\) and \(p\) does not divide \(c\) as well (which gives \(c \in S\). Hence \(a/s = (c/s)(p^n/1)\). Note \((c/s)(s/c)=1/1\) and therefore \(c/s\) is a unit. For every \(a/s \in S^{-1}A\) we may write it as \[ a/s = u(p^n/1), \] where \(u\) is a unit of \(S^{-1}A\).

Let \(I\) be any ideal in \(S^{-1}A\), and \[ m = \min\{n:u(p^n/1) \in I, u \text{ is a unit }\}. \] Let's discuss the relation between \(S^{-1}A(p^m/1)\) and \(I\). First we see \(S^{-1}A(p^m/1)=S^{-1}A(up^m/1)\) since if \(v\) is the inverse of \(u\), we get \[ vS^{-1}A(up^m/1)=S^{-1}A(p^m/1) \subset S^{-1}A(up^m/1), \\ S^{-1}A(up^m/1)=uS^{-1}A(p^m/1)\subset S^{-1}A(p^m/1). \] Any element of \(S^{-1}A(up^m/1)\) is of the form \[ vup^{m+k}/1=v(p^k/1)up^m/1. \] Since \(up^m/1 \in I\), we see \(vup^{m+k}/1 \in I\) as well, hence \(S^{-1}A(up^m/1) \subset I\). On the other hand, any element of \(I\) is of the form \(wup^{m+n}/1=w(p^n/1)u(p^m/1)\) where \(w\) is a unit and \(n \geq 0\). This shows that \(vup^{m+n}/1 \in S^{-1}A(up^m/1)\). Hence \(S^{-1}A(p^m/1)=S^{-1}A(up^m/1)=I\) as we wanted. \(\square\)

The Grothendienck Group

Free group

Let \(A\) be an abelian group. Let \((e_i)_{i \in I}\) be a family of elements of \(A\). We say that this family is a basis for \(A\) if the family is not empty, and if every element of \(A\) has a unique expression as a linear expression \[ x = \sum_{i \in I} x_i e_i \] where \(x_i \in \mathbb{Z}\) and almost all \(x_i\) are equal to \(0\). This means that the sum is actually finite. An abelian group is said to be free if it has a basis. Alternatively, we may write \(A\) as a direct sum by \[ A \cong \bigoplus_{i \in I}\mathbb{Z}e_i. \]

Free abelian group generated by a set

Let \(S\) be a set. Say we want to get a group out of this for some reason, so how? It is not a good idea to endow \(S\) with a binary operation beforehead since overall \(S\) is merely a set. We shall generate a group out of \(S\) in the most freely way.

Let \(\mathbb{Z}\langle S \rangle\) be the set of all maps \(\varphi:S \to \mathbb{Z}\) such that, for only a finite number of \(x \in S\), we have \(\varphi(x) \neq 0\). For simplicity, we denote \(k \cdot x\) to be some \(\varphi_0 \in \mathbb{Z}\langle S \rangle\) such that \(\varphi_0(x)=k\) but \(\varphi_0(y) = 0\) if \(y \neq x\). For any \(\varphi\), we claim that \(\varphi\) has a unique expression \[ \varphi=k_1 \cdot x_1 + k_2 \cdot x_2 + \cdots + k_n \cdot x_n. \] One can consider these integers \(k_i\) as the order of \(x_i\), or simply the time that \(x_i\) appears (may be negative). For \(\varphi\in\mathbb{Z}\langle S \rangle\), let \(I=\{x_1,x_2,\cdots,x_n\}\) be the set of elements of \(S\) such that \(\varphi(x_i) \neq 0\). If we denote \(k_i=\varphi(x_i)\), we can show that \(\psi=k_1 \cdot x_1 + k_2 \cdot x_2 + \cdots + k_n \cdot x_n\) is equal to \(\varphi\). For \(x \in I\), we have \(\psi(x)=k\) for some \(k=k_i\neq 0\) by definition of the '\(\cdot\)'; if \(y \notin I\) however, we then have \(\psi(y)=0\). This coincides with \(\varphi\). \(\blacksquare\)

By definition the zero map \(\mathcal{O}=0 \cdot x \in \mathbb{Z}\langle S \rangle\) and therefore we may write any \(\varphi\) by \[ \varphi=\sum_{x \in S}k_x\cdot x \] where \(k_x \in \mathbb{Z}\) and can be zero. Suppose now we have two expressions, for example \[ \varphi=\sum_{x \in S}k_x \cdot x=\sum_{x \in S}k_x'\cdot x \] Then \[ \varphi-\varphi=\mathcal{O}=\sum_{x \in S}(k_x-k'_x)\cdot x \] Suppose \(k_y - k_y' \neq 0\) for some \(y \in S\), then \[ \mathcal{O}(y)=k_y-k_y'\neq 0 \] which is a contradiction. Therefore the expression is unique. \(\blacksquare\)

This \(\mathbb{Z}\langle S \rangle\) is what we are looking for. It is an additive group (which can be proved immediately) and, what is more important, every element can be expressed as a 'sum' associated with finite number of elements of \(S\). We shall write \(F_{ab}(S)=\mathbb{Z}\langle S \rangle\), and call it the free abelian group generated by \(S\). For elements in \(S\), we say they are free generators of \(F_{ab}(S)\). If \(S\) is a finite set, we say \(F_{ab}(S)\) is finitely generated.

An abelian group is free if and only if it is isomorphic to a free abelian group \(F_{ab}(S)\) for some set \(S\).

Proof. First we shall show that \(F_{ab}(S)\) is free. For \(x \in M\), we denote \(\varphi = 1 \cdot x\) by \([x]\). Then for any \(k \in \mathbb{Z}\), we have \(k[x]=k \cdot x\) and \(k[x]+k'[y] = k\cdot x + k' \cdot y\). By definition of \(F_{ab}(S)\), any element \(\varphi \in F_{ab}(S)\) has a unique expression \[ \varphi = k_1 \cdot x_1 + \cdots + k_n \cdot x_n =k_1[x_1]+\cdots+k_n[x_n] \] Therefore \(F_{ab}(S)\) is free since we have found the basis \(([x])_{x \in S}\).

Conversely, if \(A\) is free, then it is immediate that its basis \((e_i)_{i \in I}\) generates \(A\). Our statement is therefore proved. \(\blacksquare\)

The connection between an arbitrary abelian group an a free abelian group

(Proposition 1) If \(A\) is an abelian group, then there is a free group \(F\) which has a subgroup \(H\) such that \(A \cong F/H\).

Proof. Let \(S\) be any set containing \(A\). Then we get a surjective map \(\gamma: S \to A\) and a free group \(F_{ab}(S)\). We also get a unique homomorphism \(\gamma_\ast:F_{ab}(S) \to A\) by \[ \begin{aligned} \gamma_\ast:F_{ab}(S) &\to A \\ \varphi=\sum_{x \in S}k_x\cdot x &\mapsto \sum_{x \in S}k_x\gamma(x) \end{aligned} \] which is also surjective. By the first isomorphism theorem, if we set \(H=\ker(\gamma_\ast)\) and \(F_{ab}(S)=F\), then \[ F/H \cong A. \] \(\blacksquare\)

(Proposition 2) If \(A\) is finitely generated, then \(F\) can also be chosen to be finitely generated.

Proof. Let \(S\) be the generator of \(A\), and \(S'\) is a set containing \(S\). Note if \(S\) is finite, which means \(A\) is finitely generated, then \(S'\) can also be finite by inserting one or any finite number more of elements. We have a map from \(S\) and \(S'\) into \(F_{ab}(S)\) and \(F_{ab}(S')\) respectively by \(f_S(x)=1 \cdot x\) and \(f_{S'}(x')=1 \cdot x'\). Define \(g=f_{S'} \circ \lambda:S' \to F_{ab}(S)\) we get another homomorphism by \[ \begin{aligned} g_\ast:F_{ab}(S') &\to F_{ab}(S) \\ \varphi'=\sum_{x \in S'}k_{x} \cdot x &\mapsto \sum_{x \in S'}k_{x}\cdot g(x) \end{aligned} \] This defines a unique homomorphism such that \(g_\ast \circ f_{S'} = g\). As one can also verify, this map is also surjective. Therefore by the first isomorphism theorem we have \[ A \cong F_{ab}(S) \cong F_{ab}(S')/\ker(g_\ast) \] \(\blacksquare\)

It's worth mentioning separately that we have implicitly proved two statements with commutative diagrams:

(Proposition 3 | Universal property) If \(g:S \to B\) is a mapping of \(S\) into some abelian group \(B\), then we can define a unique group-homomorphism making the following diagram commutative:


(Proposition 4) If \(\lambda:S \to S\) is a mapping of sets, there is a unique homomorphism \(\overline{\lambda}\) making the following diagram commutative:


(In the proof of Proposition 2 we exchanged \(S\) an \(S'\).)

The Grothendieck group

(The Grothendieck group) Let \(M\) be a commutative monoid written additively. We shall prove that there exists a commutative group \(K(M)\) with a monoid homomorphism \[ \gamma:M \to K(M) \]

satisfying the following universal property: If \(f:M \to A\) is a homomorphism from \(M\) into a abelian group \(A\), then there exists a unique homomorphism \(f_\gamma:K(M) \to A\) such that \(f=f_\gamma\circ\gamma\). This can be represented by a commutative diagram:


Proof. There is a commutative diagram describes what we are doing.


Let \(F_{ab}(M)\) be the free abelian group generated by \(M\). For \(x \in M\), we denote \(1 \cdot x \in F_{ab}(M)\) by \([x]\). Let \(B\) be the group generated by all elements of the type \[ [x+y]-[x]-[y] \] where \(x,y \in M\). This can be considered as a subgroup of \(F_{ab}(M)\). We let \(K(M)=F_{ab}(M)/B\). Let \(i=x \to [x]\) and \(\pi\) be the canonical map \[ \pi:F_{ab}(M) \to F_{ab}(M)/B. \] We are done by defining \(\gamma: \pi \circ i\). Then we shall verify that \(\gamma\) is our desired homomorphism satisfying the universal property. For \(x,y \in M\), we have \(\gamma(x+y)=\pi([x+y])\) and \(\gamma(x)+\gamma(y) = \pi([x])+\pi([y])=\pi([x]+[y])\). However we have \[ [x+y]-[x]-[y] \in B, \] which implies that \[ \gamma(x)+\gamma(y)=\pi([x]+[y])=\pi([x+y]) = \gamma(x+y). \] Hence \(\gamma\) is a monoid-homomorphism. Finally the universal property. By proposition 3, we have a unique homomorphism \(f_\ast\) such that \(f_\ast \circ i = f\). Note if \(y \in B\), then \(f_\ast(y) =0\). Therefore \(B \subset \ker{f_\ast}\) Therefore we are done if we define \(f_\gamma(x+B)=f_\ast (x)\). \(\blacksquare\)

Comments of the proof

Why such a \(B\)? Note in general \([x+y]\) is not necessarily equal to \([x]+[y]\) in \(F_{ab}(M)\), but we don't want it to be so. So instead we create a new equivalence relation, by factoring a subgroup generated by \([x+y]-[x]-[y]\). Therefore in \(K(M)\) we see \([x+y]+B = [x]+[y]+B\), which finally makes \(\gamma\) a homomorphism. We use the same strategy to generate the tensor product of two modules later. But at that time we have more than one relation to take care of.

Cancellative monoid

If for all \(x,y,z \in M\), \(x+y=x+z\) implies \(y=z\), then we say \(M\) is a cancellative monoid, or the cancellation law holds in \(M\). Note for the proof above we didn't use any property of cancellation. However we still have an interesting property for cancellation law.

(Theorem) The cancellation law holds in \(M\) if and only if \(\gamma\) is injective.

Proof. This proof involves another approach to the Grothendieck group. We consider pairs \((x,y) \in M \times M\) with \(x,y \in M\). Define \[ (x,y) \sim (x',y') \iff \exists \ell \in M, y+x'+\ell=x+y'+\ell. \] Then we get a equivalence relation (try to prove it yourself!). We define the addition component-wise, that is, \((x,y)+(x',y')=(x+x',y+y')\), then the equivalence classes of pairs form a group \(A\), where the zero element is \([(0,0)]\). We have a monoid-homomorphism \[ f:x \mapsto [(x,0)]. \] If cancellation law holds in \(M\), then \[ \begin{aligned} f(x) = f(y) &\implies [(x,0)] = [(y,0)] \\ &\implies 0+y+\ell=x+0+\ell \\ &\implies x=y. \end{aligned} \] Hence \(f\) is injective. By the universal property of the Grothendieck group, we get a unique homomorphism \(f_\gamma\) such that \(f_\gamma \circ \gamma = f\). If \(x \neq 0\) in \(M\), then \(f_\gamma \circ \gamma(x) \neq 0\) since \(f\) is injective. This implies \(\gamma(x) \neq 0\). Hence \(\gamma\) is injective.

Conversely, if \(\gamma\) is injective, then \(i\) is injective (this can be verified by contradiction). Then we see \(f=f_\ast \circ i\) is injective. But \(f(x)=f(y)\) if and only if \(x+\ell = y+\ell\), hence \(x+ \ell = y+ \ell\) implies \(x=y\), the cancellation law holds on \(M\).


Our first example is \(\mathbb{N}\). Elements of \(F_{ab}(\mathbb{N})\) are of the form \[ \varphi=k_1 \cdot n_1 + k_2 \cdot n_2+\cdots + k_m \cdot n_m. \] For elements in \(B\) they are generated by \[ \varphi=1\cdot (m+n)-1\cdot m - 1\cdot n \] which we wish to represent \(0\). Indeed, \(K(\mathbb{N}) \simeq \mathbb{Z}\) since if we have a homomorphism \[ \begin{aligned} f:K(\mathbb{N}) &\to \mathbb{Z} \\ \sum_{j=1}^{m}k_j \cdot n_j +B &\mapsto \sum_{j=1}^{m}k_j n_j. \end{aligned} \] For \(r \in \mathbb{Z}\), we see \(f(1 \cdot r+B)=r\). On the other hand, if \(\sum_{j=1}^{m}k_j \cdot n_j \not\in B\), then its image under \(f\) is not \(0\).

In the first example we 'granted' the natural numbers 'subtraction'. Next we grant the division on multiplicative monoid.

Consider \(M=\mathbb{Z} \setminus 0\). Now for \(F_{ab}(M)\) we write elements in the form \[ \varphi={}^{k_1}n_1{}^{k_2}n_2\cdots{}^{k_m}n_m \] which denotes that \(\varphi(n_j)=k_j\) and has no other differences. Then for elements in \(B\) they are generated by \[ \varphi = {}^1(n_1n_2){}^{-1}(n_1)^{-1}(n_2) \] which we wish to represent \(1\). Then we see \(K(M) \simeq \mathbb{Q} \setminus 0\) if we take the isomorphism \[ \begin{aligned} f:K(M) &\to \mathbb{Q} \setminus 0 \\ \left(\prod_{j=1}^{m}{}^{k_j}n_j\right)B &\mapsto \prod_{j=1}^{m}n_j^{k_j}. \end{aligned} \]

Of course this is not the end of the Grothendieck group. But for further example we may need a lot of topology background. For example, we have the topological \(K\)-theory group of a topological space to be the Grothendieck group of isomorphism classes of topological vector bundles. But I think it is not a good idea to post these examples at this timing.

Study Vector Bundle in a Relatively Harder Way - Tangent Bundle

Tangent line and tangent surface as vector spaces

We begin our study by some elementary Calculus. Now we have the function \(f(x)=x^2+\frac{e^x}{x^2+1}\) as our example. It should not be a problem to find its tangent line at point \((0,1)\), by calculating its derivative, we have \(l:x-y+1=0\) as the tangent line.

\(l\) is not a vector space since it does not get cross the origin, in general. But \(l-\overrightarrow{OA}\) is a vector space. In general, suppose \(P(x,y)\) is a point on the curve determined by \(f\), i.e. \(y=f(x)\), then we obtain a vector space \(l_p-\overrightarrow{OP} \simeq \mathbb{R}\). But the action of moving the tangent line to the origin is superfluous so naturally we consider the tangent line at \(P\) as a vector space determined by \(P\). In this case, the induced vector space (tangent line) is always of dimension \(1\).


Now we move to two-variable functions. We have a function \(a(x,y)=x^2+y^2-x-y+xy\) as our example. Some elementary Calculus work gives us the tangent surface of \(z=a(x,y)\) at \(A(1,1,1)\), which can be identified by \(S:2x+2y-z=3\simeq\mathbb{R}^2\). Again, this can be considered as a vector space determined by \(A\), or roughly speaking it is one if we take \(A\) as the origin. Further we have a base \((\overrightarrow{AB},\overrightarrow{AC})\). Other vectors on \(S\), for example \(\overrightarrow{AD}\), can be written as a linear combination of \(\overrightarrow{AB}\) and \(\overrightarrow{AC}\). In other words, \(S\) is "spanned" by \((\overrightarrow{AB},\overrightarrow{AC})\).


Tangent line and tangent surface play an important role in differentiation. But sometimes we do not have a chance to use it with ease, for example \(S^1:x^2+y^2=1\) cannot be represented by a single-variable function. However the implicit function theorem, which you have already learned in Calculus, gives us a chance to find a satisfying function locally. Here in this post we will try to generalize this concept, trying to find the tangent space at some point of a manifold. (The two examples above have already determined two manifolds and two tangent spaces.)

Definition of tangent vectors

We will introduce the abstract definition of a tangent vector at beginning. You may think it is way too abstract but actually it is not. Surprisingly, the following definition can simplify our work in the future. But before we go, make sure that you have learned about Fréchet derivative (along with some functional analysis knowledge).

Let \(M\) be a manifold of class \(C^p\) with \(p \geq 1\) and let \(x\) be a point of \(M\). Let \((U,\varphi)\) be a chart at \(x\) and \(v\) be a element of the vector space \(\mathbf{E}\) where \(\varphi(U)\) lies (for example, if \(M\) is a \(d\)-dimensional manifold, then \(v \in \mathbb{R}^d\)). Next we consider the triple \((U,\varphi,v)\). Suppose \((U,\varphi,v)\) and \((V,\psi,w)\) are two such triples. We say these two triples are equivalent if the following identity holds: \[ {\color\green{[}}{\color\red{(}}{\color\red{\psi\circ\varphi^{-1}}}{\color\red{)'}}{\color\red{(}}{\color\purple{\varphi(x)}}{\color\red)}{\color\green{]}}(v)=w. \] This identity looks messy so we need to explain how to read it. First we consider the function in red: the derivative of \(\psi\circ\varphi^{-1}\). The derivative of \(\psi\circ\varphi^{-1}\) at point \(\varphi(x)\) (in purple) is a linear transform, and the transform is embraced with green brackets. Finally, this linear transform maps \(v\) to \(w\). In short we read, the derivative of \(\psi\circ\varphi^{-1}\) at \(\varphi(x)\) maps \(v\) on \(w\). You may recall that you have meet something like \(\psi\circ\varphi^{-1}\) in the definition of manifold. It is not likely that these 'triples' should be associated to tangent vectors. But before we explain it, we need to make sure that we indeed defined an equivalent relation.

(Theorem 1) The relation \[ (U,\varphi,v) \sim (V,\psi,w)\\ [(\psi\circ\varphi^{-1})'(\varphi(x))](v)=w \] is an equivalence relation.

Proof. This will not go further than elementary Calculus, in fact, chain rule:

(Chain rule) If \(f:U \to V\) is differentiable at \(x_0 \in U\), if \(g: V \to W\) is differentiable at \(f(x_0)\), then \(g \circ f\) is differentiable at \(x_0\), and \[ (g\circ f)'(x_0)=g'(f(x_0))\circ f'(x_0) \]

  1. \((U,\varphi,v)\sim(U,\varphi,v)\).

Since \(\varphi\circ\varphi^{-1}=\operatorname{id}\), whose derivative is still the identity everywhere, we have \[ [(\varphi\circ\varphi^{-1})'(\varphi(x))](v)=\operatorname{id}(v)=v \]

  1. If \((U,\varphi,v) \sim (V,\psi,w)\), then \((V,\psi,w)\sim(U,\varphi,v)\).

So now we have \[ [(\psi\circ\varphi^{-1})'(\varphi(x))](v)=w. \] To prove that \([(\varphi\circ\psi^{-1})'(\psi(x))]{}(w)=v\), we need some implementation of chain rule.

Note first \[ (\psi\circ\varphi^{-1})'(\varphi(x))=\psi'(\varphi^{-1}(\varphi(x)))\circ\varphi^{-1}{'}(\varphi(x))=\psi'(x)\circ(\varphi^{-1})'(\varphi(x)) \] while \[ (\varphi\circ\psi^{-1})'(\psi(x))=\varphi'(x)\circ(\psi^{-1})'(\psi(x)). \] But also by the chain rule, if \(f\) is a diffeomorphism, we have \[ (f\circ f^{-1})'(x)=(f^{-1})'(f(x))\circ f'(x)=\operatorname{id} \] or equivalently \[ f'(x)=[(f^{-1})'(f(x))]^{-1} \quad (f^{-1})'(f(x))=[f'(x)]^{-1} \]

Therefore \[ \begin{aligned} \{(\psi\circ\varphi^{-1})'(\varphi(x))\}^{-1} &=\{\psi'(x)\circ(\varphi^{-1})'(\varphi(x))\}^{-1} \\ &=\{(\varphi^{-1})'(\varphi(x))\}^{-1}\circ\{\psi'(x)\}^{-1} \\ &=\varphi'(x)\circ(\psi^{-1})'(\psi(x)) \\ &=(\varphi\circ\psi^{-1})'(\psi(x)) \end{aligned} \] which implies \[ (\varphi\circ\psi^{-1})'(\psi(x))(w)=\{(\psi\circ\varphi^{-1})'(\varphi(x))\}^{-1}(w)=v. \]

  1. If \((U,\varphi,v)\sim(V,\psi,w)\) and \((V,\psi,w)\sim(W,\lambda,z)\), then \((U,\varphi,v)\sim(W,\lambda,z)\).

We are given identities \[ [(\psi\circ\varphi^{-1})'(\varphi(x))](v)=w \] and \[ [(\lambda\circ\psi^{-1})'(\psi(x))](w)=z. \] By canceling \(w\), we get \[ \begin{aligned} z = [(\lambda\circ\psi^{-1})'(\psi(x))] \circ [(\psi\circ\varphi^{-1})'(\varphi(x))] (v) \end{aligned}. \] On the other hand, \[ \begin{aligned} (\lambda\circ\varphi^{-1})'(\varphi(x))&=(\lambda\circ\psi^{-1}\circ\psi\circ\varphi^{-1})'(\varphi(x)) \\ &=(\lambda\circ\psi^{-1})'(\psi\circ\varphi^{-1}\circ\varphi(x))\circ(\psi\circ\varphi^{-1})'(\varphi(x)) \\ &=(\lambda\circ\psi^{-1})'(\psi(x))\circ(\psi\circ\varphi^{-1})'(\varphi(x)) \end{aligned} \] which is what we needed. \(\square\)

An equivalence class of such triples \((U,\varphi,v)\) is called a tangent vector of \(X\) at \(x\). The set of such tangent vectors is called the tangent space to \(X\) at \(x\), which is denoted by \(T_x(X)\). But it seems that we have gone too far. Is the triple even a 'vector'? To get a clear view let's see Euclidean submanifolds first.

Definition of tangent vectors of Euclidean submanifolds

Suppose \(M\) is a submanifold of \(\mathbb{R}^n\). We say \(z\) is the tangent vector of \(M\) at point \(x\) if there exists a curve \(\alpha\) of class \(C^1\), which is defined on \(\mathbb{R}\) and where there exists an interval \(I\) such that \(\alpha(I) \subset M\), such that \(\alpha(t_0)=x\) and \(\alpha'(t_0)=z\). (For convenience we often take \(t_0=0\).)

This definition is immediate if we check some examples. For the curve \(M: x^2+1+\frac{e^x}{x^2+1}-y=0\), we can show that \((1,1)^T\) is a tangent vector of \(M\) at \((0,1)\), which is identical to our first example. Taking \[ \alpha(t)=(t,t^2+1+\frac{e^t}{t^2+1}) \] we get \(\alpha(0)=(0,1)\) and \[ \alpha'(t)=(1,2t+\frac{e^t(t-1)^2}{(t^2+1)^2})^T. \] Therefore \(\alpha'(0)=(1,1)^T\). \(\square\)

Coordinate system and tangent vector

Let \(\mathbf{E}\) and \(\mathbf{F}\) be two Banach spaces and \(U\) an open subset of \(\mathbf{E}\). A \(C^p\) map \(f: U \to \mathbf{F}\) is called an immersion at \(x\) if \(f'(x)\) is injective.

For example, if we take \(\mathbf{E}=\mathbf{F}=\mathbb{R}=U\) and \(f(x)=x^2\), then \(f\) is an immersion at almost all point on \(\mathbb{R}\) except \(0\) since \(f'(0)=0\) is not injective. This may lead you to Sard's theorem.

(Theorem 2) Let \(M\) be a subset of \(\mathbb{R}^n\), then \(M\) is a \(d\)-dimensional \(C^p\) submanifold of \(\mathbb{R}^n\) if and only if for every \(x \in M\) there exists an open neighborhood \(U \subset \mathbb{R}^n\) of \(x\), an open neighborhood \(\Omega \subset \mathbb{R}^d\) of \(0\) and a \(C^p\) map \(g: \Omega \to \mathbb{R}^n\) such that \(g\) is immersion at \(0\) such that \(g(0)=x\), and \(g\) is a homeomorphism between \(\Omega\) and \(M \cap U\) with the topology induced from \(\mathbb{R}^n\).

This follows from the definition of manifold and should not be difficult to prove. But it is not what this blog post should cover. For a proof you can check Differential Geometry: Manifolds, Curves, and Surfaces by Marcel Berger and Bernard Gostiaux. The proof is located in section 2.1.

A coordinate system on a \(d\)-dimensional \(C^p\) submanifold \(M\) of \(\mathbb{R}^n\) is a pair \((\Omega,g)\) consisting of an open set \(\Omega \subset \mathbb{R}^d\) and a \(C^p\) function \(g:\Omega \to \mathbb{R}^n\) such that \(g(\Omega)\) is open in \(V\) and \(g\) induces a homeomorphism between \(\Omega\) and \(g(\Omega)\).

For convenience, we say \((\Omega,g)\) is centered at \(x\) if \(g(0)=x\) and \(g\) is an immersion at \(x\). By theorem 2 it is always possible to find such a coordinate system centered at a given point \(x \in M\). The following theorem will show that we can get a easier approach to tangent vector.

(Theorem 3) Let \(\mathbf{E}\) and \(\mathbf{F}\) be two finite-dimensional vector spaces, \(U \subset \mathbf{E}\) an open set, \(f:U \to \mathbf{F}\) a \(C^1\) map, \(M\) a submanifold of \(\mathbf{E}\) contained in \(U\) and \(W\) a submanifold of \(\mathbf{F}\) such that \(f(M) \subset W\). Take \(x \in M\) and set \(y=f(x)\), If \(z\) is a tangent vector to \(M\) at \(x\), the image \(f'(x)(z)\) is a tangent vector to \(W\) at \(y=f(x)\).

Proof. Since \(z\) is a tangent vector, we see there exists a curve \(\alpha: J \to M\) such that \(\alpha(0)=x\) and \(\alpha'(0)=z\) where \(J\) is an open interval containing \(0\). The function \(\beta = f \circ \alpha: J \to W\) is also a curve satisfying \(\beta(0)=f(\alpha(0))=f(x)\) and \[ \beta'(0)=f'(\alpha(0))\alpha'(0)=f'(x)(z), \] which is our desired curve. \(\square\)

Why we use 'equivalence relation'

We shall show that equivalence relation makes sense. Suppose \(M\) is a \(d\)-submanifold of \(\mathbb{R}^n\), \(x \in M\) and \(z\) is a tangent vector to \(M\) at \(x\). Let \((\Omega,g)\) be a coordinate system centered at \(x\). Since \(g \in C^p(\mathbb{R}^d;\mathbb{R}^n)\), we see \(g'(0)\) is a \(n \times d\) matrix, and injectivity ensures that \(\operatorname{rank}(g'(0))=d\).

Every open set \(\Omega \subset \mathbb{R}^d\) is a \(d\)-dimensional submanifold of \(\mathbb{R}^d\) (of \(C^p\)). Suppose now \(v \in \mathbb{R}^d\) is a tangent vector to \(\Omega\) at \(0\) (determined by a curve \(\alpha\)), then by Theorem 3, \(g \circ \alpha\) determines a tangent vector to \(M\) at \(x\), which is \(z_x=g'(0)(v)\). Suppose \((\Lambda,h)\) is another coordinate system centered at \(x\). If we want to obtain \(z_x\) as well, we must have \[ h'(0)(w)=g'(0)(v), \] which is equivalent to \[ w = (h'(0)^{-1} \circ g'(0))(v)=(h^{-1}\circ g)'(0)(v), \] for some \(w \in \mathbb{R}^d\) which is the tangent vector to \(\Lambda\) at \(0 \in \Lambda\). (The inverse makes sense since we implicitly restricted ourself to \(\mathbb{R}^d\))

However, we also have two charts by \((U,\varphi)=(g(\Omega),g^{-1})\) and \((V,\psi) = (h(\Lambda),h^{-1})\), which gives \[ (h^{-1} \circ g)'(0)(v)=[(\psi \circ \varphi^{-1})'(\varphi(x))](v)=w \] and this is just our equivalence relation (don't forget that \(g(0)=x\) hence \(g^{-1}(x)=\varphi(x)=0\)!). There we have our reason for equivalence relation: If \((U,\varphi,v) \sim (V,\psi,w)\), then \((U,\varphi,u)\) and \((V,\psi,v)\) determines the same tangent vector but we do not have to evaluate it manually. In general, all elements in an equivalence class represent a single vector, so the vector is (algebraically) a equivalence class. This still holds when talking about Banach manifold since topological properties of Euclidean spaces do not play a role. The generalized proof can be implemented with little difficulty.

Tangent space

The tangent vectors at \(x \in M\) span a vector space (which is based at \(x\)). We do hope that because if not our definition of tangent vector would be incomplete and cannot even hold for an trivial example (such as what we mentioned at the beginning). We shall show, satisfyingly, the set of tangent vectors to \(M\) at \(x\) (which we write \(T_xM\)) forms a vector space that is toplinearly isomorphic to \(\mathbf{E}\), on which \(M\) is modeled.

(Theorem 4) \(T_xM \simeq \mathbf{E}\). In other words, \(T_xM\) can be given the structure of topological vector space given by the chart.

Proof. Let \((U,\varphi)\) be a chart at \(x\). For \(v \in \mathbf{E}\), we see \((\varphi^{-1})'(x)(v)\) is a tangent vector at \(x\). On the other hand, pick \(\mathbf{w} \in T_xM\), which can be represented by \((V,\psi,w)\). Then \[ v=(\varphi\circ\psi^{-1})'(\psi(x))(w) \] makes \((U,\varphi,v) \sim (V,\psi,w)\) uniquely, and therefore we get some \(v \in \mathbf{E}\). To conclude, \[ T_xM \xrightarrow[(\varphi^{-1})'(x)]{\simeq}\mathbf{E} \] which proves our theorem. Note that this does not depend on the choice of charts. \(\square\)

For many reasons it is not a good idea to identify \(T_xM\) as \(\mathbf{E}\) without mentioning the point \(x\). For example we shouldn't identify the tangent line of a curve as \(x\)-axis. Instead, it would be better to identify or visualize \(T_xM\) as \((x,\mathbf{E})\), that is, a linear space with origin at \(x\).

Tangent bundle

Now we treat all tangent spaces as a vector bundle. Let \(M\) be a manifold of class \(C^p\) with \(p \geq 1\), define the tangent bundle by the disjoint union \[ T(M)=\bigsqcup_{x \in M}T_xM. \] This is a vector bundle if we define the projection by \[ \begin{aligned} \pi: T(M) &\to M \\ y \in T_xM &\mapsto x \end{aligned} \] and we will verify it soon. First let's see an example. Below is a visualization of the tangent bundle of \(\frac{x^2}{4}+\frac{y^2}{3}=1\), denoted by red lines:


Also we can see \(\pi\) maps points on the blue line to a point on the curve, which is \(B\).

To show that a tangent bundle of a manifold is a vector bundle, we need to verify that it satisfies three conditions we mentioned in previous post. Let \((U,\varphi)\) be a chart of \(M\) such that \(\varphi(U)\) is open in \(\mathbf{E}\), then tangent vectors can be represented by \((U,\varphi,v)\). We get a bijection \[ \tau_U:\pi^{-1}(U) = T(U) \to U \times \mathbf{E} \] by definition of tangent vectors as equivalence classes. Let \(z_x\) be a tangent vector to \(U\) at \(x\), then there exists some \(v \in \mathbf{E}\) such that \((U,\varphi,v)\) represents \(z\). On the other hand, for some \(v \in \mathbf{E}\) and \(x \in U\), \((U,\varphi,v)\) represents some tangent vector at \(x\). Explicitly, \[ \tau_{U}(z_x)=(x,v)=(\pi(z_x),[(\varphi^{-1})'(\pi(z_x))]^{-1}(z_x)) \]

Further we get the following diagram commutative (which establishes VB 1):


For VB 2 and VB 3 we need to check different charts. Let \((U_i,\varphi_i)\), \((U_j,\varphi_j)\) be two charts. Define \(\varphi_{ji}=\varphi_j \circ \varphi_i^{-1}\) on \(\varphi_i(U_i \cap U_j)\), and respectively we write \(\tau_{U_i}=\tau_i\) and \(\tau_{U_j}=\tau_j\). Then we get a transition mapping \[ \tau_{ji}:(\tau_j \circ \tau_i^{-1}):(U_i \cap U_j) \times \mathbf{E} \to (U_i \cap U_j) \times \mathbf{E}. \]

One can verify that \[ \tau_{ji}(x,v)=(\varphi_{ji}(x),D\varphi_{ji}(x) \cdot v) \] for \(x \in U_i \cap U_j\) and \(v \in \mathbf{E}\). Since \(D\varphi_{ji} \in C^{p-1}\) and \(D\varphi_{ji}(x)\) is a toplinear isomorphism, we see \[ x \mapsto (\tau_j \circ \tau_i^{-1})_x=(\varphi_{ji}(x),D\varphi_{ji}(x)\cdot(\cdot)) \] is a morphism, which goes for VB 3. It remains to verify VB 2. To do this we need a fact from Banach space theory:

If \(f:U \to L(\mathbf{E},\mathbf{F})\) is a \(C^k\)-morphism, then the map of \(U \times \mathbf{E}\) into \(\mathbf{F}\) given by \[ (x,v) \mapsto [f(x)](v) \] is a \(C^k\)-morphism.

Here, we have \(f(x)=\tau_{ji}(x,\cdot)\) and to conclude, \(\tau_{ji}\) is a \(C^{p-1}\)-morphism. It is also an isomorphism since it has an inverse \(\tau_{ij}\). Following the definition of manifold, we can conclude that \(T(U)\) has a unique manifold structure such that \(\tau_i\) are morphisms (there will be a formal proof in next post about any total space of a vector bundle). By VB 1, we also have \(\pi=\tau_i\circ pr\), which makes it a morphism as well. On each fiber \(\pi^{-1}(x)\), we can freely transport the topological vector space structure of any \(\mathbf{E}\) such that \(x\) lies in \(U_i\), by means of \(\tau_{ix}\). Since \(f(x)\) is a toplinear isomorphism, the result is independent of the choice of \(U_i\). VB 2 is therefore established.

Using some fancier word, we can also say that \(T:M \to T(M)\) is a functor from the category of \(C^p\)-manifolds to the category of vector bundles of class \(C^{p-1}\).

A Continuous Function Sending L^p Functions to L^1

Throughout, let \((X,\mathfrak{M},\mu)\) be a measure space where \(\mu\) is positive.

The question

If \(f\) is of \(L^p(\mu)\), which means \(\lVert f \rVert_p=\left(\int_X |f|^p d\mu\right)^{1/p}<\infty\), or equivalently \(\int_X |f|^p d\mu<\infty\), then we may say \(|f|^p\) is of \(L^1(\mu)\). In other words, we have a function \[ \begin{aligned} \lambda: L^p(\mu) &\to L^1(\mu) \\ f &\mapsto |f|^p. \end{aligned} \] This function does not have to be one to one due to absolute value. But we hope this function to be fine enough, at the very least, we hope it is continuous.

Here, \(f \sim g\) means that \(f-g\) equals \(0\) almost everywhere with respect to \(\mu\). It can be easily verified that this is an equivalence relation.


We still use the \(\varepsilon-\delta\) argument but it's in a metric space. Suppose \((X,d_1)\) and \((Y,d_2)\) are two metric spaces and \(f:X \to Y\) is a function. We say \(f\) is continuous at \(x_0 \in X\) if, for any \(\varepsilon>0\), there exists some \(\delta>0\) such that \(d_2(f(x_0),f(x))<\varepsilon\) whenever \(d_1(x_0,x)<\delta\). Further, we say \(f\) is continuous on \(X\) if \(f\) is continuous at every point \(x \in X\).


For \(1\leq p<\infty\), we already have a metric by \[ d(f,g)=\lVert f-g \rVert_p \] given that \(d(f,g)=0\) if and only if \(f \sim g\). This is complete and makes \(L^p\) a Banach space. But for \(0<p<1\) (yes we are going to cover that), things are much more different, and there is one reason: Minkowski inequality holds reversely! In fact, we have \[ \lVert f+g \rVert_p \geq \lVert f \rVert_p + \lVert g \rVert_p \] for \(0<p<1\). \(L^p\) space has too many weird things when \(0<p<1\). Precisely,

For \(0<p<1\), \(L^p(\mu)\) is locally convex if and only if \(\mu\) assumes finitely many values. (Proof.)

On the other hand, for example, \(X=[0,1]\) and \(\mu=m\) be the Lebesgue measure, then \(L^p(\mu)\) has no open convex subset other than \(\varnothing\) and \(L^p(\mu)\) itself. However,

A topological vector space \(X\) is normable if and only if its origin has a convex bounded neighbourhood. (See Kolmogorov's normability criterion.)

Therefore \(L^p(m)\) is not normable, hence not Banach.

We have gone too far. We need a metric that is fine enough.

Metric of \(L^p\) when \(0<p<1\)

Define \[ \Delta(f)=\int_X |f|^p d\mu \] for \(f \in L^p(\mu)\). We will show that we have a metric by \[ d(f,g)=\Delta(f-g). \] Fix \(y\geq 0\), consider the function \[ f(x)=(x+y)^p-x^p. \] We have \(f(0)=y^p\) and \[ f'(x)=p(x+y)^{p-1}-px^{p-1} \leq px^{p-1}-px^{p-1}=0 \] when \(x > 0\) and hence \(f(x)\) is nonincreasing on \([0,\infty)\), which implies that \[ (x+y)^p \leq x^p+y^p. \] Hence for any \(f\), \(g \in L^p\), we have \[ \Delta(f+g)=\int_X |f+g|^p d\mu \leq \int_X |f|^p d\mu + \int_X |g|^p d\mu=\Delta(f)+\Delta(g). \] This inequality ensures that \[ d(f,g)=\Delta(f-g) \] is a metric. It's immediate that \(d(f,g)=d(g,f) \geq 0\) for all \(f\), \(g \in L^p(\mu)\). For the triangle inequality, note that \[ d(f,h)+d(g,h)=\Delta(f-h)+\Delta(h-g) \geq \Delta((f-h)+(h-g))=\Delta(f-g)=d(f,g). \] This is translate-invariant as well since \[ d(f+h,g+h)=\Delta(f+h-g-h)=\Delta(f-g)=d(f,g) \] The completeness can be verified in the same way as the case when \(p>1\). In fact, this metric makes \(L^p\) a locally bounded F-space.

The continuity of \(\lambda\)

The metric of \(L^1\) is defined by \[ d_1(f,g)=\lVert f-g \rVert_1=\int_X |f-g|d\mu. \] We need to find a relation between \(d_p(f,g)\) and \(d_1(\lambda(f),\lambda(g))\), where \(d_p\) is the metric of the corresponding \(L^p\) space.


As we have proved, \[ (x+y)^p \leq x^p+y^p. \] Without loss of generality we assume \(x \geq y\) and therefore \[ x^p=(x-y+y)^p \leq (x-y)^p+y^p. \] Hence \[ x^p-y^p \leq (x-y)^p. \] By interchanging \(x\) and \(y\), we get \[ |x^p-y^p| \leq |x-y|^p. \] Replacing \(x\) and \(y\) with \(|f|\) and \(|g|\) where \(f\), \(g \in L^p\), we get \[ \int_{X}\lvert |f|^p-|g|^p \rvert d\mu \leq \int_X |f-g|^p d\mu. \] But \[ d_1(\lambda(f),\lambda(g))=\int_{X}\lvert |f|^p-|g|^p \rvert d\mu \\ d_p(f,g)=\Delta(f-g)= d\mu \leq \int_X |f-g|^p d\mu \] and we therefore have \[ d_1(\lambda(f),\lambda(g)) \leq d_p(f,g). \] Hence \(\lambda\) is continuous (and in fact, Lipschitz continuous and uniformly continuous) when \(0<p<1\).

\(1 \leq p < \infty\)

It's natural to think about Minkowski's inequality and Hölder's inequality in this case since they are critical inequality enablers. You need to think about some examples of how to create the condition to use them and get a fine result. In this section we need to prove that \[ |x^p-y^p| \leq p|x-y|(x^{p-1}+y^{p-1}). \] This inequality is surprisingly easy to prove however. We will use nothing but the mean value theorem. Without loss of generality we assume that \(x > y \geq 0\) and define \(f(t)=t^p\). Then \[ \frac{f(x)-f(y)}{x-y}=f'(\zeta)=p\zeta^{p-1} \] where \(y < \zeta < x\). But since \(p-1 \geq 0\), we see \(\zeta^{p-1} < x^{p-1} <x^{p-1}+y^{p-1}\). Therefore \[ f(x)-f(y)=x^p-y^p=p(x-y)\zeta^{p-1}<p(x-y)(x^{p-1}-y^{p-1}). \] For \(x=y\) the equality holds.

Therefore \[ \begin{aligned} d_1(\lambda(f),\lambda(g)) &= \int_X \left||f|^p-|g|^p\right|d\mu \\ &\leq \int_Xp\left||f|-|g|\right|(|f|^{p-1}+|g|^{p-1})d\mu \end{aligned} \] By Hölder's inequality, we have \[ \begin{aligned} \int_X ||f|-|g||(|f|^{p-1}+|g|^{p-1})d\mu & \leq \left[\int_X \left||f|-|g|\right|^pd\mu\right]^{1/p}\left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q} \\ &\leq \left[\int_X \left|f-g\right|^pd\mu\right]^{1/p}\left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q} \\ &=\lVert f-g \rVert_p \left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q}. \end{aligned} \] By Minkowski's inequality, we have \[ \left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q} \leq \left[\int_X|f|^{(p-1)q}d\mu\right]^{1/q}+\left[\int_X |g|^{(p-1)q}d\mu\right]^{1/q} \] Now things are clear. Since \(1/p+1/q=1\), or equivalently \(1/q=(p-1)/p\), suppose \(\lVert f \rVert_p\), \(\lVert g \rVert_p \leq R\), then \((p-1)q=p\) and therefore \[ \left[\int_X|f|^{(p-1)q}d\mu\right]^{1/q}+\left[\int_X |g|^{(p-1)q}d\mu\right]^{1/q} = \lVert f \rVert_p^{p-1}+\lVert g \rVert_p^{p-1} \leq 2R^{p-1}. \] Summing the inequalities above, we get \[ \begin{aligned} d_1(\lambda(f),\lambda(g)) \leq 2pR^{p-1}\lVert f-g \rVert_p =2pR^{p-1}d_p(f,g) \end{aligned} \] hence \(\lambda\) is continuous.

Conclusion and further

We have proved that \(\lambda\) is continuous, and when \(0<p<1\), we have seen that \(\lambda\) is Lipschitz continuous. It's natural to think about its differentiability afterwards, but the absolute value function is not even differentiable so we may have no chance. But this is still a fine enough result. For example we have no restriction to \((X,\mathfrak{M},\mu)\) other than the positivity of \(\mu\). Therefore we may take \(\mathbb{R}^n\) as the Lebesgue measure space here, or we can take something else.

It's also interesting how we use elementary Calculus to solve some much more abstract problems.

Study Vector Bundle in a Relatively Harder Way - Definition


Direction is a considerable thing. For example take a look at this picture (by David Gunderman):


The position of the red ball and black ball shows that this triple of balls turns upside down every time they finish one round. This wouldn't happen if this triple were on a normal band, which can be denoted by \(S^1 \times (0,1)\). What would happen if we try to describe their velocity on the Möbius band, both locally and globally? There must be some significant difference from a normal band. If we set some move pattern on balls, for example let them run horizontally or zig-zagly, hopefully we get different set of vectors. those vectors can span some vector spaces as well.

A Formal Construction

Here and in the forgoing posts, we will try to develop purely formally certain functorial constructions having to do with vector bundles. It may be overly generalized, but we will offer some examples to make it concrete.

Let \(M\) be a manifold (of class \(C^p\), where \(p \geq 0\) and can be set to \(\infty\)) modeled on a Banach space \(\mathbf{E}\). Let \(E\) be another topological space and \(\pi: E \to M\) a surjective \(C^p\)-morphism. A vector bundle is a topological construction associated with \(M\) (base space), \(E\) (total space) and \(\pi\) (bundle projection) such that, roughly speaking, \(E\) is locally a product of \(M\) and \(\mathbf{E}\).

We use \(\mathbf{E}\) instead of \(\mathbb{R}^n\) to include the infinite dimensional cases. We will try to distinguish finite-dimensional and infinite-dimensional Banach spaces here. There are a lot of things to do, since, for example, infinite dimensional Banach spaces have no countable Hamel basis, while the finite-dimensional ones have finite ones (this can be proved by using the Baire category theorem).

Next we will show precisely how \(E\) locally becomes a product space. Let \(\mathfrak{U}=(U_i)_i\) be an open covering of \(M\), and for each \(i\), suppose that we are given a mapping \[ \tau_i:\pi^{-1}(U_i)\to U_i \times E \] satisfying the following three conditions.

VB 1 \(\tau_i\) is a \(C^p\) diffeomorphism making the following diagram commutative:


where \(pr\) is the projection of the first component: \((x,y) \mapsto x\). By restricting \(\tau_i\) on one point of \(U_i\), we obtain an isomorphism on each fiber \(\pi^{-1}(x)\): \[ \tau_{ix}:\pi^{-1}(x) \xrightarrow{\simeq} \{x\} \times \mathbf{E} \]

VB 2 For each pair of open sets \(U_i\), \(U_j \in \mathfrak{U}\), we have the map \[ \tau_{jx} \circ \tau_{ix}^{-1}: \mathbf{E} \to \mathbf{E} \] to be a toplinear isomorphism (that is, it preserves \(\mathbf{E}\) for being a topological vector space).

VB 3 For any two members \(U_i\), \(U_j \in \mathfrak{U}\), we have the following function to be a \(C^p\)-morphism: \[ \begin{aligned} \varphi:U_i \cap U_j &\to L(\mathbf{E},\mathbf{E}) \\ x &\mapsto \left(\tau_j\circ \tau_i^{-1}\right)_x \end{aligned} \]

REMARKS. As with manifold, we call the set of 2-tuples \((U_i,\tau_i)_i\) a trivializing covering of \(\pi\), and that \((\tau_i)\) are its trivializing maps. Precisely, for \(x \in U_i\), we say \(U_i\) or \(\tau_i\) trivializes at \(x\).

Two trivializing coverings for \(\pi\) is said to be VB-equivalent if taken together they also satisfy conditions of VB 2 and VB 3. It's immediate that VB-equivalence is an equivalence relation and we leave the verification to the reader. It is this VB-equivalence class of trivializing coverings that determines a structure of vector bundle on \(\pi\). With respect to the Banach space \(\mathbf{E}\), we say that the vector bundle has fiber \(\mathbf{E}\), or is modeled on \(\mathbf{E}\).

Next we shall give some motivations of each condition. Each pair \((U_i,\tau_i)\) determines a local product of 'a part of the manifold' and the model space, on the latter of which we can deploy the direction with ease. This is what VB 1 tells us. But that's far from enough if we want our vectors fine enough. We do want the total space \(E\) to actually be able to qualify our requirements. As for VB 2, it is ensured that using two different trivializing maps will give the same structure of some Banach spaces (with equivalent norms). According to the image of \(\tau_{ix}\), we can say, for each point \(x \in X\), which can be determined by a fiber \(\pi^{-1}(x)\) (the pre-image of \(\tau_{ix}\)), can be given another Banach space by being sent via \(\tau_{jx}\) for some \(j\). Note that \(\pi^{-1}(x) \in E\), the total space. In fact, VB 2 has an equivalent alternative:

VB 2' On each fiber \(\pi^{-1}(x)\) we are given a structure of Banach space as follows. For \(x \in U_i\), we have a toplinear isomorphism which is in fact the trivializing map: \[ \tau_{ix}:\pi^{-1}(x)=E_x \to \mathbf{E}. \] As stated, VB 2 implies VB 2'. Conversely, if VB 2' is satisfied, then for open sets \(U_i\), \(U_j \in \mathfrak{U}\), and \(x \in U_i \cap U_j\), we have \(\tau_{jx} \circ \tau_{ix}^{-1}:\mathbf{E} \to \mathbf{E}\) to be an toplinear isomorphism. Hence, we can consider VB 2 or VB 2' as the refinement of VB 1.

In finite dimensional case, one can omit VB 3 since it can be implied by VB 2, and we will prove it below.

(Lemma) Let \(\mathbf{E}\) and \(\mathbf{F}\) be two finite dimensional Banach spaces. Let \(U\) be open in some Banach space. Let \[ f:U \times \mathbf{E} \to \mathbf{F} \] be a \(C^p\)-morphism such that for each \(x \in U\), the map \[ f_x: \mathbf{E} \to \mathbf{F} \] given by \(f_x(v)=f(x,v)\) is a linear map. Then the map of \(U\) into \(L(\mathbf{E},\mathbf{F})\) given by \(x \mapsto f_x\) is a \(C^p\)-morphism.

PROOF. Since \(L(\mathbf{E},\mathbf{F})=L(\mathbf{E},\mathbf{F_1}) \times L(\mathbf{E},\mathbf{F_2}) \times \cdots \times L(\mathbf{E},\mathbf{F_n})\) where \(\mathbf{F}=\mathbf{F_1} \times \cdots \times \mathbf{F_n}\), by induction on the dimension of \(\mathbf{F}\) and \(\mathbf{E}\), it suffices to assume that \(\mathbf{E}\) and \(\mathbf{F}\) are toplinearly isomorphic to \(\mathbb{R}\). But in that case, the function \(f(x,v)\) can be written \(g(x)v\) for some \(g:U \to \mathbb{R}\). Since \(f\) is a morphism, it follows that as a function of each argument \(x\), \(v\) is also a morphism, Putting \(v=1\) shows that \(g\) is also a morphism, which finishes the case when both the dimension of \(\mathbf{E}\) and \(\mathbf{F}\) are equal to \(1\), and the proof is completed by induction. \(\blacksquare\)

To show that VB 3 is implied by VB 2, put \(\mathbf{E}=\mathbf{F}\) as in the lemma. Note that \(\tau_j \circ \tau_i^{-1}\) maps \(U_i \cap U_j \times \mathbf{E}\) to \(\mathbf{E}\), and \(U_i \cap U_j\) is open, and for each \(x \in U_i \cap U_j\), the map \((\tau_j \circ \tau_i^{-1})_x=\tau_{jx} \circ \tau_{ix}^{-1}\) is toplinear, hence linear. Then the fact that \(\varphi\) is a morphism follows from the lemma.


Trivial bundle

Let \(M\) be any \(n\)-dimensional smooth manifold that you are familiar with, then \(pr:M \times \mathbb{R}^n \to M\) is actually a vector bundle. Here the total space is \(M \times \mathbb{R}^n\) and the base is \(M\) and \(pr\) is the bundle projection but in this case it is simply a projection. Intuitively, on a total space, we can determine a point \(x \in M\), and another component can be any direction in \(\mathbb{R}^n\), hence a vector.

We need to verify three conditions carefully. Let \((U_i,\varphi_i)_i\) be any atlas of \(M\), and \(\tau_i\) is the identity map on \(U_i\) (which is naturally of \(C^p\)). We claim that \((U_i,\tau_i)_i\) satisfy the three conditions, thus we get a vector bundle.

For VB 1 things are clear: since \(pr^{-1}(U_i)=U_i \times \mathbb{R}^n\), the diagram is commutative. Each fiber \(pr^{-1}(x)\) is essentially \((x) \times \mathbb{R}^n\), and still, \(\tau_{jx} \circ \tau_{ix}^{-1}\) is the identity map between \((x) \times \mathbb{R}^n\) and \((x) \times \mathbb{R}^n\), under the same Euclidean topology, hence VB 2 is verified, and we have no need to verify VB 3.

Möbius band

First of all, imagine you have embedded a circle into a Möbius band. Now we try to give some formal definition. As with quotient topology, \(S^1\) can be defined as \[ S^1=I/\sim_1, \]

where \(I\) is the unit interval and \(0 \sim_1 1\) (identifying two ends). On the other hand, the infinite Möbius band can be defined by \[ B= (I \times \mathbb{R})/\sim_2 \] where \((0,v) \sim_2 (1,-v)\) for all \(v \in \mathbb{R}\) (not only identifying two ends of \(I\) but also 'flips' the vertical line). Then all we need is a natural projection on the first component: \[ \pi:B \to S^1. \] And the verification has few difference from the trivial bundle. Quotient topology of Banach spaces follows naturally in this case, but things might be troublesome if we restrict ourself in \(\mathbb{R}^n\).

Tangent bundle of the sphere

The first example is relatively rare in many senses. By \(S^n\) we mean the set in \(\mathbb{R}^{n+1}\) with \[ S^n=\{(x_0,x_1,\dots,x_n):x_0^2+x_1^2+\cdots+x_n^2=1\} \] and the tangent bundle can be defined by \[ TS^n=\{(\mathbf{x},\mathbf{y}):\langle\mathbf{x},\mathbf{y}\rangle=0\} \subset S^{n} \times\mathbb{R}^{n+1}, \] where, of course, \(\mathbf{x} \in S^n\) and \(\mathbf{y} \in \mathbb{R}^{n+1}\). The vector bundle is given by \(pr:TS^n \to S^n\) where \(pr\) is the projection of the first factor. This total space is of course much finer than \(M \times \mathbb{R}^n\) in the first example. Each point in the manifold now is associated with a tangent space \(T_x(M)\) at this point.

More generally, we can define it in any Hilbert space \(H\), for example, \(L^2\) space: \[ TS=\{(x,y):\langle x , y \rangle=0\} \subset S \times H \] where \[ S=\{x:\langle x , x \rangle = 1\}. \] The projection is natural: \[ \begin{aligned} \pi: TM &\to M \\ T_x(M) & \mapsto x \end{aligned} \] But we will not cover the verification in this post since it is required to introduce the abstract definition of tangent vectors. This will be done in the following post.

There are still many things remain undiscovered

We want to study those 'vectors' associated to some manifold both globally and locally. For example we may want to describe the tangent line of some curves at some point without heavily using elementary calculus stuff. Also, we may want to describe the vector bundle of a manifold globally, for example, when will we have a trivial one? Can we classify the manifold using the behavior of the bundle? Can we make it a little more abstract, for example, consider the class of all isomorphism bundles? How do one bundle transform to another? But to do this we need a big amount of definitions and propositions.

The Big Three Pt. 6 - Closed Graph Theorem with Applications

(Before everything: elementary background in topology and vector spaces, in particular Banach spaces, is assumed.)

A surprising result of Banach spaces

We can define several relations between two norms. Suppose we have a topological vector space \(X\) and two norms \(\lVert \cdot \rVert_1\) and \(\lVert \cdot \rVert_2\). One says \(\lVert \cdot \rVert_1\) is weaker than \(\lVert \cdot \rVert_2\) if there is \(K>0\) such that \(\lVert x \rVert_1 \leq K \lVert x \rVert_2\) for all \(x \in X\). Two norms are equivalent if each is weaker than the other (trivially this is a equivalence relation). The idea of stronger and weaker norms is related to the idea of the "finer" and "coarser" topologies in the setting of topological spaces.

So what about their limit? Unsurprisingly this can be verified with elementary \(\epsilon-N\) arguments. Suppose now \(\lVert x_n - x \rVert_1 \to 0\) as \(n \to 0\), we immediately have \[ \lVert x_n - x \rVert_2 \leq K \lVert x_n-x \rVert_1 < K\varepsilon \]

for some large enough \(n\). Hence \(\lVert x_n - x \rVert_2 \to 0\) as well. But what about the converse? We give a new definition of equivalence relation between norms.

(Definition) Two norms \(\lVert \cdot \rVert_1\) and \(\lVert \cdot \rVert_2\) of a topological vector space are compatible if given that \(\lVert x_n - x \rVert_1 \to 0\) and \(\lVert x_n - y \rVert_2 \to 0\) as \(n \to \infty\), we have \(x=y\).

By the uniqueness of limit, we see if two norms are equivalent, then they are compatible. And surprisingly, with the help of the closed graph theorem we will discuss in this post, we have

(Theorem 1) If \(\lVert \cdot \rVert_1\) and \(\lVert \cdot \rVert_2\) are compatible, and both \((X,\lVert\cdot\rVert_1)\) and \((X,\lVert\cdot\rVert_2)\) are Banach, then \(\lVert\cdot\rVert_1\) and \(\lVert\cdot\rVert_2\) are equivalent.

This result looks natural but not seemingly easy to prove, since one find no way to build a bridge between the limit and a general inequality. But before that, we need to elaborate some terminologies.


(Definition) For \(f:X \to Y\), the graph of \(f\) is defined by \[ G(f)=\{(x,f(x)) \in X \times Y:x \in X\}. \]

If both \(X\) and \(Y\) are topological spaces, and the topology of \(X \times Y\) is the usual one, that is, the smallest topology that contains all sets \(U \times V\) where \(U\) and \(V\) are open in \(X\) and \(Y\) respectively, and if \(f: X \to Y\) is continuous, it is natural to expect \(G(f)\) to be closed. For example, by taking \(f(x)=x\) and \(X=Y=\mathbb{R}\), one would expect the diagonal line of the plane to be closed.

(Definition) The topological space \((X,\tau)\) is an \(F\)-space if \(\tau\) is induced by a complete invariant metric \(d\). Here invariant means that \(d(x+z,y+z)=d(x,y)\) for all \(x,y,z \in X\).

A Banach space is easily to be verified to be a \(F\)-space by defining \(d(x,y)=\lVert x-y \rVert\).

(Open mapping theorem) See this post

By definition of closed set, we have a practical criterion on whether \(G(f)\) is closed.

(Proposition 1) \(G(f)\) is closed if and only if, for any sequence \((x_n)\) such that the limits \[ x=\lim_{n \to \infty}x_n \quad \text{ and }\quad y=\lim_{n \to \infty}f(x_n) \] exist, we have \(y=f(x)\).

In this case, we say \(f\) is closed. For continuous functions, things are trivial.

(Proposition 2) If \(X\) and \(Y\) are two topological spaces and \(Y\) is Hausdorff, and \(f:X \to Y\) is continuous, then \(G(f)\) is closed.

Proof. Let \(G^c\) be the complement of \(G(f)\) with respect to \(X \times Y\). Fix \((x_0,y_0) \in G^c\), we see \(y_0 \neq f(x_0)\). By the Hausdorff property of \(Y\), there exists some open subsets \(U \subset Y\) and \(V \subset Y\) such that \(y_0 \in U\) and \(f(x_0) \in V\) and \(U \cap V = \varnothing\). Since \(f\) is continuous, we see \(W=f^{-1}(V)\) is open in \(X\). We obtained a open neighborhood \(W \times U\) containing \((x_0,y_0)\) which has empty intersection with \(G(f)\). This is to say, every point of \(G^c\) has a open neighborhood contained in \(G^c\), hence a interior point. Therefore \(G^c\) is open, which is to say that \(G(f)\) is closed. \(\square\)


REMARKS. For \(X \times Y=\mathbb{R} \times \mathbb{R}\), we have a simple visualization. For \(\varepsilon>0\), there exists some \(\delta\) such that \(|f(x)-f(x_0)|<\varepsilon\) whenever \(|x-x_0|<\delta\). For \(y_0 \neq f(x_0)\), pick \(\varepsilon\) such that \(0<\varepsilon<\frac{1}{2}|f(x_0)-y_0|\), we have two boxes (\(CDEF\) and \(GHJI\) on the picture), namely \[ B_1=\{(x,y):x_0-\delta<x<x_0+\delta,f(x_0)-\varepsilon<y<f(x_0)+\varepsilon\} \] and \[ B_2=\{(x,y):x_0-\delta<x<x_0+\delta,y_0-\varepsilon<y<y_0+\varepsilon\}. \] In this case, \(B_2\) will not intersect the graph of \(f\), hence \((x_0,y_0)\) is an interior point of \(G^c\).

The Hausdorff property of \(Y\) is not removable. To see this, since \(X\) has no restriction, it suffices to take a look at \(X \times X\). Let \(f\) be the identity map (which is continuous), we see the graph \[ G(f)=\{(x,x):x \in X\} \] is the diagonal. Suppose \(X\) is not Hausdorff, we reach a contradiction. By definition, there exists some distinct \(x\) and \(y\) such that all neighborhoods of \(x\) contain \(y\). Pick \((x,y) \in G^c\), then all neighborhoods of \((x,y) \in X \times X\) contain \((x,x)\) so \((x,y) \in G^c\) is not a interior point of \(G^c\), hence \(G^c\) is not open.

Also, as an immediate consequence, every affine algebraic variety in \(\mathbb{C}^n\) and \(\mathbb{R}^n\) is closed with respect to Euclidean topology. Further, we have the Zariski topology \(\mathcal{Z}\) by claiming that, if \(V\) is an affine algebraic variety, then \(V^c \in \mathcal{Z}\). It's worth noting that \(\mathcal{Z}\) is not Hausdorff (example?) and in fact much coarser than the Euclidean topology although an affine algebraic variety is both closed in the Zariski topology and the Euclidean topology.

The closed graph theorem

After we have proved this theorem, we are able to prove the theorem about compatible norms. We shall assume that both \(X\) and \(Y\) are \(F\)-spaces, since the norm plays no critical role here. This offers a greater variety but shall not be considered as an abuse of abstraction.

(The Closed Graph Theorem) Suppose

  1. \(X\) and \(Y\) are \(F\)-spaces,

  2. \(f:X \to Y\) is linear,

  3. \(G(f)\) is closed in \(X \times Y\).

Then \(f\) is continuous.

In short, the closed graph theorem gives a sufficient condition to claim the continuity of \(f\) (keep in mind, linearity does not imply continuity). If \(f:X \to Y\) is continuous, then \(G(f)\) is closed; if \(G(f)\) is closed and \(f\) is linear, then \(f\) is continuous.

Proof. First of all we should make \(X \times Y\) an \(F\)-space by assigning addition, scalar multiplication and metric. Addition and scalar multiplication are defined componentwise in the nature of things: \[ \alpha(x_1,y_1)+\beta(x_2,y_2)=(\alpha x_1+\beta x_2,\alpha y_1 + \beta y_2). \] The metric can be defined without extra effort: \[ d((x_1,y_1),(x_2,y_2))=d_X(x_1,x_2)+d_Y(y_1,y_2). \] Then it can be verified that \(X \times Y\) is a topological space with translate invariant metric. (Potentially the verifications will be added in the future but it's recommended to do it yourself.)

Since \(f\) is linear, the graph \(G(f)\) is a subspace of \(X \times Y\). Next we quote an elementary result in point-set topology, a subset of a complete metric space is closed if and only if it's complete, by the translate-invariance of \(d\), we see \(G(f)\) is an \(F\)-space as well. Let \(p_1: X \times Y \to X\) and \(p_2: X \times Y \to Y\) be the natural projections respectively (for example, \(p_1(x,y)=x\)). Our proof is done by verifying the properties of \(p_1\) and \(p_2\) on \(G(f)\).

For simplicity one can simply define \(p_1\) on \(G(f)\) instead of the whole space \(X \times Y\), but we make it a global projection on purpose to emphasize the difference between global properties and local properties. One can also write \(p_1|_{G(f)}\) to dodge confusion.

Claim 1. \(p_1\) (with restriction on \(G(f)\)) defines an isomorphism between \(G(f)\) and \(X\).

For \(x \in X\), we see \(p_1(x,f(x)) = x\) (surjectivity). If \(p_1(x,f(x))=0\), we see \(x=0\) and therefore \((x,f(x))=(0,0)\), hence the restriction of \(p_1\) on \(G\) has trivial kernel (injectivity). Further, it's trivial that \(p_1\) is linear.

Claim 2. \(p_1\) is continuous on \(G(f)\).

For every sequence \((x_n)\) such that \(\lim_{n \to \infty}x_n=x\), we have \(\lim_{n \to \infty}f(x_n)=f(x)\) since \(G(f)\) is closed, and therefore \(\lim_{n \to \infty}p_1(x_n,f(x_n)) =x\). Meanwhile \(p_1(x,f(x))=x\). The continuity of \(p_1\) is proved.

Claim 3. \(p_1\) is a homeomorphism with restriction on \(G(f)\).

We already know that \(G(f)\) is an \(F\)-space, so is \(X\). For \(p_1\) we have \(p_1(G(f))=X\) is of the second category (since it's an \(F\)-space and \(p_1\) is one-to-one), and \(p_1\) is continuous and linear on \(G(f)\). By the open mapping theorem, \(p_1\) is an open mapping on \(G(f)\), hence is a homeomorphism thereafter.

Claim 4. \(p_2\) is continuous.

This follows the same way as the proof of claim 2 but much easier since there is no need to care about \(f\).

Now things are immediate once one realises that \(f=p_2 \circ p_1|_{G(f)}^{-1}\), which implies that \(f\) is continuous. \(\square\)


Before we go for theorem 1 at the beginning, we drop an application on Hilbert spaces.

Let \(T\) be a bounded operator on the Hilbert space \(L_2([0,1])\) so that if \(\phi \in L_2([0,1])\) is a continuous function so is \(T\phi\). Then the restriction of \(T\) to \(C([0,1])\) is a bounded operator of \(C([0,1])\).

For details please check this.

Now we go for the identification of norms. Define \[ \begin{aligned} f:(X,\lVert\cdot\rVert_1) &\to (X,\lVert\cdot\rVert_2) \\ x &\mapsto x \end{aligned} \] i.e. the identity map between two Banach spaces (hence \(F\)-spaces). Then \(f\) is linear. We need to prove that \(G(f)\) is closed. For the convergent sequence \((x_n)\) \[ \lim_{n \to \infty}\lVert x_n -x \rVert_1=0, \] we have \[ \lim_{n \to \infty} \lVert f(x_n)-x \rVert_2=\lim_{n \to \infty}\lVert x_n -x\rVert_2=\lim_{n \to \infty}\lVert f(x_n)-f(x)\rVert_2=0. \] Hence \(G(f)\) is closed. Therefore \(f\) is continuous, hence bounded, we have some \(K\) such that \[ \lVert x \rVert_2 =\lVert f(x) \rVert_1 \leq K \lVert x \rVert_1. \] By defining \[ \begin{aligned} g:(X,\lVert\cdot\rVert_2) &\to (X,\lVert\cdot\rVert_1) \\ x &\mapsto x \end{aligned} \] we see \(g\) is continuous as well, hence we have some \(K'\) such that \[ \lVert x \rVert_1 =\lVert g(x) \rVert_2 \leq K'\lVert x \rVert_2 \] Hence two norms are weaker than each other.

The series

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it's time to make a list of the series. It's been around half a year.


  • Walter Rudin, Functional Analysis
  • Peter Lax, Functional Analysis
  • Jesús Gil de Lamadrid, Some Simple Applications of the Closed Graph Theorem