To begin with we consider a calculus problem that you may have seen in your exam:

Let $f$ be a

continuousfunction on $[0,\infty)$ that $\lim_{x \to \infty} f(x)=l$. Prove that

And we solve this problem as follows. Put $g(x)=f(x)-l$, then $\lim_{x \to \infty}g(x)=0$. Consider the two variable function $F(x,y)=-g’(xy)$ and the range $D=\{(x,y):x \ge 0, a \le y \le b\}$, we have this result:

Substituting $g(x)$ with $f(x)-l$ gives exactly what we want, isn’t it? **Well, the more analysis you learn, the more absurd this proof has been you will realise.** If you write this in an exam you will get $0$ mark no matter what. There are two major mistakes:

- Can we change the order of integration? We have no idea. But it is certain that we cannot change the order with ease, and we have some counterexamples.
- Is this function
*even*differentiable? We also have no idea. It is*almost certain*that $f$ is not (the probability that $f$ is differentiable is $0$), see this post to learn why if you have some background in functional analysis.

For a good proof, please turn to math.stackexchange. This is not easy at all.

The problem is, it is really *unfair* that in some circumstances we have to axe out all properties of differentiation. If you are studying differential equations, and a non-differentiable function pops up, you have no way to go. Sometimes, chances are that you even have *no idea* whether a function is differentiable.

So this post is written. We introduce the concept of (Schwartz) **distribution** (a.k.a. **generalised functions**), where differentiation is significantly extended, to obtain **derivative** in a generalised sense. Roughly speaking, after distribution being introduced, differentiation can be done with absolute ease.

In fact, physicists have been using distribution long before mathematicians established formal theories. For example the $\delta$ *function* introduced by Dirac that you may have met in Fourier transform:

And it is required that

But this does not make any sense in calculus. Von Neumann, in his book on quantum physics, warned against the theory using this function, and dismissed this function because this was a “fiction”. Not so pleasant. He tried with a lot of effort to demonstrate that, quantum physics could live without such a “fiction”. As you can imagine, this function may have created some bad blood between von Neumann and Dirac.

Laurent Schwartz however, managed to be a peacemaker. He developed the theory of distribution (which is exactly what we are talking about in this post), and the “fiction” became an easy “fact”. Years later, he became the 1950 Fields Medalist (one of the most prestigious medal/awards in mathematics) at the age of 35 with reason

Developed the theory of distributions, a new notion of generalized function motivated by the Dirac delta-function of theoretical physics. (Source)

As you can see later, thanks to Schwartz, the twisted $\delta$ function is well-defined and is really plain and elegant. So von Neumann didn’t need to be angry later.

By *concept* I mean, I will try to include basic ideas (without many proofs though they can be delivered), so that the serious study of it can be simpler (it can be really tough!). It is not possible that you can solve problems on distributions after reading this post.

There will be two parts. Part one focus on motivation and what is going on. I will try to make it readable to many people having finished calculus or more ideally undergraduate analysis and linear algebra, though rigour is not always guaranteed. It would be better if you know some differential equation theory, but that’s not a must. If you already have the background to read part 2, then part 1 is much easier for you and therefore is served as a good source of intuition and motivation.

If you still need to understand differentiation in single-variable calculus, then you have no need to struggle on generalised differentiation at an early point. It does not help. The requirements of linear algebra are vector spaces, subspaces and linear maps. You should know that integration and differentiation are linear maps. This is a graduate course topic, it is not realistic to assume reader to have no idea about calculus and linear algebra.

The second part will be much more advanced, and you are expected to have some background in topological vector spaces (functional analysis). Both parts cannot be considered as a lecture note but they may help you find where you are when you study this concept seriously.

Throughout, we consider functions on $\mathbb{R}$ with real value. These theories can be generalised to $\mathbb{R}^n$ with complex value where partial derivative can take part in, but we are not doing that here. At the end of the day, these work would not be a big deal.

In calculus, a lot of functions we study are smooth (for example, $y=\sin{x}$), and we write $C^\infty$ as they are *infinitely differentiable*. This is a vector space and this vector space differentiation can be done *with absolute ease*. For given $f \in C^\infty$, we have $f’,f’’,\cdots,f^{(k)}$ well defined for all $k = 1,2,\cdots$. But in vector spaces like $C^2$, $C^1$, or even $C$, differentiation can only be done with caution: we may only have $f’’$ and no $f^{(3)}$, or even $f’$ does not exist. We don’t *feel like* this kind of caution. Hence we introduce the concept of **distribution** which is also known as **generalised functions**. We want a space where we can still do differentiation with absolute ease. We may need to *modify* our definition of differentiation such that it works on every continuous functions (but it shall not lost its meaning within $C^\infty$). Bearing these in mind, we have several settings or expectations for distributions:

- Every continuous function should be (considered as) a distribution. (So we can take
derivativesfor all continuous functions without to many worry. Unlike the calculus problem at the beginning.)- The “modified differentiation” should make sure that the “modified derivative” of a distribution is still a distribution. In other words, distributions are “infinitely differentiable” (which makes differential equation theory much easier). In the language of algebra, the “modified derivative” should be an endomorphism.
- The usual formal rules of calculus should hold. For example in the new sense we should still have $(fg)’=f’g+g’f$. (Our modified differentiation should not go to far.)
- Convergence properties should also be available. (Validating this requires more theories so this can only be mentioned in part 2.)

Let’s write our desired distribution as $\mathscr{D}’$, and all continuous functions $C$. All $C,C^\infty,\mathscr{D}’$ are considered as real vector spaces and we should have

in the sense of subspaces.

Here is a breakdown of these concepts. You will see terminologies and definitions later.

- A smooth, continuous or more generally, locally integrable function, give rise to a bounded linear functional. The converse is not guaranteed to be true, but we
pretendit to be true, so allbounded linear functionalsgive rise to distributions, a.k.a. generalised functions (this name is nice because wepretendthe converse to be true).Whenever you are asked what is generalised function, you can say, it is a linear map, and sometimes it can be determined by a normal function.- For these distributions or generalised functions, we modify the derivative with respect to integration by parts. The modified derivative cannot be put down explicitly but we don’t care, because integration by parts doesn’t give us many problems.
Whenever you are asked how the derivative of a non-differentiable function is given, you can say, it is given by pretending nothing wrong in integration by parts.

We now try to understand what we really what about distribution. We start our study through integration, **because differentiation does not work**. Given $f \in C \subset \mathscr{D}’$, we first need to make sure $\int f\phi$ is well-defined, for *some* $\phi\in C^\infty$, because we want to do integration by parts, which involves **some differentiation**, and we may make use of it.

If $f$ is not even a continuous function, we still need to consider *some* $\phi$ in the same manner, or our extension would be abrupt.

Let’s talk about these $\phi$ a little bit, with respect to integration by parts. Consider the bump function

On $(a,b)$, we have $ \phi\ne 0$. On the boundary $a$ and $b$ we have $\phi(x)=0$ but that shouldn’t be a problem, because they are the alpha and omega. Points outside $[a,b]$ have no contribution to the value of this function. For some obvious reason we call $[a,b]$ the *closure* of $(a,b)$. In general, given a real-valued function $f$, we call the closure of the set of points where $f(x) \ne 0$ the **support** of $f$. As you can tell, the support of $\phi$ is $[a,b]$.

If $\phi$ has unbounded support (the support of a function $f$ is the closure of the set of points $x$ where $f(x) \ne 0$), then we may need to discuss limit at infinity. But we don’t want improper integrals at all. Hence the support of $\phi$ are always assumed to be **closed and bounded** subset of $\mathbb{R}$ It is closed because it is defined to be a closure. These closed and bounded sets are called *compact* sets. If you are not familiar with topology, it is OK at this moment to consider compact sets as bounded closed interval $[a,b]$.

The test function space $\mathscr{D}$ is defined to be all $C^\infty$ functions with compact support. This is indeed a vector space and the verification is a good excise on both linear algebra and calculus. What about $\mathscr{D}’$? Here we demonstrate how things are extended.

For each $f \in C$ (which contains $C^\infty$), we have a functional (a functional is a linear map between a vector space and its base field, here is $\mathbb{R}$. Nothing special, just a different name that has been used by mathematicians for decades!)

This functional is **bounded** for all $\phi \in \mathscr{D}$ because if $\phi$ has support $K$, then

A continuous function on a compact set is always bounded (proof), hence the integral on the right hand side is always bounded. If it touches infinity a lot of problems are also touched.

In general, a **bounded linear functional** $\Lambda:\mathscr{D} \to \mathbb{R}$ is called a *distribution*, which forms $\mathscr{D}’$ exactly. Since every continuous function $f$ gives rise to a unique bounded functional $\Lambda_f$, we consider $C$ as a subspace of $\mathscr{D}’$. Such a function give rise to a functional, which is called distribution. The converse is not generally true, but we *pretend* it to be true (we pretend the functional gives rise to a function anyway), which makes our study easier, hence the name *generalised function* is well-deserved.

Differential operator $D$ in $C^\infty$ should be extended naturally into $\mathscr{D}’$ naturally. There are many ways to extend a linear function. For example the identity map $i:\mathbb{R} \to \mathbb{R}$ has at least two ways to be extended into $\mathbb{R}^2$:

- $I:\mathbb{R}^2 \to \mathbb{R}^2$ by $(x,y) \mapsto (x,y)$.
- $\pi:\mathbb{R}^2 \to \mathbb{R}$ by $(x,y) \mapsto x$.

The restriction of these two maps on $\mathbb{R}$ is the same as $i$.

But if we extend $D$ in several ways, things would be messy. Originally derivative is defined in the sense of limit, but for a non-differentiable function, we cannot do that. We need an extension that makes most sense: it is by validating **integration by parts**. It seems like we are developing some advanced concepts, but still we need to make use of elementary ones.

For $f(x)=\sin{x}$ and $\phi \in \mathscr{D}$, we have

The derivative of $f$ is assigned to the derivative of $\phi$. Again we are using integration by parts. If $f$ is not assumed to be differentiable, we *pretend* it is, skip the body and jump to the result immediately. For example, $f(x)=|x|$ is not differentiable, but we do that anyway:

In general for $f \in C^\infty$, we have (this can be verified by some computation)

Differentiation for distributions (on top of $C^\infty$ functions) should be in the same **shape**, hence we define the $k$-th **distribution derivative** of a distribution $\Lambda$ by

Since all $\phi$ are assumed to be of $C^\infty$, there are no problem with this formula and this differentiation is defined for all $\Lambda$. We don’t care about first order limit on a continuous but not differentiable function. What matters here is the differentiation on test functions.

Try to recall what you have learnt about integration by parts. We have

because

Therefore, if our generalisation of differentiation (though we do not know how to do yet) pays respect to integration by parts, then we can still work on product rule of differentiation, hence the usual formal rules of calculus would not go too far. If our extension conflicts with integration by parts, then the ordinary meaning of differentiation is damaged.

Let’s sum up what has happened. We have obtained an inclusion

Every distribution is infinitely differentiable because functions in $\mathscr{D}$ are. If $f \in C^\infty$, then the $k$-th derivative can be understood in both the sense of ordinary differentiation and the sense of distribution because it is given by

This is independent to the choice of $\phi$. If $h$ is a function such that $\int h\phi = \int f^{(k)}\phi$, then $h=f^{(k)}$.

If $f$ is merely continuous, still we can write the $k$-th derivative as

At this point, whether $f$ is differentiable or not is not of our concern. Since $\phi$ is smooth, the formula above is well-defined. In general we don’t even care whether $f$ is continuous or even integrable, as long as it gives rise to a **bounded** linear functional, which can be guaranteed by being *locally integrable*. A function is locally integrable if $\int_K |f|<\infty$ for all compact $K \subset \mathbb{R}$. In particular, $K$ can be taken to be any bounded closed interval. **As long as $f$ is locally integrable (for example, differentiable, continuous, or simply bounded), we can assign derivative in the new sense (integration by parts).**

We want something like $(fg)’=f’g+fg’$. To avoid confusion we use $D$ to denote the derivative on distribution and $f’$ to denote the derivative in the ordinary sense. This is pretty hard but for a multiplication of a $C^\infty$ function and a distribution it is not that hard. Suppose $\Lambda \in \mathscr{D}’$ and $f \in C^\infty$. We define their ‘product’ by

We have another distribution and derivative follows in a natural way:

Meanwhile

Things still work in this aspect.

We haven’t verify convergence yet, but that requires much more knowledge on functional analysis, so we don’t do that here but in part 2. Fortunately, things would go in an intuitive way.

Consider the linear functional on $\mathscr{D}$ by

This is bounded and is in fact our rigour definition of Dirac $\delta$ function (Von Neumann can relax then!). It does have the *required property*. Say, if we realise this function as integration (informally) as

then $\delta$ can indeed be considered as a *function* whose support is the origin, and the integral over $\mathbb{R}$ is $1$.

The *derivative* of $\delta$ is well-presented as well. Note $\delta’(\phi)=\delta(\phi’)$, hence we have

So much for part 1. If you don’t have many background in functional analysis, then part 2 is not recommended, as you have no idea what is going on at all. It is not feasible to make part 2 to be readable to more people.

Here we provide some basic facts of test functions and distributions, assuming the reader some background in functional analysis. No proof is delivered because if I do this post can be as long as I want. I hope by organising facts here I can help you realise what is going on before you drown yourself in details of a proof. It is recommended to see the table of content on the right hand side first if you are on PC. You can click the `expand all`

button there.

In brief, test functions are smooth functions with compact support. By the **support** of a function we mean the *closure* of the set $\{x:f(x) \ne 0\}$. Let $K$ be a compact set in $\mathbb{R}$, then $\mathscr{D}_K$ denotes a subspace of $C^\infty$ whose support lies in $K$. Since a closed subset of a compact set itself is compact, we see all functions in $\mathscr{D}_K$ have compact support.

Test function space is defined by

And the distribution space $\mathscr{D}’$ is defined to be the dual space of $\mathscr{D}$, i.e. the space of *continuous* linear functionals of $\mathscr{D}$. But if we don’t know the topology of $\mathscr{D}$, we cannot proceed. *Here is how we attempt to establish the norm.*

Consider the norm for $\phi \in \mathscr{D}$ for all $N=0,1,2,\cdots$ by

This induces a local base

And we get a locally convex metrisable topology on $\mathscr{D}$.

If this topology makes $\mathscr{D}$ a Banach space, then it would be fantastic - a lot of Banach space technique can be used. However, this topology is too *small* to be complete. One simply need to consider this sequence:

where $\phi \in \mathscr{D}_{[0,1]}$ and $\phi>0$ on $(0,1)$. This sequence is Cauchy but the limit has no bounded support hence does not lie in $\mathscr{D}$.

This time we do an *enhancement* on the previous topology, which makes $\mathscr{D}$ a locally convex topological space, which is complete and has the Heine-Borel property (closed and bounded set is compact and vice versa). We still need the topology defined in our first attempt. It is broken into three steps:

- For each compact set $K$, let $\tau_K$ denote the subspace topology of $\mathscr{D}$ defined in attempt 1.
- Let $\beta$ be the collection of all convex balanced set $W \subset \mathscr{D}$ such that $\mathscr{D}_K \cap W \in \tau_K$ for all compact $K$. (A set $W$ is balanced if $\alpha{W} \subset W$ for all $|\alpha| \le 1$.)
- The new topology $\tau$ is defined to be the collection of all unions of sets of the form $\phi + W$ with $\phi \in \mathscr{D}$ and $W \in \beta$.

This is the topology we want, and one can indeed verify that $\tau$ is a topology, with local base $\beta$. This topology has the following properties:

- $\tau$ makes $\mathscr{D}$ a locally convex topological vector space.
- $\mathscr{D}$ has the Heine-Borel property.
- In $\mathscr{D}$, every Cauchy sequence converges.

Locally, **the topology of $\mathscr{D}_K$ is the same as $\tau_K$**. Hence we can still use properties of these norms if we want. In fact, this $\tau_K$ makes $\mathscr{D}_K$ a Fréchet space, i.e. locally compact and complete metric space.

We cannot discuss continuity without topology. But still continuity has to be treated carefully. For example the space $L^p([0,1])$ with $0<p<1$ is weird: the dual space is trivial, due to its topology: the only two open convex sets are empty set and itself. Fortunately we have the following, which is quite intuitive.

Suppose $\Lambda$ is a linear mapping of $\mathscr{D}$ into a locally compact convex space $Y$ (which can be $\mathbb{R}$, $\mathbb{C}$ or $\mathscr{D}$ itself). Then the following are equivalent:

- $\Lambda$ is continuous. (We care about the behaviour of $\mathscr{D}’$)
- $\Lambda$ is bounded. (You must have learnt the equivalence of 1 and 2 already)
- $\phi_i \to 0$ in $\mathscr{D}$ implies $\Lambda\phi_i \to 0$ in $Y$.
- The restriction of $\Lambda$ to every $\mathscr{D}_K$ is continuous.

In particular, it follows that the differential operator $D^n$ is continuous for all $n$. We also have some knowledge of the behaviour of $\mathscr{D}’$ now:

If $\Lambda$ is a linear functional on $\mathscr{D}$, then the following are equivalent:

- $\Lambda \in \mathscr{D}’$.
- To every compact set $K$ there corresponds a nonnegative integer $N$ and a constant $C<\infty$ such that the inequality
holds for every $\mathscr{D}_K$.

Consider the Dirac distribution on $x$ given by

This is indeed a distribution. The case when $x=0$ gives us the Dirac function in physics. Note

$\mathscr{D}_K$ is a **closed subspace** of $\mathscr{D}$. Since $\mathscr{D}_K$ is also nowhere dense, and there is a countable collection of $K_i \subset \mathbb{R}$ (for example $K_i=[-i,i]$) such that $\mathscr{D} = \bigcup \mathscr{D}_i$ (of the first category), and $\mathscr{D}$ itself is complete, by Baire’s Category Theorem, $\mathscr{D}$ is not metrisable. This is a flaw of the topology of $\mathscr{D}$, though is not that troublesome.

We have shown that every $C^\infty$ functions can be considered as a distribution. In general, for a function $f$ one only need to require that $f$ is **locally integrable**, i.e. for every compact set $K$ we have

If we define $\Lambda_f:\phi \mapsto \int f\phi$, we see

In particular, at the very least, all $L^1$ functions can be considered as distributions.

On the other hand, if $\mu$ is a positive measure on $\mathbb{R}$ with $\mu(K)<\infty$ for all compact $K$, then

also defines a distribution.

We know the fundamental theorem of calculus in $L^1$ only hold when the function $f$ is *absolutely continuous*. The Cantor function $f$ is differentiable almost everywhere on $[0,1]$ but

This restriction still makes sense here. Pick $f$ to be a left-continuous function with bounded variation. Then it can be shown that

where $\mu([a,b))=f(b)-f(a)$. Hence $D\Lambda_f=\Lambda_{Df}$ if and only if $f$ is *absolutely continuous*.

We consider the weak*-topology of $\mathscr{D}’$ by

Then fortunately this limit operator commutes with differential operator in a natural way, which may remind you of uniform convergence. In fact,

To prove this one needs Banach-Steinhaus theorem. Here concludes our four requirements of distributions.

Convolution plays an important role in Fourier analysis, and here is how to invite distribution to the party.

Normally for two $L^1$ functions $f,g$ we define

We can create more symbols to make life easier:

- $\tau_xu(y)=u(y-x)$.
- $\check{u}(y)=u(-y)$.

It follows that $\tau_x\check{u}(y)=\check{u}(y-x)=u(x-y)$. Hence

It shows that $g \to (f \ast g)(x)$ is actually a linear functional of $\Lambda_f$, $\tau_x$ and $g \mapsto \check{g}$. But $\Lambda_f$ itself can be a distribution, hence we define convolution for a distribution and a smooth function by

Convolution can be characterised in a natural way. In fact, for any $T:\mathscr{D} \to C^\infty$, if

then there is a unique $L \in \mathscr{D}’$ such that

As you can imagine, this setting creates a lot of potentials for Fourier transform.

- Walter Rudin,
*Functional Analysis*, Second Edition. (Part II of the book) - Peter Lax,
*Functional Analysis*. (Appendix B) - Stanford Encyclopedia of Philosophy Archive (Fall 2018 Edition), Quantum Theory: von Neumann vs. Dirac.

Let us say you are a programmer who has been working in big companies for a decade. How does it feel when you want to help someone who starts studying programming from scratch? You may find it makes no sense that he or she cannot understand that, by copying several lines of code on the book, they has successfully made a programme printing “Hello, world!” on the screen. You know what I am talking about - the curse of knowledge.

When one has successfully learnt some certain skill, they may immediately lose the sense on why other people cannot understand and study. What is the holdup? It becomes increasingly difficult to teach beginners. Blunt simplification does not do the trick all the time.

This is one of the reasons why becoming a good teacher is so hard. Academia superstars may be super awful in teaching, while teaching superstars may have already ceased focusing on academia.

I am not writing this post to be a guru and give some steps on how to lift the curse. In fact I think I am suffering from this as well.

For example, Tien-Yien Li was a famous curse of knowledge lifter. When he did talks, he always tried to start from simple examples (this is adorable of course). When instructing his students, he may ask his students to treat him as a fool, as if he had known nothing. He was indeed a good mathematician and good maths teacher, but I do wonder how practical it is. Can his students do calculus in front of him while assuming he has no idea what is calculus? I have no idea.

Though I am only guessing, I think ‘fool’ is somewhat over-exaggerating. His students were in the similar field as him, hence it would not be too hard to follow his student at all. Of course the way he instruct his students is adorable as well.

There was a reader emailed me, giving me suggestion on, well, I should write my post simpler at some certain points. But I declined his suggestion in the end. Am I doing some Serge Lang thing? I have no idea.

In his 1983 book Fundamentals of Diophantine Geometry, he included L. J. Mordell’s review of Lang’s own book Diophantine Geometry which was ended by

In conclusion, the reader will need no convincing that Lang, as has already been said, is a very learned mathematician, thoroughly familiar with every aspect of the topics he deals with, and their developments. His interesting and valuable historical notes give further evidence of this. Lang assumes that his readers are as knowledgeable as he is, and can grapple with the subject with the same ease that he does. Even if they could, Lang’s style is not such as to make matters easy for them. Lang in writing is not a follower of Gauss, whose motto was “

pauca sed matura.” Further thought and care about his book, before publication, would have been well worth while. Those who can understand the book will be indebted to him for having brought together in one volume the important results contained in it. How much greater thanks would he have earned if the book had been written in such a way that more of it could have been more easily comprehended by a larger class of readers! It is to be hoped that so me one will undertake the task of writing such a book.And he also included his response:

All my books are meant to be understood by readers having the prerequisites for the level at which the books are written.

These prerequisites vary from book to book, depending on the subject matter, my mood, and other aesthetic feelings which I have at the moment of writing.When I write a standard text in Algebra, I attempt something very different from writing a book which for the first time gives a systematic point of view on the relations of Diophantine equations and the advanced contexts of algebraic geometry. The purpose of the latter is to jazz things up as much as possible. The purpose of the former is to educate someone in the first steps which might eventually culminate in his knowing the jazz too, if his tastes allow him that path. And if his tastes don’t, then my blessings to him also. This is known as aesthetic tolerance. But just as a composer of music (be it Bach or the Beatles),I have to take my responsibility as to what I consider to be beautiful, and write my books accordingly, not just with the intent of pleasing one segment of the population. Let pleasure then fall where it may.With best regards, Serge Lang.

*Refer to this reddit post for a discussion.*

I can speak with absolute certainty that my posts are much more detailed than Serge Lang. And Lang never tried to lift the curse. But my posts cannot be readable to everyone. Say my posts on functional analysis, is not prepared for middle school students, unless they are ridiculously exceptional and have studied all prerequisites (linear algebra, real analysis, integration theory, topology) at that time. Though I shall never make my posts as terse as in Lang’s book, it is never my duty to make my posts readable for everyone. So to some extent I fail as well.

If I try to, over-simplification has to be admitted. And it is against my rule. I do not like over-simplification so I try to make sure everything makes sense. But one would not understand unless he or she has certain prerequisites. I may recover some obstacles and show the clues, but that is so much for it. I can only lift the curse with respect to a certain group of people.

It seems I did not give a thoughtful discussion. But I do hope my inbox gives me good chance for discussion instead of chance to spark unnecessity. I did not try to close myself and a good evidence is that many of my posts can be found on the first page of Google search.

]]>Throughout we consider the polynomial ring

This ring has a lot of non-trivial properties which give us a good chance to study commutative ring theory. First of all note it is immediate that

if the map is given by $X \mapsto \cos x$ and $Y \mapsto \sin x$. Besides, in $R$ we have

which is to say that $R$ is not a factorial ring, although $\mathbb{R}[X,Y]$ is.

This blog post is inspired by an exercise on Serge Lang’s *Algebra*. But when writing this blog post, I found some paywalls. It would be absurd of me to direct a random reader to these paywalls. So it is very likely that I will include proofs as many as possible (when there is an absurd paywall, and chances are I will rework them for readability). But I can’t remove the assumption that the reader has finished Atiyah-MacDonald full book or equivalences at the very least. I will add more topics in the future but that is not an easy job.

By Hilbert’s basis theorem, $\mathbb{R}[\cos{x}]$ and therefore $\mathbb{R}[\cos{x},\sin{x}]$ are Noetherian. Now we are interested in the normality of it. Since $\mathbb{R}[X,Y]/(X^2+Y^2-1) \cong \mathbb{R}[X][Y]/(Y^2-(1-X^2))$ and $2$ is a unit, $1-X^2$ is square free but not a unit, we are able to apply the following lemma to show that $R$ is a normal Noetherian ring (integrally closed in its field of fraction). For definition and properties of normal ring, please refer to the stack project.

(Lemma 1)Let $A$ be a factorial ring with field of fraction $K$ in which $2$ is a unit, $a$ in $A$ a square-free element (i.e., if $p$ is a prime element in $A$, then $a \not\in p^2A)$ which is not a unit. Then $A[T]/(T^2-a)$ is normal.

Let $t$ be the image of $T$ in $A[T]/(T^2-a)$ and in $L$. Then it is clear that $A[t] \cong A[T]/(T^2-a)$ and we can write $L=K(t)$. Note an element in $K(t)$ is of degree at most $1$, which is to say every element in $L$ can be written uniquely as a sum $r+st$ where $r,s \in K$. To prove integral closeness, we need to find minimal polynomial of $r+st$.

Next we show when $A[t]$ is integrally closed. Note$$ \begin{aligned} \left[(r+st)-r\right]^2=(st)^2 &= s^2[T^2+(T^2-a)]\\ &= s^2[a+T^2-a+(T^2-a)]\\ &= as^2 \end{aligned}$$Hence $f(X)=(X-r)^2-as^2$ sends $r+st$ to $0$. For polynomial of degree $1$, we can only write $g(X)=X-X$ such that $g(r+st)=0$, which is absurd. Hence $f(X)$ is the minimal polynomial of $r+st$. With these being said, $r+st$ is integral over $A[t]$ if and only if $-2r \in A[t]$ and $r^2-as^2 \in A[t]$. We need to show this implies $r+st \in A[t]$. Since we can consider $A$ to be a subring of $A[t]$, it suffices to show that $r,s \in A$, provided $-2r \in A$ and $r^2-as^2 \in A$ when $s \ne 0$.

Since $2$ is a unit in $A$, $-2r \in A$ clearly implies $r \in A$. It remains to prove that $-as^2 \in A$. For $s \in K$, we can write $s=s_1/s_2$ with $s_1,s_2 \in A$ relatively prime. We shall show that $s_2$ will always be a unit, which implies that $s \in A$. Write $as^2=h$, then we have $as_1^2=hs_2^2$. Assume $s_2$ is not a unit, then there is a prime $p$ divides $s_2$ as $A$ is a factorial ring. hence $as_1^2 = hs_2^2 \in p^2A$. Since $s_1$ and $s_2$ are relatively prime, $p$ and $p^2$ do not divide $s_1$, hence $a \in p^2A$, a contradiction (we have assumed $a$ to be square-free. Also, the assumption that $a$ is not a unit is used here to reach the contradiction). Hence $s_2$ is a unit, $s \in A$ and therefore $-as^2 \in A$. The proof is complete. $\square$

Of course I shan’t be this lazy. It is clear that in the factorial ring $A=\mathbb{R}[X]$, $2$ is a unit. By square-free, we mean, if $p \in A$ is prime, then $a \not \in p^2A$. For example, in $\mathbb{Z}$, $12$ is not square free because $12=2^2 \times 3 \in 2^2\mathbb{Z}$ while $14$ is square-free because $14=2 \times 7$ and square does not appear. And for $1-X^2$ things is clear because we only have $1-X^2=(1-X)(1+X)$ - there is no square. We require $2$ to be a unit because if not this argument becomes much more difficult to prove. We shall return to normality after we study the irreducible elements.

To conclude we have got a satisfying result:

(Proposition 1)$R$ is a normal Noetherian ring.

With help of Fourier transform or elementary trigonometric relations, every polynomial in $R=\mathbb{R}[\cos{x},\sin{x}]$ can be written in the form

where $a_0,a_k,b_k \in \mathbb{R}$. Define the degree $\delta(P)$ to be the maximum of integers $r,s$ where $a_r,b_s \ne 0$. Then a direct computation shows that $\delta(PQ)=\delta(P)+\delta(Q)$.

If $\delta(P)=0$, then $P(x)=a_0$ is zero or a unit. If $\delta(P)=1$, then if we have $P=P_1P_2$, then $\delta(P_1)+\delta(P_2)=1$. One of them has to be unit, hence $P$ is irreducible. If $\delta(P)=2$, then $P$ is reducible because we can solve equations in the expansion of the product

By induction all polynomials of degree $\ge 2$ is reducible. Hence irreducible elements are of the form

But since $R$ is not a UFD, we cannot work on the ideal $(a+b\sin{x}+c\cos{x})$ directly. We need to dive into abstraction for a long time.

We now proceed to another satisfying result.

(Proposition 2)$R$ is a Dedekind domain.

*Proof.* Throughout, we work on the form $R \cong \mathbb{R}[X,Y]/(X^2+Y^2-1)$. Since $\mathbb{R}[X,Y]$ is of Krull dimension $2$ (see Atiyah-MacDonald exercise 11.7, where a solution is almost given), $X^2+Y^2-1$ is irreducible, we have a prime ideal $(X^2+Y^2-1)$, and all prime ideals $P \subset \mathbb{R}[X,Y]$ strictly containing $(X^2+Y^2-1)$ are maximal. Next, let the canonical map $\pi:\mathbb{R}[X,Y] \to \mathbb{R}[X,Y]/(X^2+Y^2-1)$ be given. By proposition 1.1 of Atiyah-MacDonald, $\pi(P)$ are maximal ideals in $\mathbb{R}[X,Y]/(X^2+Y^2-1)$ provided that $P \supsetneq (X^2+Y^2-1)$ is prime. If nontrivial ideal $Q \subset \mathbb{R}[X,Y]/(X^2+Y^2-1)$ is prime, then $\pi^{-1}(Q)=Q^c$ is also prime, and it contains $(X^2+Y^2-1)$ strictly, which implies that $Q$ is maximal. Hence $R$ is of Krull dimension $1$. By proposition 1, $R$ is integrally closed, hence it is Dedekind. $\square$

Let $A$ be an integral domain and $P$ be the set of all prime ideals of height $1$, i.e. the set of all prime ideals that only contain itself as a nonzero prime ideal. Then $A$ is a Krull domain if

(KD1) $A_{\mathfrak{p}}$ is a discrete valuation ring for all $\mathfrak{p} \in P$.

(KD2) $A$ is the intersection of these discrete valuation rings (all considered as subrings of the field of fraction of $A$.

(KD3) Any nonzero element of $A$ is contained in only a finite number of height $1$ prime ideals.

To proceed our study of $R$, we need a lemma:

(Lemma 2)If $A$ is a Dedekind domain, then $A$ is also a Krull domain.

Next we prove (KD3). Pick any nonzero $a \in A$. If $a$ is a unit, then it is contained in $0$ ideals. If not, consider the ring $(a)=aA$. We have a unique factorisation as a product of prime ideals:$$ (a)= \mathfrak{p}_1^{r_1}\cdots\mathfrak{p}_n^{r_n} \subset \bigcap_{j=1}^{n}\mathfrak{p}_j.$$Hence (KD3) is proved.

For (KD2), note first $A \subset \bigcap_{\mathfrak{p}}A_{\mathfrak{p}}$ because the natural map $A \to A_{\mathfrak{p}}$ is injective for all $\mathfrak{p}$. Hence it suffices to prove the reverse. But elements in $A_{\mathfrak{p}}$ are of the form $a/s$. Hence we expect those elements of the form $b/1$ to be in $A$. Therefore it suffices to prove that $b/1 \in (a/1)A_{\mathfrak{p}}$ for all prime $\mathfrak{p}$ implies $b \in aA$ for all $a,b \in A$, $a ,b\ne 0$. Put$$ (a)=\mathfrak{p}_1^{r_1}\cdots\mathfrak{p}_n^{r_n}$$we see $\mathfrak{q}_i = \mathfrak{p}_i^{r_i}$ is $\mathfrak{p}_i$-primary and we obtain a primary decomposition. Note we in particular have$$ b \in \bigcap_{j=1}^{n}\left(aA_{\mathfrak{p}_i} \cap A \right) = \bigcap_{j=1}^{n}\mathfrak{q}_i = aA$$because each $\mathfrak{p}_i$ has height $1$. $\square$

Which is to say that

(Proposition 3)$R$ is a Krull domain.

We know that since $R$ is Dedekind, its fractional ideals form an abelian group. This gives rise to the ideal class group. By a result of Samuel, we have a shockingly simple fact:

(Proposition 4)The ideal class group $Cl(R) \cong \mathbb{Z}/2\mathbb{Z}$.

Which can be considered as a corollary to this following statement:

(Samuel)Let $F$ be a non-degenerate quadratic form in $k[X_1,X_2,X_3]$. Let $A_F=k[X_1,X_2,X_3]/(F)$. Then $Cl(A_F)=\mathbb{Z}/2\mathbb{Z}$ if and only if there is a nontrivial solution to $F(X_1,X_2,X_3)=0$ in $k$.

One can find this result at this link, and refer to **study of plane conics**.

With these being said, by theorem 8 of Zaks’ paper, one sees that $R$ is a HFD domain. To be precise, for polynomials $x_1,x_2,\cdots,x_n$ and $y_1,y_2,\cdots,y_m$, if $x_1x_2\cdots x_n=y_1y_2\cdots y_m$, then $m=n$. I may recover the proof here one day, but it would be much more difficult than writing everything you have seen here. This ring $R$ also shows that HFD is not necessarily UFD.

Since $Cl(R) \cong \mathbb{Z}/2\mathbb{Z}$, for any maximal ideal $M \subset A$, either it is principal or $M^2$ is principal. If $M$ and $M’$ are two non-principal ideal, then $MM’$ is principal. Conversely, for any irreducible $z \in R$, either $(z)$ is maximal or $(z)=MM’$ for some maximal ideal $M$ and $M’$, and $M$ and $M’$ may coincide. We have given the form of irreducible elements

So we are now interested in these $a,b,c$. We will do some high school trick first. If we put

then $z= \sqrt{a^2+b^2}(\sin(x+\alpha)+k)$ where $b’=\cos\alpha$ and $c’ = \sin\alpha$. Since $\sqrt{a^2+b^2} \in \mathbb{R}$ it suffices to study elements of the form $\sin(x+\alpha)+k$.

Define a shift morphism $h:R \to R$ by

This map is clearly an isomorphism. More importantly, since

the primary decomposition of $(\sin(x+\alpha)+k)$ and $(\sin{x}+k)$ are of the same form. We are interested in the ring $R/(\sin{x}+k)$, where it is natural to study the behaviour of $\cos{x}$. For this reason we consider the substitution morphism

We first compute the inverse image $g^{-1}[(\sin{x}+k)]$. It is natural to think about cancelling $\sin x$ into $\cos x$. Note $(\sin x + k)( \sin x - k) = (\sin^2x -k^2) = (1- \cos^2x-k^2)$, pick whichever $P(X) \in (1-k^2-X^2)$, we have

Hence $(1-k^2-X^2)\subset g^{-1}[(\sin{x}+k)]$. For the converse, note that if nonzero $P \in g^{-1}[(\sin x + k)]$, we have $\deg P > 1$ because trigonometric polynomial of the form $a+b\cos x$ can never be divided by $\sin x + k$. By Euclidean algorithm, we find $Q(X)$, $R(X)$ such that

with $\deg R \le 1.$ But when $P \in g^{-1}[(\sin x + k)]$, we must have $R(X)=0$, according to our study of the degree earlier. Hence we must have $P(X) \in (1-k^2-X^2)$, which is to say

This induces an isomorphism

And it is much easier to study the ideal $1-k^2-X^2$. To be precise,

- $k^2=1 \iff (1-k^2-X^2)=(X)^2 \iff (k+\sin x)=M^2$ for some maximal ideal $M$, because $(X)$ is a maximal ideal.
- $k^2<1 \iff (1-k^2-X^2)$ is a product of two distinct maximal ideals $\iff (k+\sin x)$ is a product of two distinct maximal ideals $M$ and $M’$.
- $k^2>1 \iff (1-k^2-X^2)$ is maximal $\iff$ $(k+\sin x)$ is maximal.

Therefore maximal ideals of $R$ are determined by $k$, or more precisely the relation between $c^2$ and $a^2+b^2$. Moreover, let $M$ be a maximal ideal, we have

- If $M$ is principal, then there exists $\alpha$ and $k$ such that

and $R/M \cong \mathbb{C}$.

- If $M$ is not principal, then there exists $\alpha \in \mathbb{R}$ such that

and $R/M \cong \mathbb{R}$.

- Robert M. Fossum,
*The Divisor Class Group of a Krull Domain*. - M. F. Atiyah, FRS & I. G. MacDonald,
*Introduction to Commutative Algebra*. - Macro Fontana, Salah-Eddine Kabbaj, Sylvia Wiegand,
*Commutative Ring Theory and Applications.* - Hideyuki Matsumura,
*Commutative Ring Theory*. - P. Samuel,
*Lectures on Unique Factorization Domains*. - A. Zaks,
*Half Factorial Domains*.

Consider a sequence of real or complex numbers $(s_n)$. If $s_n \to s$, then

Here, $\pi_n$ is called the Cesàro sum of $(s_n)$. The proof is rather simple. Given $\varepsilon>0$, there exists some $N>0$ such that $|s_n-s|<\varepsilon$ for all $n > N$. Therefore we can write

For fixed $N$, we can pick $n$ big enough such that $N/n<1/2$ (i.e. $n>2N$) and

Hence $\pi_n$ converges to $s$. But the converse is not true in general. For example, if we put $s_n=(-1)^n$, then it diverges but $\pi_n \to 0$. If $\pi_n$ converges, we say $(s_n)$ is Cesàro summable.

If we treat $\pi_n$ as an integration with respect to the counting measure, things become interesting. Why don’t we investigate the operator defined to be

In this blog post we investigate this operator in Hilbert space $L^2(0,\infty)$.

Put $L^2=L^2(0,\infty)$ relative to Lebesgue measure, and the Cèsaro operator $C$ is defined as follows:

From the example above, we shouldn’t expect $C$ to be too normal or well-behaved. But fortunately it is at the very least continuous: due to Hardy’s inequality, we have $\lVert C \rVert = 2$. I organised several proofs of this. But $C$ is not compact.

Consider a family of functions $(\varphi_A)_{A>0}$ where

(I owe Oliver Diaz for this family of functions.) It’s not hard to show that $\lVert \varphi_A \rVert = 1$. If we apply $C$ on it we see

Hence $\lVert C\varphi_A \rVert = \frac{\sqrt{1+A^2}}{A}$. Meanwhile for $B>A$, we have

It follows that

If we compute the norm on the right hand side we get

As a result, if we pick $f_n=\varphi_{2^n}$, then for any $m>n$ we get

Therefore, we find a sequence $(f_n)$ on the unit ball such that $(Cf_n)$ has no convergent subsequence.

Also we can find its adjoint operator:

Hence the adjoint is given by

$C^\ast$ is not compact as well. Further, another application of Fubini’s theorem shows that

Hence $I-C$ is an isometry, $C$ is normal.

In this section we study the spectrum of $C$ and $C^\ast$, which will be derived from properties of bilateral shift, which comes from $\ell^2$ space. For convenience we write $\mathbb{N}=\mathbb{Z}_{\geq 0}$. This section can also help you understand the connection between $L^2(0,1)$ and $L^2(0,\infty)$.

An operator $U$ on a Hilbert space $H$ is called a *simple unilateral shift* if $H$ has a orthonormal basis $(e_n)_{n \in \mathbb{N}}$ such that $U(e_n)=e_{n+1}$ for all $n \in \mathbb{N}$. This is nothing but right-shift operator in the sense of basis. Besides, we call $U$ a *unilateral shift of multiplicity $m$* if $U$ is a direct sum of $m$ simple unilateral shifts (note: $m$ can be any cardinal number, finite or infinite).

If we consider the difference between $\mathbb{N}$ and $\mathbb{Z}$, we have the definition of *bilateral shift*. An operator $W$ on $K$ is called a *simple bilateral shift* if $K$ has a orthonormal basis $(e_n)_{n \in \mathbb{Z}}$ such that $We_{n}=e_{n+1}$ for all $n \in \mathbb{Z}$. Besides, if we consider the subspace $H$ which is spanned by $(e_n)_{n \in \mathbb{N}}$, we see $W|_H$ is simply a unilateral shift. Before we begin, we investigate some elementary properties of uni/bilateral shifts.

(Proposition 1)A simple unilateral shift $U$ is an isometry.

*Proof.* Note $(Ue_m,Ue_n)=(e_{m+1},e_{n+1})=\delta_{m+1,n+1}=\delta_{mn}=(e_m,e_n)$. $\square$

(Proposition 2)A simple bilateral shift $W$ is unitary, hence is also an isometry.

*Proof.* Note $(We_m,e_n)=(e_{m+1},e_n)=\delta_{m+1,n}=\delta_{m,n-1}=(e_m,W^{-1}e_n)$, which follows that $W^\ast=W^{-1}$. $\square$

Now let the Hilbert space $K$ and its subspace $H$ (invariant under $W$) be given. Consider the ‘orthonormal’ operator given by $Re_n=e_{-(n+1)}$. It follows that $R$ is a unitary involution and

With these tools, we are ready for the most important theorems.

$W=I-C^\ast$ is a simple bilateral shift on $K=L^2$.

**Step 1 - Obtaining missing subspace, operator and basis**

Here we put $H=L^2(0,1)$, which can be canonically embedded into $L^2(0,\infty)$ in the obvious way (consider all $L^2$ functions vanish outside $(0,1)$). It is natural to put this, as there are many similarities between $L^2(0,1)$ and $L^2(0,\infty)$.

Explicitly,

Also we claim the basis to be generated by $e_0= \chi_{(0,1)}$. First of all we show that $(We_n)_{n \geq 0}$ is orthonormal. Note as we have proved, $W^\ast W = (I-C)(I-C^\ast)=I$. Without loss of generality we assume that $m \geq n$ and therefore

If $m=n$, then $(e_m,e_n)=(e_0,e_0)=1$. Hence it is reduced to prove that $(W^ke_0,e_0)=0$ for all $k>0$. First of all we have

meanwhile

Hence $We_0 \perp e_0$. Suppose now we have $(W^ke_0,e_0)=0$, then

Note $W^ke_0$ always vanishes when $x \geq 1$: when we are doing inner product, $[1,\infty)$ is automatically excluded. With these being said, $(W^ne_0)_{n \geq 0}$ forms a orthonormal set. By The Hausdorff Maximality Theorem, it is contained in a maximal orthonormal set. But since $H=L^2(0,1)$ is separable (if and only if it admits a countable basis) (proof), $(W^ke_0)$ forms a basis of $H$. From now on we write $(e_n)_{n \geq 0}$.

To find the involution $R$, note first $W=I-C^\ast$ is already unitary (also, if it is not unitary, then it cannot be a bilateral shift, we have nothing to prove), whose inverse or adjoint is $W^\ast=I-C$ as we have proved earlier. Hence we have

But we have no idea what $R$ is exactly. We need to find it manually (or we have to guess). First of all it shall be guaranteed that $RH=H^\perp$. Since $H$ contains all $L^2$ functions vanish on $[1,\infty)$, functions in $RH$ should vanish on $(0,1)$. It is natural to put $R(f)(x)=g(x)f\left( \frac{1}{x}\right)$ for the time being. $g$ should be determined by $e_{-1}$. Note $e_0\left(\frac{1}{x}\right)=\chi_{[1,\infty)}$ almost everywhere, we shall put $g(x)=-\frac{1}{x}$. It is then clear that $Re_0=W^{-1}e_0$ and $RH=H^\perp$. For the third condition, we need to show that

Note

**Step 2 - With these, $W$ in step 1 has to be a simple bilateral shift**

This is independent to the spaces chosen. To finish the proof, we need a lemma:

Suppose $K$ is a Hilbert space, $H$ is a subspace and $e_0 \in H$. $W$ is a unitary operator such that $W^ne_0 \in H$ for all $n \geq 0$ and $(e_n=W^ne_0)_{n \geq 0}$ forms a orthonormal basis of $H$. $R$ is a unitary involution on $K$ such that

then $W$ is a simple bilateral shift.

Indeed, objects mentioned in step 1 fit in this lemma. To begin with, we write $e_n=W^ne_0$ for all $n \in \mathbb{Z}$. Then $(e_n)_{n \in \mathbb{Z}}$ is an orthonormal set because for arbitrary $m,n \in \mathbb{Z}$, there is a $j \in \mathbb{Z}$ such that $m+j,n+j \geq 0$. Therefore

Since $(e_0,e_1,\cdots)$ spans $H$, $RH=H^{\perp}$, we see $(Re_0,Re_1,\cdots)$ spans $H^{\perp}$. But

hence $(e_{-1},e_{-2},\cdots)$ spans $H^\perp$. By definition of $W$, it is indeed a bilateral shift. And our proof is done $\square$

- Walter Rudin,
*Functional Analysis*. - Arlen Brown, P. R. Halmos, A. L. Shields,
*Cesàro operators*.

Throughout we consider the Hilbert space $L^2=L^2(\mathbb{R})$, the space of all complex-valued functions with real variable such that $f \in L^2$ if and only if

where $m$ denotes the ordinary Lebesgue measure (in fact it’s legitimate to consider Riemann integral in this context).

For each $t \geq 0$, we assign an bounded linear operator $Q(t)$ such that

This is indeed bounded since we have $\lVert Q(t)f \rVert_2 = \lVert f \rVert_2$ as the Lebesgue measure is translate-invariant. This is a left translation operator with a single step $t$.

The inner product in $L^2$ is defined by

If we apply $Q(t)$ on $f$, we see

where $Q(t)^\ast$ is the adjoint of $Q(t)$, which happens to be a left translation operator with a single step $t$. Clearly we have $Q(t)Q(t)^\ast=Q(t)^\ast Q(t)=I$, which indicates that $Q(t)$ is unitary. Also we can check in a more manual way:

By operator theory, since $Q(t)$ is unitary and bounded, the spectrum of $Q(t)$ lies in the unit circle $S^1$.

Note $Q(0)=I$ and

for all $f \in L^2$, which is to say that $Q(t+u)=Q(t)Q(u)$. Therefore we say ${Q(t)}$ is a *semigroup*. But what’s more important is that it satisfies strong continuity near the origin:

This is not too hard to verify. It suffices to prove that

Note $C_c(\mathbb{R})$ (continuous function with compact support) is dense in $L^2$, and for $f \in C_c(\mathbb{R})$, it follows immediately from properties of continuous functions. Next pick $f \in L^2$. Then for $\varepsilon>0$ there exists some $f_1 \in C_c(\mathbb{R})$ such that $\lVert f-f_1 \rVert_2 < \frac{\varepsilon}{4}$ and $\lVert f_1(s+t)-f_1(s)\rVert_2<\frac{\varepsilon}{2}$ for $t$ small enough. If we put $f_2=f-f_1$ we get

The limit follows as $\varepsilon \to 0$.

Recall that the infinitesimal generator of $Q(t)$ is defined to be

which is inspired by $\frac{d}{dt}e^{tA}=A$ (thanks to von Neumann). Note if $f \in L^2$ is differentiable, then

The infinitesimal generator of $Q(t)$ being differentiation operator is quite intuitive. But we need to clarify it in $L^2$ which is much larger. So what is the domain $D(A)$? We don’t know yet but we can guess. When talking about differentiation in $L^p$ space, it makes sense to extend our differentiation to absolute continuity. Also we need to make sure that $Af \in L^2$, hence we put

For every $x \in D(A)$ and any fixed $t$ we already have

hence $Af=f’$ for every $x \in D(A)$ and it follows that $D(A) \subset D$. In fact, $A$ is the restriction of the differential operator on $D(A)$. Conversely, By Hille-Yosida theorem, we see $1 \in \rho(A)$ and also one can show that $1 \in \rho(\frac{d}{dx})$. Therefore

But we also have

Thus

The fact that $(I-\frac{d}{dx})D=L^2$ can be realised by the equation $f-f’=g$, where the existence of solution can be proved using Fourier transform. Note $\hat{f’}(y)=iy\hat{f}(y)$, with some knowledge of distribution, the result can also be given by

By the Hille-Yosida theorem, the half plane ${z:\Re z>0} \subset \rho(A)$. But we can give a more precise result of it.

Pick any $f \in D(A)$. It is directly verified that

Put $g=(A-\lambda{I})f$ then

Therefore

Conversely, suppose $h(y)=\frac{\hat{g}(y)}{iy-\lambda} \in L^2$, then $\hat{g}(y)=iyh(y)-\lambda{h}(y)$. If we take its Fourier inverse, we see $g \in R(A-\lambda{I})$.

If $g \in L^2$, then clearly $\hat{g} \in L^2$. It remains to discuss $\hat{g}(y)/(iy-\lambda)$. Note $iy$ is on the imaginary axis, hence if $\lambda$ is not purely imaginary, then $\hat{g}(y)/(iy-\lambda) \in L^2$. If $\lambda$ is purely imaginary however, then we may have $\hat{g}(y)/(iy-\lambda)\not\in L^2$. For example, we can take $\hat{g}=\chi_{[s-1,s+1]}$ where $\lambda = is$. Hence if $\lambda$ is purely imaginary, $R(A-{\lambda}I)$ is a proper subspace of $L^2$. Therefore we conclude:

*This is an exercise on W. Rudin’s Functional Analysis. You can find related theorems in Chapter 13.*

Guided by researches in function theory, operator theorists gave the analogue to quasi-analytic classes. Let $A$ be an operator in a Banach space $X$. $A$ is not necessarily bounded hence the domain $D(A)$ is not necessarily to be the whole space. We say $x \in X$ is a $C^\infty$ vector if $x \in \bigcap_{n \geq 1}D(A^n)$. This is quite intuitive if we consider the differential operator. A vector is analytic if the series

has a positive radius of convergence. Finally, we say $x$ is quasi-analytic for $A$ provided that

or equivalently its nondecreasing majorant. Interestingly, if $A$ is symmetric, then $\lVert{A^nx}\rVert$ is log convex.

Based on the density of quasi-analytic vectors, we have an interesting result.

(Theorem)Let $A$ be a symmetric operator in a Hilbert space $\mathscr{H}$. If the set of quasi-analytic vectors spans a dense subset, then $A$ is essentially self-adjoint.

This theorem can be considered as a corollary to the fundamental theorem of quasi-analytic classes, by applying suitable Banach space techniques in lieu.

For a positive sequence ${a_n}$, we see it is the moment of a positive measure $\mu$, i.e. $a_n = \int_\mathbb{R}t^n d\mu(t)$ if and only if it is positively definite (proof). But the uniqueness is not guaranteed. Here we have a sufficient condition for this - using the concept of quasi-analytic vector. This is a old theorem (1922) but we are using operator theory to prove it which appeared decades later.

(Carleman’s condition)Suppose ${a_n}$ is the moment sequence of a positive measure $\mu$ on $\mathbb{R}$, then $\mu$ is uniquely determined provided that $\sum a_{2n}^{-1/2n}=\infty$.

**Proof.** Consider the Hilbert space

and the operator

It is clear that $A$ is self-adjoint. We shall work on the constant function $u(t) \equiv 1 \in \mathscr{H}$. Since $A^nu = t^n$, we see $u \in C^\infty$, otherwise $a_n$ is not defined. On the other hand, we have

But $a_{2n}^{-1/2n}=\lVert A^n u \rVert^{-1/n}$ and as a result we see $\sum a_{2n}^{-1/2n}= \sum \lVert A^n u \rVert^{-1/n} = \infty$, hence $u$ is quasi-analytic. In general, $t^n = A^n u$ is quasi-analytic for all $n \geq 0$.

Consider the space of polynomial $\mathcal{P}[t]$ with closure $\mathscr{H}_1$. It follows from the theorem above that $A_1 = A|_{\mathcal{P}[t]}$ is essentially self-adjoint in $\mathscr{H}_1$. Hence $\mathscr{H}_1$ is invariant under the one-parameter group $e^{iAs}$. Pick $y \in \mathcal{P}[t]^{\perp}$, we see

which implies that $y = 0$ a.e. [$\gamma$]. It follows that $\mathscr{H}_1 = \mathscr{H}$ or equivalently $\mathcal{P}[t]$ is dense in $\mathscr{H}$.

Suppose now we have another generating measure $\nu$ of ${a_n}$. With respect to $\nu$, $\mathcal{P}[t]$ is still a dense space. But the norm on $\mathcal{P}[t]$ is fixed by ${a_n}$, hence we obtain an isometry between $\mathcal{P}[t]_\gamma$ and $\mathcal{P}[t]_\nu$, which extends to the isometry between $L^2(\mathbb{R},\gamma)$ and $L^2(\mathbb{R},\nu)$ which forces $\gamma$ and $\nu$ to be equal. $\blacksquare$

There are a lot of nice properties of analytic functions, whose class is denoted by $C^\omega$. Formally we have the following definition:

If $f \in C^\omega$ and $x_0 \in \mathbb{R}$, one can write

Obviously $f \in C^\infty$ (and hence $C^\omega \subset C^\infty$) and alternatively we have the Taylor series converges to $f$ for any $x_0 \in \mathbb{R}$:

One interesting thing is, every $f \in C^\omega$ is uniquely determined by a sequence $D^0f(x_0), Df(x_0),D^2f(x_0),\cdots$.

Unfortunately, this property is not generally true on $C^\infty$. For example, we can consider the bump function $\varphi$ (a simple example can be found on wikipedia). In brief, $\varphi=0$ for all $x \in (-\infty,-1] \cup [1,+\infty)$ but $\varphi>0$ on $(-1,1)$. And more importantly, $\varphi \in C^\infty$. However, if we take $f = \varphi$ and $g = 2\varphi$, then $f \neq g$, but $D^nf(-2)=D^ng(-2)=0$ for all $n \geq 0$. We get a sequence of derivatives of different orders, but this sequence does not determine a unique $C^\infty$ function.

The term “uniquely determined” can also be described in an alternative way: If $f \in C^\omega$ and $D^k(x_0)=0$ for all $k \geq 0$, then $f=0$ everywhere.

So a question comes up naturally: how many functions can be determined by its derivatives of all orders? Does $C^\omega$ contain all we can get? If not, how can we describe them?

The class of analytics functions is our source of motivation, so it makes sense to dig into its properties to find more. For an analytic function it is natural to consider the restriction of a holomorphic function on the complex plane. Let $\Omega$ be the set of all $z=x+iy$ such that $|y| < \delta$ and suppose $f \in H(\Omega)$ and $|f(z)|<\beta$ for all $z \in \Omega$. By Cauchy’s Estimate, we get

Also the restriction of $f$ on $\mathbb{R}$ is real-analytic. Here comes the interesting part: $\beta$ and $\frac{1}{\delta}$ is determined only by $f$ and have nothing to do with $n$, meanwhile $n!$ is a special sequence that dominated $f$ to some extent.

This motivates us to define a special class of functions, which is called the class $C\{M_n\}$.

Let $\{M_n\}$ be a sequence of positive numbers, we let $C\{M_n\}$ denote the class of all $f \in C^\infty$ such that

where $\lVert \cdot \rVert_\infty$ is the supremum norm defined on $\mathbb{R}$, and $\beta_f,B_f$ are constants only determined by $f$ but not $n$.

In order to equip $C\{M_n\}$ with some satisfying algebraic structures, which can simplify our work, we need some restrictions.

Indeed, $B_f$ plays an much more important rule, since we have

while $\beta_f$ was eliminated to $1$ in this limit. However, if we eliminate $\beta_f$ at the beginning, i.e. put $\beta_f = 1$ for all $f \in C\{M_n\}$, then when $n=0$, we have

which prevents $C\{M_n\}$ to be a vector space. For example, if $\lVert f \rVert_\infty = M_0$, then $\lVert 2f \rVert_\infty = 2M_0 > M_0$, hence $2f \not\in C\{M_n\}$. However, if we add $\beta_f$ no matter what, say $\lVert f \rVert_\infty \leq \beta_f M_0$, then whenever we do addition and scalar multiplication, there is a different constant with respect to the function, which makes sure that $C\{M_n\}$ is closed under addition and scalar multiplication, i.e. is a vector space. If we don’t add such a constant, our class contains way too few functions.

Further, we have some restriction on the sequence $\{M_n\}$:

- $M_0=1$.
- $M_n^2 \leq M_{n-1}M_{n+1}$ ($\{\log M_n\}$ is a convex sequence).

As we will see soon, this makes $C\{M_n\}$ an algebra over $\mathbb{R}$, where multiplication is defined pointwise.

*Proof.* If $f,g \in C\{M_n\}$, then we need to show that $fg \in C\{M_n\}$. We have the product rule for differentiation:

Since $f,g \in C\{M_n\}$, we have

Of course we want to eliminate $M_jM_{n-j}$ to obtain a binomial expansion. To do this we need the convexity of the sequence $\{\log M_n\}$. Note $M_n^2 \leq M_{n-1}M_{n+1}$ implies

As a result, the line segment connecting $(n,\log M_n)$ and $(n-1,\log M_{n-1})$ is steeper and steeper as $n$ grows. By connecting these points, we actually gets a convex function but we will be more rigorous. For $0 < j < n$, we have

Hence $M_n \geq M_jM_{n-j}$ for $0<j<n$. It also hold when $j=0$ or $j=n$, hence we get

Hence $fg \in C\{M_n\}$. The reason why $C\{M_n\}$ is a vector space has been stated already. $\square$

This restriction does not hurt the generality. In fact whenever we are given a positive sequence $\{M_n\}$, we have another sequence $\{M’_n\}$ satisfying the two restrictions such that $C\{M_n\}=C\{M’_n\}$.

A class $C\{M_n\}$ is said to be quasi-analytic if the condition

for all $n \in \mathbb{N}$ implies that $f = 0$ for all $x \in \mathbb{R}$.

The reason we try to check whether it’s equal to $0$ everywhere, instead of check whether it is ‘uniquely determined’ by a sequence of derivative of different order is, this one is much simpler to work with. If a sequence of derivative of different order determines two functions, then their difference is always $0$.

We have seen that $C\{n!\}$ contains all functions which is a restriction of a holomorphic function in the strip defined by $|\Im(z)|<\delta$. Conversely, we show that any function in $C\{n!\}$ defined on the real axis can be extended to a holomorphic function with the same property. As a result, $C\{n!\}$ is a quasi-analytics class (which contains all bounded function of $C^\omega$). If we only consider functions defined on a closed and bounded interval $[a,b]$, then $C\{n!\}$ is exactly $C^\omega$.

Suppose $f \in C\{n!\}$. First of all we have

for $n \in \mathbb{N}$. By Taylor’s formulae

The remainder is therefore dominated by

If $|B(x-a)|<1$, then $\lim_{n \to \infty}|B(x-a)|^n = 0$, and we can safely write the expansion

Pick $0<\delta<\frac{1}{B}$, we can replace $x$ in the expansion above with $z$ such that $|z-a|<\delta$. This defines a holomorphic function $F_a$ on $D(a,\delta)$ (the open disk centred at $a$ with radius $\delta$). If $x \in D(a,\delta)$ is real, then $F_a(x)=f(x)$. Therefore $F_a$ is the analytic continuation of $f$; all $F_a$ form a holomorphic extension $F$ of $f$ in the strip $|\Im(z)|<\delta$. As a result, for $z = a+iy$ with $|y|<\delta$, we have

Hence $F$ is bounded in such a region.

In general, if $M_n \to \infty$ way too fast (at least faster than $n!$) as $n \to \infty$, then $C\{M_n\}$ is quasi-analytic. There are several equivalent statements on whether $C\{M_n\}$ is a quasi-analytic class, which is given by the Denjoy-Carleman theorem. Here I collect all conditions that I have found:

(Denjoy-Carleman theorem)The following conditions are equivalent:

- $C\{M_n\}$ is not quasi-analytic.
- $\int_0^\infty \log Q(x)\frac{dx}{1+x^2}<\infty$, where $Q(x)=\sum_{n=0}^{\infty}\frac{x^n}{M_n}$.
- $\int_0^\infty \log q(x) \frac{dx}{1+x^2}<\infty$, where $q(x) = \sup \frac{x^n}{M_n}$.
- $\sum_{n=1}^{\infty}\left(\frac{1}{M_n}\right)^{1/n}<\infty$.
- $\sum_{n=1}^{\infty}\frac{M_{n-1}}{M_n}<\infty$
- $C\{M_n\}$ contains nontrivial function with compact support.
- $\sum_{n=1}^{\infty}\frac{1}{\lambda_n}<\infty$ where $\lambda_n = \inf_{k \geq n}M_k^{\frac{1}{k}}$.

You may find condition 7 is ridiculous. In fact, in this condition $\{M_n\}$ is not required to satisfy the two restriction. This one is what Denjoy and Carleman found initially. Later, mathematicians find that for a sequence $\{M_n\}$ we can obtain its convex minorant $\{M_n’\} $ such that

- $M_n \geq M_n’$ for all $n$.
- $\{\log M_n’\}$ is convex.
- There is a sequence $0=n_0<n_1<\cdots$ such that $M_{n_0} = M’_{n_0}$ and $\log M_k$ is linear for $n_i \leq k \leq n_{i+1}$.

And as you may guess, the convex minorant $\{M_n’\}$ is what we are using today.

The proof of the Denjoy-Carleman theorem will come out in my next blog post. There are quite a lot of work to do to finish the proof, and it cannot be done within hours. We will be using many complex analysis theories. Also, I will try to cover some extra properties of quasi-analytic classes as well as why convex minorant is sufficient.

]]>This post is still on progress, neither is it finished nor polished properly. For the coming days there will be new contents, untill this line is deleted. What I’m planning to add at this moment:

- Transpose is not just about changing indices of its components.
- Norm and topology in vector spaces
- Representing groups using matrices

Since the background of the reader varies a lot, I will try to organise contents depending on topic and required background. For the following section, you are assumed to be familiar with basic abstract algebra terminologies, for example, group, ring, fields.

When learning linear algebra, we were always thinking about real or complex vectors, matrices. This makes sense because $\mathbb{R}$ and $\mathbb{C}$ are the closest number **fields** to our real life. But we should not have the stereotype that linear algebra is all about real and complex spaces, or properties of $\mathbb{R}^n$ and $\mathbb{C}^n$. Never has there been such an restriction. In fact, $\mathbb{R}$ and $\mathbb{C}$ can be replaced with any field $\mathbb{F}$, and there are vast differences depending on the properties of $\mathbb{F}$.

There are already some differences about linear algebra over $\mathbb{R}$ and $\mathbb{C}$. Since $\mathbb{C}$ is algebraically closed, that is, all polynomials of order $n \geq 1$ have $n$ roots, dealing with eigen functions has been much ‘safer’. Besides, for example, we can diagnoalise the matrix

in $\mathbb{C}$ but not in $\mathbb{R}$.

When $\mathbb{F}$ above is finite, there are a lot more interesting things. It’s not just saying, $\mathbb{F}$ is a field, and is finite. For example, if $\mathbb{F}=\mathbb{R}$, we have

There shouldn’t be any problem. However, on the other hand, if $\mathbb{F}=\mathbb{Z}_5$, we have

In application, when working on applied algebra, it’s quite often to meet finite fields. What if we want to solve linear equation over a finite field? That’s when linear algebra over finite fields comes in. Realise this before it’s late! By the way, we are working on rings in lieu of fields, we find ourselves in module theory.

The set of all invertible $n \times n$ matrices forms a multiplicative group (and you should have no problem verifying this). The notation won’t go further than $GL(n)$, $GL(n,\mathbb{F})$, $GL_n(\mathbb{F})$ or simply $GL_n$. The set of all orthomormal matrices, which is also a multiplicative group and written as $O(n)$, is obviously subgroup of $GL(n)$ since for all $A \in O(n)$, we have $\det{A} = \pm 1 \neq 0$ all the time. $O(n)$ contains $SO(n)$ as a subgroup, whose elements have determinant $1$. One should not mess up with $SO(n)$ and $SL(n)$ which is the group of all matrices of determinant $1$. In fact $SO(n)$ is a proper subset of $SL(n)$ and $SL(n) \cap O(n) = SO(n)$. In general we have

Now we consider a more detailed group structure between $GL(n)$ and $O(n)$. I met the following problem on a differential topology book and was about fibre and structure group. But for now it’s simply a linear algebra problem. The crux is finding the ‘square root’ of a positive defined matrix.

There is a direct product decomposition

This decomposition is pretty intuitive. For example if a matrix $A \subset GL(n,\mathbb{R})$ has determinant $a$, we may be looking for a positive definite matrix of determinant $|a|$, and another matrix of determinant $\frac{a}{|a|}$, which is expected to be orthonormal as well. We can consider $O(n)$ as a rotation of basis (change the direction), and the positive definite symmetric matrix as scaling (change the size). Similar result hold if we change the order of multipication. It worth mentioning that by direct product we mean it’s up to the order of eigenvalues.

**Proof.** For any invertible matrix $A$, we see $AA^T$ is positive definite and symmetric. Therefore there exists some $P \in O(n)$ such that

We assume that $\lambda_1\leq \lambda_2 \leq \cdots \leq \lambda_n$ to preserve uniqueness. Note $\lambda_k>0$ for all $1 \leq k \leq n$ since $AA^T$ is positive definite. We write $\Lambda=\operatorname{diag}(\sqrt\lambda_1,\sqrt\lambda_2,\cdots,\sqrt\lambda_n)$ which gives

Define the square root $B=\sqrt{AA^T}=\sqrt{A^TA}$ by

Then $B^2=P\Lambda P^T P \Lambda P^T = AA^T$. Note $B$ is also a positive definite symmetric matrix and is unique for given $A$. Let $v_1,v_2,\cdots,v_n$ be the orthonormal and linear independent eigenvectors of $B$ with respect to $\sqrt\lambda_1, \sqrt\lambda_2, \cdots, \sqrt\lambda_n$. We first take a look at the following basis:

Note

So if the value above is $1$ if $i = j$ and $0$ if $i \neq j$. ${e_1,e_2,\cdots,e_n}$ is a basis since $A$ is invertible, and later we know it is orthonormal.

Then we take

We see

since both ${e_1,e_2,\cdots,e_n}$ and ${v_1,v_2,\cdots,v_n}$ are orthonormal. On the other hand, we need to prove that $A=UB$. First of all,

(Note we used the fact that ${v_k}$ are orthonormal.) This yields

Therefore $A=UB$ holds on a set of basis, therefore holds on $\mathbb{R}^n$. This gives the desired conclusion. For any invertible $n \times n$ matrix $A$ we have a unique decomposition

where $U \in O(n)$ and $B$ is a positive definitive symmetric matrix. $\square$

Basis of a vector space is not coming from nowhere. The statement that all vector spaces have a basis is derived from axiom of choice and the fact that all non-zero elements in a field is invertible. I have written an article proving this already, see here (this is relatively advanced). On the other hand, since elements of a ring are not necessarily invertible, modules over a ring are not equipped with basis in general.

It is also worth mentioning that, a vector space of finite dimension is not necessarily of finite dimension. Infinite dimensional vector space is not some fancy thing. It’s quite simple: the set of basis is not finite. It can be countable or uncountable. And there is a pretty straightforward example: the set of all continuous functions $f:\mathbb{R} \to \mathbb{R}$.

One of the most important concepts developed in 20th century is, when studying a set, one can study functions defined on it. For example, let’s consider $[0,1]$ and $(0,1)$. If we consider the set of all continuous functions on $[0,1]$, which is written as $C([0,1])$, we see everything is fine. It’s fine to define norm on it, to define distance on it, and the norm and distance are complete. However, things are messy on $C((0,1))$. Defining a norm on it results in abnormal behaviour. If you are interested you can check here.

Now let’s consider the unit circle $S^1$ on the plane. The real continuous functions defined on $S^1$ can be considered as periodic functions defined on $\mathbb{R}$. So we may have a lot to do with it. If we are interested in the torus (the picture below is from wikipedia),

which is homeomorphic to $S^1 \times S^1$, how can we study the functions on it? We may consider $C(S^1) \times C(S^1)$, but as we will show later, there are some problems about that. Anyways, it makes sense to define ‘product’ from two vector spaces, which can ‘expand’ it.

Let’s review direct sum and direct product first. For the direct product of $A$ and $B$, we ask for a algebraic structure on the Cartesian product $A \times B$. For example, $(a,b)+(a’,b’)=(a+a’,b+b’)$. That is, the operation is defined componentwise. This works fine for groups since for each group there is only one binary operation. But at this point we don’t care about scalar multiplication.

There are two types of direct sum, inner and outer. For a vector space $V$ over a field $\mathbb{F}$, we consider two (or even more) subspaces $W$ and $W’$. We have a ‘bigger’ subspace generated by adding $W$ and $W’$ together, namely $W+W’$, which contains all elements of the form $w+w’$ where $w \in W$ and $w’ \in W’$. The representation is not guaranteed to be unique. That is, for $z=w+w’$, we may have $w_1 \in W$ and $w_1’ \in W’$ such that $z=w_1+w_1’$ but $w \neq w_1’$. This would be weird. Fortunately, the representation is unique if and only if $W \cap W’$ is trivial. In this case we say the sum of $W$ and $W’$ is direct, and write $W \bigoplus W’$. This is inner direct sum.

Can we represent the direct sum using an ordered pair? Of course we can. Elements in $W \bigoplus W’$ can be written in the form $(w,w’) \in W \times W’$, and the addition is defined componentwise. That is, $(w,w’)+(w_1,w_1’)=(w+w_1,w’+w_1’)$ (which is in fact $(w+w’)+(w_1+w_1’)=(w+w_1)+(w’+w_1’)$). It seems that we don’t go further than direct product. However we need to consider the scalar product. For $\alpha \in \mathbb{F}$, we have $\alpha(w,w’) = (\alpha{w},\alpha{w’})$ this is because $\alpha(w+w’)=\alpha{w}+\alpha{w’}$. We call this **inner** direct sum because $W$ and $W’$ are *inside* $V$. One may ask, since $w+w’=w’+w$, why the pair is ordered? For $w+w’$ we have the first one to be an element of $W$ and the second one to be $W’$ but for $w’+w$ we can’t.

Outer direct sum is different. To define this one considers two *arbitrary* vector spaces $W$ and $V$ over $\mathbb{F}$. It is not guaranteed that $W$ and $V$ are both subspaces of a bigger vector space. For example it’s legit to take $W$ to be $\mathbb{R}$ over itself and $V$ to be all real functions. $W \bigoplus V$ is defined to be the set of all ordered pairs $(w,v)$ with $w \in W$ and $v \in V$. The addition is defined componentwise, and scalar multiplication is defined to be $\alpha(w,v)=(\alpha{w},\alpha{v})$. One may also write $w+v$ if context is clear.

When the number of vector spaces is finite, we don’t distinguish between direct product and direct sum. When the index is infinite, for example when we consider $\prod_{i=1}^{\infty}X_i$ and $\bigoplus_{i=1}^{\infty}X_i$, things are different. To be precise, in the language of category theory, direct product is the *product*, and direct sum is the *coproduct*.

We are not touching the definition but first of all let’s imagine what we have for multiplication. Let $W$ and $V$ be two vector spaces over $\mathbb{F}$ and we use $\cdot$ to be the multiplication for the time being. Law of distribution should hold, that is, we have $w \cdot v + w’ \cdot v = (w+w’) \cdot v$ and $w \cdot v + w \cdot v’ = w \cdot (v+v’)$. On the other hand, scalar multiplication should be operated on a single component, that is, $\alpha(w \cdot v)=(\alpha w) \cdot v = w \cdot (\alpha v)$.

It seems illegal to use $\cdot$ so let’s use ordered pair. Under these laws, we have

It makes sense to call it ‘bilinear’. Fixing one component, we have a linear transform. However, direct product and direct product do not work here at all. If it would work, we have $(w,v)+(w’,v)=(w+w’,v+v)$. This gives rise to the tensor product: we need a legit multiplication works on vector and vector.

We have got the spirit of tensor product. A direct product is not OK. There has to be bilinear operation on itself no matter what. For two vector spaces $V$ and $W$, we write the tensor product by $V \bigotimes W$, for $v \in V$ and $w \in W$, we denote its tensor product by $v \otimes w$, which can be considered as a image or value of a bilinear function $\varphi(\cdot,\cdot):V \times W \to V \bigotimes W$. There are many bilinear map with domain $V \times W$. We ask the tensor product to be the essential one.

The

tensor product$V \bigotimes W$ of $V$ and $W$, is the vector space having the following properties.

There exists the canonical bilinear map $\varphi(\cdot,\cdot):V \times W \to V \otimes W$, and we write $\varphi(v,w) = v \otimes w \in V \bigotimes W$.

For any bilinear map $h(\cdot,\cdot):V \times W \to U$, there exists a unique linear map

such that $\lambda(\varphi(v,w)) = h(v,w)$ for all $(v,w) \in V \times W$. This is called the

universal propertyof $V \bigotimes W$.

It can be easily verified that, if $V$ and $W$ have two tensor products, then they are isomorphic (hint: use the universal property). So all tensor products of $V$ and $W$ are isomorphic, we only need to pick the obvious one (as long as it exists). But we don’t have too much space for it. For further study I recommend the following documents:

- Definition and properties of tensor products. This one involves a considerable amount of explicit calculation and is of elementary approach.
- Tensor products and bases. This one proves the existence in an abstract way.
- Tensor Product as a Universal Object (Category Theory & Module Theory). One of my recent blog posts. The topics here are relatively advanced, and I don’t think it’s a good idea to use the language of category theory at this early point.

Let $\mathbb{F}$ be any field (it can be replaced with a commutative ring if you want to), and $E,F$ be two modules over $\mathbb{F}$. We will have a glance at the definition of dual space and more importantly, we see what is a transpose. In general we study the bilinear form

Sometimes for simplicity we also write $f(x,y)=\langle x,y \rangle$. The set of all bilinear forms of $E \times F$ into $\mathbb{F}$ will be denoted by $L^2(E,F;\mathbb{F})$ and you may have seen it earlier.

We define the **kernel** of $f$ on the left to be $F^\perp$ and on the right to be $E^\perp$. Recall that for $S \subset E$, $S^\perp$ consists all $y$ such that $f(x,y)=0$ whenever $x \in S$; similarly, for $T \subset F$, $T^\perp$ consists all $x$ such that $f(x,y)=0$ whenever $y \in T$. Respectively, we say $f$ is **non-degenerate** on the left/right if the kernel on the left/right is trivial.

One of the simplest example is the case when $E=\mathbb{F}^m$ and $F=\mathbb{F}^n$. We take a $m \times n$ matrix $A$ over $\mathbb{F}$. Define $f(x,y) = x^T A y$. This is a classic bilinear form. Whether it is non-degenerate on the left or on the right depends on the linear independency of row vectors and column vectors. $\def\opn{\operatorname}$

The bilinear form $f$ gives rise to a homomorphism of $E$ to a ‘space of essential arrows’:

given by

$\opn{Hom}_\mathbb{F}(F,\mathbb{F})$ contains all linear maps of $F$ into $\mathbb{F}$. One can imagine $\opn{Hom}_\mathbb{F}(F,\mathbb{F})$ to be a set of ‘arrows’ from $F$ to $\mathbb{F}$.

Now let’s see what we can do in analysis and topology.

Let’s consider all complex polynomials of order $\leq 5$. This is a complex vector space and is in fact isomorphic to $\mathbb{C}^6$ since we have a bijection mapping $a_0+a_1z+a_2z^2+a_3z^3+a_4z^4+a_5z^5$ to $(a_0,a_1,a_2,a_3,a_4,a_5)^T$. Therefore we can simply use matrix and vectors. We represent differentiation via matrices. This is a straightforward work. We pick the natural basis $\{1,z,z^2,z^3,z^4,z^5\}$ to begin with and write the differentiation as $\mathscr{D}$. Since $\def\ms{\mathscr}$

We get a matrix corresponding to $\ms{D}$ by

Next we try to obtain the Jordan normal form of $D$. Since the minimal polynomial of $D$ is merely $m(\lambda)=\lambda^6$, we cannot diagonalise it. After some computation we get

where the matrix $J$ in the square bracket is our Jordan normal form. This makes sense since if we consider the basis $\{1,z,\frac{1}{2}z^2,\frac{1}{6}z^3,\frac{1}{24}z^4,\frac{1}{120}z^5\}$, we see under this basis,

which coincides with $J$.

We already know $\ms{D}^6=0$ but we can also get this by considering $D^6=SJ^6S^{-1}=0$ since $J^6=0$. Further, the format of $S$ should have you realise that we have a hidden $e$, that is

and the basis is in fact first $6$ terms of the expansion of $\exp{z}$.

If this cannot fansinate you I don’t know what can!

Next we consider an example on infinite dimensional vector spaces. Consider $E=C_c^\infty(\mathbb{R})$, the infinite dimensional vectror space of $C^\infty$ functions on $\mathbb{R}$ with compact support, namely, for $f \in C_c^\infty(\mathbb{R})$, we have $f \in C^\infty$ and there exists some $0<K<\infty$ such that $f(x)=0$ outside $[-K,K]$. Next consider the bilinear form $E \times E \to \mathbb{R}$ defined by the following inner product:

Note the differential operator $\ms{D}:E \to E$ is a linear map of $E$ into $E$, so let’s find its transpose $\ms{D}^T$. That is, we need to find the unique linear map $\ms{D}^T:E \to E$ such that

This is a simple application of integration by parts:

Hence the **transpose** of differentiation $\ms{D}$ is $-\ms{D}$. So we can say it’s skew-symmetric for some obvious reason. But the matrix of $\ms{D}$ in $n$-polynomial space is not.

(Perron’s theorem)Let $A$ be a $n \times n$ matrix having all components $a_{ij}>0$, then it must have a positive eigenvalue $\lambda_0$, and a unique corresponding positive eigenvector, i.e., $x=(x_1,x_2,\cdots,x_n)^T$ such that $x_i>0$ for all $i = 1,2,\cdots,n$.

In fact, the positive eigenvalue is the spectral radius of $A$, which is often written as $\rho(A)$. I recommend reading the following documents:

- A short proof of Perron’s theorem. This mentioned more algebraic properties of $\rho(A)$.
- The Perron-Frobenius Theorem. This paper mentioned some real life application (modelling growth of a population) and has some exercises to work on.
- Proof of the Frobenius-Perron Theorem. This paper is more elementary-focused.

But here we are using Brouwer’s fixed point theorem (you may find an elementary proof on project Euclid). In the following proof, we write $D_n$ to denote $n$-disk and $\Delta^n$ to denote $n$-simplex. That is,

Note $D_n$ is homeomorphic to $\Delta^n$. Further we have a lemma:

(Lemma)If $f:X \to X$ is a continuous function and $X$ is homeomorphic to $D_n$, then $f$ has a fixed point as well.

**Proof of the lemma.** Let $\varphi$ be the homeomorphism from $X$ to $D_n$. Then $\varphi \circ f \circ \varphi^{-1}:D_n \to D_n$ has a fixed point, according to Brouwer’s fixed point theorem, suppose we have

Then

and hence $\varphi^{-1}(y) \in X$ is our fixed point. $\square$

Now we are ready to prove Perron’s theorem using Brouwer’s fixed point theorem.

**Proof of Perron’s theorem.** Define $\sigma(x)=\sum_{i=1}^{n}x_i$ where $x = (x_1,x_2,\cdots,x_n)^T$, we see since it’s linear, it’s continuous (it’s not generally true for infinite dimensional spaces, but it’s safe now, and you can see this question on mathstackexchange for a proof). Similarly $A$ is continuous as well. Also, by definition, $x \in \Delta^{n-1}$ if and only if $\sigma(x)=1$. We see Define a function $g:\Delta^{n-1} \to \Delta^{n-1}$ by

We will show that this function is well-defined. Since $x \in \Delta^{n-1}$, not all components of $x$ are equal to $0$ since if so, we get $x_1+x_2+\cdots+x=0$, contradicting the assumption that $x \in \Delta^{n-1}$. Note we can write down $Ax$ explicitly (this is an elementary linear algebra thing):

Since $A$ has all components greater than $0$, we see all components of $Ax$ are greater than $0$ as well. Hence $\sigma(Ax)>0$. On the other hand, $g(x) \in \Delta^{n-1}$ since $\sigma(g(x))=\frac{\sigma(Ax)}{\sigma(Ax)}=1$. Since $A$, $\sigma$, $y=\frac{1}{x}$ are continuous, being a composition of continuous functions, $g$ is continuous.

However, since $\Delta^{n-1}$ is homeomorphic to $D_{n-1}$, $g$ has a fixed point according to the lemma. Hence there exists some $y \in \Delta^{n-1}$ such that

But as we have already proved, $\lambda_0=\sigma(Ay)$ is continuous. On the other hand, all components of $y$ are positive since all components of $Ay$ are positive. The proof is completed. $\square$

You are assumed to be familiar with multivariable calculus when reading this subsection since we are discussing it right now. But in general this section is much beyond elementary linear algebra. First of all we are presenting the *ultimate* abstract extension of the usual gradient, curl, and divergence. We simply consider the $C^\infty$ functions $\mathbb{R}^3 \to \mathbb{R}^3$. When working on gradient, we consider something like $\def\pf[#1]{\frac{\partial f}{\partial #1}}$

When working on curl, we consider

Finally for divergence we consider

They were connected by Green’s theorem, Gauss’s theorem, Stokes’ theorem. But are they abruptly connected for no reason but numerical equality? Fortunately, no. Let’s see why.

First of all for convenience we write $(x_1,x_2,x_3)$ instead of $(x,y,z)$. Define $dx_idx_j=-dx_jdx_i$ for all $i,j = 1,2,3$. Note this implies that $dx_idx_i=0$. For $d$ we have the definition as follows:

- If $f$ is a $C^\infty$ function, then $df = \sum_{i=1}^{3}\pf[x_i]dx_i$.
- If $\omega$ is of the
*form*$\sum f_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}$, then $d\omega=\sum df_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}$.

Then gradient, curl and divergence follows in the nature of things. You can verify that the second one is actually equal to $d(f_1dx+f_2dy+f_3dz)$ and the third one is equal to $d(f_1dydz-f_2dxdz+f_3dxdy)$. We call $d$ the exterior differentiation.

Linear algebra is not just for $\mathbb{R}^3$ space, so is exterior differentiation. Let $\Omega^\ast$ be the algebra over $\mathbb{R}$ (for algebra over a field, see this), generated by $dx_1,\dots,dx_n$ with the multiplication defined by an **anti-commutative** multiplication $dx_idx_j=-dx_jdx_i$ for all $i,j=1,2,\cdots,n$. As a vector space over $\mathbb{R}$, $\Omega^\ast$ is of dimension $2^n$ with a basis

where $i<j<k$. Let $C^\infty$ itself be the vector space of $C^\infty$ functions on $\mathbb{R}$, and we define the $C^\infty$ differential *forms* on $\mathbb{R}^n$ by

For simplicity we omit the tensor product symbol $\otimes$. As a result, for any $\omega \in \Omega^\ast(\mathbb{R})$, we have $\omega$ to be a simple $C^\infty$ function (why don’t we call it a $0$-form? ) or we have $\omega = \sum f_{i_1\cdots i_q}dx_{i_1}\dots dx_{i_q}$, and we call it a $q$-form since the maximal degree of $dx_j$ is $q$. Also we can define $\Omega^q(\mathbb{R}^n)$ to be the vector space of $q$-forms. Consider the differential defined $d$ defined by

- If $f$ is a $C^\infty$ function, then $df = \sum_{i=1}^{n}\pf[x_i]dx_i$.
- If $\omega$ is of the
*form*$\sum f_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}$, then $d\omega=\sum df_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}$.

This is what we call the exterior differentiation. It’s the ultimate abstract extension of gradient, curl and divergence. Your calculus teacher may have warned you, that you cannot deal with $dx$ independently. So is it safe to work like this? Yes, there is nothing to worry about. We are doing abstraction algebraically.

There are so many concepts can be understood in a linear algebra way. For example we also have

In fact Green’s theorem, Gauss’ theorem and Stokes’ theorem have a ultimate abstract extension as well, which is called the general Stokes’ theorem:

If $\omega$ is an $(n-1)$-form with compact support on an oriented manifold $M$ of dimension $n$ and if $\partial M$ is given the induced orientation, then

We are not diving into this theorem but we will conclude this subsection by a glimpse on integration. Recall that the Riemann integral of a differentiable function $f:\mathbb{R}^n \to \mathbb{R}$ can be written as

Here we add the absolute value function to $dx_1 \dots dx_n$ is to emphasise the distinction between the Riemann integral of a function and the integral of differential form, since order only matters in the latter case. For the latter case, if $\pi$ is a permutation of $1,2,\cdots,n$ or we simply say $\pi \in S_n$, then

This definition is natural and obvious. Since $\operatorname{sgn} \pi$ is equal to the determinant of the matrix representing $\pi$ (see here), it’s natural to consider the determinant. Consider the function

Then $J(\Pi)=\operatorname{sgn}\pi$. This is quite similar to what we expect from Jacobian determinant in general, which describes change-of-variable essentially. Let $x_1,x_2,\cdots,x_n$ be a basis of $\mathbb{R}^n$ and $T:\mathbb{R}^n \to \mathbb{R}^n$ be a diffeomorphism. We have a new basis $y_1,y_2,\cdots,y_n$ given by

where $\pi_i:(a_1,a_2,\cdots,a_n) \mapsto a_i$ is the $i$th projection. Namely

written in column vectors. We now show that

First we recall that $J(T)$ is the determinant of $(\partial T_i / \partial x_j)$, and the determinant of a matrix $(a_{ij})$ is defined by

where $\epsilon(\sigma)$ is actually $\operatorname{sgn}\sigma$ and $\sigma$ ranges through all permutation of $1,2,\cdots,n$. We need something to coincide. First of all, we compute $dy_i$. Note

Hence

We get, as a result,

After cancelling out so many zeros, we get $J(T)$. You don’t have to expand the identity. Pick a component $\frac{\partial T_1}{\partial x_{j_1}}dx_{j_1}$ from $dy_1$. Then when we pick another component from $dy_2$ to get it multiplied with the first one, say $\frac{\partial T_2}{\partial x_{j_2}}dx_{j_2}$, then we must have $j_1 \neq j_2$ since if not, then $dx_{j_1}dx_{j_2}=0$, and we cancel that. The rule remains the same (but even stricter) when we pick components from $dy_3$, $dy_4$, and until $dy_n$. In the end, $j_1,j_2,\cdots,j_n$ are pairwise unequal. This corresponds exactly a permutation of $1,2,\cdots,n$. Hence we get

On the other hand, $dx_{\sigma(1)}dx_{\sigma(2)}\cdots dx_{\sigma(n)}=\epsilon(\sigma)dx_1dx_2\cdots dx_n$, and if we put this inside the expansion of $dy_1dy_2\cdots dy_n$, we get

We answered a calculus question in an algebraic way (and more than that if you review more related concepts in calculus).

]]>There are several ways to define Dedekind domain since there are several equivalent statements of it. We will start from the one based on ring of fractions. As a friendly reminder, $\mb{Z}$ or any principal integral domain is already a Dedekind domain. In fact Dedekind domain may be viewed as a generalization of principal integral domain.

Let $\mfk{o}$ be an integral domain (a.k.a. entire ring), and $K$ be its quotient field. A **Dedekind domain** is an integral domain $\mfk{o}$ such that the fractional ideals form a group under multiplication. Let’s have a breakdown. By a **fractional ideal** $\mfk{a}$ we mean a nontrivial additive subgroup of $K$ such that

- $\mfk{o}\mfk{a}=\mfk{a}$,
- there exists some nonzero element $c \in \mfk{o}$ such that $c\mfk{a} \subset \mfk{o}$.

What does the group look like? As you may guess, the unit element is $\mfk{o}$. For a fractional ideal $\mfk{a}$, we have the inverse to be another fractional ideal $\mfk{b}$ such that $\mfk{ab}=\mfk{ba}=\mfk{o}$. Note we regard $\mfk{o}$ as a subring of $K$. For $a \in \mfk{o}$, we treat it as $a/1 \in K$. This makes sense because the map $i:a \mapsto a/1$ is injective. For the existence of $c$, you may consider it as a restriction that the ‘denominator’ is *bounded*. Alternatively, we say that fractional ideal of $K$ is a finitely generated $\mfk{o}$-submodule of $K$. But in this post it is not assumed that you have learned module theory.

Let’s take $\mb{Z}$ as an example. The quotient field of $\mb{Z}$ is $\mb{Q}$. We have a fractional ideal $P$ where all elements are of the type $\frac{np}{2}$ with $p$ prime and $n \in \mb{Z}$. Then indeed we have $\mb{Z}P=P$. On the other hand, take $2 \in \mb{Z}$, we have $2P \subset \mb{Z}$. For its inverse we can take a fractional ideal $Q$ where all elements are of the type $\frac{2n}{p}$. As proved in algebraic number theory, the ring of algebraic integers in a number field is a Dedekind domain.

Before we go on we need to clarify the definition of ideal multiplication. Let $\mfk{a}$ and $\mfk{b}$ be two ideals, we define $\mfk{ab}$ to be the set of all sums

where $x_i \in \mfk{a}$ and $y_i \in \mfk{b}$. Here the number $n$ means finite but is not fixed. Alternatively we cay say $\mfk{ab}$ contains all finite sum of products of $\mfk{a}$ and $\mfk{b}$.

(Proposition 1)A Dedekind domain $\mfk{o}$ is Noetherian.

By Noetherian ring we mean that every ideal in a ring is finitely generated. Precisely, we will prove that for every ideal $\mfk{a} \subset \mfk{o}$ there are $a_1,a_2,\cdots,a_n \in \mfk{a}$ such that, for every $r \in \mfk{a}$, we have an expression

Also note that any ideal $\mfk{a} \subset \mfk{o}$ can be viewed as a fractional ideal.

**Proof.** Since $\mfk{a}$ is an ideal of $\mfk{o}$, let $K$ be the quotient field of $\mfk{o}$, we see since $\mfk{oa}=\mfk{a}$, we may also view $\mfk{a}$ as a fractional ideal. Since $\mfk{o}$ is a Dedekind domain, and fractional ideals of $\mfk{a}$ is a group, there is an fractional ideal $\mfk{b}$ such that $\mfk{ab}=\mfk{ba}=\mfk{o}$. Since $1 \in \mfk{o}$, we may say that there exists some $a_1,a_2,\cdots, a_n \in \mfk{a}$ and $b_1,b_2,\cdots,b_n \in \mfk{o}$ such that $\sum_{i = 1 }^{n}a_ib_i=1$. For any $r \in \mfk{a}$, we have an expression

On the other hand, any element of the form $c_1a_1+c_2a_2+\cdots+c_na_n$, by definition, is an element of $\mfk{a}$. $\blacksquare$

From now on, the inverse of an fractional ideal $\mfk{a}$ will be written like $\mfk{a}^{-1}$.

(Proposition 2)For ideals $\mfk{a},\mfk{b} \subset \mfk{o}$, $\mfk{b}\subset\mfk{a}$ if and only if there exists some $\mfk{c}$ such that $\mfk{ac}=\mfk{b}$ (or we simply say $\mfk{a}|\mfk{b}$)

**Proof.** If $\mfk{b}=\mfk{ac}$, simply note that $\mfk{ac} \subset \mfk{a} \cap \mfk{c} \subset \mfk{a}$. For the converse, suppose that $a \supset \mfk{b}$, then $\mfk{c}=\mfk{a}^{-1}\mfk{b}$ is an ideal of $\mfk{o}$ since $\mfk{c}=\mfk{a}^{-1}\mfk{b} \subset \mfk{a}^{-1}\mfk{a}=\mfk{o}$, hence we may write $\mfk{b}=\mfk{a}\mfk{c}$. $\blacksquare$

(Proposition 3)If $\mfk{a}$ is an ideal of $\mfk{o}$, then there are prime ideals $\mfk{p}_1,\mfk{p}_2,\cdots,\mfk{p}_n$ such that

**Proof.** For this problem we use a classical technique: contradiction on maximality. Suppose this is not true, let $\mfk{A}$ be the set of ideals of $\mfk{o}$ that cannot be written as the product of prime ideals. By assumption $\mfk{U}$ is nonempty. Since as we have proved, $\mfk{o}$ is Noetherian, we can pick an maximal element $\mfk{a}$ of $\mfk{A}$ with respect to inclusion. If $\mfk{a}$ is maximal, then since all maximal ideals are prime, $\mfk{a}$ itself is prime as well. If $\mfk{a}$ is properly contained in an ideal $\mfk{m}$, then we write $\mfk{a}=\mfk{m}\mfk{m}^{-1}\mfk{a}$. We have $\mfk{m}^{-1}\mfk{a} \supsetneq \mfk{a}$ since if not, we have $\mfk{a}=\mfk{ma}$, which implies $\mfk{m}=\mfk{o}$. But by maximality, $\mfk{m}^{-1}\mfk{a}\not\in\mfk{U}$, hence it can be written as a product of prime ideals. But $\mfk{m}$ is prime as well, we have a prime factorization for $\mfk{a}$, contradicting the definition of $\mfk{U}$.

Next we show uniqueness up to permutation. If

since $\mfk{p}_1\mfk{p}_2\cdots\mfk{p}_k\subset\mfk{p}_1$ and $\mfk{p}_1$ is prime, we may assume that $\mfk{q}_1 \subset \mfk{p}_1$. By the property of fractional ideal we have $\mfk{q}_1=\mfk{p}_1\mfk{r}_1$ for some fractional ideal $\mfk{r}_1$. However we also have $\mfk{q}_1 \subset \mfk{r}_1$. Since $\mfk{q}_1$ is prime, we either have $\mfk{q}_1 \supset \mfk{p}_1$ or $\mfk{q}_1 \supset \mfk{r}_1$. In the former case we get $\mfk{p}_1=\mfk{q}_1$, and we finish the proof by continuing inductively. In the latter case we have $\mfk{r}_1=\mfk{q}_1=\mfk{p}_1\mfk{q}_1$, which shows that $\mfk{p}_1=\mfk{o}$, which is impossible. $\blacksquare$

(Proposition 4)Every nontrivial prime ideal $\mfk{p}$ is maximal.

**Proof.** Let $\mfk{m}$ be an maximal ideal containing $\mfk{p}$. By proposition 2 we have some $\mfk{c}$ such that $\mfk{p}=\mfk{mc}$. If $\mfk{m} \neq \mfk{p}$, then $\mfk{c} \neq \mfk{o}$, and we may write $\mfk{c}=\mfk{p}_1\cdots\mfk{p}_n$, hence $\mfk{p}=\mfk{m}\mfk{p}_1\cdots\mfk{p}_n$, which is a prime factorisation, contradicting the fact that $\mfk{p}$ has a unique prime factorisation, which is $\mfk{p}$ itself. Hence any maximal ideal containing $\mfk{p}$ is $\mfk{p}$ itself. $\blacksquare$

(Proposition 5)Suppose the Dedekind domain $\mfk{o}$ only contains one prime (and maximal) ideal $\mfk{p}$, let $t \in \mfk{p}$ and $t \not\in \mfk{p}^2$, then $\mfk{p}$ is generated by $t$.

**Proof.** Let $\mfk{t}$ be the ideal generated by $t$. By proposition 3 we have a factorisation

for some $n$ since $\mfk{o}$ contains only one prime ideal. According to proposition 2, if $n \geq 3$, we write $\mfk{p}^n=\mfk{p}^2\mfk{p}^{n-2}$, we see $\mfk{p}^2 \supset \mfk{p}^n$. But this is impossible since if so we have $t \in \mfk{p}^n \subset \mfk{p}^2$ contradicting our assumption. Hence $0<n<3$. But If $n=2$ we have $t \in \mfk{p}^2$ which is also not possible. So $\mfk{t}=\mfk{p}$ provided that such $t$ exists.

For the existence of $t$, note if not, then for all $t \in \mfk{p}$ we have $t \in \mfk{p}^2$, hence $\mfk{p} \subset \mfk{p}^2$. On the other hand we already have $\mfk{p}^2 = \mfk{p}\mfk{p}$, which implies that $\mfk{p}^2 \subset \mfk{p}$ (proposition 2), hence $\mfk{p}^2=\mfk{p}$, contradicting proposition 3. Hence such $t$ exists and our proof is finished. $\blacksquare$

In fact there is another equivalent definition of Dedekind domain:

A domain $\mfk{o}$ is Dedekind if and only if

- $\mfk{o}$ is Noetherian.
- $\mfk{o}$ is integrally closed.
- $\mfk{o}$ has Krull dimension $1$ (i.e. every non-zero prime ideals are maximal).

This is equivalent to say that faction ideals form a group and is frequently used by mathematicians as well. But we need some more advanced techniques to establish the equivalence. Presumably there will be a post about this in the future.

]]>we have Hardy’s inequality $\def\lrVert[#1]{\lVert #1 \rVert}$

where $\frac{1}{p}+\frac{1}{q}=1$ of course.

There are several ways to prove it. I think there are several good reasons to write them down thoroughly since that may be why you find this page. Maybe you are burnt out since it’s *left as exercise*. You are assumed to have enough knowledge of Lebesgue measure and integration.

Let $S_1,S_2 \subset \mathbb{R}$ be two measurable set, suppose $F:S_1 \times S_2 \to \mathbb{R}$ is measurable, then

A proof can be found at here by turning to Example A9. You may need to replace all measures with Lebesgue measure $m$.

Now let’s get into it. For a measurable function in this place we should have $G(x,t)=\frac{f(t)}{x}$. If we put this function inside this inequality, we see

Note we have used change-of-variable twice and the inequality once.

I have no idea how people came up with this solution. Take $xF(x)=\int_0^x f(t)t^{u}t^{-u}dt$ where $0<u<1-\frac{1}{p}$. Hölder’s inequality gives us

Hence

Note we have used the fact that $\frac{1}{p}+\frac{1}{q}=1 \implies p+q=pq$ and $\frac{p}{q}=p-1$. Fubini’s theorem gives us the final answer:

It remains to find the minimum of $\varphi(u) = \left(\frac{1}{1-uq}\right)^{p-1}\frac{1}{up}$. This is an elementary calculus problem. By taking its derivative, we see when $u=\frac{1}{pq}<1-\frac{1}{p}$ it attains its minimum $\left(\frac{p}{p-1}\right)^p=q^p$. Hence we get

which is exactly what we want. Note the constant $q$ cannot be replaced with a smaller one. We simply proved the case when $f \geq 0$. For the general case, one simply needs to take absolute value.

This approach makes use of properties of $L^p$ space. Still we assume that $f \geq 0$ but we also assume $f \in C_c((0,\infty))$, that is, $f$ is continuous and has compact support. Hence $F$ is differentiable in this situation. Integration by parts gives

Note since $f$ has compact support, there are some $[a,b]$ such that $f >0$ only if $0 < a \leq x \leq b < \infty$ and hence $xF(x)^p\vert_0^\infty=0$. Next it is natural to take a look at $F’(x)$. Note we have

hence $xF’(x)=f(x)-F(x)$. A substitution gives us

which is equivalent to say

Hölder’s inequality gives us

Together with the identity above we get

which is exactly what we want since $1-\frac{1}{q}=\frac{1}{p}$ and all we need to do is divide $\left[\int_0^\infty F^pdx\right]^{1/q}$ on both sides. So what’s next? Note $C_c((0,\infty))$ is dense in $L^p((0,\infty))$. For any $f \in L^p((0,\infty))$, we can take a sequence of functions $f_n \in C_c((0,\infty))$ such that $f_n \to f$ with respect to $L^p$-norm. Taking $F=\frac{1}{x}\int_0^x f(t)dt$ and $F_n = \frac{1}{x}\int_0^x f_n(t)dt$, we need to show that $F_n \to F$ pointwise, so that we can use Fatou’s lemma. For $\varepsilon>0$, there exists some $m$ such that $\lrVert[f_n-f]_p < \frac{1}{n}$. Thus

Hence $F_n \to F$ pointwise, which also implies that $|F_n|^p \to |F|^p$ pointwise. For $|F_n|$ we have

note the third inequality follows since we have already proved it for $f \geq 0$. By Fatou’s lemma, we have

]]>It is quite often to see direct sum or direct product of groups, modules, vector spaces. Indeed, for modules over a ring $R$, direct products are also **direct products** of $R$-modules as well. On the other hand, the direct sum is a **coproduct** in the category of $R$-modules.

But what about tensor products? It is some different kind of *product* but how? Is it related to direct product? How do we write a tensor product down? We need to solve this question but it is not a good idea to dig into numeric works.

From now on, let $R$ be a commutative ring, and $M_1,\cdots,M_n$ are $R$-modules. Mainly we work on $M_1$ and $M_2$, i.e. $M_1 \times M_2$ and $M_1 \otimes M_2$. For $n$-multilinear one, simply replace $M_1\times M_2$ with $M_1 \times M_2 \times \cdots \times M_n$ and $M_1 \otimes M_2$ with $M_1 \otimes \cdots \otimes M_n$. The only difference is the change of symbols.

The bilinear maps of $M_1 \times M_2$ determines a category, say $BL(M_1 \times M_2)$ or we simply write $BL$. For an object $(f,E)$ in this category we have $f: M_1 \times M_2 \to E$ as a bilinear map and $E$ as a $R$-module of course. For two objects $(f,E)$ and $(g,F)$, we define the morphism between them as a linear function making the following diagram commutative: $\def\mor{\operatorname{Mor}}$

This indeed makes $BL$ a category. If we define the morphisms from $(f,E)$ to $(g,F)$ by $\mor(f,g)$ (for simplicity we omit $E$ and $F$ since they are already determined by $f$ and $g$) we see the composition

satisfy all axioms for a category:

**CAT 1** Two sets $\mor(f,g)$ and $\mor(f’,g’)$ are disjoint unless $f=f’$ and $g=g’$, in which case they are equal. If $g \neq g’$ but $f = f’$ for example, for any $h \in \mor(f,g)$, we have $g = h \circ f = h \circ f’ \neq g’$, hence $h \notin \mor(f,g)$. Other cases can be verified in the same fashion.

**CAT 2** The existence of identity morphism. For any $(f,E) \in BL$, we simply take the identity map $i:E \to E$. For $h \in \mor(f,g)$, we see $g = h \circ f = h \circ i \circ f$. For $h’ \in \mor(g,f)$, we see $f = h’ \circ g = i \circ h’ \circ g$.

**CAT 3** The law of composition is associative when defined.

There we have a category. But what about the tensor product? It is defined to be *initial* (or *universally repelling*) object in this category. Let’s denote this object by $(\varphi,M_1 \otimes M_2)$.

For any $(f,E) \in BL$, we have a unique morphism (which is a module homomorphism as well) $h:(\varphi,M_1 \otimes M_2) \to (f,E)$. For $x \in M_1$ and $y \in M_2$, we write $\varphi(x,y)=x \otimes y$. We call the existence of $h$ the

universal propertyof $(\varphi,M_1 \otimes M_2)$.

The tensor product is unique up to isomorphism. That is, if both $(f,E)$ and $(g,F)$ are tensor products, then $E \simeq F$ in the sense of module isomorphism. Indeed, let $h \in \mor(f,g)$ and $h’ \in \mor(g,h)$ be the unique morphisms respectively, we see $g = h \circ f$, $f = h’ \circ g$, and therefore

Hence $h \circ h’$ is the identity of $(g,F)$ and $h’ \circ h$ is the identity of $(f,E)$. This gives $E \simeq F$.

What do we get so far? For any modules that is connected to $M_1 \times M_2$ with a bilinear map, the tensor product $M_1 \oplus M_2$ of $M_1$ and $M_2$, is always able to be connected to that module with a unique module homomorphism. What if there are more than one tensor products? Never mind. All tensor products are isomorphic.

But wait, does this definition make sense? Does this product even exist? How can we study the tensor product of two modules if we cannot even write it down? So far we are only working on arrows, and we don’t know what is happening inside an module. It is not a good idea to waste our time on ‘nonsenses’. We can look into it in an natural way. Indeed, if we can find a module satisfying the property we want, then we are done, since this can represent the tensor product under any circumstances. Again, all tensor products of $M_1$ and $M_2$ are isomorphic.

Let $M$ be the free module generated by the set of all tuples $(x_1,x_2)$ where $x_1 \in M_1$ and $x_2 \in M_2$, and $N$ be the submodule generated by tuples of the following types:

First we have a inclusion map $\alpha=M_1 \times M_2 \to M$ and the canonical map $\pi:M \to M/N$. We claim that $(\pi \circ \alpha, M/N)$ is exactly what we want. But before that, we need to explain why we define such a $N$.

The reason is quite simple: We want to make sure that $\varphi=\pi \circ \alpha$ is bilinear. For example, we have $\varphi(x_1+x_1’,x_2)=\varphi(x_1,x_2)+\varphi(x_1’,x_2)$ due to our construction of $N$ (other relations follow in the same manner). This can be verified group-theoretically. Note

but

Hence we get the identity we want. For this reason we can write

Sometimes to avoid confusion people may also write $x_1 \otimes_R x_2$ if both $M_1$ and $M_2$ are $R$-modules. But before that we have to verify that this is indeed the tensor product. To verify this, all we need is the universal property of free modules.

By the universal property of $M$, for any $(f,E) \in BL$, we have a induced map $f_\ast$ making the diagram inside commutative. However, for elements in $N$, we see $f_\ast$ takes value $0$, since $f_\ast$ is a bilinear map already. We finish our work by taking $h[(x,y)+N] = f_\ast(x,y)$. This is the map induced by $f_\ast$, following the property of factor module.

For coprime integers $m,n>1$, we have $\def\mb{\mathbb}$

where $O$ means that the module only contains $0$ and $\mb{Z}/m\mb{Z}$ is considered as a module over $\mb{Z}$ for $m>1$. This suggests that, the tensor product of two modules is not necessarily ‘bigger’ than its components. Let’s see why this is trivial.

Note that for $x \in \mb{Z}/m\mb{Z}$ and $y \in \mb{Z}/n\mb{Z}$, we have

since, for example, $mx = 0$ for $x \in \mb{Z}/m\mb{Z}$ and $\varphi(0,y)=0$. If you have trouble understanding why $\varphi(0,y)=0$, just note that the submodule $N$ in our construction contains elements generated by $(0x,y)-0(x,y)$ already.

By Bézout’s identity, for any $x \otimes y$, we see there are $a$ and $b$ such that $am+bn=1$, and therefore

Hence the tensor product is trivial. This example gives us a lot of inspiration. For example, what if $m$ and $n$ are not necessarily coprime, say $\gcd(m,n)=d$? By Bézout’s identity still we have

This inspires us to study the connection between $\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z}$ and $\mb{Z}/d\mb{Z}$. By the **universal property**, for the bilinear map $f:\mb{Z}/m\mb{Z} \times \mb{Z}/n\mb{Z} \to \mb{Z}/d\mb{Z}$ defined by

(there should be no difficulty to verify that $f$ is well-defined), there exists a unique morphism $h:\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} \to \mb{Z}/d\mb{Z}$ such that

Next we show that it has a natural inverse defined by

Taking $a’ = a+kd$, we show that $g(a+d\mb{Z})=g(a’+\mb{Z})$, that is, we need to show that

By Bézout’s identity, there exists some $r,s$ such that $rm+sn=d$. Hence $a’ = a + ksn+krm$, which gives

since

So $g$ is well-defined. Next we show that this is the inverse. Firstly

Secondly,

Hence $g = h^{-1}$ and we can say

If $m,n$ are coprime, then $\gcd(m,n)=1$, hence $\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} \simeq \mb{Z}/\mb{Z}$ is trivial. More interestingly, $\mb{Z}/m\mb{Z}\otimes \mb{Z}/m\mb{Z}=\mb{Z}/m\mb{Z}$. But this elegant identity raised other questions. First of all, $\gcd(m,n)=\gcd(n,m)$, which implies

Further, for $m,n,r >1$, we have $\gcd(\gcd(m,n),r)=\gcd(m,\gcd(n,r))=\gcd(m,n,r)$, which gives

hence

Hence for modules of the form $\mb{Z}/m\mb{Z}$, we see the tensor product operation is associative and commutative up to isomorphism. Does this hold for all modules? The universal property answers this question affirmatively. From now on we will be keep using the universal property. Make sure that you have got the point already.

Let $M_1,M_2,M_3$ be $R$-modules, then there exists a unique isomorphism

for $x \in M_1$, $y \in M_2$, $z \in M_3$.

*Proof.* Consider the map

where $x \in M_1$. Since $(\cdot\otimes\cdot)$ is bilinear, we see $\lambda_x$ is bilinear for all $x \in M_1$. Hence by the universal property there exists a unique map of the tensor product:

Next we have the map

which is bilinear as well. Again by the universal property we have a unique map

This is indeed the isomorphism we want. The reverse is obtained by reversing the process. For the bilinear map

we get a unique map

Then from the bilinear map

we get the unique map, which is actually the reverse of $\overline{\mu}_x$:

Hence the two tensor products are isomorphic. $\square$

Let $M_1$ and $M_2$ be $R$-modules, then there exists a unique isomorphism

where $x_1 \in M_1$ and $x_2 \in M_2$.

*Proof.* The map

is bilinear and gives us a unique map

given by $x \otimes y \mapsto y \otimes x$. Symmetrically, the map $\lambda’:M_2 \times M_1 \to M_1 \otimes M_2$ gives us a unique map

which is the inverse of $\overline{\lambda}$. $\square$

Therefore, we may view the set of all $R$-modules as a commutative semigroup with the binary operation $\otimes$.

Consider commutative diagram:

Where $f_i:M_i \to M_i’$ are some module-homomorphism. What do we want here? On the left hand, we see $f_1 \times f_2$ sends $(x_1,x_2)$ to $(f_1(x_1),f_2(x_2))$, which is quite natural. The question is, is there a natural map sending $x_1 \otimes x_2$ to $f_1(x_1) \otimes f_2(x_2)$? This is what we want from the right hand. We know $T(f_1 \times f_2)$ exists, since we have a bilinear map by $\mu = \varphi’ \circ (f_1\times f_2)$. So for $(x_1,x_2) \in M_1 \times M_2$, we have $T(f_1 \times f_2)(x_1 \otimes x_2) = \varphi’ \circ (f_1 \times f_2)(x_1,x_2) = f_1(x_1) \otimes f_2(x_2)$ as what we want.

But $T$ in this graph has more interesting properties. First of all, if $M_1 = M_1’$ an $M_2 = M_2’$, both $f_1$ and $f_2$ are identity maps, then we see $T(f_1 \times f_2)$ is the identity as well. Next, consider the following chain

We can make it a double chain:

It is obvious that $(g_1 \circ f_1 \times g_2 \circ f_2)=(g_1 \times g_2) \circ (f_1 \times f_2)$, which also gives

Hence we can say $T$ is functorial. Sometimes for simplicity we also write $T(f_1,f_2)$ or simply $f_1 \otimes f_2$, as it sends $x_1 \otimes x_2$ to $f_1(x_1) \otimes f_2(x_2)$. Indeed it can be viewed as a map

]]>First we recall some backgrounds. Suppose $A$ is a ring with multiplicative identity $1_A$. A **left module** of $A$ is an additive abelian group $(M,+)$, together with an ring operation $A \times M \to M$ such that

for $x,y \in M$ and $a,b \in A$. As a corollary, we see $(0_A+0_A)x=0_Ax=0_Ax+0_Ax$, which shows $0_Ax=0_M$ for all $x \in M$. On the other hand, $a(x-x)=0_M$ which implies $a(-x)=-(ax)$. We can also define right $A$-modules but we are not discussing them here.

Let $S$ be a subset of $M$. We say $S$ is a **basis** of $M$ if $S$ generates $M$ and $S$ is linearly independent. That is, for all $m \in M$, we can pick $s_1,\cdots,s_n \in S$ and $a_1,\cdots,a_n \in A$ such that

and, for any $s_1,\cdots,s_n \in S$, we have

Note this also shows that $0_M\notin S$ (what happens if $0_M \in S$?). We say $M$ is **free** if it has a basis. The case when $M$ or $A$ is trivial is excluded.

If $A$ is a field, then $M$ is called a **vector space**, which has no difference from the one we learn in linear algebra and functional analysis. Mathematicians in functional analysis may be interested in the cardinality of a vector space, for example, when a vector space is of finite dimension, or when the basis is countable. But the basis does not come from nowhere. In fact we can prove that vector spaces have basis, but modules are not so lucky. $\def\mb{\mathbb}$

First of all let’s consider the cyclic group $\mb{Z}/n\mb{Z}$ for $n \geq 2$. If we define

which is actually $m$ copies of an element, then we get a module, which will be denoted by $M$. For any $x=k+n\mb{Z} \in M$, we see $nk+n\mb{Z}=0_M$. Therefore for **any** subset $S \subset M$, if $x_1,\cdots,x_k \in M$, we have

which gives the fact that $M$ has no basis. In fact this can be generalized further. If $A$ is a ring but not a field, let $I$ be a nontrivial proper ideal, then $A/I$ is a module that has no basis.

Following $\mb{Z}/n\mb{Z}$ we also have another example on finite order. Indeed, **any finite abelian group is not free as a module over $\mb{Z}$.** More generally,

Let $G$ be a abelian group, and $G_{tor}$ be its torsion subgroup. If $G_{tor}$ is non-trival, then $G$ cannot be a free module over $\mb{Z}$.

Next we shall take a look at infinite rings. Let $F[X]$ be the polynomial ring over a field $F$ and $F’[X]$ be the polynomial sub-ring that have coefficient of $X$ equal to $0$. Then $F[X]$ is a $F’[X]$-module. However it is not free.

Suppose we have a basis $S$ of $F[X]$, then we claim that $|S|>1$. If $|S|=1$, say $P \in S$, then $P$ cannot generate $F[X]$ since if $P$ is constant then we cannot generate a polynomial contains $X$ with power $1$; If $P$ is not constant, then the constant polynomial cannot be generate. Hence $S$ contains at least two polynomials, say $P_1 \neq 0$ and $P_2 \neq 0$. However, note $-X^2P_1 \in F’[X]$ and $X^2P_2 \in F’[X]$, which gives

Hence $S$ cannot be a basis.

I hope those examples have convinced you that basis is not a universal thing. We are going to prove that every vector space has a basis. More precisely,

Let $V$ be a nontrivial vector space over a field $K$. Let $\Gamma$ be a set of generators of $V$ over $K$ and $S \subset \Gamma$ is a subset which is linearly independent, then there exists a basis of $V$ such that $S \subset B \subset \Gamma$.

Note we can always find such $\Gamma$ and $S$. For the extreme condition, we can pick $\Gamma=V$ and $S$ be a set containing any single non-zero element of $V$. Note this also gives that we can generate a basis by expanding any linearly independent set. The proof relies on a fact that every non-zero element in a field is invertible, and also, Zorn’s lemma. In fact, axiom of choice is equivalent to the statement that every vector has a set of basis.$\def\mfk{\mathfrak}$

*Proof.* Define

Then $\mfk{T}$ is not empty since it contains $S$. If $T_1 \subset T_2 \subset \cdots$ is a totally ordered chain in $\mfk{T}$, then $T=\bigcup_{i=1}^{\infty}T_i$ is again linearly independent and contains $S$. To show that $T$ is linearly independent, note that if $x_1,x_2,\cdots,x_n \in T$, we can find some $k_1,\cdots,k_n$ such that $x_i \in T_{k_i}$ for $i=1,2,\cdots,n$. If we pick $k = \max(k_1,\cdots,k_n)$, then

But we already know that $T_k$ is linearly independent, so $a_1x_1+\cdots+a_nx_n=0_V$ implies $a_1=\cdots=a_n=0_K$.

By Zorn’s lemma, let $B$ be the maximal element of $\mfk{T}$, then $B$ is also linearly independent since it is an element of $\mfk{T}$. Next we show that $B$ generates $V$. Suppose not, then we can pick some $x \in \Gamma$ that is not generated by $B$. Define $B’=B \cup \{x\}$, we see $B’$ is linearly independent as well, because if we pick $y_1,y_2,\cdots,y_n \in B$, and if

then if $b \neq 0$ we have

contradicting the assumption that $x$ is not generated by $B$. Hence $b=0_K$. However, we have proved that $B’$ is a linearly independent set containing $B$ and contained in $S$, contradicting the maximality of $B$ in $\mfk{T}$. Hence $B$ generates $V$. $\square$

]]>In fact the construction of $\mathbb{Q}$ from $\mathbb{Z}$ has already been an example. For any $a \in \mathbb{Q}$, we have some $m,n \in \mathbb{Z}$ with $n \neq 0$ such that $a = \frac{m}{n}$. As a matter of notation we may also say an ordered pair $(m,n)$ determines $a$. Two ordered pairs $(m,n)$ and $(m’,n’)$ are *equivalent* if and only if

But we are only using the ring structure of $\mathbb{Z}$. So it is natural to think whether it is possible to generalize this process to all rings. But we are also using the fact that $\mathbb{Z}$ is an entire ring (or alternatively integral domain, they mean the same thing). However there is a way to generalize it. $\def\mfk{\mathfrak}$

(Definition 1)Amultiplicatively closed subset$S \subset A$ is a set that $1 \in S$ and if $x,y \in S$, then $xy \in S$.

For example, for $\mathbb{Z}$ we have a multiplicatively closed subset

We can also insert $0$ here but it may produce some bad result. If $S$ is also an ideal then we must have $S=A$ so this is not very interesting. However the complement is interesting.

(Proposition 1)Suppose $A$ is a commutative ring such that $1 \neq 0$. Let $S$ be a multiplicatively closed set that does not contain $0$. Let $\mfk{p}$ be the maximal element of ideals contained in $A \setminus S$, then $\mfk{p}$ is prime.

*Proof.* Recall that $\mfk{p}$ is prime if for any $x,y \in A$ such that $xy \in \mfk{p}$, we have $x \in \mfk{p}$ or $y \in \mfk{p}$. But now we fix $x,y \in \mfk{p}^c$. Note we have a strictly bigger ideal $\mfk{q}_1=\mfk{p}+Ax$. Since $\mfk{p}$ is maximal in the ideals contained in $A \setminus S$, we see

Therefore there exist some $a \in A$ and $p \in \mfk{p}$ such that

Also, $\mfk{q}_2=\mfk{p}+Ay$ has nontrivial intersection with $S$ (due to the maximality of $\mfk{p}$), there exist some $a’ \in A$ and $p’ \in \mfk{p}$ such that

Since $S$ is closed under multiplication, we have

But since $\mfk{p}$ is an ideal, we see $pp’+p’ax+pa’y \in \mfk{p}$. Therefore we must have $xy \notin \mfk{p}$ since if not, $(p+ax)(p’+a’y) \in \mfk{p}$, which gives $\mfk{p} \cap S \neq \varnothing$, and this is impossible. $\square$

As a corollary, for an ideal $\mfk{p} \subset A$, if $A \setminus \mfk{p}$ is multiplicatively closed, then $\mfk{p}$ is prime. Conversely, if we are given a prime ideal $\mfk{p}$, then we also get a multiplicatively closed subset.

(Proposition 2)If $\mfk{p}$ is a prime ideal of $A$, then $S = A \setminus \mfk{p}$ is multiplicatively closed.

*Proof.* First $1 \in S$ since $\mfk{p} \neq A$. On the other hand, if $x,y \in S$ we see $xy \in S$ since $\mfk{p}$ is prime. $\square$

We define a equivalence relation on $A \times S$ as follows:

(Proposition 3)$\sim$ is an equivalence relation.

*Proof.* Since $(as-as)1=0$ while $1 \in S$, we see $(a,s) \sim (a,s)$. For being symmetric, note that

Finally, to show that it is transitive, suppose $(a,s) \sim (b,t)$ and $(b,t) \sim (c,u)$. There exist $u,v \in S$ such that

This gives $bsv=atv$ and $buw = ctw$, which implies

But $tvw \in S$ since $t,v,w \in S$ and $S$ is multiplicatively closed. Hence

$\square$

Let $a/s$ denote the equivalence class of $(a,s)$. Let $S^{-1}A$ denote the set of equivalence classes (it is not a good idea to write $A/S$ as it may coincide with the notation of factor group), and we put a ring structure on $S^{-1}A$ as follows:

There is no difference between this one and the one in elementary algebra. But first of all we need to show that $S^{-1}A$ indeed form a ring.

(Proposition 4)The addition and multiplication are well defined. Further, $S^{-1}A$ is a commutative ring with identity.

*Proof.* Suppose $(a,s) \sim (a’,s’)$ and $(b,t) \sim (b’,t’)$ we need to show that

or

There exists $u,v \in S$ such that

If we multiply the first equation by $vtt’$ and second equation by $uss’$, we see

which is exactly what we want.

On the other hand, we need to show that

That is,

Again, we have

Hence

Since $uv \in S$, we are done.

Next we show that $S^{-1}A$ has a ring structure. If $0 \in S$, then $S^{-1}A$ contains exactly one element $0/1$ since in this case, all pairs are equivalent:

We therefore only discuss the case when $0 \notin S$. First $0/1$ is the zero element with respect to addition since

On the other hand, we have the inverse $-a/s$:

$1/1$ is the unit with respect to multiplication:

Multiplication is associative since

Multiplication is commutative since

Finally distributivity.

Note $ab/cb=a/c$ since $(abc-abc)1=0$. $\square$ $\def\mb{\mathbb}$

First we consider the case when $A$ is entire. If $0 \in S$, then $S^{-1}A$ is trivial, which is not so interesting. However, provided that $0 \notin S$, we get some well-behaved result:

(Proposition 5)Let $A$ be an entire ring, and let $S$ be a multiplicatively closed subset of $A$ that does not contain $0$, then the natural mapis injective. Therefore it can be considered as a natural inclusion. Further, every element of $\varphi_S(S)$ is invertible.

*Proof.* Indeed, if $x/1=0/1$, then there exists $s \in S$ such that $xs=0$. Since $A$ is entire and $s \neq 0$, we see $x=0$, hence $\varphi_S$ is entire. For $s \in S$, we see $\varphi_S(s)=s/1$. However $(1/s)\varphi_S(s)=(1/s)(s/1)=s/s=1$. $\square$

Note since $A$ is entire we can also conclude that $S^{-1}A$ is entire. As a word of warning, the ring homomorphism $\varphi_S$ is *not* in general injective since, for example, when $0 \in S$, this map is the zero.

If we go further, making $S$ contain all non-zero element, we have:

(Proposition 6)If $A$ is entire and $S$ contains all non-zero elements of $A$, then $S^{-1}A$ is a field, called thequotient fieldor thefield of fractions.

*Proof.* First we need to show that $S^{-1}A$ is entire. Suppose $(a/s)(b/t)=ab/st =0/1$ but $a/s \neq 0/1$, we see however

Since $A$ is entire, $b$ has to be $0$, which implies $b/t=0/1$. Second, if $a/s \neq 0/1$, we see $a \neq 0$ and therefore is in $S$, hence we’ve found the inverse $(a/s)^{-1}=s/a$. $\square$

In this case we can identify $A$ as a subset of $S^{-1}A$ and write $a/s=s^{-1}a$.

Let $A$ be a commutative ring, an let $S$ be the set of invertible elements of $A$. If $u \in S$, then there exists some $v \in S$ such that $uv=1$. We see $1 \in S$ and if $a,b \in S$, we have $ab \in S$ since $ab$ has an inverse as well. This set is frequently denoted by $A^\ast$, and is called the group of **invertible** elements of $A$. For example for $\mb{Z}$ we see $\mb{Z}^\ast$ consists of $-1$ and $1$. If $A$ is a field, then $A^\ast$ is the multiplicative group of non-zero elements of $A$. For example $\mb{Q}^\ast$ is the set of all rational numbers without $0$. For $A^\ast$ we have

If $A$ is a field, then $(A^\ast)^{-1}A \simeq A$.

*Proof.* Define

Then as we have already shown, $\varphi_S$ is injective. Secondly we show that $\varphi_S$ is surjective. For any $a/s \in (A^\ast)^{-1}A$, we see $as^{-1}/1 = a/s$. Therefore $\varphi_S(as^{-1})=a/s$ as is shown. $\square$

Now let’s see a concrete example. If $A$ is entire, then the polynomial ring $A[X]$ is entire. If $K = S^{-1}A$ is the quotient field of $A$, we can denote the quotient field of $A[X]$ as $K(X)$. Elements in $K(X)$ can be naturally called **rational polynomials**, and can be written as $f(X)/g(X)$ where $f,g \in A[X]$. For $b \in K$, we say a rational function $f/g$ is **defined** at $b$ if $g(b) \neq 0$. Naturally this process can be generalized to polynomials of $n$ variables.

We say a commutative ring $A$ is local if it has a unique maximal ideal. Let $\mfk{p}$ be a prime ideal of $A$, and $S = A \setminus \mfk{p}$, then $A_{\mfk{p}}=S^{-1}A$ is called the **local ring of $A$ at $\mfk{p}$**. Alternatively, we say the process of passing from $A$ to $A_\mfk{p}$ is *localization* at $\mfk{p}$. You will see it makes sense to call it localization:

(Proposition 7)$A_\mfk{p}$ is local. Precisely, the unique maximal ideal isNote $I$ is indeed equal to $\mfk{p}A_\mfk{p}$.

*Proof.* First we show that $I$ is an ideal. For $b/t \in A_\mfk{p}$ and $a/s \in I$, we see

since $a \in \mfk{p}$ implies $ba \in \mfk{p}$. Next we show that $I$ is maximal, which is equivalent to show that $A_\mfk{p}/I$ is a field. For $b/t \notin I$, we have $b \in S$, hence it is legit to write $t/b$. This gives

Hence we have found the inverse.

Finally we show that $I$ is the unique maximal ideal. Let $J$ be another maximal ideal. Suppose $J \neq I$, then we can pick $m/n \in J \setminus I$. This gives $m \in S$ since if not $m \in \mfk{p}$ and then $m/n \in I$. But for $n/m \in A_\mfk{p}$ we have

This forces $J$ to be $A_\mfk{p}$ itself, contradicting the assumption that $J$ is a maximal ideal. Hence $I$ is unique. $\square$

Let $p$ be a prime number, and we take $A=\mb{Z}$ and $\mfk{p}=p\mb{Z}$. We now try to determine what do $A_\mfk{p}$ and $\mfk{p}A_\mfk{p}$ look like. First $S = A \setminus \mfk{p}$ is the set of all entire numbers prime to $p$. Therefore $A_\mfk{p}$ can be considered as the ring of all rational numbers $m/n$ where $n$ is prime to $p$, and $\mfk{p}A_\mfk{p}$ can be considered as the set of all rational numbers $kp/n$ where $k \in \mb{Z}$ and $n$ is prime to $p$.

$\mb{Z}$ is the simplest example of ring and $p\mb{Z}$ is the simplest example of prime ideal. And $A_\mfk{p}$ in this case shows what does localization do: $A$ is ‘expanded’ with respect to $\mfk{p}$. Every member of $A_\mfk{p}$ is related to $\mfk{p}$, and the maximal ideal is determined by $\mfk{p}$.

Let $k$ be a infinite field. Let $A=k[x_1,\cdots,x_n]$ where $x_i$ are independent indeterminates, $\mfk{p}$ a prime ideal in $A$. Then $A_\mfk{p}$ is the ring of all rational functions $f/g$ where $g \notin \mfk{p}$. We have already defined rational functions. But we can go further and demonstrate the prototype of the local rings which arise in algebraic geometry. Let $V$ be the variety defined by $\mfk{p}$, that is,

Then what about $A_\mfk{p}$? We see since for $f/g \in A_\mfk{p}$ we have $g \notin \mfk{p}$, therefore for $g(x)$ is not equal to $0$ almost everywhere on $V$. That is, $A_\mfk{p}$ can be identified with the ring of all rational functions on $k^n$ which are defined at *almost all* points of $V$. We call this the local ring of $k^n$ **along the variety** $V$.

Let $A$ be a ring and $S^{-1}A$ a ring of fractions, then we shall see that $\varphi_S:S \to S^{-1}A$ has a universal property.

(Proposition 8)Let $g:A \to B$ be a ring homomorphism such that $g(s)$ is invertible in $B$ for all $s \in S$, then there exists a unique homomorphism $h:S^{-1}A \to B$ such that $g = h \circ \varphi_S$.

*Proof.* For $a/s \in S^{-1}A$, define $h(a/s)=g(a)g(s)^{-1}$. It looks immediate but we shall show that this is what we are looking for and is unique.

Firstly we need to show that it is well defined. Suppose $a/s=a’/s’$, then there exists some $u \in S$ such that

Applying $g$ on both side yields

Since $g(x)$ is invertible for all $s \in S$, we therefore get

It is a homomorphism since

and

they are equal since

Next we show that $g=h \circ \varphi_S$. For $a \in A$, we have

Finally we show that $h$ is unique. Let $h’$ be a homomorphism satisfying the condition, then for $a \in A$ we have

For $s \in S$, we also have

Since $a/s = (a/1)(1/s)$ for all $a/s \in S^{-1}A$, we get

That is, $h’$ (or $h$) is totally determined by $g$. $\square$

Let’s restate it in the language of category theory (you can skip it if you have no idea what it is now). Let $\mfk{C}$ be the category whose objects are ring-homomorphisms

such that $f(s)$ is invertible for all $s \in S$. Then according to proposition 5, $\varphi_S$ is an object of $\mfk{C}$. For two objects $f:A \to B$ and $f’:A \to B’$, a morphism $g \in \operatorname{Mor}(f,f’)$ is a homomorphism

such that $f’=g \circ f$. So here comes the question: what is the position of $\varphi_S$?

Let $\mfk{A}$ be a category. an object $P$ of $\mfk{A}$ is called **universally attracting** if there exists a unique morphism of each object of $\mfk{A}$ into $P$, an is called **universally repelling** if for every object of $\mfk{A}$ there exists a unique morphism of $P$ into this object. Therefore we have the answer for $\mfk{C}$.

(Proposition 9)$\varphi_S$ is a universally repelling object in $\mfk{C}$.

An ideal $\mfk{o} \in A$ is said to be **principal** if there exists some $a \in A$ such that $Aa = \mfk{o}$. For example for $\mb{Z}$, the ideal

is principal and we may write $2\mb{Z}$. If every ideal of a **commutative** ring $A$ is principal, we say $A$ is principal. Further we say $A$ is a **PID** if $A$ is also an integral domain (entire). When it comes to ring of fractions, we also have the following proposition:

(Proposition 10)Let $A$ be a principal ring and $S$ a multiplicatively closed subset with $0 \notin S$, then $S^{-1}A$ is principal as well.

*Proof.* Let $I \subset S^{-1}A$ be an ideal. If $a \in S$ where $a/s \in I$, then we are done since then $(s/a)(a/s) = 1/1 \in I$, which implies $I=S^{-1}A$ itself, hence we shall assume $a \notin S$ for all $a/s \in I$. But for $a/s \in I$ we also have $(a/s)(s/1)=a/1 \in I$. Therefore $J=\varphi_S^{-1}(I)$ is not empty. $J$ is an ideal of $A$ since for $a \in A$ and $b \in J$, we have $\varphi_S(ab) =ab/1=(a/1)(b/1) \in I$ which implies $ab \in J$. But since $A$ is principal, there exists some $a$ such that $Aa = J$. We shall discuss the relation between $S^{-1}A(a/1)$ and $I$. For any $(c/u)(a/1)=ca/u \in S^{-1}A(a/1)$, clearly we have $ca/u \in I$, hence $S^{-1}A(a/1)\subset I$. On the other hand, for $c/u \in I$, we see $c/1=(c/u)(u/1) \in I$, hence $c \in J$, and there exists some $b \in A$ such that $c = ba$, which gives $c/u=ba/u=(b/u)(a/1) \in I$. Hence $I \subset S^{-1}A(a/1)$, and we have finally proved that $I = S^{-1}A(a/1)$. $\square$

As an immediate corollary, if $A_\mfk{p}$ is the localization of $A$ at $\mfk{p}$, and if $A$ is principal, then $A_\mfk{p}$ is principal as well. Next we go through another kind of rings. A ring is called **factorial** (or a **unique factorization ring** or **UFD**) if it is entire and if every non-zero element has a unique factorization into irreducible elements. An element $a \neq 0$ is called **irreducible** if it is not a unit and whenever $a=bc$, then either $b$ or $c$ is a unit. For all non-zero elements in a factorial ring, we have

where $u$ is a unit) (invertible).

In fact, every PID is a UFD (proof here). Irreducible elements in a factorial ring is called **prime elements** or simply **prime** (take $\mathbb{Z}$ and prime numbers as an example). Indeed, if $A$ is a factorial ring and $p$ a prime element, then $Ap$ is a prime ideal. But we are more interested in the ring of fractions of a factorial ring.

(Proposition 11)Let $A$ be a factorial ring and $S$ a multiplicatively closed subset with $0 \notin S$, then $S^{-1}A$ is factorial.

*Proof.* Pick $a/s \in S^{-1}A$. Since $A$ is factorial, we have $a=up_1 \cdots p_k$ where $p_i$ are primes and $u$ is a unit. But we have no idea what are irreducible elements of $S^{-1}A$. Naturally our first attack is $p_i/1$. And we have no need to restrict ourselves to $p_i$, we should work on all primes of $A$. Suppose $p$ is a prime of $A$. If $p \in S$, then $p/1 \in S$ is a unit, not prime. If $Ap \cap S \neq \varnothing$, then $rp \in S$ for some $r \in A$. But then

again $p/1$ is a unit, not prime. Finally if $Ap \cap S = \varnothing$, then $p/1$ is prime in $S^{-1}A$. For any

we see $ab=stp \not\in S$. But this also gives $ab \in Ap$ which is a prime ideal, hence we can assume $a \in Ap$ and write $a=rp$ for some $r \in A$. With this expansion we get

Hence $b/t$ is a unit, $p/1$ is a prime.

Conversely, suppose $a/s$ is irreducible in $S^{-1}A$. Since $A$ is factorial, we may write $a=u\prod_{i}p_i$. $a$ cannot be an element of $S$ since $a/s$ is not a unit. We write

We see there is some $v \in A$ such that $uv=1$ and accordingly $(u/1)(v/1)=uv/1=1/1$, hence $u/1$ is a unit. We claim that there exist a unique $p_k$ such that $1 \leq k \leq n$ and $Ap \cap S = \varnothing$. If not exists, then all $p_j/1$ are units. If both $p_{k}$ and $p_{k’}$ satisfy the requirement and $p_k \neq p_k’$, then we can write $a/s$ as

Neither the one in curly bracket nor $p_{k’}/1$ is unit, contradicting the fact that $a/s$ is irreducible. Next we show that $a/s=p_k/1$. For simplicity we write

Note $a/s = bp_k/s = (b/s)(p_k/1)$. Since $a/s$ is irreducible, $p_k/1$ is not a unit, we conclude that $b/s$ is a unit. We are done for the study of irreducible elements of $S^{-1}A$: it is of the form $p/1$ (up to a unit) where $p$ is prime in $A$ and $Ap \cap S = \varnothing$.

Now we are close to the fact that $S^{-1}A$ is also factorial. For any $a/s \in S^{-1}A$, we have an expansion

Let $p’_1,p’_2,\cdots,p’_j$ be those whose generated prime ideal has nontrivial intersection with $S$, then $p’_1/1, p’_2/1,\cdots,p’_j/1$ are units of $S^{-1}A$. Let $q_1,q_2,\cdots,q_k$ be other $p_i$’s, then $q_1/1,q_2/1,\cdots,q_k/1$ are irreducible in $S^{-1}A$. This gives

Hence $S^{-1}A$ is factorial as well. $\square$

We finish the whole post by a comprehensive proposition:

(Proposition 12)Let $A$ be a factorial ring and $p$ a prime element, $\mfk{p}=Ap$. The localization of $A$ at $\mfk{p}$ is principal.

*Proof.* For $a/s \in S^{-1}A$, we see $p$ does not divide $s$ since if $s = rp$ for some $r \in A$, then $s \in \mfk{p}$, contradicting the fact that $S = A \setminus \mfk{p}$. Since $A$ is factorial, we may write $a = cp^n$ for some $n \geq 0$ and $p$ does not divide $c$ as well (which gives $c \in S$. Hence $a/s = (c/s)(p^n/1)$. Note $(c/s)(s/c)=1/1$ and therefore $c/s$ is a unit. For every $a/s \in S^{-1}A$ we may write it as

where $u$ is a unit of $S^{-1}A$.

Let $I$ be any ideal in $S^{-1}A$, and

Let’s discuss the relation between $S^{-1}A(p^m/1)$ and $I$. First we see $S^{-1}A(p^m/1)=S^{-1}A(up^m/1)$ since if $v$ is the inverse of $u$, we get

Any element of $S^{-1}A(up^m/1)$ is of the form

Since $up^m/1 \in I$, we see $vup^{m+k}/1 \in I$ as well, hence $S^{-1}A(up^m/1) \subset I$. On the other hand, any element of $I$ is of the form $wup^{m+n}/1=w(p^n/1)u(p^m/1)$ where $w$ is a unit and $n \geq 0$. This shows that $vup^{m+n}/1 \in S^{-1}A(up^m/1)$. Hence $S^{-1}A(p^m/1)=S^{-1}A(up^m/1)=I$ as we wanted. $\square$

]]>Let $A$ be an abelian group. Let $(e_i)_{i \in I}$ be a family of elements of $A$. We say that this family is a **basis** for $A$ if the family is not empty, and if every element of $A$ has a unique expression as a **linear expression**

where $x_i \in \mathbb{Z}$ and almost all $x_i$ are equal to $0$. This means that the sum is actually finite. An abelian group is said to be **free** if it has a basis. Alternatively, we may write $A$ as a direct sum by

Let $S$ be a set. Say we want to get a group out of this for some reason, so how? It is not a good idea to endow $S$ with a binary operation beforehead since overall $S$ is merely a set. We shall **generate** a group out of $S$ in the most **freely** way.

Let $\mathbb{Z}\langle S \rangle$ be the set of all **maps** $\varphi:S \to \mathbb{Z}$ such that, for only a **finite** number of $x \in S$, we have $\varphi(x) \neq 0$. For simplicity, we denote $k \cdot x$ to be some $\varphi_0 \in \mathbb{Z}\langle S \rangle$ such that $\varphi_0(x)=k$ but $\varphi_0(y) = 0$ if $y \neq x$. For any $\varphi$, we claim that $\varphi$ has a unique expression

One can consider these integers $k_i$ as the order of $x_i$, or simply the time that $x_i$ appears (may be negative). For $\varphi\in\mathbb{Z}\langle S \rangle$, let $I=\{x_1,x_2,\cdots,x_n\}$ be the set of elements of $S$ such that $\varphi(x_i) \neq 0$. If we denote $k_i=\varphi(x_i)$, we can show that $\psi=k_1 \cdot x_1 + k_2 \cdot x_2 + \cdots + k_n \cdot x_n$ is equal to $\varphi$. For $x \in I$, we have $\psi(x)=k$ for some $k=k_i\neq 0$ by definition of the ‘$\cdot$’; if $y \notin I$ however, we then have $\psi(y)=0$. This coincides with $\varphi$. $\blacksquare$

By definition the zero map $\mathcal{O}=0 \cdot x \in \mathbb{Z}\langle S \rangle$ and therefore we may write any $\varphi$ by

where $k_x \in \mathbb{Z}$ and can be zero. Suppose now we have two expressions, for example

Then

Suppose $k_y - k_y’ \neq 0$ for some $y \in S$, then

which is a contradiction. Therefore the expression is unique. $\blacksquare$

This $\mathbb{Z}\langle S \rangle$ is what we are looking for. It is an additive group (which can be proved immediately) and, what is more important, every element can be expressed as a ‘sum’ associated with finite number of elements of $S$. We shall write $F_{ab}(S)=\mathbb{Z}\langle S \rangle$, and call it the **free abelian group generated by $S$**. For elements in $S$, we say they are **free generators** of $F_{ab}(S)$. If $S$ is a finite set, we say $F_{ab}(S)$ is **finitely generated**.

An abelian group is

freeif and only if it is isomorphic to a free abelian group $F_{ab}(S)$ for some set $S$.

**Proof.** First we shall show that $F_{ab}(S)$ is free. For $x \in M$, we denote $\varphi = 1 \cdot x$ by $[x]$. Then for any $k \in \mathbb{Z}$, we have $k[x]=k \cdot x$ and $k[x]+k’[y] = k\cdot x + k’ \cdot y$. By definition of $F_{ab}(S)$, any element $\varphi \in F_{ab}(S)$ has a unique expression

Therefore $F_{ab}(S)$ is free since we have found the basis $([x])_{x \in S}$.

Conversely, if $A$ is free, then it is immediate that its basis $(e_i)_{i \in I}$ generates $A$. Our statement is therefore proved. $\blacksquare$

(Proposition 1)If $A$ is an abelian group, then there is a free group $F$ which has a subgroup $H$ such that $A \cong F/H$.

**Proof.** Let $S$ be any set containing $A$. Then we get a surjective map $\gamma: S \to A$ and a free group $F_{ab}(S)$. We also get a unique homomorphism $\gamma_\ast:F_{ab}(S) \to A$ by

which is also surjective. By the first isomorphism theorem, if we set $H=\ker(\gamma_\ast)$ and $F_{ab}(S)=F$, then

$\blacksquare$

(Proposition 2)If $A$ is finitely generated, then $F$ can also be chosen to be finitely generated.

**Proof.** Let $S$ be the generator of $A$, and $S’$ is a set containing $S$. Note if $S$ is finite, which means $A$ is finitely generated, then $S’$ can also be finite by inserting one or any finite number more of elements. We have a map from $S$ and $S’$ into $F_{ab}(S)$ and $F_{ab}(S’)$ respectively by $f_S(x)=1 \cdot x$ and $f_{S’}(x’)=1 \cdot x’$. Define $g=f_{S’} \circ \lambda:S’ \to F_{ab}(S)$ we get another homomorphism by

This defines a unique homomorphism such that $g_\ast \circ f_{S’} = g$. As one can also verify, this map is also surjective. Therefore by the first isomorphism theorem we have

$\blacksquare$

It’s worth mentioning separately that we have implicitly proved two statements with commutative diagrams:

(Proposition 3 | Universal property)If $g:S \to B$ is a mapping of $S$ into some abelian group $B$, then we can define a unique group-homomorphism making the following diagram commutative:

(Proposition 4)If $\lambda:S \to S$ is a mapping of sets, there is a unique homomorphism $\overline{\lambda}$ making the following diagram commutative:

(In the proof of Proposition 2 we exchanged $S$ an $S’$.)

(The Grothendieck group)Let $M$ be a commutative monoid written additively. We shall prove that there exists a commutative group $K(M)$ with a monoid homomorphismsatisfying the following universal property: If $f:M \to A$ is a homomorphism from $M$ into a abelian group $A$, then there exists a unique homomorphism $f_\gamma:K(M) \to A$ such that $f=f_\gamma\circ\gamma$. This can be represented by a commutative diagram:

**Proof.** There is a commutative diagram describes what we are doing.

Let $F_{ab}(M)$ be the free abelian group generated by $M$. For $x \in M$, we denote $1 \cdot x \in F_{ab}(M)$ by $[x]$. Let $B$ be the group generated by all elements of the type

where $x,y \in M$. This can be considered as a subgroup of $F_{ab}(M)$. We let $K(M)=F_{ab}(M)/B$. Let $i=x \to [x]$ and $\pi$ be the canonical map

We are done by defining $\gamma: \pi \circ i$. Then we shall verify that $\gamma$ is our desired homomorphism satisfying the universal property. For $x,y \in M$, we have $\gamma(x+y)=\pi([x+y])$ and $\gamma(x)+\gamma(y) = \pi([x])+\pi([y])=\pi([x]+[y])$. However we have

which implies that

Hence $\gamma$ is a monoid-homomorphism. Finally the universal property. By proposition 3, we have a unique homomorphism $f_\ast$ such that $f_\ast \circ i = f$. Note if $y \in B$, then $f_\ast(y) =0$. Therefore $B \subset \ker{f_\ast}$ Therefore we are done if we define $f_\gamma(x+B)=f_\ast (x)$. $\blacksquare$

Why such a $B$? Note in general $[x+y]$ is not necessarily equal to $[x]+[y]$ in $F_{ab}(M)$, but we don’t want it to be so. So instead we create a new **equivalence relation**, by factoring a subgroup generated by $[x+y]-[x]-[y]$. Therefore in $K(M)$ we see $[x+y]+B = [x]+[y]+B$, which finally makes $\gamma$ a homomorphism. We use the same strategy to generate the **tensor product** of two modules later. But at that time we have more than one relation to take care of.

If for all $x,y,z \in M$, $x+y=x+z$ implies $y=z$, then we say $M$ is a cancellative monoid, or the cancellation law holds in $M$. Note for the proof above we didn’t use any property of cancellation. However we still have an interesting property for cancellation law.

(Theorem)The cancellation law holds in $M$ if and only if $\gamma$ is injective.

**Proof.** This proof involves another approach to the Grothendieck group. We consider pairs $(x,y) \in M \times M$ with $x,y \in M$. Define

Then we get a equivalence relation (try to prove it yourself!). We define the addition component-wise, that is, $(x,y)+(x’,y’)=(x+x’,y+y’)$, then the equivalence classes of pairs form a group $A$, where the zero element is $[(0,0)]$. We have a monoid-homomorphism

If cancellation law holds in $M$, then

Hence $f$ is injective. By the universal property of the Grothendieck group, we get a unique homomorphism $f_\gamma$ such that $f_\gamma \circ \gamma = f$. If $x \neq 0$ in $M$, then $f_\gamma \circ \gamma(x) \neq 0$ since $f$ is injective. This implies $\gamma(x) \neq 0$. Hence $\gamma$ is injective.

Conversely, if $\gamma$ is injective, then $i$ is injective (this can be verified by contradiction). Then we see $f=f_\ast \circ i$ is injective. But $f(x)=f(y)$ if and only if $x+\ell = y+\ell$, hence $x+ \ell = y+ \ell$ implies $x=y$, the cancellation law holds on $M$.

Our first example is $\mathbb{N}$. Elements of $F_{ab}(\mathbb{N})$ are of the form

For elements in $B$ they are generated by

which we wish to represent $0$. Indeed, $K(\mathbb{N}) \simeq \mathbb{Z}$ since if we have a homomorphism

For $r \in \mathbb{Z}$, we see $f(1 \cdot r+B)=r$. On the other hand, if $\sum_{j=1}^{m}k_j \cdot n_j \not\in B$, then its image under $f$ is not $0$.

In the first example we ‘granted’ the natural numbers ‘subtraction’. Next we grant the division on multiplicative monoid.

Consider $M=\mathbb{Z} \setminus 0$. Now for $F_{ab}(M)$ we write elements in the form

which denotes that $\varphi(n_j)=k_j$ and has no other differences. Then for elements in $B$ they are generated by

which we wish to represent $1$. Then we see $K(M) \simeq \mathbb{Q} \setminus 0$ if we take the isomorphism

Of course this is not the end of the Grothendieck group. But for further example we may need a lot of topology background. For example, we have the topological $K$-theory group of a topological space to be the Grothendieck group of isomorphism classes of topological vector bundles. But I think it is not a good idea to post these examples at this timing.

]]>We begin our study by some elementary Calculus. Now we have the function $f(x)=x^2+\frac{e^x}{x^2+1}$ as our example. It should not be a problem to find its tangent line at point $(0,1)$, by calculating its derivative, we have $l:x-y+1=0$ as the tangent line.

$l$ is not a vector space since it does not get cross the origin, in general. But $l-\overrightarrow{OA}$ is a vector space. In general, suppose $P(x,y)$ is a point on the curve determined by $f$, i.e. $y=f(x)$, then we obtain a vector space $l_p-\overrightarrow{OP} \simeq \mathbb{R}$. But the action of moving the tangent line to the origin is superfluous so naturally we consider the tangent line at $P$ as a vector space **determined** by $P$. In this case, the induced vector space (tangent line) is always of dimension $1$.

Now we move to two-variable functions. We have a function $a(x,y)=x^2+y^2-x-y+xy$ as our example. Some elementary Calculus work gives us the tangent surface of $z=a(x,y)$ at $A(1,1,1)$, which can be identified by $S:2x+2y-z=3\simeq\mathbb{R}^2$. Again, this can be considered as a vector space **determined** by $A$, or roughly speaking it is one if we take $A$ as the origin. Further we have a base $(\overrightarrow{AB},\overrightarrow{AC})$. Other vectors on $S$, for example $\overrightarrow{AD}$, can be written as a linear combination of $\overrightarrow{AB}$ and $\overrightarrow{AC}$. In other words, $S$ is “spanned” by $(\overrightarrow{AB},\overrightarrow{AC})$.

Tangent line and tangent surface play an important role in differentiation. But sometimes we do not have a chance to use it with ease, for example $S^1:x^2+y^2=1$ cannot be represented by a single-variable function. However the implicit function theorem, which you have already learned in Calculus, gives us a chance to find a satisfying function locally. Here in this post we will try to generalize this concept, trying to find the tangent **space** at some point of a manifold. (The two examples above have already determined two manifolds and two tangent spaces.)

We will introduce the abstract definition of a tangent vector at beginning. You may think it is way too abstract but actually it is not. Surprisingly, the following definition can simplify our work in the future. But before we go, make sure that you have learned about Fréchet derivative (along with some functional analysis knowledge).

Let $M$ be a manifold of class $C^p$ with $p \geq 1$ and let $x$ be a point of $M$. Let $(U,\varphi)$ be a chart at $x$ and $v$ be a element of the vector space $\mathbf{E}$ where $\varphi(U)$ lies (for example, if $M$ is a $d$-dimensional manifold, then $v \in \mathbb{R}^d$). Next we consider the triple $(U,\varphi,v)$. Suppose $(U,\varphi,v)$ and $(V,\psi,w)$ are two such triples. We say these two triples are **equivalent** if the following identity holds:

This identity looks messy so we need to explain how to read it. First we consider the function in red: the derivative of $\psi\circ\varphi^{-1}$. The derivative of $\psi\circ\varphi^{-1}$ at point $\varphi(x)$ (in purple) is a linear transform, and the transform is embraced with green brackets. Finally, this linear transform maps $v$ to $w$. In short we read, the derivative of $\psi\circ\varphi^{-1}$ at $\varphi(x)$ maps $v$ on $w$. You may recall that you have meet something like $\psi\circ\varphi^{-1}$ in the definition of manifold. It is not likely that these ‘triples’ should be associated to tangent vectors. But before we explain it, we need to make sure that we indeed defined an equivalent relation.

(Theorem 1)The relationis an equivalence relation.

*Proof.* This will not go further than elementary Calculus, in fact, chain rule:

(Chain rule)If $f:U \to V$ is differentiable at $x_0 \in U$, if $g: V \to W$ is differentiable at $f(x_0)$, then $g \circ f$ is differentiable at $x_0$, and

- $(U,\varphi,v)\sim(U,\varphi,v)$.

Since $\varphi\circ\varphi^{-1}=\operatorname{id}$, whose derivative is still the identity everywhere, we have

- If $(U,\varphi,v) \sim (V,\psi,w)$, then $(V,\psi,w)\sim(U,\varphi,v)$.

So now we have

To prove that $[(\varphi\circ\psi^{-1})’(\psi(x))]{}(w)=v$, we need some implementation of chain rule.

Note first

while

But also by the chain rule, if $f$ is a diffeomorphism, we have

or equivalently

Therefore

which implies

- If $(U,\varphi,v)\sim(V,\psi,w)$ and $(V,\psi,w)\sim(W,\lambda,z)$, then $(U,\varphi,v)\sim(W,\lambda,z)$.

We are given identities

and

By canceling $w$, we get

On the other hand,

which is what we needed. $\square$

An **equivalence class** of such triples $(U,\varphi,v)$ is called a **tangent vector** of $X$ at $x$. The set of such tangent vectors is called the **tangent space** to $X$ at $x$, which is denoted by $T_x(X)$. But it seems that we have gone too far. Is the triple even a ‘vector’? To get a clear view let’s see Euclidean submanifolds first.

Suppose $M$ is a submanifold of $\mathbb{R}^n$. We say $z$ is the

tangent vectorof $M$ at point $x$ if there exists a curve $\alpha$ of class $C^1$, which is defined on $\mathbb{R}$ and where there exists an interval $I$ such that $\alpha(I) \subset M$, such that $\alpha(t_0)=x$ and $\alpha’(t_0)=z$. (For convenience we often take $t_0=0$.)

This definition is immediate if we check some examples. For the curve $M: x^2+1+\frac{e^x}{x^2+1}-y=0$, we can show that $(1,1)^T$ is a tangent vector of $M$ at $(0,1)$, which is identical to our first example. Taking

we get $\alpha(0)=(0,1)$ and

Therefore $\alpha’(0)=(1,1)^T$. $\square$

Let $\mathbf{E}$ and $\mathbf{F}$ be two Banach spaces and $U$ an open subset of $\mathbf{E}$. A $C^p$ map $f: U \to \mathbf{F}$ is called an

immersionat $x$ if $f’(x)$ is injective.

For example, if we take $\mathbf{E}=\mathbf{F}=\mathbb{R}=U$ and $f(x)=x^2$, then $f$ is an immersion at almost all point on $\mathbb{R}$ except $0$ since $f’(0)=0$ is not injective. This may lead you to Sard’s theorem.

(Theorem 2)Let $M$ be a subset of $\mathbb{R}^n$, then $M$ is a $d$-dimensional $C^p$ submanifold of $\mathbb{R}^n$ if and only if for every $x \in M$ there exists an open neighborhood $U \subset \mathbb{R}^n$ of $x$, an open neighborhood $\Omega \subset \mathbb{R}^d$ of $0$ and a $C^p$ map $g: \Omega \to \mathbb{R}^n$ such that $g$ is immersion at $0$ such that $g(0)=x$, and $g$ is a homeomorphism between $\Omega$ and $M \cap U$ with the topology induced from $\mathbb{R}^n$.

This follows from the definition of manifold and should not be difficult to prove. But it is not what this blog post should cover. For a proof you can check *Differential Geometry: Manifolds, Curves, and Surfaces* by Marcel Berger and Bernard Gostiaux. The proof is located in section 2.1.

A coordinate system on a $d$-dimensional $C^p$ submanifold $M$ of $\mathbb{R}^n$ is a pair $(\Omega,g)$ consisting of an open set $\Omega \subset \mathbb{R}^d$ and a $C^p$ function $g:\Omega \to \mathbb{R}^n$ such that $g(\Omega)$ is open in $V$ and $g$ induces a homeomorphism between $\Omega$ and $g(\Omega)$.

For convenience, we say $(\Omega,g)$ is centered at $x$ if $g(0)=x$ and $g$ is an immersion at $x$. By theorem 2 it is always possible to find such a coordinate system centered at a given point $x \in M$. The following theorem will show that we can get a easier approach to tangent vector.

(Theorem 3)Let $\mathbf{E}$ and $\mathbf{F}$ be two finite-dimensional vector spaces, $U \subset \mathbf{E}$ an open set, $f:U \to \mathbf{F}$ a $C^1$ map, $M$ a submanifold of $\mathbf{E}$ contained in $U$ and $W$ a submanifold of $\mathbf{F}$ such that $f(M) \subset W$. Take $x \in M$ and set $y=f(x)$, If $z$ is a tangent vector to $M$ at $x$, the image $f’(x)(z)$ is a tangent vector to $W$ at $y=f(x)$.

*Proof.* Since $z$ is a tangent vector, we see there exists a curve $\alpha: J \to M$ such that $\alpha(0)=x$ and $\alpha’(0)=z$ where $J$ is an open interval containing $0$. The function $\beta = f \circ \alpha: J \to W$ is also a curve satisfying $\beta(0)=f(\alpha(0))=f(x)$ and

which is our desired curve. $\square$

We shall show that equivalence relation makes sense. Suppose $M$ is a $d$-submanifold of $\mathbb{R}^n$, $x \in M$ and $z$ is a tangent vector to $M$ at $x$. Let $(\Omega,g)$ be a coordinate system centered at $x$. Since $g \in C^p(\mathbb{R}^d;\mathbb{R}^n)$, we see $g’(0)$ is a $n \times d$ matrix, and injectivity ensures that $\operatorname{rank}(g’(0))=d$.

Every open set $\Omega \subset \mathbb{R}^d$ is a $d$-dimensional submanifold of $\mathbb{R}^d$ (of $C^p$). Suppose now $v \in \mathbb{R}^d$ is a tangent vector to $\Omega$ at $0$ (determined by a curve $\alpha$), then by Theorem 3, $g \circ \alpha$ determines a tangent vector to $M$ at $x$, which is $z_x=g’(0)(v)$. Suppose $(\Lambda,h)$ is another coordinate system centered at $x$. If we want to obtain $z_x$ as well, we must have

which is equivalent to

for some $w \in \mathbb{R}^d$ which is the tangent vector to $\Lambda$ at $0 \in \Lambda$. *(The inverse makes sense since we implicitly restricted ourself to $\mathbb{R}^d$)*

However, we also have two charts by $(U,\varphi)=(g(\Omega),g^{-1})$ and $(V,\psi) = (h(\Lambda),h^{-1})$, which gives

and this is just our equivalence relation (don’t forget that $g(0)=x$ hence $g^{-1}(x)=\varphi(x)=0$!). There we have our reason for equivalence relation: If $(U,\varphi,v) \sim (V,\psi,w)$, then $(U,\varphi,u)$ and $(V,\psi,v)$ determines the same tangent vector but we do not have to evaluate it manually. In general, all elements in an equivalence class represent a single vector, so the vector is (algebraically) a equivalence class. This still holds when talking about Banach manifold since topological properties of Euclidean spaces do not play a role. The generalized proof can be implemented with little difficulty.

The tangent vectors at $x \in M$ span a vector space (which is based at $x$). We do hope that because if not our definition of tangent vector would be incomplete and cannot even hold for an trivial example (such as what we mentioned at the beginning). We shall show, satisfyingly, the set of tangent vectors to $M$ at $x$ (which we write $T_xM$) forms a vector space that is toplinearly isomorphic to $\mathbf{E}$, on which $M$ is modeled.

(Theorem 4)$T_xM \simeq \mathbf{E}$. In other words, $T_xM$ can be given the structure of topological vector space given by the chart.

*Proof.* Let $(U,\varphi)$ be a chart at $x$. For $v \in \mathbf{E}$, we see $(\varphi^{-1})’(x)(v)$ is a tangent vector at $x$. On the other hand, pick $\mathbf{w} \in T_xM$, which can be represented by $(V,\psi,w)$. Then

makes $(U,\varphi,v) \sim (V,\psi,w)$ uniquely, and therefore we get some $v \in \mathbf{E}$. To conclude,

which proves our theorem. Note that this does not depend on the choice of charts. $\square$

For many reasons it is not a good idea to identify $T_xM$ as $\mathbf{E}$ without mentioning the point $x$. For example we shouldn’t identify the tangent line of a curve as $x$-axis. Instead, it would be better to identify or visualize $T_xM$ as $(x,\mathbf{E})$, that is, a linear space with origin at $x$.

Now we treat *all* tangent spaces as a vector bundle. Let $M$ be a manifold of class $C^p$ with $p \geq 1$, define the tangent bundle by the disjoint union

This is a vector bundle if we define the projection by

and we will verify it soon. First let’s see an example. Below is a visualization of the tangent bundle of $\frac{x^2}{4}+\frac{y^2}{3}=1$, denoted by red lines:

Also we can see $\pi$ maps points on the blue line to a point on the curve, which is $B$.

To show that a tangent bundle of a manifold is a vector bundle, we need to verify that it satisfies three conditions we mentioned in previous post. Let $(U,\varphi)$ be a chart of $M$ such that $\varphi(U)$ is open in $\mathbf{E}$, then tangent vectors can be represented by $(U,\varphi,v)$. We get a bijection

by definition of tangent vectors as equivalence classes. Let $z_x$ be a tangent vector to $U$ at $x$, then there exists some $v \in \mathbf{E}$ such that $(U,\varphi,v)$ represents $z$. On the other hand, for some $v \in \mathbf{E}$ and $x \in U$, $(U,\varphi,v)$ represents some tangent vector at $x$. Explicitly,

Further we get the following diagram commutative (which establishes **VB 1**):

For **VB 2** and **VB 3** we need to check different charts. Let $(U_i,\varphi_i)$, $(U_j,\varphi_j)$ be two charts. Define $\varphi_{ji}=\varphi_j \circ \varphi_i^{-1}$ on $\varphi_i(U_i \cap U_j)$, and respectively we write $\tau_{U_i}=\tau_i$ and $\tau_{U_j}=\tau_j$. Then we get a transition mapping

One can verify that

for $x \in U_i \cap U_j$ and $v \in \mathbf{E}$. Since $D\varphi_{ji} \in C^{p-1}$ and $D\varphi_{ji}(x)$ is a toplinear isomorphism, we see

is a morphism, which goes for **VB 3**. It remains to verify **VB 2**. To do this we need a fact from Banach space theory:

If $f:U \to L(\mathbf{E},\mathbf{F})$ is a $C^k$-morphism, then the map of $U \times \mathbf{E}$ into $\mathbf{F}$ given by

is a $C^k$-morphism.

Here, we have $f(x)=\tau_{ji}(x,\cdot)$ and to conclude, $\tau_{ji}$ is a $C^{p-1}$-morphism. It is also an isomorphism since it has an inverse $\tau_{ij}$. Following the definition of manifold, we can conclude that $T(U)$ has a unique **manifold structure** such that $\tau_i$ are morphisms (there will be a formal proof in next post about any total space of a vector bundle). By **VB 1**, we also have $\pi=\tau_i\circ pr$, which makes it a morphism as well. On each fiber $\pi^{-1}(x)$, we can freely transport the topological vector space structure of any $\mathbf{E}$ such that $x$ lies in $U_i$, by means of $\tau_{ix}$. Since $f(x)$ is a toplinear isomorphism, the result is independent of the choice of $U_i$. **VB 2** is therefore established.

Using some fancier word, we can also say that $T:M \to T(M)$ is a **functor** from the category of $C^p$-manifolds to the category of vector bundles of class $C^{p-1}$.

If $f$ is of $L^p(\mu)$, which means $\lVert f \rVert_p=\left(\int_X |f|^p d\mu\right)^{1/p}<\infty$, or equivalently $\int_X |f|^p d\mu<\infty$, then we may say $|f|^p$ is of $L^1(\mu)$. In other words, we have a function

This function does not have to be one to one due to absolute value. But we hope this function to be ‘fine’ enough, at the very least, we hope it is continuous.

Here, $f \sim g$ means that $f-g$ equals to $0$ almost everywhere with respect to $\mu$.

We still use $\varepsilon-\delta$ argument but it’s in a metric space. Suppose $(X,d_1)$ and $(Y,d_2)$ are two metric spaces and $f:X \to Y$ is a function. We say $f$ is continuous at $x_0 \in X$ if for any $\varepsilon>0$, there exists some $\delta>0$ such that $d_2(f(x_0),f(x))<\varepsilon$ whenever $d_1(x_0,x)<\delta$. Further, we say $f$ is continuous on $X$ if $f$ is continuous at every point $x \in X$.

For $1\leq p<\infty$, we already have a metric by

given that $d(f,g)=0$ if and only if $f \sim g$. This is complete and makes $L^p$ a Banach space. But for $0<p<1$ (yes we are going to cover that), things are much more different, and there is one reason: Minkowski inequality holds reversely! In fact we have

for $0<p<1$. In fact, $L^p$ space has too many weird things when $0<p<1$. Precisely,

For $0<p<1$, $L^p(\mu)$ is locally convex if and only if $\mu$ assumes finitely many values. (Proof.)

On the other hand, for example, $X=[0,1]$ and $\mu=m$ be the Lebesgue measure, then $L^p(\mu)$ has *no* open convex subset other than $\varnothing$ and $L^p(\mu)$ itself. However,

A topological vector space $X$ is normable if and only if its origin has a convex bounded neighbourhood. (See Kolmogorov’s normability criterion.)

Therefore $L^p(m)$ is not normable, hence not Banach.

We have gone too far. We need a metric that is fine enough.

*In this subsection we always have $0<p<1$.*

Define

for $f \in L^p(\mu)$. We will show that we have a metric by

Fix $y\geq 0$, consider the function

We have $f(0)=y^p$ and

when $x > 0$ and hence $f(x)$ is nonincreasing on $[0,\infty)$, which implies that

Hence for any $f$, $g \in L^p$, we have

This inequality ensures that

is a metric. It’s immediate that $d(f,g)=d(g,f) \geq 0$ for all $f$, $g \in L^p(\mu)$. For the triangle inequality, note that

This is translate-invariant as well since

The completeness can be verified in the same way as the case when $p>1$. In fact, this metric makes $L^p$ a locally bounded F-space.

The metric of $L^1$ is defined by

We need to find a relation between $d_p(f,g)$ and $d_1(\lambda(f),\lambda(g))$, where $d_p$ is the metric of the corresponding $L^p$ space.

As we have proved,

Without loss of generality we assume $x \geq y$ and therefore

Hence

By interchanging $x$ and $y$, we get

Replacing $x$ and $y$ with $|f|$ and $|g|$ where $f$, $g \in L^p$, we get

But

and we therefore have

Hence $\lambda$ is continuous (and in fact, Lipschitz continuous and uniformly continuous) when $0<p<1$.

It’s natural to think about Minkowski’s inequality and Hölder’s inequality in this case since they are critical inequality enablers. You need to think about some examples of how to create the condition to use them and get a fine result. In this section we need to prove that

This inequality is surprisingly easy to prove however. We will use nothing but the mean value theorem. Without loss of generality we assume that $x > y \geq 0$ and define $f(t)=t^p$. Then

where $y < \zeta < x$. But since $p-1 \geq 0$, we see $\zeta^{p-1} < x^{p-1} <x^{p-1}+y^{p-1}$. Therefore

For $x=y$ the equality holds.

Therefore

By *Hölder’s inequality*, we have

By *Minkowski’s inequality*, we have

Now things are clear. Since $1/p+1/q=1$, or equivalently $1/q=(p-1)/p$, suppose $\lVert f \rVert_p$, $\lVert g \rVert_p \leq R$, then $(p-1)q=p$ and therefore

Summing the inequalities above, we get

hence $\lambda$ is continuous.

We have proved that $\lambda$ is continuous, and when $0<p<1$, we have seen that $\lambda$ is Lipschitz continuous. It’s natural to think about its differentiability afterwards, but the absolute value function is not even differentiable so we may have no chance. But this is still a fine enough result. For example we have no restriction to $(X,\mathfrak{M},\mu)$ other than the positivity of $\mu$. Therefore we may take $\mathbb{R}^n$ as the Lebesgue measure space here, or we can take something else.

It’s also interesting how we use elementary Calculus to solve some much more abstract problems.

]]>Direction is a considerable thing. For example take a look at this picture (by David Gunderman):

The position of the red ball and black ball shows that this triple of balls turns upside down every time they finish one round. This wouldn’t happen if this triple were on a normal band, which can be denoted by $S^1 \times (0,1)$. What would happen if we try to describe their velocity on the Möbius band, both locally and globally? There must be some significant difference from a normal band. If we set some move pattern on balls, for example let them run horizontally or zig-zagly, hopefully we get different *set* of vectors. those vectors can span some vector spaces as well.

Here and in the forgoing posts, we will try to develop purely formally certain functorial constructions having to do with vector bundles. It may be overly generalized, but we will offer some examples to make it concrete.

Let $M$ be a manifold (of class $C^p$, where $p \geq 0$ and can be set to $\infty$) modeled on a Banach space $\mathbf{E}$. Let $E$ be another topological space and $\pi: E \to M$ a surjective $C^p$-morphism. A **vector bundle** is a topological construction associated with $M$ (base space), $E$ (total space) and $\pi$ (bundle projection) such that, roughly speaking, $E$ is locally a product of $M$ and $\mathbf{E}$.

We use $\mathbf{E}$ instead of $\mathbb{R}^n$ to include the infinite dimensional cases. We will try to distinguish finite-dimensional and infinite-dimensional Banach spaces here. There are a lot of things to do, since, for example, infinite dimensional Banach spaces have no countable Hamel basis, while the finite-dimensional ones have finite ones (this can be proved by using the Baire category theorem).

Next we will show precisely how $E$ locally becomes a product space. Let $\mathfrak{U}=(U_i)_i$ be an open covering of $M$, and for each $i$, suppose that we are *given* a mapping

satisfying the following three conditions.

**VB 1** $\tau_i$ is a $C^p$ diffeomorphism making the following diagram commutative:

where $pr$ is the projection of the first component: $(x,y) \mapsto x$. By restricting $\tau_i$ on one point of $U_i$, we obtain an isomorphism on each fiber $\pi^{-1}(x)$:

**VB 2** For each pair of open sets $U_i$, $U_j \in \mathfrak{U}$, we have the map

to be a toplinear isomorphism (that is, it preserves $\mathbf{E}$ for being a *topological* vector space).

**VB 3** For any two members $U_i$, $U_j \in \mathfrak{U}$, we have the following function to be a $C^p$-morphism:

**REMARKS.** As with manifold, we call the set of 2-tuples $(U_i,\tau_i)_i$ a **trivializing covering** of $\pi$, and that $(\tau_i)$ are its **trivializing maps**. Precisely, for $x \in U_i$, we say $U_i$ or $\tau_i$ trivializes at $x$.

Two trivializing *coverings* for $\pi$ is said to be **VB-equivalent** if taken together they also satisfy conditions of **VB 2** and **VB 3**. It’s immediate that **VB-equivalence** is an equivalence relation and we leave the verification to the reader. It is this VB-equivalence *class* of trivializing coverings that determines a structure of **vector bundle** on $\pi$. With respect to the Banach space $\mathbf{E}$, we say that the vector bundle has **fiber** $\mathbf{E}$, or is **modeled on** $\mathbf{E}$.

Next we shall give some motivations of each condition. Each pair $(U_i,\tau_i)$ determines a local product of ‘a part of the manifold’ and the model space, on the latter of which we can deploy the direction with ease. This is what **VB 1** tells us. But that’s far from enough if we want our vectors fine enough. We do want the total space $E$ to actually be able to qualify our requirements. As for **VB 2**, it is ensured that using two different trivializing maps will give the same structure of some Banach spaces (with *equivalent* norms). According to the image of $\tau_{ix}$, we can say, for each point $x \in X$, which can be determined by a fiber $\pi^{-1}(x)$ (the pre-image of $\tau_{ix}$), can be given another Banach space by being sent via $\tau_{jx}$ for some $j$. Note that $\pi^{-1}(x) \in E$, the total space. In fact, **VB 2** has an equivalent alternative:

**VB 2’** On each fiber $\pi^{-1}(x)$ we are given a structure of Banach space as follows. For $x \in U_i$, we have a toplinear isomorphism which is in fact the trivializing map:

As stated, **VB 2** implies **VB 2’**. Conversely, if **VB 2’** is satisfied, then for open sets $U_i$, $U_j \in \mathfrak{U}$, and $x \in U_i \cap U_j$, we have $\tau_{jx} \circ \tau_{ix}^{-1}:\mathbf{E} \to \mathbf{E}$ to be an toplinear isomorphism. Hence, we can consider **VB 2** or **VB 2’** as the refinement of **VB 1**.

In finite dimensional case, one can omit **VB 3** since it can be implied by **VB 2**, and we will prove it below.

(Lemma)Let $\mathbf{E}$ and $\mathbf{F}$ be two finite dimensional Banach spaces. Let $U$ be open in some Banach space. Letbe a $C^p$-morphism such that for each $x \in U$, the map

given by $f_x(v)=f(x,v)$ is a linear map. Then the map of $U$ into $L(\mathbf{E},\mathbf{F})$ given by $x \mapsto f_x$ is a $C^p$-morphism.

**PROOF.** Since $L(\mathbf{E},\mathbf{F})=L(\mathbf{E},\mathbf{F_1}) \times L(\mathbf{E},\mathbf{F_2}) \times \cdots \times L(\mathbf{E},\mathbf{F_n})$ where $\mathbf{F}=\mathbf{F_1} \times \cdots \times \mathbf{F_n}$, by induction on the dimension of $\mathbf{F}$ and $\mathbf{E}$, it suffices to assume that $\mathbf{E}$ and $\mathbf{F}$ are toplinearly isomorphic to $\mathbb{R}$. But in that case, the function $f(x,v)$ can be written $g(x)v$ for some $g:U \to \mathbb{R}$. Since $f$ is a morphism, it follows that as a function of each argument $x$, $v$ is also a morphism, Putting $v=1$ shows that $g$ is also a morphism, which finishes the case when both the dimension of $\mathbf{E}$ and $\mathbf{F}$ are equal to $1$, and the proof is completed by induction. $\blacksquare$

To show that **VB 3** is implied by **VB 2**, put $\mathbf{E}=\mathbf{F}$ as in the lemma. Note that $\tau_j \circ \tau_i^{-1}$ maps $U_i \cap U_j \times \mathbf{E}$ to $\mathbf{E}$, and $U_i \cap U_j$ is open, and for each $x \in U_i \cap U_j$, the map $(\tau_j \circ \tau_i^{-1})_x=\tau_{jx} \circ \tau_{ix}^{-1}$ is toplinear, hence linear. Then the fact that $\varphi$ is a morphism follows from the lemma.

Let $M$ be any $n$-dimensional smooth manifold that you are familiar with, then $pr:M \times \mathbb{R}^n \to M$ is actually a vector bundle. Here the total space is $M \times \mathbb{R}^n$ and the base is $M$ and $pr$ is the bundle projection but in this case it is simply a projection. Intuitively, on a total space, we can determine a point $x \in M$, and another component can be any direction in $\mathbb{R}^n$, hence a *vector*.

We need to verify three conditions carefully. Let $(U_i,\varphi_i)_i$ be any atlas of $M$, and $\tau_i$ is the identity map on $U_i$ (which is naturally of $C^p$). We claim that $(U_i,\tau_i)_i$ satisfy the three conditions, thus we get a vector bundle.

For **VB 1** things are clear: since $pr^{-1}(U_i)=U_i \times \mathbb{R}^n$, the diagram is commutative. Each fiber $pr^{-1}(x)$ is essentially $(x) \times \mathbb{R}^n$, and still, $\tau_{jx} \circ \tau_{ix}^{-1}$ is the identity map between $(x) \times \mathbb{R}^n$ and $(x) \times \mathbb{R}^n$, under the same Euclidean topology, hence **VB 2** is verified, and we have no need to verify **VB 3**.

First of all, imagine you have embedded a circle into a Möbius band. Now we try to give some formal definition. As with quotient topology, $S^1$ can be defined as

where $I$ is the unit interval and $0 \sim_1 1$ (identifying two ends). On the other hand, the infinite Möbius band can be defined by

where $(0,v) \sim_2 (1,-v)$ for all $v \in \mathbb{R}$ (not only identifying two ends of $I$ but also ‘flips’ the vertical line). Then all we need is a natural projection on the first component:

And the verification has few difference from the trivial bundle. Quotient topology of Banach spaces follows naturally in this case, but things might be troublesome if we restrict ourself in $\mathbb{R}^n$.

The first example is relatively rare in many senses. By $S^n$ we mean the set in $\mathbb{R}^{n+1}$ with

and the tangent bundle can be defined by

where, of course, $\mathbf{x} \in S^n$ and $\mathbf{y} \in \mathbb{R}^{n+1}$. The vector bundle is given by $pr:TS^n \to S^n$ where $pr$ is the projection of the first factor. This total space is of course much finer than $M \times \mathbb{R}^n$ in the first example. Each point in the manifold now is associated with a *tangent space* $T_x(M)$ at this point.

More generally, we can define it in any Hilbert space $H$, for example, $L^2$ space:

where

The projection is natural:

But we will not cover the verification in this post since it is required to introduce the abstract definition of tangent vectors. This will be done in the following post.

We want to study those ‘vectors’ associated to some manifold both globally and locally. For example we may want to describe the tangent line of some curves at some point without heavily using elementary calculus stuff. Also, we may want to describe the vector bundle of a manifold globally, for example, when will we have a trivial one? Can we classify the manifold using the behavior of the bundle? Can we make it a little more abstract, for example, consider the class of all isomorphism bundles? How do one bundle *transform* to another? But to do this we need a big amount of definitions and propositions.

We can define several relations between two norms. Suppose we have a topological vector space $X$ and two norms $\lVert \cdot \rVert_1$ and $\lVert \cdot \rVert_2$. One says $\lVert \cdot \rVert_1$ is *weaker* than $\lVert \cdot \rVert_2$ if there is $K>0$ such that $\lVert x \rVert_1 \leq K \lVert x \rVert_2$ for all $x \in X$. Two norms are *equivalent* if each is weaker than the other (trivially this is a equivalence relation). The idea of stronger and weaker norms is related to the idea of the “finer” and “coarser” topologies in the setting of topological spaces.

So what about their limit of convergence? Unsurprisingly this can be verified with elementary $\epsilon-N$ arguments. Suppose now $\lVert x_n - x \rVert_1 \to 0$ as $n \to 0$, we immediately have

for some large enough $n$. Hence $\lVert x_n - x \rVert_2 \to 0$ as well. But what about the converse? We give a new definition of equivalence relation between norms.

(Definition)Two norms $\lVert \cdot \rVert_1$ and $\lVert \cdot \rVert_2$ of a topological vector space arecompatibleif given that $\lVert x_n - x \rVert_1 \to 0$ and $\lVert x_n - y \rVert_2 \to 0$ as $n \to \infty$, we have $x=y$.

By the uniqueness of limit, we see if two norms are equivalent, then they are compatible. And surprisingly, with the help of the closed graph theorem we will discuss in this post, we have

(Theorem 1)If $\lVert \cdot \rVert_1$ and $\lVert \cdot \rVert_2$ are compatible, and both $(X,\lVert\cdot\rVert_1)$ and $(X,\lVert\cdot\rVert_2)$ are Banach, then $\lVert\cdot\rVert_1$ and $\lVert\cdot\rVert_2$ are equivalent.

This result looks natural but not seemingly easy to prove, since one find no way to build a bridge between the limit and a general inequality. But before that, we need to elaborate some terminologies.

(Definition)For $f:X \to Y$, thegraphof $f$ is defined by

If both $X$ and $Y$ are topological spaces, and the topology of $X \times Y$ is the usual one, that is, the smallest topology that contains all sets $U \times V$ where $U$ and $V$ are open in $X$ and $Y$ respectively, and if $f: X \to Y$ is continuous, it is natural to expect $G(f)$ to be closed. For example, by taking $f(x)=x$ and $X=Y=\mathbb{R}$, one would expect the diagonal line of the plane to be closed.

(Definition)The topological space $(X,\tau)$ is an $F$-space if $\tau$ is induced by a complete invariant metric $d$. Here invariant means that $d(x+z,y+z)=d(x,y)$ for all $x,y,z \in X$.

A Banach space is easily to be verified to be a $F$-space by defining $d(x,y)=\lVert x-y \rVert$.

(Open mapping theorem)See this post

By definition of closed set, we have a practical criterion on whether $G(f)$ is closed.

(Proposition 1)$G(f)$ is closed if and only if, for any sequence $(x_n)$ such that the limitsexist, we have $y=f(x)$.

In this case, we say $f$ is closed. For continuous functions, things are trivial.

(Proposition 2)If $X$ and $Y$ are two topological spaces and $Y$ is Hausdorff, and $f:X \to Y$ is continuous, then $G(f)$ is closed.

*Proof.* Let $G^c$ be the complement of $G(f)$ with respect to $X \times Y$. Fix $(x_0,y_0) \in G^c$, we see $y_0 \neq f(x_0)$. By the Hausdorff property of $Y$, there exists some open subsets $U \subset Y$ and $V \subset Y$ such that $y_0 \in U$ and $f(x_0) \in V$ and $U \cap V = \varnothing$. Since $f$ is continuous, we see $W=f^{-1}(V)$ is open in $X$. We obtained a open neighborhood $W \times U$ containing $(x_0,y_0)$ which has empty intersection with $G(f)$. This is to say, every point of $G^c$ has a open neighborhood contained in $G^c$, hence a interior point. Therefore $G^c$ is open, which is to say that $G(f)$ is closed. $\square$

**REMARKS.** For $X \times Y=\mathbb{R} \times \mathbb{R}$, we have a simple visualization. For $\varepsilon>0$, there exists some $\delta$ such that $|f(x)-f(x_0)|<\varepsilon$ whenever $|x-x_0|<\delta$. For $y_0 \neq f(x_0)$, pick $\varepsilon$ such that $0<\varepsilon<\frac{1}{2}|f(x_0)-y_0|$, we have two boxes ($CDEF$ and $GHJI$ on the picture), namely

and

In this case, $B_2$ will not intersect the graph of $f$, hence $(x_0,y_0)$ is an interior point of $G^c$.

The Hausdorff property of $Y$ is not removable. To see this, since $X$ has no restriction, it suffices to take a look at $X \times X$. Let $f$ be the identity map (which is continuous), we see the graph

is the diagonal. Suppose $X$ is not Hausdorff, we reach a contradiction. By definition, there exists some distinct $x$ and $y$ such that all neighborhoods of $x$ contain $y$. Pick $(x,y) \in G^c$, then *all* neighborhoods of $(x,y) \in X \times X$ contain $(x,x)$ so $(x,y) \in G^c$ is *not* a interior point of $G^c$, hence $G^c$ is not open.

Also, as an immediate consequence, every affine algebraic variety in $\mathbb{C}^n$ and $\mathbb{R}^n$ is closed with respect to Euclidean topology. Further, we have the Zariski topology $\mathcal{Z}$ by claiming that, if $V$ is an affine algebraic variety, then $V^c \in \mathcal{Z}$. It’s worth noting that $\mathcal{Z}$ is *not* Hausdorff (example?) and in fact much coarser than the Euclidean topology although an affine algebraic variety is both closed in the Zariski topology and the Euclidean topology.

After we have proved this theorem, we are able to prove the theorem about compatible norms. We shall assume that both $X$ and $Y$ are $F$-spaces, since the norm plays no critical role here. This offers a greater variety but shall not be considered as an abuse of abstraction.

(The Closed Graph Theorem)Suppose(a) $X$ and $Y$ are $F$-spaces,

(b) $f:X \to Y$ is linear,

(c) $G(f)$ is closed in $X \times Y$.

Then $f$ is continuous.

In short, the closed graph theorem gives a sufficient condition to claim the continuity of $f$ (keep in mind, linearity does not imply continuity). If $f:X \to Y$ is continuous, then $G(f)$ is closed; if $G(f)$ is closed and $f$ is linear, then $f$ is continuous.

*Proof.* First of all we should make $X \times Y$ an $F$-space by assigning addition, scalar multiplication and metric. Addition and scalar multiplication are defined componentwise in the nature of things:

The metric can be defined without extra effort:

Then it can be verified that $X \times Y$ is a topological space with translate invariant metric. (Potentially the verifications will be added in the future but it’s recommended to do it yourself.)

Since $f$ is linear, the graph $G(f)$ is a subspace of $X \times Y$. Next we quote an elementary result in point-set topology, a subset of a complete metric space is closed if and only if it’s complete, by the translate-invariance of $d$, we see $G(f)$ is an $F$-space as well. Let $p_1: X \times Y \to X$ and $p_2: X \times Y \to Y$ be the natural projections respectively (for example, $p_1(x,y)=x$). Our proof is done by verifying the properties of $p_1$ and $p_2$ on $G(f)$.

*For simplicity one can simply define $p_1$ on $G(f)$ instead of the whole space $X \times Y$, but we make it a global projection on purpose to emphasize the difference between global properties and local properties. One can also write $p_1|_{G(f)}$ to dodge confusion.*

**Claim 1.** $p_1$ (with restriction on $G(f)$) defines an isomorphism between $G(f)$ and $X$.

For $x \in X$, we see $p_1(x,f(x)) = x$ (surjectivity). If $p_1(x,f(x))=0$, we see $x=0$ and therefore $(x,f(x))=(0,0)$, hence the restriction of $p_1$ on $G$ has trivial kernel (injectivity). Further, it’s trivial that $p_1$ is linear.

**Claim 2.** $p_1$ is continuous on $G(f)$.

For every sequence $(x_n)$ such that $\lim_{n \to \infty}x_n=x$, we have $\lim_{n \to \infty}f(x_n)=f(x)$ since $G(f)$ is closed, and therefore $\lim_{n \to \infty}p_1(x_n,f(x_n)) =x$. Meanwhile $p_1(x,f(x))=x$. The continuity of $p_1$ is proved.

**Claim 3.** $p_1$ is a homeomorphism with restriction on $G(f)$.

We already know that $G(f)$ is an $F$-space, so is $X$. For $p_1$ we have $p_1(G(f))=X$ is of the second category (since it’s an $F$-space and $p_1$ is one-to-one), and $p_1$ is continuous and linear on $G(f)$. By the open mapping theorem, $p_1$ is an open mapping on $G(f)$, hence is a homeomorphism thereafter.

**Claim 4.** $p_2$ is continuous.

This follows the same way as the proof of claim 2 but much easier since we have no need to care about $f$.

Now things are immediate once one realizes that $f=p_2 \circ p_1|_{G(f)}^{-1}$, and hence $f$ is continuous. $\square$

Before we go for theorem 1 at the beginning, we drop an application on Hilbert spaces.

Let $T$ be a bounded operator on the Hilbert space $L_2([0,1])$ so that if $\phi \in L_2([0,1])$ is a continuous function so is $T\phi$. Then the restriction of $T$ to $C([0,1])$ is a bounded operator of $C([0,1])$.

For details please check this.

Now we go for the identification of norms. Define

i.e. the identity map between two Banach spaces (hence $F$-spaces). Then $f$ is linear. We need to prove that $G(f)$ is closed. For the convergent sequence $(x_n)$

we have

Hence $G(f)$ is closed. Therefore $f$ is continuous, hence bounded, we have some $K$ such that

By defining

we see $g$ is continuous as well, hence we have some $K’$ such that

Hence two norms are weaker than each other.

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it’s time to make a list of the series. It’s been around half a year.

- The Big Three Pt. 1 - Baire Category Theorem Explained
- The Big Three Pt. 2 - The Banach-Steinhaus Theorem
- The Big Three Pt. 3 - The Open Mapping Theorem (Banach Space)
- The Big Three Pt. 4 - The Open Mapping Theorem (F-Space)
- The Big Three Pt. 5 - The Hahn-Banach Theorem (Dominated Extension)
- The Big Three Pt. 6 - Closed Graph Theorem with Applications

- Walter Rudin,
*Functional Analysis* - Peter Lax,
*Functional Analysis* - Jesús Gil de Lamadrid,
*Some Simple Applications of the Closed Graph Theorem*

Partition of unity builds a bridge between local properties and global properties. A nice example is the Stokes’ theorem on manifolds.

Suppose $\omega$ is a $(n-1)$-form with compact support on a oriented manifold $M$ of dimension $n$ and if $\partial{M}$ is given the induced orientation, then

This theorem can be proved in two steps. First, by Fubini’s theorem, one proves the identity on $\mathbb{R}^n$ and $\mathbb{H}^n$. Second, for the general case, let $(U_\alpha)$ be an oriented atlas for $M$ and $(\rho_\alpha)$ a partition of unity to $(U_\alpha)$, one naturally writes $\omega=\sum_{\alpha}\rho_\alpha\omega$. Since $\int_M d\omega=\int_{\partial M}\omega$ is linear with respect to $\omega$, it suffices to prove it only for $\rho_\alpha\omega$. Note that the support of $\rho_\alpha\omega$ is contained in the intersection of supports of $\rho_\alpha$ and $\omega$, hence a compact set.

On the other hand, $U_\alpha$ is diffeomorphic to either $\mathbb{R}^n$ or $\mathbb{H}^n$, it is immediate that

Which furnishes the proof for the general case.

As is seen, to prove a global thing, we do it locally. If you have trouble with these terminologies, never mind. We will go through this right now (in a more abstract way however). If you are familiar with them however, fell free to skip.

Throughout, we use bold letters like $\mathbf{E}$, $\mathbf{F}$ to denote Banach spaces. We will treat Euclidean spaces as a case instead of our restriction. Indeed since Banach spaces are not necessarily of finite dimension, our approach can be troublesome. But the benefit is a better view of abstraction.

Let $X$ be a set. An

atlas of class$C^p$ ($p \geq 0$) on $X$ is a collection of pairs $(U_i,\varphi_i)$ where $i$ ranges through some indexing set, satisfying the following conditions:

AT 1.Each $U_i$ is a subset of $X$ and $\bigcup_{i}U_i=X$.

AT 2.Each $\varphi_i$ is a bijection of $U_i$ onto an open subset $\varphi_iU_i$ of some Banach space $\mathbf{E}_i$ and for any $i$ and $j$, $\phi_i(U_i \cap U_j)$ is open in $E_i$.

AT 3.The mapis a $C^p$-isomorphism for all $i$ and $j$.

One should be advised that isomorphism here does not come from group theory, but category theory. Precisely speaking, it’s the isomorphism in the category $\mathfrak{O}$ whose objects are the continuous maps of Banach spaces and whose morphisms are the continuous maps of class $C^p$.

Also, by setting $\tau_X=(U_i)_i$, we see $\tau_X$ is a topology, and $\varphi_i$ are topological isomorphisms. Also, we see no need to assume that $X$ is Hausdorff unless we start with Hausdorff spaces. Lifting this restriction gives us more freedom (also sometimes more difficulty to some extent though).

For condition **AT 2**, we did not require that the vector spaces be the same for all indexes $i$, or even that they be toplinearly isomorphic. If they are all equal to the same space $\mathbf{E}$, then we say that the atlas is an $\mathbf{E}$-atlas.

Suppose that we are given an open subset $U$ of $X$ and a topological isomorphism $\phi:U \to U’$ onto an open subset of some Banach space $\mathbb{E}$. We shall say that $(U,\varphi)$ is **compatible** with the atlas $(U_i,\varphi_i)_i$ if each map $\varphi\circ\varphi^{-1}$ is a $C^p$-isomorphism. Two atlas are said to be **compatible** if each chart of one is compatible with other atlas. It can be verified that this is a equivalence relation. *An equivalence relation of atlases of class $C^p$ on $X$ is said to define a structure of $C^p$- manifold on $X$.* If all the vector spaces $\mathbf{E}_i$ in some atlas are toplinearly isomorphic, we can find some universal $\mathbf{E}$ that is equal to all of them. In this case, we say $X$ is a $\mathbf{E}$-manifold or that $X$ is

As we know, $\mathbb{R}^n$ is a Banach space. If $\mathbf{E}=\mathbb{R}^n$ for some fixed $n$, then we say that the manifold is $n$-dimensional. Also we have the **local coordinates**. A chart

is given by $n$ coordinate functions $\varphi_1,\cdots,\varphi_n$. If $P$ denotes a point of $U$, these functions are often written

or simply $x_1,\cdots,x_n$.

Let $X$ be a topological space. A covering $\mathfrak{U}$ of $X$ is **locally finite** if every point $x$ has a neighborhood $U$ such that all but a finite number of members of $\mathfrak{U}$ do not intersect with $U$ (as you will see, this prevents some nonsense summation). A **refinement** of a covering $\mathfrak{U}$ is a covering $\mathfrak{U}’$ such that for any $U’ \in \mathfrak{U}’$, there exists some $U \in \mathfrak{U}$ such that $U’ \subset U$. If we write $\mathfrak{U} \leq \mathfrak{U}’$ in this case, we see that the set of open covers on a topological space forms a *direct set*.

A topological space is **paracompact** if it is Hausdorff, and every open covering has a locally finite open refinement. Here follows some examples of paracompact spaces.

- Any compact Hausdorff space.
- Any CW complex.
- Any metric space (hence $\mathbb{R}^n$).
- Any Hausdorff Lindelöf space.
- Any Hausdorff $\sigma$-compact space

These are not too difficult to prove, and one can easily find proofs on the Internet. Below are several key properties of paracompact spaces.

If $X$ is paracompact, then $X$ is normal. (Proof here)

Let $X$ be a paracompact (hence normal) space and $\mathfrak{U}=(U_i)$ a locally finite open cover, then there exists a locally finite open covering $\mathfrak{V}=(V_i)$ such that $\overline{V_i} \subset U_i$. (Proof here. Note the axiom of choice is assumed.

One can find proofs of the following propositions on *Elements of Mathematics, General Topology, Chapter 1-4* by N. Bourbaki. It’s interesting to compare them to the corresponding ones of compact spaces.

Every closed subspace $F$ of a paracompact space $X$ is paracompact.

The product of a paracompact space and a compact space is paracompact.

Let $X$ be a locally compact paracompact space. Then every open covering $\mathfrak{R}$ of $X$ has a locally finite open refinement $\mathfrak{R}’$ formed of relatively compact sets. If $X$ is $\sigma$-compact then $\mathfrak{R}’$ can be taken to be countable.

A

partition of unity(of class $C^p$) on a manifold $X$ consists of an open covering $(U_i)$ of $X$ and a family of functionssatisfying the following conditions:

PU 1.For all $x \in X$ we have $\phi_i(x) \geq 0$.

PU 2.The support of $\psi_i$ is contained in $U_i$.

PU 3.The covering is locally finite

PU 4.For each point $x \in X$ we have

The sum in PU 4 makes sense because for given point $x$, there are only finite many $i$ such that $\psi_i(x) >0$, according to PU 3.

A manifold $X$ will be said to **admit partition of unity** if it is paracompact, and if, given a locally finite open covering $(U_i)$, there exists a partition of unity $(\psi_i)$ such that the support of $\psi_i$ is contained in $U_i$.

This function will be useful when dealing with finite dimensional case.

For every integer $n$ and every real number $\delta>0$ there exist maps $\psi_n \in C^{\infty}(\mathbb{R}^n;\mathbb{R})$ which equal $1$ on $B(0,1)$ and vanish in $\mathbb{R}^n\setminus B(1,1+\delta)$.

*Proof.* It suffices to prove it for $\mathbb{R}$ since once we proved the existence of $\psi_1$, then we may write

Consider the function $\phi: \mathbb{R} \to \mathbb{R}$ defined by

The reader may have seen it in some analysis course and should be able to check that $\phi \in C^{\infty}(\mathbb{R};\mathbb{R})$. Integrating $\phi$ from $-\infty$ to $x$ and divide it by $\lVert \phi \rVert_1$ (you may have done it in probability theory) to obtain

it is immediate that $\theta(x)=0$ for $x \leq a$ and $\theta(x)=1$ for $x \geq b$. By taking $a=1$ and $b=(1+\delta)^2$, our job is done by letting $\psi_1(x)=1-\theta(x^2)$. Considering $x^2=|x|^2$, one sees that the identity about $\psi_n$ and $\psi_1$ is redundant. $\square$

In the following blog posts, we will generalize this to Hilbert spaces.

Of course this is desirable. But we will give an example that sometimes we cannot find a satisfying partition of unity.

Let $D$ be a connected bounded open set in $\ell^p$ where $p$ is not an even integer. Assume $f$ is a real-valued function, continuous on $\overline{D}$ and $n$-times differentiable in $D$ with $n \geq p$. Then $f(\overline{D}) \subset \overline{f(\partial D)}$.

(Corollary)Let $f$ be an $n$-times differentiable function on $\ell^p$ space, where $n \geq p$, and $p$ is not an even integer. If $f$ has its support in a bounded set, then $f$ is identically zero.

It follows that for $n \geq p$, $C^n$ partitions of unity do not exists whenever $p$ is not an even integer. For example,e $\ell^1[0,1]$ does not have a $C^2$ partition of unity. It is then our duty to find that under what condition does the desired partition of unity available.

Below are two theorems about the existence of partitions of unity. We are not proving them here but in the future blog post since that would be rather long. The restrictions on $X$ are acceptable. For example $\mathbb{R}^n$ is locally compact and hence the manifold modeled on $\mathbb{R}^n$.

Let $X$ be a manifold which is locally compact Hausdorff and whose topology has a countable base. Then $X$ admits partitions of unity

Let $X$ be a paracompact manifold of class $C^p$, modeled on a separable Hilbert space $E$, then $X$ admits partitions of unity (of class $C^p$)

- N. Bourbaki,
*Elements of Mathematics* - S. Lang,
*Fundamentals of Differential Geometry* - M. Berger,
*Differential Geometry: Manifolds, Curves, and Surfaces* - R. Bonic and J. Frampton,
*Differentiable Functions on Certain Banach Spaces*

对于$\Gamma$函数，我们有一个经典的极限式（证明请见ProofWiki）。

利用这个式子，我们能立刻计算出一些比较难算的极限。注意到这个公式如果写成自然数的形式，有

所以我们能立刻计算出这个极限：

但是Stirling公式不仅仅如此。这篇博客里我们会见到几个比较经典的估计。

这一节我们会看到的结论是

如果在计算器里算一下右边的数，会发现，$\phi_n=\frac{n!}{(n/e)^n\sqrt{2\pi n}}$一直在$1$附近。

对于$m=1,2,3,\dots$，在$y=\ln(x)$下方定义“折线函数”：

其中$m \leq x \leq m+1$。在上方定义另一个“折线函数”：

其中$m-1/2 \leq x < m+1/2$。如果画出$f$，$\ln{x}$，$g$的图像，会发现，$f$和$g$是对$\ln{x}$的拟合。且在$x \geq 1$时，我们有

所以计算定积分的时候就有

但是$f$和$g$的关系并不是那么简单。计算$f$的积分，我们发现

而对于$g$，我们又有

这就说明

总结上面几个不等式，我们得到，对$n>1$：

不等式各项都减去$\int_1^n \ln x dx$，我们又有

由Stirling公式我们知道，

而数列$x_n=-\frac{1}{8n}+\ln(n!)-(\frac{1}{2}+n)\ln{n}+n$是单调递增的，由上式可知收敛到$\ln\sqrt{2\pi}$。在不等式左边，我们取上确界$\ln\sqrt{2\pi}$。在不等式右边，我们取下确界$x_1+\frac{1}{8}=1$。这就让我们得到了

这也就导致

这对所有$n =1,2,3,\dots$都成立。

对于任意$c \in \mathbb{R}$，我们有

这可以看成，把$\Gamma(x)$向左平移$c$后，在$x$足够大时，其值和$x^c\Gamma(x)$接近。这个等式的证明也是比较简单的，虽然计算比较繁琐，只需要利用Stirling公式。

现在这三个因式的极限就很好计算了。显然我们有

以及

最后，

故原极限为$1$。计算过程也非常精彩。注意到如果把$x$和$c$换成正整数$n$和整数$k$，我们又有

结合Bernoulli不等式我们有

接下来我们会给出一个比较精细的估计。实际上，

根据$B(x,y)$函数的定义，

令$t=u^2，我们得到

代入$x=\frac{1}{2}$和$y=n+1$，我们就和所想要的结果很近了：

注意到，利用$B$函数的第二个表达式，我们是可以计算出$\Gamma(\frac{1}{2})$的。实际上，

从而$\Gamma(\frac{1}{2})=\sqrt{\pi}$。对于$B(\frac{1}{2},n+1)$，我们可以用到上面的平移公式了：

从而

最后我们证明一个和Stirling公式没有关系的等式

根据古典代数学基本定理，我们立刻有

注意到另一方面

$x=1$时，我们有

此即

考虑到欧拉反射公式，对于$1 \leq k \leq n-1$，我们有

如果$n$为奇数，那么根据上面的结果，我们能得到

这时我们只用到了一半数量的$k$。要用上另一半的$k$，我们只需要把$k$和$n-k$交换顺序，从而得到了

即为所得。如果$n$为偶数，只需要把$1/2$这一项单独拿出来分两段计算即可。

我们给出两个看上去很难计算的极限式。

如果用Stirling公式直接替换$n!$，这个极限的结果是显然的。

所以只需要求$(1+\frac{1}{n})^{n^2}e^{-n}$的极限即可。但是可千万别想当然地认为这个极限是$1$。如果我们利用Taylor展开，能得到

所以原极限为$\sqrt\frac{2\pi}{e}$

注意$n$项的分子相乘，有$\exp(n-1-\frac{1}{2}-\cdots-\frac{1}{n})$，而调和级数是发散的，我们想得到收敛，自然就要想到Euler常数$\gamma=\lim_{n\to\infty}\left(1+\frac{1}{2}+\cdots+\frac{1}{n}-\ln{n}\right)$。我们似乎也没有办法直接化简分母，我们知道$(1+1/k)^k$的极限是$e$，但是这里似乎用不上。所以不如先把分母展开化简一下。

所以原极限可以写成

这时候就可以直接使用Stirling公式了。

而$\lim_{n\to\infty}\left(1+\frac{1}{n}\right)^{-n}=e^{-1}$，$\lim_{n\to\infty}e^{\ln{n}-1-\frac{1}{2}-\frac{1}{3}-\cdots-\frac{1}{n}}=e^{-\gamma}$，我们得到原极限为$\frac{\sqrt{2\pi}}{e^{1+\gamma}}$

]]>