# The concept of generalised functions (distributions) and derivatives

# Introduction

To begin with we consider a calculus problem that you may have seen in your exam:

Let \(f\) be a

continuousfunction on \([0,\infty)\) that \(\lim_{x \to \infty} f(x)=l\). Prove that \[ \int_0^\infty \frac{f(ax)-f(bx)}{x}\mathrm{d}x = (f(0)-l)\ln\frac{b}{a} \]

And we solve this problem as follows. Put \(g(x)=f(x)-l\), then \(\lim_{x \to \infty}g(x)=0\). Consider the
two variable function \(F(x,y)=-g'(xy)\) and the range \(D=\{(x,y):x \ge 0, a \le y \le b\}\), we
have this result: \[
\begin{aligned}
\iint_D F(x,y)\mathrm{d}x\mathrm{d}y &=
\int_0^\infty\mathrm{d}x\int_a^b -g'(xy)\mathrm{d}y \\
&= \int_0^\infty
\frac{g(ax)-g(bx)}{x}\mathrm{d}x \\
&= \int_a^b \mathrm{d}y
\int_0^\infty -g'(xy)\mathrm{d}x \\
&= \int_a^b
\frac{g(0)}{y}\mathrm{d}y \\
&=g(0)\ln\frac{b}{a} \\
\end{aligned}
\] Substituting \(g(x)\) with
\(f(x)-l\) gives exactly what we want,
isn't it? **Well, the more analysis you learn, the more absurd
this proof has been you will realise.** If you write this in an
exam you will get \(0\) mark no matter
what. There are two major mistakes:

- Can we change the order of integration? We have no idea. But it is certain that we cannot change the order with ease, and we have some counterexamples.
- Is this function
*even*differentiable? We also have no idea. It is*almost certain*that \(f\) is not (the probability that \(f\) is differentiable is \(0\)), see this post to learn why if you have some background in functional analysis.

For a good proof, please turn to math.stackexchange. This is not easy at all.

The problem is, it is really *unfair* that in some
circumstances we have to axe out all properties of differentiation. If
you are studying differential equations, and a non-differentiable
function pops up, you have no way to go. Sometimes, chances are that you
even have *no idea* whether a function is differentiable.

So this post is written. We introduce the concept of (Schwartz)
**distribution** (a.k.a. **generalised
functions**), where differentiation is significantly extended, to
obtain **derivative** in a generalised sense. Roughly
speaking, after distribution being introduced, differentiation can be
done with absolute ease.

## A *function* of
bad blood among physicists

In fact, physicists have been using distribution long before
mathematicians established formal theories. For example the \(\delta\) *function* introduced by
Dirac that you may have met in Fourier transform: \[
\delta(x) = \begin{cases}\infty &\quad x=0, \\ 0&\quad
\text{others} .\end{cases}
\] And it is required that \[
\int_{-\infty}^{\infty}\delta(x)\mathrm{d}x=1.
\] But this does not make any sense in calculus. Von Neumann, in
his book on quantum physics, warned against the theory using this
function, and dismissed this function because this was a "fiction". Not
so pleasant. He tried with a lot of effort to demonstrate that, quantum
physics could live without such a "fiction". As you can imagine, this
function may have created some bad blood between von Neumann and
Dirac.

Laurent Schwartz however, managed to be a peacemaker. He developed the theory of distribution (which is exactly what we are talking about in this post), and the "fiction" became an easy "fact". Years later, he became the 1950 Fields Medalist (one of the most prestigious medal/awards in mathematics) at the age of 35 with reason

Developed the theory of distributions, a new notion of generalized function motivated by the Dirac delta-function of theoretical physics. (Source)

As you can see later, thanks to Schwartz, the twisted \(\delta\) function is well-defined and is really plain and elegant. So von Neumann didn't need to be angry later.

## On backgrounds

By *concept* I mean, I will try to include basic ideas
(without many proofs though they can be delivered), so that the serious
study of it can be simpler (it can be really tough!). It is not possible
that you can solve problems on distributions after reading this
post.

There will be two parts. Part one focus on motivation and what is going on. I will try to make it readable to many people having finished calculus or more ideally undergraduate analysis and linear algebra, though rigour is not always guaranteed. It would be better if you know some differential equation theory, but that's not a must. If you already have the background to read part 2, then part 1 is much easier for you and therefore is served as a good source of intuition and motivation.

If you still need to understand differentiation in single-variable calculus, then you have no need to struggle on generalised differentiation at an early point. It does not help. The requirements of linear algebra are vector spaces, subspaces and linear maps. You should know that integration and differentiation are linear maps. This is a graduate course topic, it is not realistic to assume reader to have no idea about calculus and linear algebra.

The second part will be much more advanced, and you are expected to have some background in topological vector spaces (functional analysis). Both parts cannot be considered as a lecture note but they may help you find where you are when you study this concept seriously.

# Part I - Integration by Parts

Throughout, we consider functions on \(\mathbb{R}\) with real value. These theories can be generalised to \(\mathbb{R}^n\) with complex value where partial derivative can take part in, but we are not doing that here. At the end of the day, these work would not be a big deal.

## Motivation: the vector space of distributions

In calculus, a lot of functions we study are smooth (for example,
\(y=\sin{x}\)), and we write \(C^\infty\) as they are *infinitely
differentiable*. This is a vector space and this vector space
differentiation can be done *with absolute ease*. For given \(f \in C^\infty\), we have \(f',f'',\cdots,f^{(k)}\) well
defined for all \(k = 1,2,\cdots\). But
in vector spaces like \(C^2\), \(C^1\), or even \(C\), differentiation can only be done with
caution: we may only have \(f''\) and no \(f^{(3)}\), or even \(f'\) does not exist. We don't *feel
like* this kind of caution. Hence we introduce the concept of
**distribution** which is also known as **generalised
functions**. We want a space where we can still do
differentiation with absolute ease. We may need to *modify* our
definition of differentiation such that it works on every continuous
functions (but it shall not lost its meaning within \(C^\infty\)). Bearing these in mind, we have
several settings or expectations for distributions:

- Every continuous function should be (considered as) a distribution. (So we can take
derivativesfor all continuous functions without to many worry. Unlike the calculus problem at the beginning.)- The "modified differentiation" should make sure that the "modified derivative" of a distribution is still a distribution. In other words, distributions are "infinitely differentiable" (which makes differential equation theory much easier). In the language of algebra, the "modified derivative" should be an endomorphism.
- The usual formal rules of calculus should hold. For example in the new sense we should still have \((fg)'=f'g+g'f\). (Our modified differentiation should not go to far.)
- Convergence properties should also be available. (Validating this requires more theories so this can only be mentioned in part 2.)

Let's write our desired distribution as \(\mathscr{D}'\), and all continuous functions \(C\). All \(C,C^\infty,\mathscr{D}'\) are considered as real vector spaces and we should have \[ C^\infty \subset C \subset \mathscr{D}' \] in the sense of subspaces.

## What
is distribution and *extended* differentiation exactly

Here is a breakdown of these concepts. You will see terminologies and definitions later.

- A smooth, continuous or more generally, locally integrable function, give rise to a bounded linear functional. The converse is not guaranteed to be true, but we
pretendit to be true, so allbounded linear functionalsgive rise to distributions, a.k.a. generalised functions (this name is nice because wepretendthe converse to be true).Whenever you are asked what is generalised function, you can say, it is a linear map, and sometimes it can be determined by a normal function.- For these distributions or generalised functions, we modify the derivative with respect to integration by parts. The modified derivative cannot be put down explicitly but we don't care, because integration by parts doesn't give us many problems.
Whenever you are asked how the derivative of a non-differentiable function is given, you can say, it is given by pretending nothing wrong in integration by parts.

We now try to understand what we really what about distribution. We
start our study through integration, **because differentiation
does not work**. Given \(f \in C
\subset \mathscr{D}'\), we first need to make sure \(\int f\phi\) is well-defined, for
*some* \(\phi\in C^\infty\),
because we want to do integration by parts, which involves **some
differentiation**, and we may make use of it.

If \(f\) is not even a continuous
function, we still need to consider *some* \(\phi\) in the same manner, or our extension
would be abrupt.

Let's talk about these \(\phi\) a
little bit, with respect to integration by parts. Consider the bump
function \[
\phi(x) = \begin{cases} \exp(\frac{1}{(x-a)(x-b)}) & \quad a < x
< b, \\ 0 &\quad \text{ otherwise. }\end{cases}
\] On \((a,b)\), we have $ $. On
the boundary \(a\) and \(b\) we have \(\phi(x)=0\) but that shouldn't be a
problem, because they are the alpha and omega. Points outside \([a,b]\) have no contribution to the value
of this function. For some obvious reason we call \([a,b]\) the *closure* of \((a,b)\). In general, given a real-valued
function \(f\), we call the closure of
the set of points where \(f(x) \ne 0\)
the **support** of \(f\).
As you can tell, the support of \(\phi\) is \([a,b]\).

If \(\phi\) has unbounded support
(the support of a function \(f\) is the
closure of the set of points \(x\)
where \(f(x) \ne 0\)), then we may need
to discuss limit at infinity. But we don't want improper integrals at
all. Hence the support of \(\phi\) are
always assumed to be **closed and bounded** subset of \(\mathbb{R}\) It is closed because it is
defined to be a closure. These closed and bounded sets are called
*compact* sets. If you are not familiar with topology, it is OK
at this moment to consider compact sets as bounded closed interval \([a,b]\).

The test function space \(\mathscr{D}\) is defined to be all \(C^\infty\) functions with compact support. This is indeed a vector space and the verification is a good excise on both linear algebra and calculus. What about \(\mathscr{D}'\)? Here we demonstrate how things are extended.

For each \(f \in C\) (which contains
\(C^\infty\)), we have a functional (a
functional is a linear map between a vector space and its base field,
here is \(\mathbb{R}\). Nothing
special, just a different name that has been used by mathematicians for
decades!) \[
\begin{aligned}
\Lambda_f: \mathscr{D} &\to \mathbb{R}, \\
\phi &\mapsto \int f\phi.
\end{aligned}
\] This functional is **bounded** for all \(\phi \in \mathscr{D}\) because if \(\phi\) has support \(K\), then \[
|\Lambda_f(\phi)|=\left|\int_K f\phi\right| \le \left(\int_K |f|
\right)\sup_{x \in K}|\phi|.
\] A continuous function on a compact set is always bounded (proof),
hence the integral on the right hand side is always bounded. If it
touches infinity a lot of problems are also touched.

In general, a **bounded linear functional** \(\Lambda:\mathscr{D} \to \mathbb{R}\) is
called a *distribution*, which forms \(\mathscr{D}'\) exactly. Since every
continuous function \(f\) gives rise to
a unique bounded functional \(\Lambda_f\), we consider \(C\) as a subspace of \(\mathscr{D}'\). Such a function give
rise to a functional, which is called distribution. The converse is not
generally true, but we *pretend* it to be true (we pretend the
functional gives rise to a function anyway), which makes our study
easier, hence the name *generalised function* is
well-deserved.

### Distribution derivative

Differential operator \(D\) in \(C^\infty\) should be extended naturally into \(\mathscr{D}'\) naturally. There are many ways to extend a linear function. For example the identity map \(i:\mathbb{R} \to \mathbb{R}\) has at least two ways to be extended into \(\mathbb{R}^2\):

- \(I:\mathbb{R}^2 \to \mathbb{R}^2\) by \((x,y) \mapsto (x,y)\).
- \(\pi:\mathbb{R}^2 \to \mathbb{R}\) by \((x,y) \mapsto x\).

The restriction of these two maps on \(\mathbb{R}\) is the same as \(i\).

But if we extend \(D\) in several
ways, things would be messy. Originally derivative is defined in the
sense of limit, but for a non-differentiable function, we cannot do
that. We need an extension that makes most sense: it is by validating
**integration by parts**. It seems like we are developing
some advanced concepts, but still we need to make use of elementary
ones.

For \(f(x)=\sin{x}\) and \(\phi \in \mathscr{D}\), we have \[
\Lambda_{f'}(\phi)=\int f'\phi = \int \phi\cos{x} =
\underbrace{\phi\sin{x}|_{-\infty}^{\infty}}_{\text{zero}} -\int
\phi'\sin x=-\Lambda_f(\phi')
\] The derivative of \(f\) is
assigned to the derivative of \(\phi\).
Again we are using integration by parts. If \(f\) is not assumed to be differentiable, we
*pretend* it is, skip the body and jump to the result
immediately. For example, \(f(x)=|x|\)
is not differentiable, but we do that anyway: \[
\int |x|'\phi = -\int |x|\phi'.
\] In general for \(f \in
C^\infty\), we have (this can be verified by some computation)
\[
\Lambda_{D^k f}(\phi)=\int D^k f \phi = (-1)^k \int fD^k\phi = (-1)^k
\Lambda _f(D^k\phi).
\] Differentiation for distributions (on top of \(C^\infty\) functions) should be in the same
**shape**, hence we define the \(k\)-th **distribution
derivative** of a distribution \(\Lambda\) by \[
D^k\Lambda: \phi \mapsto (-1)^{k}\Lambda(D^k\phi).
\] Since all \(\phi\) are
assumed to be of \(C^\infty\), there
are no problem with this formula and this differentiation is defined for
all \(\Lambda\). We don't care about
first order limit on a continuous but not differentiable function. What
matters here is the differentiation on test functions.

### Why integration by parts

Try to recall what you have learnt about integration by parts. We have \[ \int uv' = \int (uv)' - \int u'v \] because \[ (uv)' = u'v+uv'. \] Therefore, if our generalisation of differentiation (though we do not know how to do yet) pays respect to integration by parts, then we can still work on product rule of differentiation, hence the usual formal rules of calculus would not go too far. If our extension conflicts with integration by parts, then the ordinary meaning of differentiation is damaged.

Let's sum up what has happened. We have obtained an inclusion \[ C^\infty \subset C \subset \mathscr{D}'. \] Every distribution is infinitely differentiable because functions in \(\mathscr{D}\) are. If \(f \in C^\infty\), then the \(k\)-th derivative can be understood in both the sense of ordinary differentiation and the sense of distribution because it is given by \[ \phi \mapsto (-1)^k\int f \phi^{(k)} = \int f^{(k)}\phi\quad \forall \phi \in \mathscr{D}. \] This is independent to the choice of \(\phi\). If \(h\) is a function such that \(\int h\phi = \int f^{(k)}\phi\), then \(h=f^{(k)}\).

If \(f\) is merely continuous, still we can write the \(k\)-th derivative as \[ \phi \mapsto (-1)^{k} \int f \phi^{(k)} \quad \forall \phi \in \mathscr{D}. \]

At this point, whether \(f\) is
differentiable or not is not of our concern. Since \(\phi\) is smooth, the formula above is
well-defined. In general we don't even care whether \(f\) is continuous or even integrable, as
long as it gives rise to a **bounded** linear functional,
which can be guaranteed by being *locally integrable*. A function
is locally integrable if \(\int_K
|f|<\infty\) for all compact \(K
\subset \mathbb{R}\). In particular, \(K\) can be taken to be any bounded closed
interval. **As long as \(f\) is
locally integrable (for example, differentiable, continuous, or simply
bounded), we can assign derivative in the new sense
(integration by parts).**

### Product rule of differentiation

We want something like \((fg)'=f'g+fg'\). To avoid confusion we use \(D\) to denote the derivative on distribution and \(f'\) to denote the derivative in the ordinary sense. This is pretty hard but for a multiplication of a \(C^\infty\) function and a distribution it is not that hard. Suppose \(\Lambda \in \mathscr{D}'\) and \(f \in C^\infty\). We define their 'product' by \[ (f\Lambda)(\phi) = \Lambda(f\phi). \] We have another distribution and derivative follows in a natural way: \[ \begin{aligned} D(f\Lambda)(\phi) &=-(f\Lambda)(\phi') \\ &= -\Lambda(f\phi') \\ \end{aligned} \] Meanwhile \[ \begin{aligned} (f'\Lambda+fD\Lambda)(\phi) &= \Lambda(f'\phi)+D\Lambda(f\phi) \\ &= \Lambda(f'\phi)-\Lambda(f'\phi+f\phi') \\ &=-\Lambda(f\phi'). \end{aligned} \] Things still work in this aspect.

We haven't verify convergence yet, but that requires much more knowledge on functional analysis, so we don't do that here but in part 2. Fortunately, things would go in an intuitive way.

### Dirac delta distribution

Consider the linear functional on \(\mathscr{D}\) by \[
\delta(\phi)=\phi(0).
\] This is bounded and is in fact our rigour definition of Dirac
\(\delta\) function (Von Neumann can
relax then!). It does have the *required property*. Say, if we
realise this function as integration (informally) as \[
\delta(\phi)=\int \delta\phi=\phi(0) \quad \forall \phi \in \mathscr{D},
\] then \(\delta\) can indeed be
considered as a *function* whose support is the origin, and the
integral over \(\mathbb{R}\) is \(1\).

The *derivative* of \(\delta\) is well-presented as well. Note
\(\delta'(\phi)=\delta(\phi')\),
hence we have \[
\delta'(\phi)=\phi'(0).
\]

So much for part 1. If you don't have many background in functional analysis, then part 2 is not recommended, as you have no idea what is going on at all. It is not feasible to make part 2 to be readable to more people.

# Part II - Topology and Calculus - a Overview

Here we provide some basic facts of test functions and distributions, assuming the reader some background in functional analysis. No proof is delivered because if I do this post can be as long as I want. I hope by organising facts here I can help you realise what is going on before you drown yourself in details of a proof. It is recommended to see the table of content on the right hand side first if you are on PC.

## Topology

In brief, test functions are smooth functions with compact support.
By the **support** of a function we mean the
*closure* of the set \(\{x:f(x) \ne
0\}\). Let \(K\) be a compact
set in \(\mathbb{R}\), then \(\mathscr{D}_K\) denotes a subspace of \(C^\infty\) whose support lies in \(K\). Since a closed subset of a compact set
itself is compact, we see all functions in \(\mathscr{D}_K\) have compact support.

Test function space is defined by \[
\mathscr{D} := \bigcup_{K \text{ compact}}\mathscr{D}_K.
\] And the distribution space \(\mathscr{D}'\) is defined to be the
dual space of \(\mathscr{D}\), i.e. the
space of *continuous* linear functionals of \(\mathscr{D}\). But if we don't know the
topology of \(\mathscr{D}\), we cannot
proceed. *Here is how we attempt to establish the norm.*

### Topology attempt 1 - norm

Consider the norm for \(\phi \in \mathscr{D}\) for all \(N=0,1,2,\cdots\) by \[ \| \phi \|_N = \sup_{x \in \mathbb{R}; n \le N}|D^nf|. \] This induces a local base \[ V_N = \left\{ \phi \in \mathscr{D}_K:\|\phi\|_N \le \frac{1}{N} \right\} \quad (N=1,2,3,\cdots). \]

And we get a locally convex metrisable topology on \(\mathscr{D}\).

If this topology makes \(\mathscr{D}\) a Banach space, then it would
be fantastic - a lot of Banach space technique can be used. However,
this topology is too *small* to be complete. One simply need to
consider this sequence: \[
\psi_m(x)=\phi(x-1)+\frac{1}{2}\phi(x-2)+\cdots+\frac{1}{m}\phi(x-m)
\] where \(\phi \in
\mathscr{D}_{[0,1]}\) and \(\phi>0\) on \((0,1)\). This sequence is Cauchy but the
limit has no bounded support hence does not lie in \(\mathscr{D}\).

### Topology attempt 2 - enhancement

This time we do an *enhancement* on the previous topology,
which makes \(\mathscr{D}\) a locally
convex topological space, which is complete and has the Heine-Borel
property (closed and bounded set is compact and vice versa). We still
need the topology defined in our first attempt. It is broken into three
steps:

- For each compact set \(K\), let \(\tau_K\) denote the subspace topology of \(\mathscr{D}\) defined in attempt 1.
- Let \(\beta\) be the collection of all convex balanced set \(W \subset \mathscr{D}\) such that \(\mathscr{D}_K \cap W \in \tau_K\) for all compact \(K\). (A set \(W\) is balanced if \(\alpha{W} \subset W\) for all \(|\alpha| \le 1\).)
- The new topology \(\tau\) is defined to be the collection of all unions of sets of the form \(\phi + W\) with \(\phi \in \mathscr{D}\) and \(W \in \beta\).

This is the topology we want, and one can indeed verify that \(\tau\) is a topology, with local base \(\beta\). This topology has the following properties:

- \(\tau\) makes \(\mathscr{D}\) a locally convex topological vector space.
- \(\mathscr{D}\) has the Heine-Borel property.
- In \(\mathscr{D}\), every Cauchy sequence converges.

Locally, **the topology of \(\mathscr{D}_K\) is the same as \(\tau_K\)**. Hence we can still use
properties of these norms if we want. In fact, this \(\tau_K\) makes \(\mathscr{D}_K\) a Fréchet space, i.e.
locally compact and complete metric space.

### Continuity and category

We cannot discuss continuity without topology. But still continuity has to be treated carefully. For example the space \(L^p([0,1])\) with \(0<p<1\) is weird: the dual space is trivial, due to its topology: the only two open convex sets are empty set and itself. Fortunately we have the following, which is quite intuitive.

Suppose \(\Lambda\) is a linear mapping of \(\mathscr{D}\) into a locally compact convex space \(Y\) (which can be \(\mathbb{R}\), \(\mathbb{C}\) or \(\mathscr{D}\) itself). Then the following are equivalent:

- \(\Lambda\) is continuous. (We care about the behaviour of \(\mathscr{D}'\))
- \(\Lambda\) is bounded. (You must have learnt the equivalence of 1 and 2 already)
- \(\phi_i \to 0\) in \(\mathscr{D}\) implies \(\Lambda\phi_i \to 0\) in \(Y\).
- The restriction of \(\Lambda\) to every \(\mathscr{D}_K\) is continuous.

In particular, it follows that the differential operator \(D^n\) is continuous for all \(n\). We also have some knowledge of the behaviour of \(\mathscr{D}'\) now:

If \(\Lambda\) is a linear functional on \(\mathscr{D}\), then the following are equivalent:

- \(\Lambda \in \mathscr{D}'\).
- To every compact set \(K\) there corresponds a nonnegative integer \(N\) and a constant \(C<\infty\) such that the inequality
\[ |\Lambda\phi| \le C \|\phi\|_N \]

holds for every \(\mathscr{D}_K\).

Consider the Dirac distribution on \(x\) given by \[
\delta_x(\phi)=\phi(x)\quad \phi \in \mathscr{D}.
\] This is indeed a distribution. The case when \(x=0\) gives us the Dirac function in
physics. Note \[
\mathscr{D}_K = \bigcap_{x \in K^c}\ker\delta_x,
\] \(\mathscr{D}_K\) is a
**closed subspace** of \(\mathscr{D}\). Since \(\mathscr{D}_K\) is also nowhere dense, and
there is a countable collection of \(K_i
\subset \mathbb{R}\) (for example \(K_i=[-i,i]\)) such that \(\mathscr{D} = \bigcup \mathscr{D}_i\) (of
the first category), and \(\mathscr{D}\) itself is complete, by Baire's Category Theorem, \(\mathscr{D}\) is not metrisable. This is a
flaw of the topology of \(\mathscr{D}\), though is not that
troublesome.

## Calculus of distributions

We have shown that every \(C^\infty\) functions can be considered as a
distribution. In general, for a function \(f\) one only need to require that \(f\) is **locally integrable**,
i.e. for every compact set \(K\) we
have \[
\int_K |f|<\infty.
\] If we define \(\Lambda_f:\phi
\mapsto \int f\phi\), we see \[
|\Lambda_f(\phi)|\le \left( \int_K |f| \right)\sup|\phi|, \quad \phi \in
\mathscr{D}_K.
\]

In particular, at the very least, all \(L^1\) functions can be considered as distributions.

On the other hand, if \(\mu\) is a positive measure on \(\mathbb{R}\) with \(\mu(K)<\infty\) for all compact \(K\), then \[ \Lambda_\mu:\phi \to \int \phi d\mu \] also defines a distribution.

### Absolute continuity

We know the fundamental theorem of calculus in \(L^1\) only hold when the function \(f\) is *absolutely continuous*. The
Cantor function \(f\) is differentiable
almost everywhere on \([0,1]\) but
\[
\int_0^1 f'(x)\mathrm{d}x = 0, \quad f(1)-f(0)=1.
\] This restriction still makes sense here. Pick \(f\) to be a left-continuous function with
bounded variation. Then it can be shown that \[
D\Lambda_f = \Lambda_\mu
\] where \(\mu([a,b))=f(b)-f(a)\). Hence \(D\Lambda_f=\Lambda_{Df}\) if and only if
\(f\) is *absolutely
continuous*.

### Convergence (uniform?)

We consider the weak*-topology of \(\mathscr{D}'\) by \[ \Lambda_i \to \Lambda: \lim_{i \to \infty}\Lambda_i\phi = \Lambda\phi \quad \forall \phi \in \mathscr{D}. \] Then fortunately this limit operator commutes with differential operator in a natural way, which may remind you of uniform convergence. In fact, \[ \Lambda_i \to \Lambda \implies \Lambda \in \mathscr{D} \text{ and }D^k\Lambda_i \to D^k\Lambda \quad \forall k=1,2,\cdots. \] To prove this one needs Banach-Steinhaus theorem. Here concludes our four requirements of distributions.

### Convolution

Convolution plays an important role in Fourier analysis, and here is how to invite distribution to the party.

Normally for two \(L^1\) functions \(f,g\) we define \[ (f \ast g)(x)=\int_\mathbb{R}f(y)g(x-y)\mathrm{d}y. \] We can create more symbols to make life easier:

- \(\tau_xu(y)=u(y-x)\).
- \(\check{u}(y)=u(-y)\).

It follows that \(\tau_x\check{u}(y)=\check{u}(y-x)=u(x-y)\). Hence \[ (f \ast g)(x) = \int_\mathbb{R} f(y)(\tau_x\check{g})(y)\mathrm{d}y. \] It shows that \(g \to (f \ast g)(x)\) is actually a linear functional of \(\Lambda_f\), \(\tau_x\) and \(g \mapsto \check{g}\). But \(\Lambda_f\) itself can be a distribution, hence we define convolution for a distribution and a smooth function by \[ L \ast \phi(x) = L(\tau_x\check{\phi}), \quad L \in \mathscr{D}', \phi \in \mathscr{D}. \] Convolution can be characterised in a natural way. In fact, for any \(T:\mathscr{D} \to C^\infty\), if \[ \tau_x T = T\tau_x, \] then there is a unique \(L \in \mathscr{D}'\) such that \[ T\phi = L\ast \phi. \] As you can imagine, this setting creates a lot of potentials for Fourier transform.

# References and Further Reading

- Walter Rudin,
*Functional Analysis*, Second Edition. (Part II of the book) - Peter Lax,
*Functional Analysis*. (Appendix B) - Stanford Encyclopedia of Philosophy Archive (Fall 2018 Edition), Quantum Theory: von Neumann vs. Dirac.

The concept of generalised functions (distributions) and derivatives