A brief introduction to Fréchet derivative

Fréchet derivative is a generalisation to the ordinary derivative. Generally we are talking about Banach space, where \(\mathbb{R}\)​ is a special case. Indeed, the space discussed is not even required to be of finite dimension.

Recall

A real-valued function \(f(t)\) of a real variable, defined on some neighborhood of \(0\), is said to be of \(o(t)\) if \[ \lim_{t \to 0} \frac{f(t)}{t}=0. \] And its derivative at some point \(a\) is defined by \[ f'(a)=\lim_{h \to 0}\frac{f(a+h)-f(a)}{h}. \] We also have this equivalent equation: \[ f(a+h)=f(a)+f'(a)h+o(h). \] Now suppose \(f:U \subset \mathbb{R}^n \to \mathbb{R}^m\) where \(U\) is an open set. The function \(f\) is differentiable at \(x_0 \in U\) if satisfying the following conditions.

  1. All partial derivatives of \(f\), i.e. \(\frac{\partial f_i}{\partial x_j}\) exists for all \(i=1,\cdots,m\) and \(j = 1,\cdots,n\) at \(f\). (Which ensures that the Jacobian matrix exists and is well-defined).

  2. The Jacobian matrix \(J(x_0)\in\mathbb{R}^{m\times n}\) satisfies \[ \lim_{|h| \to 0}\frac{|f(x_0+h)-f(x_0)-J(x_0)h|}{|h|}=0. \] In fact the Jacobian matrix has been the derivative of \(f\) at \(x_0\) although it's a matrix in lieu of number. But we should treat a number as a matrix in the general case. In the following definition of Fréchet derivative, you will see that we should treat something as linear functional.

Definition

Let \(f:U\to\mathbf{F}\) be a function where \(U\) is an open subset of \(\mathbf{E}\). We say \(f\) is Fréchet differentiable at \(x \in U\) if there is a bounded and linear operator \(\lambda:\mathbf{E} \to \mathbf{F}\) such that \[ \lim_{\lVert y \rVert \to 0}\frac{\lVert f(x+y)-f(x)-\lambda y \rVert}{\lVert y \rVert}=0. \] We say that \(\lambda\) is the derivative of \(f\) at \(x\), which will be denoted by \(Df(x)\) or \(f'(x)\). Notice that \(\lambda \in L(\mathbf{E},\mathbf{F})\). If \(f\) is differentiable at every point of \(f\), then \(f'\) is a map by \[ f':U \to L(\mathbf{E},\mathbf{F}). \]


The definition above doesn't go too far from real functions defined on the real axis. Now we are assuming that both \(\mathbf{E}\) and \(\mathbf{F}\) are merely topological vector spaces, and still we can get the definition of Fréchet derivative (generalized).

Let \(\varphi\) be a mapping of a neighborhood of \(0\) of \(\mathbf{E}\) into \(\mathbf{F}\). We say that \(\varphi\) is tangent to \(0\) if given a neighborhood \(W\) of \(0\) in \(\mathbf{F}\), there exists a neighborhood \(V\) of \(0\) in \(\mathbf{E}\) such that \[ \varphi(tV) \subset o(t)W \] for some function of \(o(t)\). For example, if both \(\mathbf{E}\) and \(\mathbf{F}\) are normed (not have to be Banach), then we get a usual condition by \[ \lVert \varphi(x) \rVert \leq \lVert x \rVert \psi(x) \] where \(\lim_{\lVert x \rVert \to 0}\psi(x)=0\).

Still we assume that \(\mathbf{E}\) and \(\mathbf{F}\) are topological vector spaces. Let \(f:U \to \mathbf{F}\) be a continuous map. We say that \(f\) is differentiable at a point \(x \in U\) if there exists some \(\lambda \in L(\mathbf{E},\mathbf{F})\) such that for small \(y\) we have \[ f(x+y)=f(x)+\lambda{y}+\varphi(y) \] where \(\varphi\) is tangent to \(0\). Notice that \(\lambda\) is uniquely determined.

Propositions

You must be familiar with some properties of derivative, but we are redoing these in Banach space.

Chain rule

If \(f: U \to V\) is differentiable at \(x_0\), and \(g:V \to W\) is differentiable at \(f(x_0)\), then \(g \circ f\) is differentiable at \(x_0\), and \[ (g \circ f)'(x_0)=g'(f(x_0)) \circ f'(x_0) \]

Proof. We are proving this in topological vector space. By definition, we already have some linear operator \(\lambda\) and \(\mu\) such that \[ f(x_0+y)=f(x_0)+\lambda{y}+\varphi(y) \\ g(f(x_0)+h)=g(f(x_0))+\mu{h}+\psi(h) \] where \(\varphi\) and \(\psi\) are tangent to \(0\). Further, we got \[ f'(x_0)=\lambda \\ g'(f(x_0))=\mu \] To evaluate \(g(f(x_0+y))\), notice that \[ \begin{equation} \begin{aligned} g(f(x_0+y))&=g[f(x_0)+(\lambda{y}+\varphi(y))] \\ &=g(f(x_0))+\mu(\lambda{y}+\varphi(y))+\psi(\lambda{y}+\varphi(y)) \\ &=g(f(x_0))+\mu\circ\lambda{y}+\mu\circ\varphi(y)+\psi(\lambda{y}+\varphi(y)) \end{aligned} \end{equation} \] It's clear that \(\mu\circ\varphi(y)+\psi(\lambda{y}+\varphi(y))\) is tangent to \(0\), and \(\mu\circ\lambda\) is the linear map we are looking for. That is, \[ (g \circ f)'(x)=g'(f(x_0)) \circ f'(x_0). \]

Derivative of higher orders

From now on, we are dealing with Banach spaces. Let \(U\) be an open subset of \(\mathbf{E}\), and \(f:U \to \mathbf{F}\) be differentiable at each point of \(U\). If \(f'\) is continuous, then we say that \(f\) is of class \(C^1\). The function of order \(C^p\) where \(p \geq 1\) is defined inductively. The \(p\)-th derivative \(D^pf\) is defined as \(D(D^{p-1}f)\) and is itself a map of \(U\) into \(L(\mathbf{E},L(\mathbf{E},\cdots,L(\mathbf{E},\mathbf{F})\cdots)))\) which is isomorphic to \(L^p(\mathbf{E},\mathbf{F})\). A map \(f\) is said to be of class \(C^p\) if its \(kth\) derivative \(D^kf\) exists for \(1 \leq k \leq p\), and is continuous. With the help of chain rule, and the fact that the composition of two continuous functions are continuous, we get

Let \(U,V\) be open subsets of some Banach spaces. If \(f:U \to V\) and \(g: V \to \mathbf{F}\) are of class \(C^p\), then so is \(g \circ f\).

Open subsets of Banach spaces as a category

We in fact get a category \(\{(U,f_U)\}\) where \(U\) is the object as an open subset of some Banach space, and \(f_U\) is the morphism as a map of class \(C^p\) mapping \(U\) into another open set. To verify this, one only has to realize that the composition of two maps of class \(C^p\) is still of class \(C^p\) (as stated above).

We say that \(f\) is of class \(C^\infty\) if \(f\) is of class \(C^p\) for all integers \(p \geq 1\). Meanwhile \(C^0\) maps are the continuous maps.

An example

We are going to evaluate the Fréchet derivative of a nonlinear functional. It is the derivative of a functional mapping an infinite dimensional space into \(\mathbb{R}\) (instead of \(\mathbb{R}\) to \(\mathbb{R}\)).

Consider the functional by \[ \begin{aligned} \Gamma:C^0[0,1] &\to \mathbb{R} \\ u &\mapsto \int_{0}^{1}u^2(x)\sin\pi{x}dx. \end{aligned} \] where the norm is defined by \[ \lVert u \rVert = \sup_{x \in [0,1]}|u|. \]

For \(u\in C[0,1]\), we are going to find an linear operator \(\lambda\) such that \[ \Gamma(u+\eta)=\Gamma(u)+\lambda{\eta}+\varphi(\eta), \] where \(\varphi(\eta)\) is tangent to \(0\).

Solution. By evaluating \(\Gamma(u+\eta)\), we get \[ \begin{aligned} \Gamma(u+\eta)&=\int_{0}^{1}(u+\eta)^2\sin\pi{x}dx \\ &= \Gamma(u)+2\int_{0}^{1}u\eta\sin\pi{x}dx+\int_{0}^{1}\eta^2\sin\pi{x}dx. \end{aligned} \] To prove that \(\int_{0}^{1}\eta^2\sin{x}dx\) is the \(\varphi(\eta)\) desired, notice that \[ \int_{0}^{1}\eta^2\sin\pi{x}dx \leq \lVert\eta\rVert^2\int_{0}^{1}\sin\pi{x}dx=2\lVert \eta \rVert^2. \] Therefore we have \[ 0\leq\lim_{\lVert \eta \rVert \to 0}\frac{\int_{0}^{1}\eta^2\sin\pi{x}dx}{\lVert \eta \rVert} \leq \lim_{\lVert\eta\rVert\to0}2\lVert\eta\rVert=0 \] as desired. The Fréchet derivative of \(\Gamma\) at \(u\) is defined by \[ \begin{aligned} \Gamma'(u):C[0,1] &\to \mathbb{R} \\ \eta &\mapsto 2\int_{0}^{1}u\eta\sin\pi{x}dx. \end{aligned} \] It's hard to believe but, the derivative is not a number, nor a matrix, but a linear operator. But conversely, a real or complex number or matrix can be treated as a linear operator in the nature of things.