Vague Convergence in Measure-theoretic Probability Theory - Equivalent Conditions
Introduction
In analysis and probability theory, one studies various sort of convergences (of random variables) for various reasons. In this post we study vague convergence, which is responsible for the convergence in distribution.
Vaguely speaking, vague convergence is the weakest kind of convergence one can expect (whilst still caring about continuity whenever possible). We do not consider any dependence relation between the sequence of random variables.
Throughout, fix a probability space $(\Omega,\mathscr{F},\mathscr{P})$, where $\Omega$ is the sample space, $\mathscr{F}$ the event space and $\mathscr{P}$ the probability function. Let $(X_n)$ be a sequence of random variables on this space. Each random variable $X_n$ canonically induces a probability space $(\mathbb{R},\mathscr{B},\mu_n)$ where $\mathscr{B}$ is the Borel $\sigma$-measure. To avoid notation hell we only consider the correspondence $X_n \leftrightarrow \mu_n$ where
Here comes the question: if $X_n$ tends to a limit, then we would expect that $\mu_n$ converges to a limit (say $\mu$) in some sense (at least on some intervals). But is that always the case? Even if the sequence converges, can we even have $\mu(\mathbb{R})=1$? We will see through some examples that this is really not the case.
Examples: Failure of Convergence on Intervals
Let $X_n\equiv\frac{(-1)^n}{n}$, then $X_n \to 0$ deterministically. For any $a>0$, the sequence $\mu_n((0,a))$ oscillates between $0$ and $1$, i.e. it ends up in the form
which does not converge at all. Likewise, for any $b<0$, the sequence $\mu_n((b,0))$ oscillates between $1$ and $0$.
As another example of convergence failure, consider $b_n<0<a_n$ with $a_n \to 0$ and $b_n \to 0$ as $n \to \infty$, and let $X_n$ be the sequence of random variables having the uniform distribution on $(b_n,a_n)$. We see $X_n \to 0$ a.e. but $\mu_n((b,0))$, which subjects to the area under $X_n$ between $b_n$ and $0$, may not converge at all, or converge to any number between $0$ and $1$.
Example: Failure of Converging to a Probability Measure
We compose an example where $\mu_n$ converges to a measure $\mu$ where $\mu(\mathbb{R})<1$, preventing $\mu$ from being a probability measure. To do this, fix two positive numbers $\alpha$ and $\beta$ such that $\alpha+\beta<1$. Consider the sequence of random variables $X_n$ with
Then $X_n \to X$ where
Then $\lim_n\mu_n(\mathbb{R})=1-\alpha-\beta<1$. Atoms of this measure has escaped to $+\infty$ and $-\infty$.
These examples inspire us to develop a weaker sense of convergence, where we only take intervals into account (because we would expect continuous functions to play a role).
Definitions
From the example above, it is clear that it is not expected to reach $=1$ all the time. Therefore we consider $\le 1$ instead, hence the following weakened version of probability measure and distribution function follow.
Definition 1. A measure $\mu$ on $\mathbb{R}$ is a subprobability measure (s.p.m.) if $\mu(\mathbb{R}) \le 1$. Correspondingly, one defines the subdistribution function (s.d.f.) with respect to $\mu$ by
When $\mu(\mathbb{R})=1$, there is nothing new, but even if not, we do not have very much obstacles. Still we see $F(x)$ is a right continuous function with $F(-\infty)=0$ and $F(+\infty)=\mu(\mathbb{R}) \le 1$. For brevity’s sake, we will write $\mu((a,b])$ into $\mu(a,b]$ from now on, and similarly for other kind of intervals. We also put $\mu(a,b)=0$ when $a>b$ because why not.
Our examples also warn us that atoms are a big deal, which leads us to the following definition concerning intervals.
Definition 2. Notation being above, an interval $(a,b)$ is called a continuous interval if neither $a$ nor $b$ is an atom of $\mu$, i.e. if $\mu(a,b)=\mu[a,b]$.
One can test if $(0,1)$ is a continuous interval in our first group of examples. Now we are ready for the definition of vague convergence.
Definition 3. A sequence $(\mu_n)$ of s.p.m. is said to converge vaguely to an s.p.m. $\mu$ if there exists a dense subset $D \subset \mathbb{R}$ such that
We write $\mu_n \xrightarrow{v} \mu$.
Let $(F_n)$ be the corresponding s.d.f. of $(\mu_n)$ and $F$ the s.d.f. of $\mu$. Then we say that $F_n$ converges vaguely to $F$ and write $F_n \xrightarrow{v} F$.
It is unfair that we are not building the infrastructure for random variables (r.v.) in this context. We introduce the following concept that you may have already studied in the calculus-based probability theory:
Definition 4. Let $(X_n)$ be a sequence of r.v.’s with corresponding cumulative distribution functions (c.d.f.) $(F_n)$. We say $X_n$ converge weakly or in distribution to $X$ (with corresponding c.d.f. $F$) if $F_n \xrightarrow{v} F$.
In calculus-based probability theory, one studies that $F_n(x) \to F(x)$ whenever $F$ is continuous at $x$. This definition is easier to understand but has skipped a lot of important details.
Equivalent Conditions
In this section we study vague convergence in a view of measure theory, utilising $\varepsilon-\delta$ arguments most of the time. We will see that the convergence looks quite similar to the convergence of $\mathbb{R}$.
Let $(a_n)$ be a sequence of real numbers, we can recall that
- If $(a_n)$ converges, then the limit is unique.
- If $(a_n)$ is bounded, then it has a bounded subsequence.
- If every subsequence of $(a_n)$ converges to $a$, then $a_n$ converges to $a$.
These results are natural in the context of calculus, but in the world of topology and functional analysis, these are not naturally expected. However, s.p.m.’s enjoy all three of them (for the second point, notice that an s.p.m. is bounded in a sense anyway.) Nevertheless, it would be too ambitious to include everything here and assume that the reader will finish it in one shot.
Theorem 1. Let $(\mu_n)$ and $\mu$ be s.p.m.’s. The following conditions are equivalent:
(1) $\mu_n \xrightarrow{v} \mu$.
(2) For every finite interval $(a,b)$ and $\varepsilon>0$, there exists an $n_0(a,b,\varepsilon)$ such that whenever $n \ge n_0$,
(3) For every continuity interval $(a,b]$ of $\mu$, we have
When $(\mu_n)$ and $\mu$ are p.m.’s, the second condition is equivalent to the “uniformed” edition:
(4) For every $\delta>0$ and $\varepsilon>0$, there exists $n_0(\delta,\varepsilon)$ such that if $n \ge n_0$, then for every interval $(a,b)$, possibly infinite:
Proof. We first study the equivalence of the first three statements. Suppose $\mu_n$ converges vaguely to $\mu$. We are given a dense subset $D$ of the real line such that whenever $a,b \in D$ and $a0$, there are $a_1,a_2,b_1,b_2 \in D$ satisfying
By vague convergence, there exists $n_0>0$ such that whenever $n \ge n_0$,
for $i=1,2$ and $j=1,2$. It follows that
and on the other hand
Combining both, the implication is clear.
Next, we assume (2), and let $(a,b)$ be a continuous interval of $\mu$, i.e we have $\mu(a,b)=\mu[a,b]$. The relation $\mu(a+\varepsilon,b-\varepsilon)-\varepsilon \le \mu_n(a,b)$ implies that
holds for all $\varepsilon>0$. On the other hand, as $\varepsilon \to 0$ on the left hand side, we see
Likewise, the relation $\mu_n(a,b) \le \mu(a-\varepsilon,b+\varepsilon)+\varepsilon$ yields
As $\varepsilon \to 0$ on the right hand side, we obtain
To conclude both sides, notice that
This forces $\mu_n(a,b)$ to converge to $\mu(a,b)$. This implies that $\mu_n(a,b] \to \mu(a,b]$. To see this, pick another continuous interval $(a,b’)$ which properly contains $(a,b)$. Then $(b,b’)$ is another continuous interval. It follows that
Assume (3). Notice that the set of atoms $A$ of $\mu$ has to be at most countable, therefore $D=\mathbb{R} \setminus A$ is dense in $\mathbb{R}$. On the other hand, $(a,b]$ is a continuous interval if and only if $a,b \in D$. This implies (1).
The arguments above also shows that when discussing vague convergence, one can replace $(a,b]$ with $(a,b)$, $(a,b]$ or $[a,b]$ freely, as long as $(a,b)$ is a continuous interval. It also follows that $\mu_n(\{a\}) \to 0$.
For (4), as (4) implies (2) (by taking $\delta=\varepsilon>0$), it remains to show that (3) implies (4) assuming that $\mu_n$ and $\mu$ are p.m.’s. Indeed, it suffices to prove it on a finite interval, and we will firstly justify this action. Let $A$ denote the set of atoms of $\mu$. First of all we can pick integer $n>0$ such that $\mu(-n,n) > 1-\frac{\varepsilon}{4}$ (that is, the interval is so big that the measure is close to $1$ enough). Pick $\alpha,\beta \in A^c$ such that $a \le -n$ and $b \ge n$ (this is possible because $A^c$ is dense). For the interval $(\alpha,\beta)$, we can put a finite partition
such that $|a_{j+1}-a_j| \le \delta$ and $a_j \in A^c$ for all $j=1,\dots,\ell-1$. Therefore, we have
By (3), there exists $n_0$ depending on $\varepsilon$ and $\ell$ (thereby $\delta$) such that
for all $n \ge n_0$. Adding over all $j$, replacing the endpoint with open interval, we see
It follows that
(This is where being p.m. matters.) Therefore when $n \ge n_0$ and discussing $\mu(a,b)$ versus $\mu_n(a,b)$, ignoring $(a,b) \setminus (a_1,a_\ell)$ results only in an error of $<\frac{\varepsilon}{2}$. Therefore it suffices to assume that $(a,b) \subset (a_1,a_\ell)$ and show that
Since $(a,b) \subset (a_1,a_\ell)$, there exists $j,k$ with $1 \le j \le k < \ell$ such that
This concludes the proof and demonstrates why our specific choice of $a_j$ is important. $\square$
We cannot give a treatment of all three points above but the first point, the unicity of vague limit, is now clear.
Corollary 1 (Unicity of vague limit). Notation being in definition 3. If there is another s.p.m. $\mu’$ and another dense set $D’$ such that whenever $a,b \in D’$ and $a<b$, one has $\mu_n(a,b] \to \mu’(a,b]$, then $\mu$ and $\mu’$ are identical.
Proof. Let $A$ be the set of atoms of $\mu$ and $\mu’$; then if $a,b \in A^c$, one has $\mu_n(a,b] \to \mu(a,b]$ and $\mu_n(a,b] \to \mu’(a,b]$. Therefore $\mu(a,b]=\mu’(a,b]$. Since $A^c$ is dense in $\mathbb{R}$, them two must be identical. $\square$
Vague Convergence in Measure-theoretic Probability Theory - Equivalent Conditions