Posted 2023-02-13Updated 2023-07-08Analysis / Measure Theory / Probability Theory

Vague Convergence in Measure-theoretic Probability Theory - Equivalent Conditions

Introduction

In analysis and probability theory, one studies various sort of convergences (of random variables) for various reasons. In this post we study vague convergence, which is responsible for the convergence in distribution.

Vaguely speaking, vague convergence is the weakest kind of convergence one can expect (whilst still caring about continuity whenever possible). We do not consider any dependence relation between the sequence of random variables.

Throughout, fix a probability space $(\Omega,\mathscr{F},\mathscr{P})$, where $\Omega$ is the sample space, $\mathscr{F}$ the event space and $\mathscr{P}$ the probability function. Let $(X_n)$ be a sequence of random variables on this space. Each random variable $X_n$ canonically induces a probability space $(\mathbb{R},\mathscr{B},\mu_n)$ where $\mathscr{B}$ is the Borel $\sigma$-measure. To avoid notation hell we only consider the correspondence $X_n \leftrightarrow \mu_n$ where

$\mu_n(B)=\mathscr{P}(X^{-1}(B))=\mathscr{P}\{X \in B\}.$

Here comes the question: if $X_n$ tends to a limit, then we would expect that $\mu_n$ converges to a limit (say $\mu$) in some sense (at least on some intervals). But is that always the case? Even if the sequence converges, can we even have $\mu(\mathbb{R})=1$? We will see through some examples that this is really not the case.

Examples: Failure of Convergence on Intervals

Let $X_n\equiv\frac{(-1)^n}{n}$, then $X_n \to 0$ deterministically. For any $a>0$, the sequence $\mu_n((0,a))$ oscillates between $0$ and $1$, i.e. it ends up in the form

$\dots,0,1,0,1,\dots$

which does not converge at all. Likewise, for any $b<0$, the sequence $\mu_n((b,0))$ oscillates between $1$ and $0$.

As another example of convergence failure, consider $b_n<0<a_n$ with $a_n \to 0$ and $b_n \to 0$ as $n \to \infty$, and let $X_n$ be the sequence of random variables having the uniform distribution on $(b_n,a_n)$. We see $X_n \to 0$ a.e. but $\mu_n((b,0))$, which subjects to the area under $X_n$ between $b_n$ and $0$, may not converge at all, or converge to any number between $0$ and $1$.

Example: Failure of Converging to a Probability Measure

We compose an example where $\mu_n$ converges to a measure $\mu$ where $\mu(\mathbb{R})<1$, preventing $\mu$ from being a probability measure. To do this, fix two positive numbers $\alpha$ and $\beta$ such that $\alpha+\beta<1$. Consider the sequence of random variables $X_n$ with

$\begin{aligned} X_n = \begin{cases} n & \text{with probability $\alpha$,} \\ 0 & \text{with probability $1-\alpha-\beta$,} \\ -n &\text{with probability $\beta$.} \end{cases} \end{aligned}$

Then $X_n \to X$ where

$\begin{aligned} X = \begin{cases} \infty & \text{with probability $\alpha$,} \\ 0 & \text{with probability $1-\alpha-\beta$,} \\ -\infty &\text{with probability $\beta$.} \end{cases} \end{aligned}$

Then $\lim_n\mu_n(\mathbb{R})=1-\alpha-\beta<1$. Atoms of this measure has escaped to $+\infty$ and $-\infty$.

These examples inspire us to develop a weaker sense of convergence, where we only take intervals into account (because we would expect continuous functions to play a role).

Definitions

From the example above, it is clear that it is not expected to reach $=1$ all the time. Therefore we consider $\le 1$ instead, hence the following weakened version of probability measure and distribution function follow.

Definition 1. A measure $\mu$ on $\mathbb{R}$ is a subprobability measure (s.p.m.) if $\mu(\mathbb{R}) \le 1$. Correspondingly, one defines the subdistribution function (s.d.f.) with respect to $\mu$ by
$\forall x, F(x)=\mu((-\infty,x]).$

When $\mu(\mathbb{R})=1$, there is nothing new, but even if not, we do not have very much obstacles. Still we see $F(x)$ is a right continuous function with $F(-\infty)=0$ and $F(+\infty)=\mu(\mathbb{R}) \le 1$. For brevity’s sake, we will write $\mu((a,b])$ into $\mu(a,b]$ from now on, and similarly for other kind of intervals. We also put $\mu(a,b)=0$ when $a>b$ because why not.

Our examples also warn us that atoms are a big deal, which leads us to the following definition concerning intervals.

Definition 2. Notation being above, an interval $(a,b)$ is called a continuous interval if neither $a$ nor $b$ is an atom of $\mu$, i.e. if $\mu(a,b)=\mu[a,b]$.

One can test if $(0,1)$ is a continuous interval in our first group of examples. Now we are ready for the definition of vague convergence.

Definition 3. A sequence $(\mu_n)$ of s.p.m. is said to converge vaguely to an s.p.m. $\mu$ if there exists a dense subset $D \subset \mathbb{R}$ such that
$\forall a,b\in D,a<b,\quad \mu_n(a,b] \to \mu(a,b].$
We write $\mu_n \xrightarrow{v} \mu$.

Let $(F_n)$ be the corresponding s.d.f. of $(\mu_n)$ and $F$ the s.d.f. of $\mu$. Then we say that $F_n$ converges vaguely to $F$ and write $F_n \xrightarrow{v} F$.

It is unfair that we are not building the infrastructure for random variables (r.v.) in this context. We introduce the following concept that you may have already studied in the calculus-based probability theory:

Definition 4. Let $(X_n)$ be a sequence of r.v.’s with corresponding cumulative distribution functions (c.d.f.) $(F_n)$. We say $X_n$ converge weakly or in distribution to $X$ (with corresponding c.d.f. $F$) if $F_n \xrightarrow{v} F$.

In calculus-based probability theory, one studies that $F_n(x) \to F(x)$ whenever $F$ is continuous at $x$. This definition is easier to understand but has skipped a lot of important details.

Equivalent Conditions

In this section we study vague convergence in a view of measure theory, utilising $\varepsilon-\delta$ arguments most of the time. We will see that the convergence looks quite similar to the convergence of $\mathbb{R}$.

Let $(a_n)$ be a sequence of real numbers, we can recall that

If $(a_n)$ converges, then the limit is unique.
If $(a_n)$ is bounded, then it has a bounded subsequence.
If every subsequence of $(a_n)$ converges to $a$, then $a_n$ converges to $a$.

These results are natural in the context of calculus, but in the world of topology and functional analysis, these are not naturally expected. However, s.p.m.’s enjoy all three of them (for the second point, notice that an s.p.m. is bounded in a sense anyway.) Nevertheless, it would be too ambitious to include everything here and assume that the reader will finish it in one shot.

Theorem 1. Let $(\mu_n)$ and $\mu$ be s.p.m.’s. The following conditions are equivalent:

(1) $\mu_n \xrightarrow{v} \mu$.

(2) For every finite interval $(a,b)$ and $\varepsilon>0$, there exists an $n_0(a,b,\varepsilon)$ such that whenever $n \ge n_0$,
$\mu(a+\varepsilon,b-\varepsilon)-\varepsilon \le \mu_n(a,b) \le \mu(a-\varepsilon,b+\varepsilon)+\varepsilon$
(3) For every continuity interval $(a,b]$ of $\mu$, we have
$\mu_n(a,b] \to \mu(a,b].$
When $(\mu_n)$ and $\mu$ are p.m.’s, the second condition is equivalent to the “uniformed” edition:

(4) For every $\delta>0$ and $\varepsilon>0$, there exists $n_0(\delta,\varepsilon)$ such that if $n \ge n_0$, then for every interval $(a,b)$, possibly infinite:
$\mu(a+\delta,b-\delta)-\varepsilon \le \mu_n(a,b) \le \mu(a-\delta,b+\delta)+\varepsilon.$

Proof. We first study the equivalence of the first three statements. Suppose $\mu_n$ converges vaguely to $\mu$. We are given a dense subset $D$ of the real line such that whenever $a,b \in D$ and $a0$, there are $a_1,a_2,b_1,b_2 \in D$ satisfying

$a-\varepsilon<a_1<a<a_2<a+\varepsilon,\;b-\varepsilon<b_1<b<b_2<b+\varepsilon.$

By vague convergence, there exists $n_0>0$ such that whenever $n \ge n_0$,

$|\mu_n(a_i,b_j]-\mu(a_i,b_j]|<\varepsilon$

for $i=1,2$ and $j=1,2$. It follows that

$\mu_n(a,b) \ge \mu_n(a_2,b_1] > \mu_2(a_2,b_1]-\varepsilon\ge \mu(a+\varepsilon,b-\varepsilon)-\varepsilon$

and on the other hand

$\mu_n(a,b) \le \mu_n(a_1,b_2] < \mu(a_1,b_2]+\varepsilon \le \mu(a-\varepsilon,b+\varepsilon)+\varepsilon.$

Combining both, the implication is clear.

Next, we assume (2), and let $(a,b)$ be a continuous interval of $\mu$, i.e we have $\mu(a,b)=\mu[a,b]$. The relation $\mu(a+\varepsilon,b-\varepsilon)-\varepsilon \le \mu_n(a,b)$ implies that

$\mu(a+\varepsilon,b-\varepsilon) - \varepsilon \le \varliminf_{n\to\infty} \mu_n(a,b)$

holds for all $\varepsilon>0$. On the other hand, as $\varepsilon \to 0$ on the left hand side, we see

$\mu(a,b) \le \varliminf_{n\to\infty}\mu_n(a,b).$

Likewise, the relation $\mu_n(a,b) \le \mu(a-\varepsilon,b+\varepsilon)+\varepsilon$ yields

$\varlimsup_{n\to\infty}\mu_n(a,b) \le \mu(a-\varepsilon,b+\varepsilon)+\varepsilon.$

As $\varepsilon \to 0$ on the right hand side, we obtain

$\varlimsup_{n \to \infty}\mu_n(a,b) \le \mu[a,b]=\mu(a,b).$

To conclude both sides, notice that

$\mu(a,b) \le \varliminf_{n\to\infty}\mu_n(a,b)\le\varlimsup_{n\to\infty}\mu_n(a,b)\le\mu[a,b]=\mu(a,b).$

This forces $\mu_n(a,b)$ to converge to $\mu(a,b)$. This implies that $\mu_n(a,b] \to \mu(a,b]$. To see this, pick another continuous interval $(a,b’)$ which properly contains $(a,b)$. Then $(b,b’)$ is another continuous interval. It follows that

$\mu_n(a,b] = \mu_n(a,b')-\mu_n(b,b') \to \mu(a,b')-\mu(b,b')=\mu(a,b].$

Assume (3). Notice that the set of atoms $A$ of $\mu$ has to be at most countable, therefore $D=\mathbb{R} \setminus A$ is dense in $\mathbb{R}$. On the other hand, $(a,b]$ is a continuous interval if and only if $a,b \in D$. This implies (1).

The arguments above also shows that when discussing vague convergence, one can replace $(a,b]$ with $(a,b)$, $(a,b]$ or $[a,b]$ freely, as long as $(a,b)$ is a continuous interval. It also follows that $\mu_n(\{a\}) \to 0$.

For (4), as (4) implies (2) (by taking $\delta=\varepsilon>0$), it remains to show that (3) implies (4) assuming that $\mu_n$ and $\mu$ are p.m.’s. Indeed, it suffices to prove it on a finite interval, and we will firstly justify this action. Let $A$ denote the set of atoms of $\mu$. First of all we can pick integer $n>0$ such that $\mu(-n,n) > 1-\frac{\varepsilon}{4}$ (that is, the interval is so big that the measure is close to $1$ enough). Pick $\alpha,\beta \in A^c$ such that $a \le -n$ and $b \ge n$ (this is possible because $A^c$ is dense). For the interval $(\alpha,\beta)$, we can put a finite partition

$\alpha=a_1<a_2<\dots<a_\ell=\beta$

such that $|a_{j+1}-a_j| \le \delta$ and $a_j \in A^c$ for all $j=1,\dots,\ell-1$. Therefore, we have

$\begin{equation} \mu((a_1,a_\ell)^c)<\frac{\varepsilon}{4}. \end{equation}$

By (3), there exists $n_0$ depending on $\varepsilon$ and $\ell$ (thereby $\delta$) such that

$\sup_{1 \le j \le \ell -1}|\mu(a_j,a_{j+1}]-\mu_n(a_j,a_{j+1}]| < \frac{\varepsilon}{4\ell},$

for all $n \ge n_0$. Adding over all $j$, replacing the endpoint with open interval, we see

$|\mu(a_1,a_\ell)-\mu_n(a_1,a_\ell)|<\frac{\varepsilon}{4}.$

It follows that

$\begin{aligned} \mu_n((a_1,a_\ell)^c) &= 1-\mu_n(a_1,a_\ell) \\ &= \mu((a_1,a_\ell))+\mu((a_1,a_\ell)^c)-\mu_n(a_1,a_\ell) \\ &\le \mu((a_1,a_\ell)^c)+|\mu(a_1,a_\ell)-\mu_n(a_1,a_\ell)| \\ &< \frac{\varepsilon}{2}. \end{aligned}$

(This is where being p.m. matters.) Therefore when $n \ge n_0$ and discussing $\mu(a,b)$ versus $\mu_n(a,b)$, ignoring $(a,b) \setminus (a_1,a_\ell)$ results only in an error of $<\frac{\varepsilon}{2}$. Therefore it suffices to assume that $(a,b) \subset (a_1,a_\ell)$ and show that

$\mu(a+\delta,b-\delta)-\frac{\varepsilon}{2}\le \mu_n(a,b) \le \mu(a-\delta,b+\delta)+\frac{\varepsilon}{2}.$

Since $(a,b) \subset (a_1,a_\ell)$, there exists $j,k$ with $1 \le j \le k < \ell$ such that

$\begin{aligned} \mu_n(a+\delta,b-\delta)-\frac{\varepsilon}{4} &\le \mu_n(a_{j+1},a_k)-\frac{\varepsilon}{4} \le \mu(a_{j+1},a_k) \le \mu(a,b) \\ &\le \mu(a_j,a_{k+1}) \le \mu_n(a_j,a_{k+1})+\frac{\varepsilon}{4} \\ &\le \mu_n(a-\delta,b+\delta) + \frac{\varepsilon}{4}. \end{aligned}$

This concludes the proof and demonstrates why our specific choice of $a_j$ is important. $\square$

We cannot give a treatment of all three points above but the first point, the unicity of vague limit, is now clear.

Corollary 1 (Unicity of vague limit). Notation being in definition 3. If there is another s.p.m. $\mu’$ and another dense set $D’$ such that whenever $a,b \in D’$ and $a<b$, one has $\mu_n(a,b] \to \mu’(a,b]$, then $\mu$ and $\mu’$ are identical.

Proof. Let $A$ be the set of atoms of $\mu$ and $\mu’$; then if $a,b \in A^c$, one has $\mu_n(a,b] \to \mu(a,b]$ and $\mu_n(a,b] \to \mu’(a,b]$. Therefore $\mu(a,b]=\mu’(a,b]$. Since $A^c$ is dense in $\mathbb{R}$, them two must be identical. $\square$

Vague Convergence in Measure-theoretic Probability Theory - Equivalent Conditions

https://desvl.xyz/2023/02/13/vague-convergence/

Author

Desvl

Posted on

2023-02-13

Updated on

2023-07-08

Licensed under

#Chung

Vague Convergence in Measure-theoretic Probability Theory - Equivalent Conditions

Introduction

Examples: Failure of Convergence on Intervals

Example: Failure of Converging to a Probability Measure

Definitions

Equivalent Conditions

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Catalogue

Links