[Home]

Joint distribution

Definition: Jointly distributed random variables When we say that some random variables are jointly distributed, we mean that they are all defined on the same probability space.

If we want to combine values of different random variables (e.g., by addition, subtraction etc or comparison like $\leq$), then they must be jointly distributed. If we have $n$ jointly distributed real-valued random variables, then you may consider them as components of an ${\mathbb R}^n$-valued random variable. Sometimes we call such a random variable a multivariate random variable, as opposed to a univariate one.

We shall now extend the various familiar concepts about ${\mathbb R}$-valued random variables to ${\mathbb R}^n$-valued random variables.

Definition: Joint CDF Let $X = (X_1,...,X_n)$ be an ${\mathbb R}^n$-valued random variable. Its joint CDF is defined as $F:{\mathbb R}^n\rightarrow{\mathbb R}$ where for all $(x_1,...,x_n)\in{\mathbb R}^n$ $$ F(x_1,...,x_n) = P(X_1\leq x_1~\&~\cdots~\&~X_n\leq x_n). $$

The extension of the concept of discreteness is straightforward.

Definition: Discrete An ${\mathbb R}^n$-valued random variable $X$ is called discrete if there is a countable set $A\subseteq{\mathbb R}^n$ such that $P(X\in A)=1.$

The definition of continuous random variable is slightly more confusing. For ${\mathbb R}$-valued random variables we had two equivalent definitions:

ever singleton set has probability zero,
CDF is continuous.

For an ${\mathbb R}^n$-valued random variable, these two conditions are not equivalent (the latter is stronger). We use the stronger condition as the defintion of continuity of an ${\mathbb R}^n$-valued random variable.

Caution: Most books take a much stronger definition of continuity for joint distribution. More precisely, that definition should be called absolute continuity, which we shall learn later.

Definition: Continuous An ${\mathbb R}^n$-valued random variable $X$ is called continuous if its joint CDF is continuous.

The following example shows that the first condition is indeed weaker than the second.

EXAMPLE: Consider the function with the following graph:

Clearly it satisfies the 4 conditions of being a CDF. Hence we know that there is a random variable $X$ with this CDF (by the fundamental theorem).

Define a ${\mathbb R}^2$-valued random variable as $Y=(X,1).$ Show that for any $(a,b)\in{\mathbb R}^2$ we have $P(Y=(a,b))=0.$ Also show that the CDF of $Y$ is not continuous.

SOLUTION: $P(Y=(a,b))= P(X=a~\&~1=b)\leq P(X=a)=0,$ since $X$ is a continuous random variable.

Also, the joint CDF is $$ F(a,b) = P(X\leq a~\&~1\leq b) = \left\{\begin{array}{ll}0&\text{if }b < 1\\F(a)&\text{if }b\geq 1.\\\end{array}\right. $$ If we take $(a_n,b_n) =\left( \frac 12, 1-\frac 1n\right),$ then $(a_n,b_n)\rightarrow \left(\frac 12,1\right).$

Now $F(a_n,b_n)\equiv 0,$ and so $F(a_n,b_n)\rightarrow 0.$

But $F\left(\frac 12,1\right) = \frac 12\neq 0.$

Definition: Joint PMF Let $X$ be an ${\mathbb R}^n$-valued discrete random variable. Then its joint PMF is the function $p:{\mathbb R}^n\rightarrow{\mathbb R}$ defined as $$ p(x_1,...,x_n)= P(X_1=x_1~\&~\cdots~\&~X_n=x_n). $$

Marginal distributions

If you are given two jointly distributed random variables $X,Y$ and you know their joint distribution, i.e. given any $A\subseteq{\mathbb R}^2$ you know $P((X,Y)\in A),$ then you can work out the probability distribution of $X$ and $Y$ separately from this, i.e., for any fiven $B\subseteq{\mathbb R}$ you can find $P(X\in B)$ and $P(Y\in B)$ as follows:

$P(X\in B) = P(X\in B~\&~ Y\in{\mathbb R}) = P((X,Y)\in A),$ where $A = B\times{\mathbb R}.$ Similarly, for $Y.$

Definition: Marginal distribution Let $X=(X_1,...,X_n)$ be an ${\mathbb R}^n$-valued random variable. For any $\{i_1,...,i_k\}\subseteq\{1,2,...,n\}$ the joint distribution of $(X_{i_1},...,X_{i_k})$ is called a $k$-dimensional marginal for the joint distribution of $X.$

Theorem Let $(X,Y)$ be an ${\mathbb R}^2$-valued random variable with joint CDF $F(x,y).$ Then the marginal CDF of $X$ is $$ F_X(x) = P(X\leq x) = \lim_{y\rightarrow \infty} F(x,y) $$ and the marginal CDF of $X$ is $$ F_Y(y) = P(Y\leq y) = \lim_{x\rightarrow \infty} F(x,y). $$

Expectation

The definition of expectation is straightforward extension of the univariate case.

Definition: Expectation Let $X$ be an ${\mathbb R}^n$-valued discrete random variable with PMF $p(x)$. Let $f:{\mathbb R}^n\rightarrow {\mathbb R}$ be any function. Then $E(h(X))$ is defined as follows.

If $\sum_x |h(x)| p(x) < \infty,$ then $$ E(h(X)) = \sum_x h(x) p(x). $$
If $\sum_x |h(x)| p(x) = \infty,$ then
1. if all but finitely many terms in the sum are positive, we define $E(h(X))=\infty.$
2. if all but finitely many terms in the sum are negative, we define $E(h(X))=-\infty.$
3. if there are infinitely many positive and negative terms, then $E(h(X))$ is undefined.

If $X$ is an ${\mathbb R}^n$-valued random variable, and $h:{\mathbb R}^n\rightarrow {\mathbb R}^m$ is any function, then $E(h(X))$ is defined component by component, and is said to exists finitely iff all the component expectations exist finitely.

Theorem If $X, Y$ are jointly distributed real-valued random variables, each with finite expectation, then $X+Y$ also has finite expectation $$ E(X+Y) = E(X)+E(Y). $$

Proof: In this course we shall prove this only when $X,Y$ are both discrete random variables.

First, notice that $X+Y$ is again discrete.

Because: If $X$ takes values in the countable set $\{x_1,x_2,...\}$ and $Y$ take values in the countable set $\{y_1,y_2,...\},$ then each possible value of $X+Y$ must be of the form $x_i+y_j.$ There are only countably many such values.

Let $p_{ij} = P(X=x_i~\&~ Y=y_j).$

Then $P(X=x_i) = \sum_j p_{ij}$ and $P(Y=y_j) = \sum_i p_{ij}.$

So $E(X) = \sum_i x_i P(X=x_i) = \sum_i x_i \sum_j p_{ij},$ and $E(Y) = \sum_j y_j P(Y=y_j) = \sum_j y_j \sum_i p_{ij} .$

By the given condition both these series converges absolutely, and may be grouped and arranged in any way without changing the sum.

So $\sum_i\sum_j |x_i p_{ij}|< \infty,$ and $\sum_j\sum_i |y_j p_{ij}|< \infty.$

Now $|x_i+y_j|\leq |x_i|+|y_j|$ by triangle inequality.

Hence $\sum_{i,j} |(x_i+y_j)p_{ij}| <\infty$ and so $E(X+Y)$ exists finitely. Also $$ E(X+Y) = \sum_{i,j} (x_i+y_j)p_{ij} = \sum_i\sum_j x_ip_{ij} + \sum_j\sum_i y_jp_{ij} = E(X)+E(Y), $$ as required. [QED]

This result leads to simple trick that we discuss next.

Indicator trick

Suppose that you are to find expected number of something. For example, $n$ letters are randomly put into $n$ addressed envelops, and you are to find $E(X),$ where $X$ is the number of correctly placed letters. would you count $X$ In any given situation like the following, you can find $X$ by first putting a check mark for each correctly placed letter and then counting the total number of check marks.

Mathematically each ckec mark is an indicator. For example, the indicator for the $i$-th letter is $$ I_i = \left\{\begin{array}{ll}1&\text{if }i\mbox{-th letter is placed correctly}\\0&\text{otherwise.}\end{array}\right.. $$ Counting the number of check marks amounts to summing $I_i$'.s Thus, $X = \sum I_i.$

Notice that each $I_i$ is a random variable, and $E(X) = \sum E(I_i).$

Since each $I_i$ takes only the values $1$ and $0,$ hence $E(I_i) = P(I_i=1).$

Now $I_i=1$ means $i$-th letter has been placed correctly. This is has probability $\frac{(n-1)!}{n!} = \frac 1n.$

So $E(X) = n\times \frac 1n = 1.$

It's a bit surprising that $E(X)$ does not depend on $n.$

Independent random variables

An important special case of jointly distributed random variables is that of independent random variables. To state the definition we shall intriduce a new terminology: If $X:\Omega\rightarrow S$ is a random variable, then by "an event in terms of $X$" we shall mean $\{w\in\Omega~:~ X(w)\in A\}$ for some $A\in S.$ Similarly, if $X:\Omega\rightarrow S$ and $Y:\Omega\rightarrow T$ are jointly distributed random variables, then "an event in terms of $X,Y$" means $\{w\in\Omega~:~ (X(w),Y(w))\in A\},$ where $A\subseteq S\times T.$

Definition: Indepdendent random variables Let $X_1,...,X_n$ be jointly distributed random variables. We say that they are independent if for all disjoint subsets $A,B\subseteq\{1,...,n\}$ any event in terms of $\{X_i~:~i\in A\}$ is independent of any event in terms of $\{X_i~:~i\in B\}.$

EXAMPLE: If $X,Y,Z$ are independent random variables, then $$ P(X^2+Y^2 \leq 4~\&~ Z\neq 5) = P(X^2+Y^2 \leq 4)P(Z\neq 5). $$

Theorem If $X_1,...,X_n$ are independent random variables, then any function of some of the $X$'s is independent of any function of the remaining $X$'s.

Proof: Split $\{1,...,n\}$ into two disjoint subsets $\{i_1,...,i_k\}$ and $\{j_1,...,j_{n-k}\}.$

Let $Y = f(X_{i_1,...,i_k})$ and $Z = g(X_{j_1,...,j_{n-k}}),$ where $f,g$ are any two functions.

Take any two sets $A,B.$ Then $$P(Y\in A~\&~Z\in B) = P(f(X_{i_1,...,i_k})\in A~\&~g(X_{j_1,...,j_{n-k}})\in B) = P(f(X_{i_1,...,i_k})\in A)P(g(X_{j_1,...,j_{n-k}})\in B) = P(Y\in A)P(Z\in B). $$ [QED]

Theorem Let $X,Y$ be jointly distributed discrete random variables, with PMFs $p(x)$ and $q(x).$ If they are independent, then their joint PMF is $h(x,y) = p(x)q(y).$

Proof:Immediate from the definition of independence.[QED]

Theorem Let $X,Y$ be jointly distributed random variables, with CDFs $F(x)$ and $G(x).$ If they are independent, then their joint CDF is $H(x,y) = F(x)G(y).$

Proof:Immediate from the definition of independence.[QED]

Theorem If $X,Y$ are independent random variables with finite expectations, then $E(XY) = E(X)E(Y).$

Proof: We shall prove this for the case where $X,Y$ are both discrete (hence so is $(X,Y)$).

Let $p(x,y), p_X(x)$ and $p_Y(y)$ be the joint and marginal PMFs, respectively.

Then $$ E(XY) = \sum_{x,y} xy p(x,y) = \sum_{x,y} xy p_X(x)p_Y(y) = \sum_x x p_X(x)\times \sum _y yp_Y(y) = E(X)E(Y). $$ The grouping and rearranging were justified since the series were absolutely convergent. [QED]

Covariance

Definition: Covariance If $X,Y$ are jointly distributed random variables, then their covariance is defined as $$ cov(X,Y) = E[(X-E(X))(Y-E(Y))]. $$

Theorem $cov(X,Y) = E(XY)-E(X)E(Y).$

Proof: By direct algebraic expansion. [QED]

Theorem If $X,Y$ are independent and $E(X^2), E(Y^2) < \infty,$then $cov(X,Y)=0.$ The converse is not true.

Proof: The first part follows immediately from the fact that $E(XY)=E(X)E(Y).$

A counter example for the second part is as follows.

$X$ takes values $-1,0,1$ with equal probabilities. $Y = |X|.$ Direct computation shows $E(X)=E(XY)=0$ and so $cov(X,Y)=0.$

But $P(X=0~\&~Y=1) = 0 \neq P(X=0)P(Y=1).$ [QED]

The $cov(\cdot,\cdot)$ function behaves much like ordinary multiplication. The following theorems show this.

Theorem $cov(X,Y)=cov(Y,X).$

Theorem $cov(\sum a_i X_i, \sum b_j Y_j) = \sum_{i,j} a_ib_jcov(X_i,Y_j).$

Also we have

Theorem $cov(X,X) = V(X).$

Theorem $cov(aX+b,cY+d) = ac cov(X,Y).$

EXAMPLE: The analog of $(a+b)^2 = a^2+2ab+b^2$ here is $V(X+Y) = V(X)+2 cov(X,Y) +V(Y).$ This also shows that if $X,Y$ are independent, then $V(X+Y) = V(X)+V(Y).$

Theorem If $X$ or $Y$ is a degenerate random variable, then $cov(X,Y)=0.$

Cauchy-Scwartz inequality $cov(X,Y)^2 \leq V(X)V(Y).$ Equality holds iff $\exists a,b,c\in{\mathbb R}~~P(aX+bY=c)=1.$

Proof: The result is obvious if $X$ is degenerate. So let's consider the case where $X$ is not degenerate. Then $V(X)>0.$

Define $Z = Y-\underbrace{\frac{cov(X,Y)}{V(X)}}_\beta X.$

We know that $V(Z)\geq 0.$

Now, $$ V(Z) = V(Y) + V(\beta X) - 2cov(Y,\beta X) = V(Y) + \beta^2 V(X) - 2 \beta cov(X,Y). $$ Since $\beta = \frac{cov(X,Y)}{V(X)},$ this reduces to $$ V(Y) - \frac{cov(X,Y)^2}{V(X)}. $$ Since this is $\geq0,$ the inequality follows immediately.

Also equality holds iff $V(Z)=0$, i.e., $Z$ is degenerate.

So we have $V(X) X - cov(X,Y) Y = kV(X)$ for some $k\in{\mathbb R}.$

This completes the proof. [QED]

Definition: Correlation If $X,Y$ are jointly distributed random variables with $V(X), V(Y)>0,$ then their correlation is defined as $$ \rho(X,Y)= \frac{ cov(X,Y) }{ \sqrt{V(X)V(Y)} }. $$

By Cauchy-Scwartz inequality, $rho(X,Y) \in [-1,1].$ Also, $\rho(X,Y)=-1$ or $\rho(X,Y)=1$ if and only if $X,Y$ are linearly linearly related with probability 1, i.e., $\exists a,b,c\in{\mathbb R}$ such that $P(aX+bY=c)=1.$