So far we have seen a number of two sample tests. These as well as many
others all use test statistics of a general form called linear rank
statistics. We discuss this below.
Our set up is as before:
X1,...,Xm iid F
Y1,...,Yn iid G
X's and Y's are mutually independent
F, G both continuous and unknown.
Pool the samples together as
X1,...,Xm,Y1,...,Yn
Let the rank vector of the pooled data be
R1,...,Rm,Rm+1,...,
Rm+n
Thus,
Ri = rank of Xi in the pooled data for 1
≤ i ≤ m
and
Rm+j = rank of Yj in the pooled data for 1
≤ j ≤ n.
Example:
In a controlled experiment to measure the effectiveness of a treatment
suppose that the control group yields the following measurements:
X1 = 2, X2 = 4, X3 = 9,
and the treatment group produces
Y1 = 5, Y2 = 8.
Then here are the pooled data and the rank vector:
pooled data: 2 4 9 5 8
rank vector: 1 2 5 3 4
How to check that a given rank statistic is indeed a linear rank
statistic? The following theorem helps you to do this.
Proof:
Direct argument.
Exercise 10.1:
Is the Mann-Whitney U-statistic a rank statistic? Is it linear? If so, find A.
[Hint: Explicitly write down
the U-statistic in terms of the ranks.]
Exercise 10.2:
Do the same for the 2-stample KS test statistic, D+
Use the theorem above.
Exercise 10.3:
Do the same for the Wald-Wolfowitz run test.
Use the theorem above.
Exercise 10.4:
To justify the terminology used we need to show that a simple linear rank
statistic is indeed a linear rank statistic as we have defined earlier. In
particular, show that a simple linear rank statistic corresponds to a A
matrix of the form
A = u v'
where u and v are column vectors.
Computing moments of simple linear rank statistics
Let
S =
∑
ci a(Ri)
We shall compute E(S) and Var(S) under H0: F = G.
Exercise 10.5:
Show that under H0,
each Ri ~ discrete uniform {1,...,m+n}
and
(R1,...,Rm+n)) ~ discrete uniform over all
(m+n)! possible permutations of {1,...,m+n}
Exercise 10.6:
Use the last exercise to show that
E(S) = (m+n)cbar * abar
where cbar and abar of averages of ci's and a(i)'s respectively.
Exercise 10.7:
Show that
cov(a(Ri),a(Rj))=
-σ2a/(m+n)
Exercise 10.8:
Show that
Var(S) = (m+n-1) σ2a
σ2c,
where σ2c is the variance of the
ci's. The denominator in σ2c is
(m+n-1).
Similarly for σ2a.