cwave.eu5.org
Also see: http://www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com
Free Guestbook
My Guestbook

Last updated on Fri May 21 11:52:18 IST 2010.

Two sample inference

Rank statistics

So far we have seen a number of two sample tests. These as well as many others all use test statistics of a general form called linear rank statistics. We discuss this below.

Our set up is as before:
X1,...,Xm iid F
Y1,...,Yn iid G
X's and Y's are mutually independent
F, G both continuous and unknown.

Pool the samples together as
X1,...,Xm,Y1,...,Yn
Let the rank vector of the pooled data be
R1,...,Rm,Rm+1,..., Rm+n
Thus,
Ri = rank of Xi in the pooled data for 1 ≤ i ≤ m
and
Rm+j = rank of Yj in the pooled data for 1 ≤ j ≤ n.

Example: In a controlled experiment to measure the effectiveness of a treatment suppose that the control group yields the following measurements:
X1 = 2, X2 = 4, X3 = 9,
and the treatment group produces
Y1 = 5, Y2 = 8.
Then here are the pooled data and the rank vector:

   pooled data: 2  4  9  5  8
   rank vector: 1  2  5  3  4
   

Definition A statistic that is a function of the rank vector is called a rank statistic.

Definition Let A be any known (m+n) by (m+n) matrix with (i,j)-th element denoted by a(i,j). The
∑ a(i,Ri)
is called a linear rank statistic.
How to check that a given rank statistic is indeed a linear rank statistic? The following theorem helps you to do this.

Theorem Suppose that T(R) is a rank statistics, where R denotes the rank vector, ie, permutations of 1,...,N, say. Let S be any rank vector, and i,j be in {1,...,N}. Define S(i,j) as the permutation obtained from S by swapping i and j. Then T is a linear rank statistic iff
T(S)-T(S(i,j)) is free of Sk for all k ≠ i,j.

Proof: Direct argument.

Exercise 10.1: Is the Mann-Whitney U-statistic a rank statistic? Is it linear? If so, find A.
[Hint: Explicitly write down the U-statistic in terms of the ranks.]

Exercise 10.2: Do the same for the 2-stample KS test statistic, D+
Use the theorem above.

Exercise 10.3: Do the same for the Wald-Wolfowitz run test.
Use the theorem above.

Definition A statistic of the form
∑ ci a(Ri)
is called a simple, linear rank statistic, where ci's are any constants and a(.) is any function.

Exercise 10.4: To justify the terminology used we need to show that a simple linear rank statistic is indeed a linear rank statistic as we have defined earlier. In particular, show that a simple linear rank statistic corresponds to a A matrix of the form
A = u v'
where u and v are column vectors.

Computing moments of simple linear rank statistics

Let
S = ∑ ci a(Ri)
We shall compute E(S) and Var(S) under H0: F = G.

Exercise 10.5: Show that under H0,
each Ri ~ discrete uniform {1,...,m+n}
and
(R1,...,Rm+n)) ~ discrete uniform over all (m+n)! possible permutations of {1,...,m+n}

Exercise 10.6: Use the last exercise to show that
E(S) = (m+n)cbar * abar
where cbar and abar of averages of ci's and a(i)'s respectively.

Exercise 10.7: Show that
cov(a(Ri),a(Rj))= -σ2a/(m+n)

Exercise 10.8: Show that
Var(S) = (m+n-1) σ2a σ2c,
where σ2c is the variance of the ci's. The denominator in σ2c is (m+n-1). Similarly for σ2a.


PrevNext
© Arnab Chakraborty (2010)