cwave.eu5.org
Also see: http://www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com
Free Guestbook
My Guestbook

Last updated on Fri May 21 11:52:14 IST 2010.

Tests of association

Kendall's tau

Example: Are class 10 and class 12 results associated? In other words, is a student doing well in the first exam also likely to do well in the second? To check this we collect data from n students:
(X1,Y1),...,(Xn,Yn)
where
X = class 10 grade
Y = class 12 grade.

Assume that (Xi,Yi)'s are iid with some continuous bivariate distribution F(x,y), which is unknown. We can compute sample correlation to test for the association, but it is not distribution-free (i.e., its distribution involves the unknown F) so we cannot compute critical values for a test based on the sample correlation coefficients. Kendall's τ test is a nonparametric way out. Here we proceed as follows.

Pick two students i, j (i≠ j). Call them concordant if
either (Xi < Xj and Yi < Yj)
or (Xi > Xj and Yi > Yj)
Otherwise, call them discordant. Note that due to the continuity assumption on F, we do not need to worry about ties. Let
θ = P(two randomly chosen students are concordant)
= 1-P(they are discordant)

Definition Kendall's τ parameter is defined as
τ = θ - (1-θ) = 2 θ - 1.

Exercise 2.1: If X and Y are independent then show that τ = 0.

Exercise 2.2: Is the converse true?
[Hint: Use the next exercise. Take the distribution of X suitably. Then attach a random sign to X to get Y.]

Exercise 2.3: Show that
τ = E(sign(X1-X2)*sign(Y1-Y2)).
Here
sign(u) = 1 if u > 0
= 0 if u = 0
= -1 if u < 0

If τ > 0 then concordant pairs occur more often than discordant pairs, i.e., if student i did better than student j in class 10, he/she is likely to do better again in the class 12. Thus, "τ > 0" means positive association. We want to test
H0: X, Y independent Vs. H1: τ ≠ 0
or
H0: X, Y independent Vs. H1: τ > 0
or
H0: X, Y independent Vs. H1: τ < 0
We can estimate τ by
T = ∑ i <j sign(Xi-Xj) sign(Yi-Yj) / nC2.
We shall use this as our test statistic. It is easier to compute this using concommitants than by using the definition directly.

Definition (Concommitant) Suppose that we have bivariate data
(X1,Y1), ..., Xn,Yn )
Let Ri's denote the ranks of Yi's. Sort the sample by the X's. This permutes the R's in some way. The permuted R's are called the concommitants of Y wrt X.

Example: Suppose that we have the data

   original Xi: 3    2   1.5  4     9
   Original Yi: 5   -9   2    3.4  10
   Y-ranks: Ri: 4    1   2    3     5
   Sorted X(i): 1.5  2   3    4     9
   Permuted Ri: 2    1   4    3     5
   
The last row gives the concommitants of Y wrt X.

Exercise 2.4: Show that
T = - ∑ i < j sign(Si-Sj) / nC2,
where the Si's are the concommitants of Y wrt X (e.g., in the last example S1=2, S2=1 etc.)

We reject H0 if |T| is large (for the two-sided alternative.) Similarly for the one-sided cases. To quantify how large is large enough, we need to know the null distribution of T.

Exercise 2.5: Show that T is distribution-free under H0.

Exercise 2.6: Explicitly compute the null distribution of T when n = 3.

Next, let us compute the moments of T.

Exercise 2.7: Show that E(T) = τ.

The formula for Var(T) for general τ is complicated (it is given in the textbook). We shall here compute Var(T) under H0. Since E(T) = 0 under H0, hence Var(T) = E(T2). For 1 ≤ i, j ≤ n, define
a(i,j) = sign(Xi-Xj) sign(Yi-Yj)

Exercise 2.8: Show that under H0
E(a(i,j) a(r,s)) = 0 if i ≠ r, j ≠ s

Exercise 2.9: Show that under H0
E(a(i,j) a(r,s)) = 1/9 if i = r, j ≠ s

If X ~ F, then E[(F(X)-1)2] = 1/3, since F(X) ~Unif(0,1).

Exercise 2.10: Show that under H0
E(a(i,j) a(r,s)) = 1 if i = r, j = s

Exercise 2.11: Show that the number of (i,j) and (r,s) with i=r and j ≠ s, is n(n-1)(n-2).

Exercise 2.12: Use the above exercises to show that, under H0,
Var(T) = (2n+5)/9n(n-1)

Asymptotic distribution

Computing the exact distribution of T for large sample size, is complicated. So one then uses the asymptotic distribution of T.

Theorem For any τ (not just under H0)
(T-E(T))/sqrt(Var(T)) ~ AN(0,1).

Proof: This will follow from a theorem about U-statistics that we shall prove later.


PrevNext
© Arnab Chakraborty (2010)