Arnab Chakraborty's classical nonparametric statistics notes

cwave.eu5.org
Also see: http://www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com

My Guestbook

Last updated on Fri May 21 11:52:14 IST 2010.

Home >

... Statistics >

... ... Classical nonparametrics

Tests of association

Kendall's tau

Example: Are class 10 and class 12 results associated? In other words, is a student doing well in the first exam also likely to do well in the second? To check this we collect data from n students:
(X₁,Y₁),...,(X_n,Y_n)
where
X = class 10 grade
Y = class 12 grade.

Assume that (X_i,Y_i)'s are iid with some continuous bivariate distribution F(x,y), which is unknown. We can compute sample correlation to test for the association, but it is not distribution-free (i.e., its distribution involves the unknown F) so we cannot compute critical values for a test based on the sample correlation coefficients. Kendall's τ test is a nonparametric way out. Here we proceed as follows.

Pick two students i, j (i≠ j). Call them concordant if

either (X_i < X_j and Y_i < Y_j)
or (X_i > X_j and Y_i > Y_j)

Otherwise, call them discordant. Note that due to the continuity assumption on F, we do not need to worry about ties. Let

θ = P(two randomly chosen students are concordant)

= 1-P(they are discordant)

Definition Kendall's τ parameter is defined as

τ = θ - (1-θ) = 2 θ - 1.

Exercise 2.1: If X and Y are independent then show that τ = 0.

Exercise 2.2: Is the converse true?
[Hint: Use the next exercise. Take the distribution of X suitably. Then attach a random sign to X to get Y.]

Exercise 2.3: Show that
τ = E(sign(X₁-X₂)*sign(Y₁-Y₂)).
Here

sign(u) = 1 if u > 0

= 0 if u = 0

= -1 if u < 0

If τ > 0 then concordant pairs occur more often than discordant pairs, i.e., if student i did better than student j in class 10, he/she is likely to do better again in the class 12. Thus, "τ > 0" means positive association. We want to test

H₀: X, Y independent Vs. H₁: τ ≠ 0

H₀: X, Y independent Vs. H₁: τ > 0

H₀: X, Y independent Vs. H₁: τ < 0

We can estimate τ by

T = ∑ _{i <j} sign(X_i-X_j) sign(Y_i-Y_j) / ⁿC₂.

We shall use this as our test statistic. It is easier to compute this using concommitants than by using the definition directly.

Definition (Concommitant) Suppose that we have bivariate data

(X₁,Y₁), ..., X_n,Yn )

Let R_i's denote the ranks of Y_i's. Sort the sample by the X's. This permutes the R's in some way. The permuted R's are called the concommitants of Y wrt X.

Example: Suppose that we have the data

original X_i: 3 2 1.5 4 9 Original Y_i: 5 -9 2 3.4 10 Y-ranks: R_i: 4 1 2 3 5 Sorted X_(i): 1.5 2 3 4 9 Permuted R_i: 2 1 4 3 5

The last row gives the concommitants of Y wrt X.

Exercise 2.4: Show that
T = - ∑ _{i < j} sign(S_i-S_j) / ⁿC₂,
where the S_i's are the concommitants of Y wrt X (e.g., in the last example S₁=2, S₂=1 etc.)

We reject H₀ if |T| is large (for the two-sided alternative.) Similarly for the one-sided cases. To quantify how large is large enough, we need to know the null distribution of T.

Exercise 2.5: Show that T is distribution-free under H₀.

Exercise 2.6: Explicitly compute the null distribution of T when n = 3.

Next, let us compute the moments of T.

Exercise 2.7: Show that E(T) = τ.

The formula for Var(T) for general τ is complicated (it is given in the textbook). We shall here compute Var(T) under H₀. Since E(T) = 0 under H₀, hence Var(T) = E(T²). For 1 ≤ i, j ≤ n, define

a(i,j) = sign(X_i-X_j) sign(Y_i-Y_j)

Exercise 2.8: Show that under H₀
E(a(i,j) a(r,s)) = 0 if i ≠ r, j ≠ s

Exercise 2.9: Show that under H₀
E(a(i,j) a(r,s)) = 1/9 if i = r, j ≠ s

If X ~ F, then E[(F(X)-1)²] = 1/3, since F(X) ~Unif(0,1).

Exercise 2.10: Show that under H₀
E(a(i,j) a(r,s)) = 1 if i = r, j = s

Exercise 2.11: Show that the number of (i,j) and (r,s) with i=r and j ≠ s, is n(n-1)(n-2).

Exercise 2.12: Use the above exercises to show that, under H₀,
Var(T) = (2n+5)/9n(n-1)

Asymptotic distribution

Computing the exact distribution of T for large sample size, is complicated. So one then uses the asymptotic distribution of T.

Theorem For any τ (not just under H₀)

(T-E(T))/sqrt(Var(T)) ~ AN(0,1).

Proof: This will follow from a theorem about U-statistics that we shall prove later.