Arnab Chakraborty's classical nonparametric statistics notes

cwave.eu5.org
Also see: http://www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com

My Guestbook

Last updated on Fri May 21 11:52:14 IST 2010.

Home >

... Statistics >

... ... Classical nonparametrics

Paired sample problem

Introduction

Example: If you take a bar of steel, and apply a large enough force on it, it will snap. The minimum force to snap it is called the breaking strength of the steel. To increase the breaking strength of steel manufacturers treat the steel in various ways in the factory, e.g., they add carbon to it, anneal it (i.e., heat and then slowly cool it) and so on. Suppose that someone has proposed such a treatment, and we want to test if it really increases the breaking strength. For this we take n steel bars and cut each into two halves. Then we measure the breaking strength of the first halves. Call these as
X₁,...,X_n.
Next we apply the treatment on the other halves, and measure their breaking strengths:
Y₁,...,Y_n.

It is reasonable to assume that
(X₁,Y₁),..., (X_n,Y_n)
are indep but not necessarily identically distributed. This is called a paired sample set up. Define
Z_i = Y_i-X_i.
If the treatment is really effective we would expect mmost of the Z's to be positive. If the treatment has no effect then Z's are as likely to be positive as negative. In nonparametric statistical inference it is customary to assume that Z's all have continuous distributions with a common median, θ which is unknown. Then the problem is to test
H₀: θ = 0 Vs H₁: θ > 0
We can of course similarly be interested in the other onesided test or a two sided test.

Wilcoxon's signed rank test

We have already seen one test for this, viz, the sign test. However, it uses uses nothing but but the sign sign and hence loses much of the information present in the data. This is partly remedied in Wilcoxon's Signed Rank test. Here we assume that the Z_i's are iid and symmetric around θ. Define

s_i = 1 if Z_i > 0

0 else

Let R_i be the ranks of |Z_i|'s, eg, if |Z₇| is the largest |Z_i| then R₇=n, and so on. Then our test statistic is

T = ∑ s_i R_i

It is called Wilcoxon's signed rank statistic.

Example: Suppose that the Z_i's are
1.4 -2.9 2.4 -3.4 5.9 2.5
The we compute the following

|Z_i| 1.4 2.9 2.4 3.4 5.9 2.5 s_i 1 0 1 0 1 1 R_i 1 4 2 5 6 3

This produces T = 1+2+6+3 = 9.

We should reject H₀ if T is large.

Claim: T is distribution-free under H₀.

Undr H₀, Z's are symmetric around 0. So |Z| and s are indep. [See the exercise below.] Since R's are functions of |Z|'s, hence the s's are indep of the R's. Also s's are iid +1 or -1 with probability 1/2 under H₀.

Let's compute the conditional null distribution of T | R_i's = ∑ s_iR_i | R_i. In this conditional distribution, Ri's are just some fixed permutation of {1,...,n}. Multiplying them with si's and adding, amounts to performing iid unbaised coin tosses for each i, and then adding the Ri's corresponding to the heads. The distribution of the sum obviously does not depend upon the permutation. So T is indep of the R_i's and T is distribution-free under H₀.

Exercise 1.1: Show that |Z| is indep of s if Z has distribution symmetric around 0.

Exercise 1.2: Compute the null distribution of T explicitly for n = 4.

Exercise 1.3: Show that, under H₀, E(T) = n(n+1)/4 and that T has symmetric distribution around E(T).
Interchange 0's and 1's.

Please note that the definition of T here is slightly different from that done in class. There the signs were -1 and +1. Here they are 0 and 1. However, both the definitions produce equivalent tests in the sense that T defined in class is large iff T defined here is large. Both the definitions are used for Wilcoxon signed rank test.

Next we shall compute Var(T) under H₀. In order to do this it will be helpful to rewrite T as follows.

Define D_i's as "inverse" of R_i's in the following sense:

D_i = j iff R_j = i.

Sometimes D_i's are called antiranks. Define

U_i = 1 if Z_{D_i} > 0

0 else

Exercise 1.4: Show that
T = ∑ i U_i.

Exercise 1.5: Show that, under H₀, E(U_i) = 1/2, and Var(U_i) = 1/4.

Exercise 1.6: Show that, under H₀,
E(U_iU_j) = 1/4 if i≠j.
Hence conclude that if i ≠ j, then cov(U_i,U_j)=0.

Exercise 1.7: Use the above exercises to show that
V(T) = n(n+1)(2n+1)/24
under H₀.

Exercise 1.8: Write a computer program to verify the upper-tail probabilities mentioned in table a.4 of the book by Hollander and Wolfe for n=10. If you cannot find the book, then just write a program to find the P(T > a) for any given a.
Use binary addition to step through all the 2ⁿ patterns of 0's and 1's.

Why is the symmetry assumption reasonable?

Note that Wilcoxon's signed rank test uses more information than the sign test. But it also requires an extra symmetry assumption. This assumption is often a reasonable one as we see below.

Example: Let X be the breaking strength before the treatment, and let Y be that after the treatment. Assume that X,Y are independent and that their distribution differs only in location. In other word, if X ~ F and Y ~ G, then
F(x) = G(x-θ) for some θ

Exercise 1.9: Show that in this case X-Y has distribution symmetric around θ. [Hint: First consider the case θ=0.]