cwave.eu5.org
Also see: http://www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com
Free Guestbook
My Guestbook

Last updated on Fri May 21 11:52:14 IST 2010.

Paired sample problem

Introduction

Example: If you take a bar of steel, and apply a large enough force on it, it will snap. The minimum force to snap it is called the breaking strength of the steel. To increase the breaking strength of steel manufacturers treat the steel in various ways in the factory, e.g., they add carbon to it, anneal it (i.e., heat and then slowly cool it) and so on. Suppose that someone has proposed such a treatment, and we want to test if it really increases the breaking strength. For this we take n steel bars and cut each into two halves. Then we measure the breaking strength of the first halves. Call these as
X1,...,Xn.
Next we apply the treatment on the other halves, and measure their breaking strengths:
Y1,...,Yn.

It is reasonable to assume that
(X1,Y1),..., (Xn,Yn)
are indep but not necessarily identically distributed. This is called a paired sample set up. Define
Zi = Yi-Xi.
If the treatment is really effective we would expect mmost of the Z's to be positive. If the treatment has no effect then Z's are as likely to be positive as negative. In nonparametric statistical inference it is customary to assume that Z's all have continuous distributions with a common median, θ which is unknown. Then the problem is to test
H0: θ = 0 Vs H1: θ > 0
We can of course similarly be interested in the other onesided test or a two sided test.

Wilcoxon's signed rank test

We have already seen one test for this, viz, the sign test. However, it uses uses nothing but but the sign sign and hence loses much of the information present in the data. This is partly remedied in Wilcoxon's Signed Rank test. Here we assume that the Zi's are iid and symmetric around θ. Define
si = 1 if Zi > 0
0 else
Let Ri be the ranks of |Zi|'s, eg, if |Z7| is the largest |Zi| then R7=n, and so on. Then our test statistic is
T = ∑ si Ri
It is called Wilcoxon's signed rank statistic.

Example: Suppose that the Zi's are
1.4 -2.9 2.4 -3.4 5.9 2.5
The we compute the following

   |Zi| 1.4 2.9 2.4 3.4 5.9 2.5
   si    1   0   1   0   1   1
   Ri    1   4   2   5   6   3
   
This produces T = 1+2+6+3 = 9.

We should reject H0 if T is large.
Claim: T is distribution-free under H0.

Undr H0, Z's are symmetric around 0. So |Z| and s are indep. [See the exercise below.] Since R's are functions of |Z|'s, hence the s's are indep of the R's. Also s's are iid +1 or -1 with probability 1/2 under H0.

Let's compute the conditional null distribution of T | Ri's = ∑ siRi | Ri. In this conditional distribution, Ri's are just some fixed permutation of {1,...,n}. Multiplying them with si's and adding, amounts to performing iid unbaised coin tosses for each i, and then adding the Ri's corresponding to the heads. The distribution of the sum obviously does not depend upon the permutation. So T is indep of the Ri's and T is distribution-free under H0.

Exercise 1.1: Show that |Z| is indep of s if Z has distribution symmetric around 0.

Exercise 1.2: Compute the null distribution of T explicitly for n = 4.

Exercise 1.3: Show that, under H0, E(T) = n(n+1)/4 and that T has symmetric distribution around E(T).
Interchange 0's and 1's.

Please note that the definition of T here is slightly different from that done in class. There the signs were -1 and +1. Here they are 0 and 1. However, both the definitions produce equivalent tests in the sense that T defined in class is large iff T defined here is large. Both the definitions are used for Wilcoxon signed rank test.

Next we shall compute Var(T) under H0. In order to do this it will be helpful to rewrite T as follows.

Define Di's as "inverse" of Ri's in the following sense:
Di = j iff Rj = i.
Sometimes Di's are called antiranks. Define
Ui = 1 if ZDi > 0
0 else

Exercise 1.4: Show that
T = ∑ i Ui.

Exercise 1.5: Show that, under H0, E(Ui) = 1/2, and Var(Ui) = 1/4.

Exercise 1.6: Show that, under H0,
E(UiUj) = 1/4 if i≠j.
Hence conclude that if i ≠ j, then cov(Ui,Uj)=0.

Exercise 1.7: Use the above exercises to show that
V(T) = n(n+1)(2n+1)/24
under H0.

Exercise 1.8: Write a computer program to verify the upper-tail probabilities mentioned in table a.4 of the book by Hollander and Wolfe for n=10. If you cannot find the book, then just write a program to find the P(T > a) for any given a.
Use binary addition to step through all the 2n patterns of 0's and 1's.

Why is the symmetry assumption reasonable?
Note that Wilcoxon's signed rank test uses more information than the sign test. But it also requires an extra symmetry assumption. This assumption is often a reasonable one as we see below.

Example: Let X be the breaking strength before the treatment, and let Y be that after the treatment. Assume that X,Y are independent and that their distribution differs only in location. In other word, if X ~ F and Y ~ G, then
F(x) = G(x-θ) for some θ

Exercise 1.9: Show that in this case X-Y has distribution symmetric around θ. [Hint: First consider the case θ=0.]


PrevNext
© Arnab Chakraborty (2010)