 |
Paired sample problem
Introduction
| Example:
If you take a bar of steel, and apply a large enough force on it, it will
snap. The minimum force to snap it is called the breaking strength of the
steel. To increase the breaking strength of steel manufacturers treat the
steel in
various ways in the factory, e.g., they add carbon to it, anneal it (i.e.,
heat and then slowly cool it) and so on. Suppose that someone has proposed
such a treatment, and we want to test if it really increases the breaking
strength. For this we take n steel bars and cut each into two halves. Then
we measure the breaking strength of the first halves. Call these as
X1,...,Xn.
Next we apply the treatment on the other halves, and measure their
breaking strengths:
Y1,...,Yn.
It is reasonable to assume that
(X1,Y1),...,
(Xn,Yn)
are indep but not necessarily identically distributed. This is called a
paired sample set up. Define
Zi = Yi-Xi.
If the treatment is really
effective we would expect mmost of the Z's to be positive. If the treatment
has no effect then Z's are as likely to be positive as
negative. In nonparametric statistical inference it is
customary to assume that Z's all have continuous distributions with a
common median, θ which is unknown.
Then the problem is to test
H0: θ = 0
Vs H1: θ > 0
We can of course similarly be interested in the other onesided test or a
two sided test.
|
Wilcoxon's signed rank test
We have already seen one test for this, viz, the sign test. However, it
uses uses nothing but but the sign sign and hence loses much of the
information present in the data. This is partly remedied in Wilcoxon's
Signed Rank test. Here we assume that the Zi's are iid and
symmetric around θ. Define
Let Ri be the ranks of |Zi|'s,
eg, if |Z7| is the largest |Zi|
then R7=n,
and so on. Then our test statistic is
T =
∑
si Ri
It is called Wilcoxon's signed rank statistic.
| Example:
Suppose that the Zi's are
1.4 -2.9 2.4 -3.4 5.9 2.5
The we compute the following
|Zi| 1.4 2.9 2.4 3.4 5.9 2.5
si 1 0 1 0 1 1
Ri 1 4 2 5 6 3
This produces T = 1+2+6+3 = 9.
|
We should reject H0 if T is large.
Claim:
T is distribution-free under H0.
Undr H0, Z's are symmetric around 0. So |Z| and s are indep.
[See the exercise below.] Since
R's are functions of |Z|'s, hence the s's are indep of the R's. Also s's are
iid +1 or -1 with probability 1/2 under H0.
Let's compute the conditional null distribution of
T | Ri's =
∑
siRi |
Ri.
In this conditional distribution, Ri's are just some fixed
permutation of {1,...,n}. Multiplying them with si's and adding, amounts
to performing iid unbaised coin tosses for each i, and then adding
the Ri's corresponding to the heads. The distribution of the sum
obviously does not depend
upon the permutation. So T is indep of the Ri's and T is
distribution-free under H0.
| Exercise 1.1:
Show that |Z| is indep of s if Z has distribution symmetric around 0.
|
| Exercise 1.2:
Compute the null distribution of T explicitly for n = 4.
|
| Exercise 1.3:
Show that, under H0, E(T) = n(n+1)/4 and that T has symmetric
distribution around E(T).
Interchange 0's and 1's.
|
Please note that the definition of T here is slightly different from
that done in class. There the signs were -1 and +1. Here they are 0 and
1. However, both the definitions produce equivalent tests in the sense that T
defined in class is large iff T defined here is large. Both the
definitions are used for Wilcoxon signed rank test.
Next we shall compute Var(T) under H0. In order to do this it
will be helpful to rewrite T as follows.
Define Di's as "inverse" of Ri's in the following
sense:
Di = j iff Rj = i.
Sometimes Di's are called antiranks. Define
| Exercise 1.4:
Show that
T =
∑
i Ui.
|
| Exercise 1.5:
Show that, under H0, E(Ui) = 1/2, and
Var(Ui) = 1/4.
|
| Exercise 1.6:
Show that, under H0,
E(UiUj) = 1/4 if i≠j.
Hence conclude that if i ≠ j, then
cov(Ui,Uj)=0.
|
| Exercise 1.7:
Use the above exercises to show that
V(T) = n(n+1)(2n+1)/24
under H0.
|
| Exercise 1.8:
Write a computer program to verify the upper-tail probabilities
mentioned in table a.4 of the book by Hollander and Wolfe for n=10. If you
cannot find the book, then just write a program to find the P(T > a)
for any given a.
Use binary addition to step through all the 2n patterns of 0's
and 1's.
|
Why is the symmetry assumption reasonable?
Note that Wilcoxon's signed rank test uses more information than the sign
test. But it also requires an extra symmetry assumption. This assumption
is often a reasonable one as we see below.
| Example:
Let X be the breaking strength before the treatment, and let
Y be that after the treatment. Assume that X,Y are independent and that
their distribution differs only in location.
In other word, if X ~ F and Y ~ G, then
F(x) = G(x-θ)
for some θ
|
| Exercise 1.9:
Show that in this case X-Y has distribution symmetric around
θ.
[Hint: First consider the case θ=0.]
|
|