Example:
Suppose that we want to test the efficacy of a sleeping pill. A simple way
to do this would be to administer the drug to a sample of insomnia
patients and then to measure the increase in the amounts of their
sleep. However, it is usually not enough to measure just this
increase. Some patients are so fond of medicines, that they "feel better"
whenever they think they have taken a medicine. Even giving them a lump of
sugar in the form of a pill would make them "feel better". This is called
the placebo effect, the sugar pill being called a placebo.
So in order to test the efficacy of a sleeping pill, it is not enough to
check if it increases sleep or not, but also to check if it outperforms
the placebo. So we perform the following blind experiment.
Take m+n insomnia patients. They should represent the population as much
as possible wrt characteristics that might affect the performance of the
drug (e.g., gender, age etc). Randomly split this sample into a
control and a treatment group of sizes m and n,
respectively. Administer placebo to the control group, and the pill to the
treatment group. Then the mesuare the increase in amount of sleep of each
m+n patients. Let
X1,...,Xm
be the measurements for the control group, and let
Y1,...,Yn
be the corresponding measurements for the treatment group. We assume that
the X's are iid with some unknown continuous distribution F, and the Y's
are iid with some unknown continuous distribution G. We want to check if F
and G are different.
Digression:
A couple of points
Why not take m=n ? Usually such an expeirment involves a
follow up study, e.g. a request like
"Please come back after a month of medication and report your
amount of sleep."
is made to each patient.
Some patients may not honour this request, and just drop out of
the experiment. So the final sample sizes may differ for the two groups.
Why is the experiment called a "blind" experiment ?
Because, we do not want the patients to know to which group they have been
put (else, the control patients will not expeirnce the placebo effect.)
Sometimes even the investigators are not told which patients belong to
which group. This is called a "double blind" experiment.
Wald-Wolfowitz Run Test
We have the set up as described above. We want to test
H0: F = G Vs. H1: F ≠ G
This test is only for two-sided alternatives. Here is the test procedure:
First combine the two samples and order the (m+n) numbers. Then write the
combined, ordered sample as a string of X's and Y's, e.g.,
X Y Y X X X Y Y Y X X X Y X X
We expect that under H0 the X's and the Y's would be "well
mixed" in this string. In order to check this we count the number, R, of
runs
(i.e., a stretch of the same letter). In this example R= 7, the runs being
X, YY, XXX, YYY, XXX, Y and XX
This R is our test statistic. We shall reject H0 if R is
"small" To quantify how small is "small" we need to find the null
distribution of R.
Exercise 7.1:
What are the possible values that R can take?
Exercise 7.2:
Show the number of ways in which a positive integer N can be split into K
strictly positive parts is
N-1CK-1.
Exercise 7.3:
Let 2k+1 be a possible value for R. Show that under H0
P(R=2k+1) = (
m-1Ck-1n-1Ck+
m-1Ckn-1Ck-1)/
m+nCm
Exercise 7.4:
If 2k is a possible value for R then show that under H0
P(R=2k) =
2
m-1Ck-1n-1Ck-1/
m+nCm
Thus, observe that R is distribution-free under H0, i.e.,
its distribution does not involve F.
Next we want to find the moments of R. For this mark each of the X or Y
with a 1 or 0 according as it is the
start of a new run. Call the i-th mark as Ui.
Thus, U1 is always 1. The
sequence
XYYXXXYYYXXXYXX
is marked as
110100100100110
Clearly, R =
∑
Ui.
Exercise 7.5:
Show that under H0
E(R) = (2mn/(m+n)) + 1
[Hint: Compute E(Ui) and add.]
Exercise 7.6:
Take any 1 < i < j-1 in {1,...,m+n}. Show that
E(UiUj) =
4*
m+n-4Cm-2/
m+nCn
[Hint: Look at i-1, i j-1 and j-th locations. There are four cases
that will make Ui and Uj both 1:
XYXY,XYYX,YXYX,YXXY.
Compute proababilities of these and add.]
Exercise 7.7:
Show that under H0
Var(R) = 2mn(2mn-m-n)/(m+n)2(m+n-1))
[Hint: You need to find E(UiUj) for all
i < j. One case has already been done in the last exercise.]