cwave.eu5.org
Also see: http://www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com
Free Guestbook
My Guestbook

Last updated on Fri May 21 11:52:17 IST 2010.

Two sample inference

Introduction

Example: Suppose that we want to test the efficacy of a sleeping pill. A simple way to do this would be to administer the drug to a sample of insomnia patients and then to measure the increase in the amounts of their sleep. However, it is usually not enough to measure just this increase. Some patients are so fond of medicines, that they "feel better" whenever they think they have taken a medicine. Even giving them a lump of sugar in the form of a pill would make them "feel better". This is called the placebo effect, the sugar pill being called a placebo.

So in order to test the efficacy of a sleeping pill, it is not enough to check if it increases sleep or not, but also to check if it outperforms the placebo. So we perform the following blind experiment.

Take m+n insomnia patients. They should represent the population as much as possible wrt characteristics that might affect the performance of the drug (e.g., gender, age etc). Randomly split this sample into a control and a treatment group of sizes m and n, respectively. Administer placebo to the control group, and the pill to the treatment group. Then the mesuare the increase in amount of sleep of each m+n patients. Let
X1,...,Xm
be the measurements for the control group, and let
Y1,...,Yn
be the corresponding measurements for the treatment group. We assume that the X's are iid with some unknown continuous distribution F, and the Y's are iid with some unknown continuous distribution G. We want to check if F and G are different.

Digression:

A couple of points
  1. Why not take m=n ? Usually such an expeirment involves a follow up study, e.g. a request like
    "Please come back after a month of medication and report your amount of sleep."
    is made to each patient. Some patients may not honour this request, and just drop out of the experiment. So the final sample sizes may differ for the two groups.
  2. Why is the experiment called a "blind" experiment ? Because, we do not want the patients to know to which group they have been put (else, the control patients will not expeirnce the placebo effect.) Sometimes even the investigators are not told which patients belong to which group. This is called a "double blind" experiment.

Wald-Wolfowitz Run Test

We have the set up as described above. We want to test
H0: F = G Vs. H1: F ≠ G
This test is only for two-sided alternatives. Here is the test procedure:

First combine the two samples and order the (m+n) numbers. Then write the combined, ordered sample as a string of X's and Y's, e.g.,
X Y Y X X X Y Y Y X X X Y X X
We expect that under H0 the X's and the Y's would be "well mixed" in this string. In order to check this we count the number, R, of runs (i.e., a stretch of the same letter). In this example R= 7, the runs being
X, YY, XXX, YYY, XXX, Y and XX
This R is our test statistic. We shall reject H0 if R is "small" To quantify how small is "small" we need to find the null distribution of R.

Exercise 7.1: What are the possible values that R can take?

Exercise 7.2: Show the number of ways in which a positive integer N can be split into K strictly positive parts is
N-1CK-1.

Exercise 7.3: Let 2k+1 be a possible value for R. Show that under H0
P(R=2k+1) = ( m-1Ck-1 n-1Ck+ m-1Ck n-1Ck-1)/ m+nCm

Exercise 7.4: If 2k is a possible value for R then show that under H0
P(R=2k) = 2 m-1Ck-1 n-1Ck-1/ m+nCm

Thus, observe that R is distribution-free under H0, i.e., its distribution does not involve F.

Next we want to find the moments of R. For this mark each of the X or Y with a 1 or 0 according as it is the start of a new run. Call the i-th mark as Ui. Thus, U1 is always 1. The sequence
XYYXXXYYYXXXYXX
is marked as
110100100100110
Clearly, R = ∑ Ui.

Exercise 7.5: Show that under H0
E(R) = (2mn/(m+n)) + 1
[Hint: Compute E(Ui) and add.]

Exercise 7.6: Take any 1 < i < j-1 in {1,...,m+n}. Show that
E(UiUj) = 4* m+n-4Cm-2/ m+nCn
[Hint: Look at i-1, i j-1 and j-th locations. There are four cases that will make Ui and Uj both 1:
XYXY,XYYX,YXYX,YXXY.
Compute proababilities of these and add.]

Exercise 7.7: Show that under H0
Var(R) = 2mn(2mn-m-n)/(m+n)2(m+n-1))
[Hint: You need to find E(UiUj) for all i < j. One case has already been done in the last exercise.]

Theorem As min(m,n) → ∞, we have under H0
(R-E(R))/sqrt(Var(R)) ~ AN(0,1).

Proof: Not to be done in this course.


PrevNext
© Arnab Chakraborty (2010)