Arnab Chakraborty's classical nonparametric statistics notes

cwave.eu5.org
Also see: http://www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com

My Guestbook

Last updated on Fri May 21 11:52:17 IST 2010.

Home >

... Statistics >

... ... Classical nonparametrics

Two sample inference

Introduction

Example: Suppose that we want to test the efficacy of a sleeping pill. A simple way to do this would be to administer the drug to a sample of insomnia patients and then to measure the increase in the amounts of their sleep. However, it is usually not enough to measure just this increase. Some patients are so fond of medicines, that they "feel better" whenever they think they have taken a medicine. Even giving them a lump of sugar in the form of a pill would make them "feel better". This is called the placebo effect, the sugar pill being called a placebo.
So in order to test the efficacy of a sleeping pill, it is not enough to check if it increases sleep or not, but also to check if it outperforms the placebo. So we perform the following blind experiment.
Take m+n insomnia patients. They should represent the population as much as possible wrt characteristics that might affect the performance of the drug (e.g., gender, age etc). Randomly split this sample into a control and a treatment group of sizes m and n, respectively. Administer placebo to the control group, and the pill to the treatment group. Then the mesuare the increase in amount of sleep of each m+n patients. Let
X₁,...,X_m
be the measurements for the control group, and let
Y₁,...,Y_n
be the corresponding measurements for the treatment group. We assume that the X's are iid with some unknown continuous distribution F, and the Y's are iid with some unknown continuous distribution G. We want to check if F and G are different.

Digression:

A couple of points

Why not take m=n ? Usually such an expeirment involves a follow up study, e.g. a request like
"Please come back after a month of medication and report your amount of sleep."
is made to each patient. Some patients may not honour this request, and just drop out of the experiment. So the final sample sizes may differ for the two groups.
Why is the experiment called a "blind" experiment ? Because, we do not want the patients to know to which group they have been put (else, the control patients will not expeirnce the placebo effect.) Sometimes even the investigators are not told which patients belong to which group. This is called a "double blind" experiment.

Wald-Wolfowitz Run Test

We have the set up as described above. We want to test

H₀: F = G Vs. H₁: F ≠ G

This test is only for two-sided alternatives. Here is the test procedure:

First combine the two samples and order the (m+n) numbers. Then write the combined, ordered sample as a string of X's and Y's, e.g.,

X Y Y X X X Y Y Y X X X Y X X

We expect that under H₀ the X's and the Y's would be "well mixed" in this string. In order to check this we count the number, R, of runs (i.e., a stretch of the same letter). In this example R= 7, the runs being

X, YY, XXX, YYY, XXX, Y and XX

This R is our test statistic. We shall reject H₀ if R is "small" To quantify how small is "small" we need to find the null distribution of R.

Exercise 7.1: What are the possible values that R can take?

Exercise 7.2: Show the number of ways in which a positive integer N can be split into K strictly positive parts is
^N-1C_K-1.

Exercise 7.3: Let 2k+1 be a possible value for R. Show that under H₀
P(R=2k+1) = ( ^m-1C_k-1 ^n-1C_k+ ^m-1C_k ^n-1C_k-1)/ ^m+nC_m

Exercise 7.4: If 2k is a possible value for R then show that under H₀
P(R=2k) = 2 ^m-1C_k-1 ^n-1C_k-1/ ^m+nC_m

Thus, observe that R is distribution-free under H₀, i.e., its distribution does not involve F.

Next we want to find the moments of R. For this mark each of the X or Y with a 1 or 0 according as it is the start of a new run. Call the i-th mark as U_i. Thus, U₁ is always 1. The sequence

XYYXXXYYYXXXYXX

is marked as

110100100100110

Clearly, R = ∑ U_i.

Exercise 7.5: Show that under H₀
E(R) = (2mn/(m+n)) + 1
[Hint: Compute E(U_i) and add.]

Exercise 7.6: Take any 1 < i < j-1 in {1,...,m+n}. Show that
E(U_iU_j) = 4* ^m+n-4C_m-2/ ^m+nC_n
[Hint: Look at i-1, i j-1 and j-th locations. There are four cases that will make U_i and U_j both 1:
XYXY,XYYX,YXYX,YXXY.
Compute proababilities of these and add.]

Exercise 7.7: Show that under H₀
Var(R) = 2mn(2mn-m-n)/(m+n)²(m+n-1))
[Hint: You need to find E(U_iU_j) for all i < j. One case has already been done in the last exercise.]

Theorem As min(m,n) → ∞, we have under H₀

(R-E(R))/sqrt(Var(R)) ~ AN(0,1).

Proof: Not to be done in this course.