cwave.eu5.org
Also see: http://www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com
Free Guestbook
My Guestbook

Last updated on Fri May 21 11:52:15 IST 2010.

One sample Problem

Introduction

So far we have seen the paired sample problem. However, it was effectively a one sample problem , since we always worked with the differences Z1,...,Zn. We now restate our findings in the one sample problem set up.

In the one problem set up we have a single sample
Z1,...,Zn
where Zi's are independent, but not necessarily identically distributed. Each Zi is a continuous variable. They have a common median θ , which is unknown. We want to test
H0: θ = 0 Vs H1: θ > 0
We have already discussed two test procedures for this.
  • Sign test: uses only the signs of Z's.
  • Wilcoxon's signed rank test: uses both the signs of Z's as well as the ranks of the |Z|'s. (We need an extra symmetry assumption for this.)

Estimation

Suppose we are interested in estimating θ. The first estimator that comes to mind is the sample median. It is defined as follows. Order the Z's as
Z(1) ≤ ... ≤ Z(n).
Then define the sample median as
= Z(n+1/2) if n is odd
(Z(n/2)+Z(n/2+1))/2 if n is even.

Theorem Suppose that Z1,...,Zn are iid with some common continuous distribution. Then show that is median unbiased for θ, i.e.,
P( < θ) = P( > θ).
In other words, θ is the median of .

Proof: We shall only do the proof when n=odd. The n=even case is similar but notationally more involved. Define
Ui= 1 if Zi > θ
0 else
Then Ui's are iid Bern(1/2). The U-vector is distributed uniformly over the set A, which consists of all possible 2n vectors of 0's and 1's.

Exercise 4.1: Suppose that n=2k+1. Let B A consist of all the vectors with at least k+1 0's. Similarly define C A consisting of all vectors with at least k+1 1's. Show that
A = B ∪ C and B ∩ C = φ.
Next, show that
P(U-vector in B) = P(U-vector in C)
Hence conclude the theorem for n=odd.

For even n, you have first work with n=2. Then for n=2k, split A into 3 parts, those with at least k+1 1's, those with at least k+1 0's and those with exactly k 0's and 1's. The first two parts may be dealt with as in the odd case. The last part needs to split further into two parts using the n=2 argument. We shall not go into the details in this course.

Theorem (Bahadur's represenation) Let Z1,...,Zn be iid with some common continuous density, f. Let θ be its median. Assume that f(θ) > 0. Let denote the sample median based on the sample. Then
= θ + (0.5-Fn(θ))/f(θ) + Rn,
for some Rn where
sqrt(n)*Rn goes to zero as n goes to infinity.
Here Fn denotes the empirical distributin function of the Zi's.

Proof: Not to be done in this course.

Bahadur's representation has more than one form depending on the assumptions made on the distribution of Z's and the order of Rn.

Exercise 4.2: Use the above theorem to show that sample median has an asymptotic normal distribution. Find out the mean and variance of this normal distribution.

Hodges-Lehmann approach

Here is another method of estimation. This is more general in the sense we do not require the Zi's to be identically distributed. This approach, called the Hodges Lehmann (HL) approach, is as follows.

Once again consider the Wilcoxon's signed rank test. Let us call its test statistic as
T(Z1,...,Zn)
Define the function
f(x) = T(Z1-x,...,Zn-x)
Note that f(θ) has mean n(n+1)/4. We can interpret this n(n+1)/4 as the ideal value for f(θ). The HL approach suggests estimating θ using HL such that f(HL) is close to n(n+1)/4 as possible.

Exercise 4.3: Show that f(x) = #{(Zi+Zj)/2 > x : i ≤ j }.
Let hi=|Zi-x|. Define
aik = I{|hi| ≥ |hk|} I{hi > 0}
Show that f(x) = ∑ i,k aik. Check that
{aik+aki = 1} iff {(hi+hk) > 0}

Exercise 4.4: Show that HL is the median of the above set.

Exercise 4.5: For symmetric distributions HL is claimed to outperform the sample median. Do a simple simulation to check this as follows. Generate 1000 samples each of size 100 from N(0,1). Compute sample median as well as HL for each of the 1000 samples. Estimate bias, variance and MSE, and compare. Also, plot the two histograms.

The same approach may also be used to obtain confidence intervals. For this order the m = n(n+1)/2 numbers
(Zi+Zj)/2 for i ≤ j,
as
A1 ≤ ... ≤ Am.
Then
f(x) = #{j : Aj > x}

Exercise 4.6: Show that f(θ) is distribution-free, i.e., the distribution of f(θ) does not depend on the distribution of the Z's (not even on the value of θ).

So we can find an integer k such that
P(k ≤ f(θ) ≤ m-k) = 0.95.
[Since, f(θ) is a discrete random variable the equality may not be exactly achievable.]

Exercise 4.7: Show that for this k we have
P(Ak ≤ θ ≤ Am-k+1) = 0.95.

{f(θ) ≥ k} iff #{Ai > θ} ≥ k. In this case, Am-k+1 is guaranteed to be above θ.

Exercise 4.8: Apply the HL approach to the sign test to get another estimator of θ.
[Hint: Define f(x) as the sign test statistic computed based on data shifted by x. Then choose so that f() is as close to Ef(θ) as possible.]


PrevNext
© Arnab Chakraborty (2010)