So far we have seen the paired sample problem. However, it was effectively
a one sample problem , since we always worked with the differences
Z1,...,Zn. We now restate our findings in the one
sample problem set up.
In the one problem set up we have a single sample
Z1,...,Zn
where Zi's are
independent, but not necessarily identically distributed. Each
Zi is a continuous variable. They have
a common median θ , which is unknown. We want to test
H0: θ = 0
Vs
H1: θ > 0
We have already discussed two test procedures for this.
Sign test: uses only the signs of Z's.
Wilcoxon's signed rank test:
uses both the signs of Z's as well as the ranks of the |Z|'s.
(We need an extra symmetry assumption for this.)
Estimation
Suppose we are interested in estimating θ. The first estimator
that comes
to mind is the sample median. It is defined as follows. Order the
Z's as
Z(1) ≤ ... ≤ Z(n).
Then define
the sample median as
=
Z(n+1/2)
if n is odd
(Z(n/2)+Z(n/2+1))/2
if n is even.
Proof:
We shall only do the proof when n=odd. The n=even case is similar
but notationally more involved. Define
Ui=
1
if Zi > θ
0
else
Then Ui's are iid Bern(1/2). The U-vector is distributed
uniformly over the set A, which consists of all possible 2n
vectors of 0's and 1's.
Exercise 4.1:
Suppose that n=2k+1. Let B A consist of all the vectors with at
least k+1 0's. Similarly define C A consisting of all vectors
with at least k+1 1's. Show that
A = B ∪ C and B ∩ C =
φ.
Next, show that
P(U-vector in B) = P(U-vector in C)
Hence conclude the theorem for n=odd.
For even n, you have first work with n=2. Then for n=2k, split A into 3
parts, those with at least k+1 1's, those with at least k+1 0's and those
with exactly k 0's and 1's. The first two parts may be dealt with as in
the odd case. The last part needs to split further into two parts using
the n=2 argument. We shall not go into the details in this course.
Proof:
Not to be done in this course.
Bahadur's representation has more than one form depending on the
assumptions made on the distribution of Z's and the order of
Rn.
Exercise 4.2:
Use the above theorem to show that sample median has an asymptotic normal
distribution. Find out the mean and variance of this normal distribution.
Hodges-Lehmann approach
Here is another method of estimation. This is more general in the sense we
do not require the Zi's to be identically distributed. This
approach, called the Hodges Lehmann (HL) approach, is as follows.
Once again consider the Wilcoxon's signed rank test. Let us call its test
statistic as
T(Z1,...,Zn)
Define the function
f(x) = T(Z1-x,...,Zn-x)
Note that f(θ) has mean n(n+1)/4. We can interpret this n(n+1)/4
as the ideal value for f(θ). The HL approach suggests
estimating θ using HL such that
f(HL) is close to n(n+1)/4 as possible.
Exercise 4.3:
Show that f(x) =
#{(Zi+Zj)/2
> x : i ≤ j }.
Let hi=|Zi-x|.
Define
aik = I{|hi| ≥ |hk|}
I{hi > 0}
Show that f(x) =
∑
i,k aik. Check that
{aik+aki = 1} iff
{(hi+hk) > 0}
Exercise 4.4:
Show that HL is the median of the above set.
Exercise 4.5:
For symmetric distributions HL
is claimed to outperform the sample median. Do a simple
simulation to check this as follows. Generate 1000 samples each of size
100 from N(0,1). Compute sample median as well as
HL for each of the 1000 samples. Estimate bias,
variance and MSE, and compare. Also, plot the two histograms.
The same approach may also be used to obtain confidence intervals. For
this order the m = n(n+1)/2 numbers
(Zi+Zj)/2 for i ≤ j,
as
A1 ≤ ... ≤ Am.
Then
f(x) = #{j : Aj > x}
Exercise 4.6:
Show that f(θ) is distribution-free, i.e., the distribution of
f(θ) does not depend on the distribution of the Z's (not even on
the value of θ).
So we can find an integer k such that
P(k ≤ f(θ) ≤ m-k) = 0.95.
[Since, f(θ) is a discrete random variable the equality may not be
exactly achievable.]
Exercise 4.7:
Show that for this k we have
P(Ak
≤
θ ≤
Am-k+1) =
0.95.
{f(θ) ≥ k} iff #{Ai > θ} ≥ k.
In this case, Am-k+1 is guaranteed to be above θ.
Exercise 4.8:
Apply the HL approach to the sign test to get another estimator of
θ. [Hint: Define f(x) as the sign test statistic computed based on data
shifted by x. Then choose so that f() is as close
to Ef(θ) as possible.]