Many of the statistics that we have dealt with so far in this course are
related to a special family called U-statistics. We discuss general
properties of this family now.
Example:
If we take h(x) = x (for m=1) then the corresponding U statistics is the
sample mean.
Exercise .1:
Suppose that we take
m=2, and the kernel
h(x,y) = x2 - xy
What is the U statistic? It should be something familiar.
Suppose that X1,...,Xn are iid F, and
θ is some real-valued parameter of interest..
Exercise .2:
If E(h(X1,...,Xm) = θ, then show that
E(U) = θ, as well. Hence conclude that every estimable parameter
θ has an unbiased estimator which is a U-statistic.
Example:
h(x,y,z) = xy+xz+yz is a symmetric function. But
h(x,y) = x2y is not. h(x,y)=xy is symmetric, but
h(x,y,z)=xy is not!
Exercise .3:
If E(h(X1,...,Xm) = θ, then show that there
is a symmetric function
g:Rm
→
R
such that
E(g(X1,...,Xm) = θ.
Exercise .4:
Show that for any estimable θ there is a U statistic with
symmetric kernel
that is an unbiased estimator for θ.
Note that if h is symmetric then the U-statistics based on h is same as
U =
∑
' h(xi1,...,
xim)/
nCm,
where
∑
' is over all i1,...,im such that
1 ≤ i1 < ... < im ≤ n
Exercise .5:
Obtain unbiased symmetric kernel
U-statistic estimators for the 3rd central moment.
First try to do it for product of raw moments.
Exercise .6:
Show that the sign test statistic is a constant multiple of a
U-statistic.
Exercise .7:
Show that the Wilcoxon signed rank statistic is a linear combination of
two U-statistics. In particular, it has the form
n U1 +
nC2 U2.
Two sample U-statistics
This is defined in a way similar to the above one sample case. Here we
have a kernel
h:Rr x Rs
→
R
Based on a two sample dataset
X1,...,Xm,
Y1,...,Yn,
(where m ≥ r and n ≥ s) we define the two-sample U-statistic
with kernel h as
U =
∑
'
h(Xi1,...,
Xim,
Yj1,...,Yjn)/
(
mCrnCs)
Exercise .8:
Consider the parameter
θ = P(X < Y).
Find a kernel to estimate it unbiasedly. Find out the corresponding
U-statistic. Show that it is a multiple of the Mann-Whitney U-test
statistic.