cwave.eu5.org
Also see: www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com
Free Guestbook
My Guestbook

Last updated on Wed May 19 16:12:02 IST 2010.

U-statistics

Introduction

Many of the statistics that we have dealt with so far in this course are related to a special family called U-statistics. We discuss general properties of this family now.

Definition Let h: RmR be some function. Then U-statitistic with kernel h, based on a sample
X1,...,Xn (where n ≥ m)
is defined as
U = ∑ ' h(xi1,..., xim)/ nPm,
where ∑ ' is over all i1,...,im such that
1 ≤ i1 ≠ ... ≠ im ≤ n

Example: If we take h(x) = x (for m=1) then the corresponding U statistics is the sample mean.

Exercise .1: Suppose that we take m=2, and the kernel
h(x,y) = x2 - xy
What is the U statistic? It should be something familiar.

Suppose that X1,...,Xn are iid F, and θ is some real-valued parameter of interest..

Definition We call θ (unbiasedly) estimable if there is m ≥ 1 and a function h(X1,...,Xm) such that
E(h(X1,...,Xm) = θ for all θ.
The smallest value of m for which there exists such an h is called the degree of θ.

Exercise .2: If E(h(X1,...,Xm) = θ, then show that E(U) = θ, as well. Hence conclude that every estimable parameter θ has an unbiased estimator which is a U-statistic.

Definition A function h:RmR is called symmetric if for all permutations p of {1,...,m} we have
h(X1,...,Xm) = h(Xp(1),...,Xp(m))

Example: h(x,y,z) = xy+xz+yz is a symmetric function. But h(x,y) = x2y is not. h(x,y)=xy is symmetric, but h(x,y,z)=xy is not!

Exercise .3: If E(h(X1,...,Xm) = θ, then show that there is a symmetric function
g:RmR
such that E(g(X1,...,Xm) = θ.

Exercise .4: Show that for any estimable θ there is a U statistic with symmetric kernel that is an unbiased estimator for θ.

Note that if h is symmetric then the U-statistics based on h is same as
U = ∑ ' h(xi1,..., xim)/ nCm,
where ∑ ' is over all i1,...,im such that
1 ≤ i1 < ... < im ≤ n

Exercise .5: Obtain unbiased symmetric kernel U-statistic estimators for the 3rd central moment.
First try to do it for product of raw moments.

Exercise .6: Show that the sign test statistic is a constant multiple of a U-statistic.

Exercise .7: Show that the Wilcoxon signed rank statistic is a linear combination of two U-statistics. In particular, it has the form
n U1 + nC2 U2.

Two sample U-statistics

This is defined in a way similar to the above one sample case. Here we have a kernel
h:Rr x RsR
Based on a two sample dataset
X1,...,Xm, Y1,...,Yn,
(where m ≥ r and n ≥ s) we define the two-sample U-statistic with kernel h as
U = ∑ ' h(Xi1,..., Xim, Yj1,...,Yjn)/ ( mCr nCs)

Exercise .8: Consider the parameter
θ = P(X < Y).
Find a kernel to estimate it unbiasedly. Find out the corresponding U-statistic. Show that it is a multiple of the Mann-Whitney U-test statistic.


PrevNext
© Arnab Chakraborty (2010)