cwave.eu5.org
Also see: http://www.angelfire.com/dragon/letstry
cwave04 at yahoo dot com
Free Guestbook
My Guestbook

Last updated on Fri May 21 11:52:13 IST 2010.

Classical nonparametric statistics

Introduction

Any statistical inference problem has the following basic structure.
We have some random data having joint distribution F which is not entirely known. We want to make inference about the unknown aspects of F based on observed data. The inference is typically either an estimation problem or a testing problem.
The difference between nonparametric and parametric problems has to do with how much we already assume known about F.

Example: (Parametric problem) Suppose X1,...,Xn are iid N(μ,σ2). We want to estimate μ and σ2. This is a typical problem from parametric statistics. Here we assume that distribution of the X's is completely known except for only two unknown numbers μ and σ2.

Definition If the distribution of the data is completely known except for finitely many unknown numbers, then the problem is called a parametric problem. Otherwise, we have a nonparametric problem. In a parametric situation each of the finitely many unknown numbers is called a parameter.

Example: (Nonparametric problem) Supose that we are testing the efficacy of a sleeping pill. Let X1,...,Xn be the amount of sleep of n patients before taking the pill, and let Y1,...,Yn be the corresponding amounts after taking it. We want to test if the pill really increases one's amount of sleep. Assuming that the patients behave indepependently we may reasonably assume that
(X1,Y1),...,(Xn,Yn)
are independent, but not necessarily identically distributed. We model the effect of the drug as follows. There is an unknown number theta denoting the median increase of sleep, ie
Zi = Yi-Xi
have theta as its median. Note that we are not assuming that Z's all have the same distribution. We are merely assuming that they have a common median. We want to test
H0: θ = 0 Vs H1: θ > 0

In this example we have not assumed any knowledge about the underlying distribution except for the exisitence of a common median θ for the Z's. Thus our ignorance cannot be summed up as finitely many unknown numbers. Hence this is a nonparametric statistical inference problem.

Example: Suppose we X1,...,Xn iid with continuous density f, which is unknown. We want to estimate f.

Exercise 0.1: Why did we need the continuity assumption on f?

Exercise 0.2: Think of a nonparametric inference situation in regression.
In the model "Y=&alpha+βX+ε, we may have ε's iid with some unknown distribution F. Or, we may have the model Y=f(X)+ε, where f itself is some unknown continuous function.

Semiparametric problems

Some people like to call the sleeping pill example as a semiparametric problem, because here we are interested in only one unknown number, θ, though θ is not the only unknown quantity.

Distribution-free techniques

At the heart of any nonparametric statistical inference problem sits a distribution-free technique.

Example: (Sign test) We are continuing with the sleeping pill example. Assume that the Zi's are continuous random variables. Let the observed Zi's be
-2.3, 3.9, 2.5, -2.1, -3.4, 1.4, 2.4, 1.9
Consider the signs
-1,+1,+1,-1,-1,+1,+1,+1
Count the +1's: T = 5. Reject H0 for "large" T. How large is "large"? To answer this we need to know the distribution of the test statistic T under H0. It is Bin(8,0.5). If n is large, T is approxly N(n/2, n/4)) under H0.

In this example T is called a distribution-free test statistic under H0. Here H0 is a composite hypothesis. In fact it is infinite dimensional in the sense that a null distribution cannot be specified completely by specifying just a finite collection of numbers. But still T has a distribution that is free of F. Classical nonparametric inference proceeds by cleverly constructing such distribution-free statistics. Thus classical nonparametric statistical inference is more or lesss a list of such statistics. However, not all problems have such a handy distribution-free statistic. For these problems one uses computation-intensive modern nonparametric inference, that we shall learn about.

Exercise 0.3: Compute the power of the sign test for the alternative θ = 5, assuming that the Zi's are iid with some unknown, common, continuous distribution. Sample size, n=1000.
[Hint: Can you do it if the Zi's are iid N(5,1)?]


Next
© Arnab Chakraborty (2010)