Computing the exact distribution of the KS statistic is prohibitively
difficult for large n. In this case we use the asymptotic distributions of
D, D+ and D-.
Proof:
Not to be done in this course.
H(x) is indeed a valid continuous distribution function. It appears quite
difficult to prove that it is indeed so. The simplest proof that I know of
is by Feller. It is a few page long (however, it actually proves something
stronger.) H(x) is an example of what mathematicians call a theta function.
Proof:
Not to be done in this course.
Exercise 6.4:
Show that nD+2 is asymptotically distributed as
Expo(2).
Exercise 6.5:
The file data.txt contains iid data from some
unknown continuous distribution, F. We want to test
H0: F = N(0,1) Vs. H1: F N(0,1).
Perform KS test using the asymptotic distribution. Report the P-value.
Confidence intervals using KS procedure
Suppose that X1,...,Xn are iid with unknown
continuous distribution G. We want to get a &alpha-level CI for G. For
this consider the random variable
D = supx |Fn(x)-G(x)|
Exercise 6.6:
Can you compute D if G is not known? Does the distribution of D depend on G?
Find a constant c such that
P(D≤c)
= 1-&alpha
i.e.,
P(|Fn(x)-G(x)| ≤ c for all x)
= 1-
&alpha
i.e.,
P(Fn(x)-c ≤ G(x) ≤
Fn(x)+c for all x)
= 1-
&alpha
This provides a CI for G.
Exercise 6.7:
The file data.txt contains iid data from some
unknown continuous distribution, F. Obtain a 95% CI for F using 2-sided KS
statistic. Plot the CI. [Hint: You need to compute cut-off point c based on the infinite series
given earlier. Use Matlab.]