Multivariate Statistical Analysis
(A research level course)
Class presentations
Please see this link for the class presentation allocation.
Schedule
We shall meet from 11:10am to 12:55pm on Wednesdays and
Fridays.
Reference
See here for an annotated list of texts. Links to supplementary materials will be given in the timeline page, which will
be updated frequently.
Syllabus
Here is a brief outline of the topics I intend to choose from.
Review
We shall start by tidying up various loose ends left from earlier
encounters with multivariate statistics. Most of you have (or at least
should have, but don't panic if you don't have) some familiarity with the following topics:
- Principal Component Analysis: An oft-quoted,
over-rated method from classical statistics with an elegant
mathematical theory to support it. Good practical applications
are not that easily found, though! We shall explore this via a
real life study of face recognition. We shall also attack a digit
recognition problem.
- Various classification techniques: For most users of
statistics, classification seems to be the holy-grail of multivariate
statistical analysis. We shall review discriminant analysis as well as hierarchical clustering
and model-based clustering.
- Classification & Regression Tree (C&RT): One
of the early modern methods to deal with multivariate
classification and regression.
- Different multivariate medians: A not too terribly
important topic, that nevertheless deserves a mention.
- Singular Value Decomposition: A very important tool
from linear algebra.
We shall review each of these topics by first giving a
cut-n-dried mathematical description, and then applying them on
real data using R.
Fresh topics
Next we shall go into some topics which you may not have
encountered before.
- CHAID: A rather long acronym for CHi-squared Automated
Interaction Detection. It is a generalisation for C&RT.
- Copula and Vine:
Statistics draws its mathematical strength from probability. The
main connection between the them is via modelling of data using
probability distributions. So we need a rich supply of
multivariate distrbutions to suit different scenarios. Copula and
vine are two modern methods for this.
- Structural Equation Modelling (SEM): An emerging
field that seeks to generalise linear models and factor
analysis.
- Discrete Multivariate Data Analysis: The bulk of
multivariate statistics deals with continuous data. But as
genetics is churning out huge amount of data, we cannot afford
to ignore discrete multivariate data analysis.
Presentations
Exams are boring, and not really a good way to test knowledge at
reasearch level. So we shall do away with the midsem exam, and
replace it with a set of presentations. Here are some topics to
choose from. You may suggest your own topic also.
- Conjoint Analysis
- Correspondence Analysis
- Multidimensional scaling
- Functional data analysis
- Projection pursuit
Data mining
Data mining is a buzz word we hear a lot nowadays. We shall learn
a bit about it down the way.