Multivariate Statistical Analysis

(A research level course)

Class presentations

Please see this link for the class presentation allocation.

Schedule

We shall meet from 11:10am to 12:55pm on Wednesdays and Fridays.

Reference

See here for an annotated list of texts. Links to supplementary materials will be given in the timeline page, which will be updated frequently.

Syllabus

Here is a brief outline of the topics I intend to choose from.

Review

We shall start by tidying up various loose ends left from earlier encounters with multivariate statistics. Most of you have (or at least should have, but don't panic if you don't have) some familiarity with the following topics:

Principal Component Analysis: An oft-quoted, over-rated method from classical statistics with an elegant mathematical theory to support it. Good practical applications are not that easily found, though! We shall explore this via a real life study of face recognition. We shall also attack a digit recognition problem.
Various classification techniques: For most users of statistics, classification seems to be the holy-grail of multivariate statistical analysis. We shall review discriminant analysis as well as hierarchical clustering and model-based clustering.
Classification & Regression Tree (C&RT): One of the early modern methods to deal with multivariate classification and regression.
Different multivariate medians: A not too terribly important topic, that nevertheless deserves a mention.
Singular Value Decomposition: A very important tool from linear algebra.

We shall review each of these topics by first giving a cut-n-dried mathematical description, and then applying them on real data using R.

Fresh topics

Next we shall go into some topics which you may not have encountered before.

CHAID: A rather long acronym for CHi-squared Automated Interaction Detection. It is a generalisation for C&RT.
Copula and Vine: Statistics draws its mathematical strength from probability. The main connection between the them is via modelling of data using probability distributions. So we need a rich supply of multivariate distrbutions to suit different scenarios. Copula and vine are two modern methods for this.
Structural Equation Modelling (SEM): An emerging field that seeks to generalise linear models and factor analysis.
Discrete Multivariate Data Analysis: The bulk of multivariate statistics deals with continuous data. But as genetics is churning out huge amount of data, we cannot afford to ignore discrete multivariate data analysis.

Presentations

Exams are boring, and not really a good way to test knowledge at reasearch level. So we shall do away with the midsem exam, and replace it with a set of presentations. Here are some topics to choose from. You may suggest your own topic also.

Conjoint Analysis
Correspondence Analysis
Multidimensional scaling
Functional data analysis
Projection pursuit

Data mining

Data mining is a buzz word we hear a lot nowadays. We shall learn a bit about it down the way.