Date: Mar 28, 2014

Vines

Statistics heavily relies on the availability of probability distributions to model various observed features of a data set. Many such distributions are available for univariate data. But for multivariate data multinormal distribution used to be the only useful distribution for a long time.

Copulas provide a rich family of multivariate distributions. Indeed, it allows to combine univariate marginals in arbitrary way. Sklar's theorem guarantees that any multivariate distribution may be obtained like this. But still it has one great problem: given a multivariate data set with dimension >2 how to choose a copula that captures the salient features of the data set? The main problem is that we human beings cannot see any feature in a data set of dimension >2. So it would help to have a technique to design a copula by considering two variables at a time. And vine is just a way to do that.

We shall start explaining the concept of vine starting with a mathematical structure that apparently has no relation with statistics. We shall need a definition from graph theory:

Definition By a tree with n vertices we mean a diahgram with n dots (called vertices) joined by lines (called edges) such that all the vertices are connected and there is no cycle.

Typical examples are

These are trees on 4 vertices But the following two diagrams are not trees:

These are not trees It is easy to see (and not difficult to prove) that any tree on n vertices has exactly n-1 edges.

We shall now define a vine as a set of trees built in a hierarchical manner. We shall start with a set of n variables:

X₁ , ... , X_n .

Consider them as n vertices and create a tree by joining them with n-1 edges. It does not matter how you do this, as long as you satisfy the conditions of a tree. Here is one example:

Slideshow

You could of course have proceeded in a different way, like the following, for example:

Another vine on 4 variables We have used different colours for the edeges of the different trees. Mathematically the role of colour is played by the order. The red edges (the oes that are drawn first) are called edges of order 0. The blue edges have order 1, the green edge has order 2.

A vine on n variables is sometimes called an n-dimensional vine. It will have n-i-1 edges of order i for i=0,...,n-1.

As you can guess, there are many vines possible for a given dimension. It turns out that a subset of "nice" vines is enough for statistical purposes. These are called regular vines. To understand the concept we need a new definition:

Definition Two edges of the same order are called neighbours if they share a common end point. Alternatively, since an edge may be considered as a doubleton set, two edges are called neighbours if the two doubleton sets have an intersection of size 1.

Now we can define a regular vine:

Definition A vine is called regular if the edges of each order ≥ 1 join only neighbouring edges of the previous order.

The literature deals almost exclusively with regular vines. Here is an irregular vine, just as an object of curiosity:

An irregular vine: {{1,2},{3,4}} connects {1,2} and {3,4} which are not neighbours as {1,2}∩{3,4}=φ

Naming the edges of a regular vine

Expressing an edge as a doubleton set is cumbersome. Fortunately, there is an easier naming system available for regular vines. For this we define two sets for each edge: the conditioned set and the conditioning set. Let's start with an example: Consider the edge {{1,2},{2,3}}, which is an edges of order 1. Ignore all the braces to get a list:

1, 2, 2, 3.

The set of all elements that occur more than once in this list is called the conditioning set. Here it is {2}. The remaining elements (which occur exactly once each) consitute the conditioned set: {1,3}.

As another example consider the 0-th order edge {1,2}. Its conditioned set is {1,2} and conditioning set is φ.

This intuitive description may be difficult to apply for larger vines. So it's good to learn the formal definition, which starts by defining a constraint set for an edge. Intuitively, it is the union of the conditioning set and the conditioned set. The formal definition is recursive:

Definition Let e={a,b} be an edge in a vine. Its constraint set (written CONSTRAINT(e)) is defined as

If order(e)=0, then CONSTRAINT(e) = {a,b},
If order(e)>0, then CONSTRAINT(e) = CONSTRAINT(a) ∪ CONSTRAINT(b).

Now we can define conditioned and conditioning sets as follows.

Definition Let e={a,b} be an edge of in a vine. Its conditioning set is

If order(e)=0, then conditioning set is φ,
If order(e)>0, then conditioning set is CONSTRAINT(a)∩ CONSTRAINT(b).

The conditioned set is defined as CONSTRAINT(e) minus the conditioning set.

It is easy to see that the conditioning and conditioned sets are disjoint. A key observation leading to their usefulness for regular vines is that each edge is characterised by the (conditioned, conditioning) pair. Also the conditioned set is always a doubleton set. The size of the conditioning set equals the order of the edge.

Here is the standard naming comvention for edges in a vine: If an edge has conditioning set {4,5,6} and conditioned set {1,3} then the edge is named

1,3 | 4,5,6 .

Edges of order 0 have empty conditioning sets, so they are written without any vertical bar, like

1,3 .

Here is a complete example:

The names of edges

Comment Box is loading comments...



Click on next.