Date: Feb 14, 2014

SVM in R

Today our aim is to perform SVM using the kernlab package of R. We shall not deal with real life data yet. Working with a toy data set will help to familiarise ourselves with R's way of looking at SVM.

First we create a little bivariate data consisting of two classes.


neg1 = 0; neg2 = 0 # (0,0) is the centre of the neg class
pos1 = 2; pos2 = 2 # (2,2) is the centre of the pos class

#Generate 100 points for the pos class
xpos = cbind(rnorm(100,mean=pos1),rnorm(100,mean=pos2))

#Generate 100 points for the neg class
xneg = cbind(rnorm(100,mean=neg1),rnorm(100,mean=neg2))

#Combine the 100+100 points into a data set of size 200.
x = rbind(xpos,xneg)
colnames(xtrain)=c('X1','X2')

#The true classes: 100 pos (+1) followed by 100 neg (-1)
y = c(rep(1,100),rep(-1,100))

Plot the data to understand what we are doing.


plot(x,col=ifelse(y>0,'red','blue'))

Load the package kernlab (after installing it of course!).


library(kernlab)

The main workhorse is the ksvm function. It has different forms. The line below gives one of the simplest forms. Here x is the measurement matrix, y is the vector of true classes. SVM is a general family of algorithms of which we have discussed only one in the class. This particular case is called C-SVC (SVC stands for support vector classification, while the first C stands for the commonly used name for the cost parameter).


svp = ksvm(x,y,type="C-svc",kernel='vanilladot',C=10)

vanilladot is the Euclidean inner product kernel. Finally we have specified a value for the cost parameter. The ksvm function performs the optimisation we have discussed in class. The resulting classifier is placed in the variable svp.

Try to peep inside svp with

svp

This shows only the tip of the iceberg. We shall take a deeper look later. But for now let's use it to classify some case. Well, we have not created any new case. So we shall try to classify the original data itself. This will give us the so-called "resubstitution error".


ypred = predict(svp, data = x)

The true classes stored in y is just a vector of +1's and -1's. Similarly, ypred is another vector of +1's and -1's. It is difficult to compare them unless we make a 2 × 2 contingency table.


table(y,ypred)

Of course we can also find the misclassifcation proportion:


sum(ypred==y)/length(y)

Since we are in a bivariate set up, a scatterplot is a good way to visualise the classifier. The kernlab package has its own plot function to produce a colourful visualisation.


plot(svp,data=x)

It is not the familiar "strip between classes" plot that we have seen in the theory classes. It shows the two classes with two colours and the "strength" of the class memberships by varying the intensity. This is more appropriate than the "strip between classes" plot as we are doing soft margin classification here.

It is important to understand the plot carefully. The white region is sort of a "no man's zone" separating the two classes. This is where the medial line of the strip lies. There are four types of points: filled/open circles/triangles.

The filled points are the support vectors. These are taken from the training data, and are part of the classifier. Choice of the symbols (circle/triangle) is done by the true classes.
The open points are from the new data set being classified. In our example we are working with resubstitution error, and so the new data set is actually the original data set itself. The choice of the symbols (circle/triangle) is based on the predicted classes.

Now we shall take a deeper look inside the classifier returned by ksvm.


svp = ksvm(x,y,type="C-svc",kernel='vanilladot',C=100,scaled=c(F,F))

This is the same thing as before, except that we have used scaled=c(F,F) to prevent R from studentising the measurements before computation. We are preventing this so that we can easily relate the things we see inside the bowels of ksvm with the x we have passed to it.

As you can guess svp is a bundle of things returned by ksvm. A typical way to "unpack" such a bundle is to use


names(svp)

Unfortunately, it will return NULL! This is because R has three different ways by which it bundles things together. These are called S3, S4 and R5 techniques. Most of the time we work with S3 objects, and then names is the way to unpack. But kernlab uses S4 objects. Here the technique is to use


slotNames(svp)

You can see a slot called alphaindex, for example. You might be tempted to use


svp$alphindex

to see its value. But the $ opertor is for S3 objects only. For S4 objects you need to use the @ opertor:


svp@alphindex

You can also use the following command:


alphaindex(svp)

You might think that it is just a vector of some integers. Look very carefully, and you'll see that it is actually a list of length one, and the sole member of the list is a vector of integers. If you do not get this point then consider the difference between the following two lines:


temp1 = 1:10 #a vector of 10 integers
temp2 = list(1:10) #a list with a single element, which is a vector of integers

temp1 + 1 # OK
temp2 + 1 #error: can't add a list with an integer!
temp2[[1]] + 1 # OK

Let's look at the vector svp@alphaindex[[1]] once again. It gives you the indices for the cases which are the support vectors (the points that land inside the strip). If you want to look at the measurments (the values of X₁ and X₂ in our example) for these cases you may use any of the two commands below:


svp@xmatrix
xmatrix(svp)

Again, it is not just a matrix of X's. It is a one-element list with the matrix as its sole element. Why is ksvm wrapping everything in a one-element list? Because ksvm is actually meant to work with any finite number of classes while SVM in principle deals with only binary classification. So ksvm performs a separate SVM for each pair of classes, and finally produces a list of length ^k C₂ where k is the number of classes. Each element of the list is the complete SVM output for one particular pair of classes. In our case we have just k=2 classes, hence the lists are of length ² C₂ = 1.

Looking at all the slots of the svp object like this is boring. Instead, let's add some meaning to our exploration by setting a goal: we want to produce the familiar "strip between classes" plot using svp. If you recall the theory the equation of the medial line hrough the strip was

∑ α_i y_i < x_i · x > + b = 0 ,

where the sum is effectively only over the support vector cases (as α_i = 0 for the other cases). So we can write it as

< w · x > + b = 0 ,

where

w = ∑ α_i y_i x_i .

So we need to know where R stores the α_i's:


svp@alpha

Only the nonzero α_i's are stored. Note how they are all in (0,C]. We can of course multiply these by y_i's and then add, but R actually makes the product α_i y_i directly available in a slot called coef:


svp@coef

Notice that these are the as α_i's except for sign. That is because y_i's are +1 or -1. So computing w is easy:


w = coef(svp)[[1]]%*%xmatrix(svp)[[1]]

Finally we need b:


b = svp@b[[1]]

Here w = ( w₁ , w₂ ) is 2-dim. So the medial line through the strip has equation

w₁ x₁ + w₂ x₂ = b.

Let's cast it in the slope intercept form so that we may use abline:


slope = -w[1]/w[2]
intercept.mid = b/w[2]

Similarly we need to find the intercepts for the two boundaries:


intercept.neg = (b+1)/w[2]
intercept.pos = (b-1)/w[2]

We do not need to recompute the slope, as the boundaries are parallel to the medial line.

Now at last we are ready to plot. We start by making the scatterplot.


plot(x,col=ifelse(y>0,'red','blue'))

And then we add the three lines:


abline(a=intercept.mid,b=slope)
abline(a=intercept.neg,b=slope,lty=2)
abline(a=intercept.pos,b=slope,lty=2)

Cross validation

We can perform CV explicitly as follows. First we perform stratified sampling to split the data into training and test parts:


posTrn = sample(100,80)
negTrn = 100+sample(100,80)
trnInd = c(posTrn,negTrn)
xtrain = x[trnInd,]
ytrain = y[trnInd]
xtest = x[-trnInd,]
ytest = y[-trnInd]

Then we train the SVM using the training part:


svp = ksvm(xtrain,ytrain,type="C-svc",kernel='vanilladot',C=1)

And assess its performance on the test part:


ypred = predict(svp, data = xtest)
table(ytest,ypred)
sum(ypred==ytest)/length(ytest)

We need to do this over a loop.

But fortunately the ksvm function has CV built into it. To do 10-fold CV, for example, we need to specify cross=10.


svp = ksvm(x,y,cross=10,type="C-svc",kernel='vanilladot',C=1)

The CV error (average proprtion of misclassifcation in the test parts) may be obtained like:


svp@cross

Here the sample size is 200. So you can request k-fold cross validation where 2≤ k ≤ 200. If k=0 (the default), then no CV is performed. If you supply k=1, then an error message is generated. If k=200 then you get Leave-One-Out (LOO) CV. If k > 200, then R complains about some index going out of range (instead of saying something more meaningful like: You cannot perform k-forld CV if k > sample size).

Non-linear

We can replace the linear (vanilladot) kernel by Gaussian Radial Basis Function kernel (rbfdot). A list of other available kernels may be obtained by looking up the help for vanilladot.


svp = ksvm(x,y,type="C-svc",kernel='rbfdot',C=1)
plot(scp,data=x)

Notice how the boundaries are curved now. Also notice that despite the circular contours of the kernel function, the boundaries are not circular.

The Gaussian Radial Basis Function kernel has a parameter Σ. It is automatically estimated by R using a function called sigest. If you want you can specifysome value for the parameter:


svp = ksvm(x,y,cross=10,type="C-svc",kernel=rbfdot(sigma=0.1),C=1)
plot(scp,data=x)

Of course, the main fun lies in using your own kernel. First you have to create a kernel function:


myKernel = function(x,y) {(sum(x*y) +1)*exp(-0.001*sum((x-y)^2))}

And make it an object of class kernel:


class(myKernel) = "kernel"

Now use it just like an standard kernel:


svp = ksvm(x,y,type="C-svc",kernel=myKernel,C=1)

Multiclass problems

So far all our problems involve only two classes. But, as we have already mentioned, the ksvm function can deal with more than two classes as well. It just performs SVM for all possiblepairs of classes, and finally use majority vote. Let's see how this is done using Fisher's Iris Data.


svp = ksvm(Species ~ .,data=iris,type="C-svc",kernel="rbfdot",C=1)

We have used a slightly different form of ksvm here: Instead of specifying the measurements and true classes as separate arguements, we have used the formula

Species ~ .

which means "Classify by Species using all the other variables in the data set".

The predict function will automatically take care of the majority voting. To convince yourself that ksvm has indeed performed pairwise SVM just check:


svp@b

You'll see that it is a vector of length 3 = ³ C₂.

Comment Box is loading comments...