Today our aim is to perform SVM using the kernlab package of R. We shall not deal
with real life data yet. Working with a toy data set will help to familiarise ourselves
with R's way of looking at SVM.
First we create a little bivariate data consisting of two classes.
neg1 = 0; neg2 = 0 # (0,0) is the centre of the neg class
pos1 = 2; pos2 = 2 # (2,2) is the centre of the pos class
#Generate 100 points for the pos class
xpos = cbind(rnorm(100,mean=pos1),rnorm(100,mean=pos2))
#Generate 100 points for the neg class
xneg = cbind(rnorm(100,mean=neg1),rnorm(100,mean=neg2))
#Combine the 100+100 points into a data set of size 200.
x = rbind(xpos,xneg)
colnames(xtrain)=c('X1','X2')
#The true classes: 100 pos (+1) followed by 100 neg (-1)
y = c(rep(1,100),rep(-1,100))
Plot the data to understand what we are doing.
plot(x,col=ifelse(y>0,'red','blue'))
Load the package kernlab (after installing it of course!).
library(kernlab)
The main workhorse is the ksvm function. It has different forms. The line below
gives one of the simplest forms. Here x is the measurement matrix, y is
the vector of true classes. SVM is a general family of algorithms of which we have discussed
only one in the class. This particular case is called C-SVC (SVC stands for
support vector classification, while the first C stands for the commonly used name for the
cost parameter).
vanilladot is the Euclidean inner product kernel. Finally we have specified a value
for the cost parameter. The ksvm function performs the optimisation we have discussed in class.
The resulting classifier is placed in the variable svp.
Try to peep inside svp with
svp
This shows only the tip of the iceberg. We shall take a deeper look later. But for now
let's use it to classify some case. Well, we have not created any new case. So we shall try to classify
the original data itself. This will give us the so-called "resubstitution error".
ypred = predict(svp, data = x)
The true classes stored in y is just a vector of +1's and -1's. Similarly,
ypred is another vector of +1's and -1's. It is difficult to compare them unless
we make a 2 × 2 contingency table.
table(y,ypred)
Of course we can also find the misclassifcation proportion:
sum(ypred==y)/length(y)
Since we are in a bivariate set up, a scatterplot is a good way to visualise the classifier. The
kernlab package has its own plot function to produce a colourful visualisation.
plot(svp,data=x)
It is not the familiar "strip between classes" plot that we have seen in the theory classes.
It shows the two classes with two colours and the "strength" of the class memberships by varying the
intensity. This is more appropriate than the "strip between classes" plot as we are
doing soft margin classification here.
It is important to understand the plot carefully. The white region is sort of a "no man's zone"
separating the two classes. This is where the medial line of the strip lies. There are
four types of points: filled/open circles/triangles.
The filled points are the support vectors. These are taken from the
training data, and are part of the classifier. Choice of the symbols (circle/triangle)
is done by the true classes.
The open points are from the new data set being classified. In our example we are
working with resubstitution error, and so the new data set is actually the original data set itself.
The choice of the symbols (circle/triangle) is based on the predicted classes.
Now we shall take a deeper look inside the classifier returned by ksvm.
This is the same thing as before, except that we have used scaled=c(F,F) to
prevent R from studentising the measurements before computation. We are preventing this
so that we can easily relate the things we see inside the bowels of ksvm with
the x we have passed to it.
As you can guess svp is a bundle of things returned by ksvm. A typical
way to "unpack" such a bundle is to use
names(svp)
Unfortunately, it will return NULL! This is because R has three different ways by which
it bundles things together. These are called S3, S4 and R5 techniques.
Most of the time we work with S3 objects, and then names is the way to
unpack. But kernlab uses S4 objects. Here the technique is to use
slotNames(svp)
You can see a slot called alphaindex, for example. You might be tempted to use
svp$alphindex
to see its value. But the $ opertor is for S3 objects only. For S4
objects you need to use the @ opertor:
svp@alphindex
You can also use the following command:
alphaindex(svp)
You might think that it is just a vector of some integers.
Look very carefully, and you'll see that it is actually a list of length one, and
the sole member of the list is a vector of integers. If you do not get this point then consider
the difference between the following two lines:
temp1 = 1:10 #a vector of 10 integers
temp2 = list(1:10) #a list with a single element, which is a vector of integers
temp1 + 1 # OK
temp2 + 1 #error: can't add a list with an integer!
temp2[[1]] + 1 # OK
Let's look at the vector svp@alphaindex[[1]] once again. It gives you the indices for the cases
which are the support vectors (the points that land inside the strip). If you want to
look at the measurments (the values of X1 and X2 in our example) for these cases you may use any of the two commands below:
svp@xmatrix
xmatrix(svp)
Again, it is not just a matrix of X's. It is a one-element list with the matrix as its
sole element. Why is ksvm wrapping everything in a one-element list? Because ksvm
is actually meant to work with any finite number of classes while SVM in principle deals with
only binary classification. So ksvm performs a separate SVM for each pair of classes, and
finally produces a list of length k C2 where k is the number of classes. Each
element of the list is the complete SVM output for one particular pair of classes. In our case
we have just k=2 classes, hence the lists are of
length 2 C2 = 1.
Looking at all the slots of the svp object like this is boring. Instead, let's add
some meaning to our exploration by setting a goal: we want to produce the familiar "strip between
classes" plot using svp. If you recall the theory the equation of the medial line
hrough the strip was
∑ αi yi < xi · x > + b = 0 ,
where the sum is effectively only over the support vector cases (as αi = 0 for
the other cases). So we can write it as
< w · x > + b = 0 ,
where
w = ∑ αi yi xi .
So we need to know where R stores the αi's:
svp@alpha
Only the nonzero αi's are stored. Note how they are all in (0,C]. We can of course
multiply these by yi's and then add, but R actually makes
the product αi yi directly available in a slot
called coef:
svp@coef
Notice that these are the as αi's except for
sign. That is because yi's are +1
or -1. So computing w is easy:
w = coef(svp)[[1]]%*%xmatrix(svp)[[1]]
Finally we need b:
b = svp@b[[1]]
Here w = ( w1 , w2 ) is 2-dim. So the medial line
through the strip has equation
w1 x1 + w2 x2 = b.
Let's cast it in the slope intercept form so that we may use abline:
slope = -w[1]/w[2]
intercept.mid = b/w[2]
Similarly we need to find the intercepts for the two boundaries:
We do not need to recompute the slope, as the boundaries are
parallel to the medial line.
Now at last we are ready to plot. We start by making the
scatterplot.
The CV error (average proprtion of misclassifcation in the test
parts) may be obtained like:
svp@cross
Here the sample size is 200. So you can
request k-fold cross validation where 2≤ k ≤
200. If k=0 (the default), then no CV is performed. If
you supply k=1, then an error message is
generated. If k=200 then you get Leave-One-Out (LOO)
CV. If k > 200, then R complains about some index going
out of range (instead of saying something more meaningful like:
You cannot perform k-forld CV if k > sample
size).
Non-linear
We can replace the linear (vanilladot) kernel by
Gaussian Radial Basis Function kernel (rbfdot). A
list of other available kernels may be obtained by looking up the
help for vanilladot.
Notice how the boundaries are curved now. Also notice that despite
the circular contours of the kernel function, the boundaries are
not circular.
The Gaussian Radial Basis Function kernel has a
parameter Σ. It is automatically estimated by R using
a function called sigest. If you want you can
specifysome value for the parameter:
So far all our problems involve only two classes. But, as we have
already mentioned, the ksvm function can deal with
more than two classes as well. It just performs SVM for all
possiblepairs of classes, and finally use majority vote. Let's
see how this is done using Fisher's
Iris Data.
We have used a slightly different form of ksvm here:
Instead of specifying the measurements and true classes as
separate arguements, we have used the formula
Species ~ .
which means "Classify by Species using all the other variables in
the data set".
The predict function will automatically take care of
the majority voting. To convince yourself that ksvm
has indeed performed pairwise SVM just check:
svp@b
You'll see that it is a vector of length 3 = 3 C2.