Date: Feb 07, 2014

[Update:[Thu Feb 13 IST 2014]]

SVM (contd)

We ended the last lecture with the convex programming problem where we wanted to minimise
w'w + C ∑ (1 - yi (w1 x1i + w2 x2i + b))+
w.r.t. ( w1 , w2 ) and b.

This is a convex programming problem no doubt, but a pretty nasty one because of two reasons:
  1. It involves the (...)+ function, which is not differentiable, and the point of non-differnatiability depends on the unknown w1 , w2.
  2. We have to do the optimisation in dimension d+1 where d is the dimension of xi's. As we have seen in the face recognition problem, d is usually very large. The sample size n is much smaller in comparison.
We shall now employ a clever trick to eliminate both the problems. The trick will reduce the problem to a quadratic programming problem, where the target function will be differentiable, and we shall be optimising it w.r.t. only n variables. The only price we shall pay is that the optimisation will be a constrained one, but the constraints will be very simple.

The trick is to notice that
C ∑ (1 - yi (w1 x1i + w2 x2i + b))+
is the maximum value of
∑ ai (1 - yi (w1 x1i + w2 x2i + b))
where each ai ∈ [0,C]. So our original unconstrained minimisation problem now becomes
minw,b maxa [ w'w + C ∑ ai (1 - yi (w1 x1i + w2 x2i + b)) ].
But for a convex optimisation problem min max = max min, and so this is the same as
maxa minw,b [ w'w + C ∑ ai (1 - yi (w1 x1i + w2 x2i + b)) ].
We have cheated here a bit. In fact, the function may be unbounded below (w.r.t. b) for certain values of a. This is readily seen because the function is linear in b with coefficient
-∑ ai yi .
If this is nonzero, we can always choose b to make the function arbitrarily small. However, since our aim is to choose a to maximise this minimum, so clearly we shall never choose a for which this happens. In other words we shall confine our search only to the a's for which
∑ ai yi = 0.
So our optimisation now becomes
maxa minw [ w'w + C ∑ ai (1 - yi (w1 x1i + w2 x2i)) ],
where w is unconstrained, but a must satisfy two conditions:
a ∈ [0,C]n and y'a = 0.
Notice that the target function is differentiable as well as convex, and w is unconstrained. So we can use differentiation to get the optimal w as
w = ∑ ai yi xi / 2 .
Substituting we are now left with an optimisation problem only over a:
maxai ai - (1/4) * ∑ij ai aj yi yj xi' xj
subject to the conditions
a ∈ [0,C]n and y'a = 0.
Thus we have indeed achieved our goal.

Of course, we need to know that values of w and b in the original problem. These are given in the following algorithm (from the main reference):



Comment Box is loading comments...