SVM (contd)

We ended the last lecture with the convex programming problem where we wanted to minimise

w'w + C ∑ (1 - y_i (w₁ x_1i + w₂ x_2i + b))₊

w.r.t. ( w₁ , w₂ ) and b.

This is a convex programming problem no doubt, but a pretty nasty one because of two reasons:

It involves the (...)₊ function, which is not differentiable, and the point of non-differnatiability depends on the unknown w₁ , w₂.
We have to do the optimisation in dimension d+1 where d is the dimension of x_i's. As we have seen in the face recognition problem, d is usually very large. The sample size n is much smaller in comparison.

We shall now employ a clever trick to eliminate both the problems. The trick will reduce the problem to a quadratic programming problem, where the target function will be differentiable, and we shall be optimising it w.r.t. only n variables. The only price we shall pay is that the optimisation will be a constrained one, but the constraints will be very simple.

The trick is to notice that

C ∑ (1 - y_i (w₁ x_1i + w₂ x_2i + b))₊

is the maximum value of

∑ a_i (1 - y_i (w₁ x_1i + w₂ x_2i + b))

where each a_i ∈ [0,C]. So our original unconstrained minimisation problem now becomes

min_w,b max_a [ w'w + C ∑ a_i (1 - y_i (w₁ x_1i + w₂ x_2i + b)) ].

But for a convex optimisation problem min max = max min, and so this is the same as

max_a min_w,b [ w'w + C ∑ a_i (1 - y_i (w₁ x_1i + w₂ x_2i + b)) ].

We have cheated here a bit. In fact, the function may be unbounded below (w.r.t. b) for certain values of a. This is readily seen because the function is linear in b with coefficient

-∑ a_i y_i .

If this is nonzero, we can always choose b to make the function arbitrarily small. However, since our aim is to choose a to maximise this minimum, so clearly we shall never choose a for which this happens. In other words we shall confine our search only to the a's for which

∑ a_i y_i = 0.

So our optimisation now becomes

max_a min_w [ w'w + C ∑ a_i (1 - y_i (w₁ x_1i + w₂ x_2i)) ],

where w is unconstrained, but a must satisfy two conditions:

a ∈ [0,C]ⁿ and y'a = 0.

Notice that the target function is differentiable as well as convex, and w is unconstrained. So we can use differentiation to get the optimal w as

w = ∑ a_i y_i x_i / 2 .

Substituting we are now left with an optimisation problem only over a:

max_a ∑_i a_i - (1/4) * ∑_i ∑_j a_i a_j y_i y_j x_i' x_j

subject to the conditions

a ∈ [0,C]ⁿ and y'a = 0.

Thus we have indeed achieved our goal.

Of course, we need to know that values of w and b in the original problem. These are given in the following algorithm (from the main reference):

Comment Box is loading comments...