We ended the last lecture with the convex programming problem
where we wanted to minimise
w'w + C ∑ (1 - yi (w1 x1i + w2 x2i + b))+
w.r.t. ( w1 , w2 ) and b.
This is a convex programming problem no doubt, but a pretty
nasty one because of two reasons:
It involves the (...)+ function, which is not
differentiable, and the point of non-differnatiability depends
on the unknown w1 , w2.
We have to do the optimisation in dimension d+1
where d is the dimension of xi's. As we have seen
in the face recognition problem, d is usually very
large. The sample size n is much smaller in comparison.
We shall now employ a clever trick to eliminate both the
problems. The trick will reduce the problem to a quadratic
programming problem, where the target function will be
differentiable, and we shall be optimising it
w.r.t. only n variables. The only price we shall pay is
that the optimisation will be a constrained one, but the
constraints will be very simple.
The trick is to notice that
C ∑ (1 - yi (w1 x1i + w2 x2i + b))+
is the maximum value of
∑ ai (1 - yi (w1 x1i + w2 x2i + b))
where each ai ∈ [0,C]. So our original unconstrained
minimisation problem now becomes
minw,b maxa [ w'w + C ∑ ai (1 - yi (w1 x1i + w2 x2i +
b)) ].
But for a convex optimisation problem min max = max min, and so
this is the same as
maxa minw,b [ w'w + C ∑ ai (1 - yi (w1 x1i + w2 x2i +
b)) ].
We have cheated here a bit. In fact, the function may be
unbounded below (w.r.t. b) for certain values
of a. This is readily seen because the function is linear
in b with coefficient
-∑ ai yi .
If this is nonzero, we can always choose b to make the
function arbitrarily small. However, since our aim is to
choose a to maximise this minimum, so clearly we shall
never choose a for which this happens. In other words we
shall confine our search only to the a's for which
∑ ai yi = 0.
So our optimisation now becomes
maxa minw [ w'w + C ∑ ai (1 - yi (w1 x1i + w2 x2i)) ],
where w is unconstrained, but a must satisfy two
conditions:
a ∈ [0,C]n and y'a = 0.
Notice that the target function is differentiable as well as convex, and w
is unconstrained. So we can use differentiation to get the
optimal w as
w = ∑ ai yi xi / 2 .
Substituting we are now left with an optimisation problem only
over a:
maxa ∑i ai - (1/4) * ∑i ∑j ai aj yi yj xi' xj
subject to the conditions
a ∈ [0,C]n and y'a = 0.
Thus we have indeed achieved our goal.
Of course, we need to know that values of w and b
in the original problem. These are given in the following
algorithm (from the main reference):