with all the data of x, x -> f() ->y, you want to separate them with: . There are many solutions there (w and b), which one is better? -- the one with the maximum margin (Margin Width ).
Support vectors are those data points that the margin pushed up against (on the border of the margin). The maximum margin implies that only the support vectors are important, the other training examples are ignorable.
The margin width
maximum is the same as minimum (dot production)
To solve w and b:
Minimize subject to
The solution is the quadratic optimization problem:
The solution has the form:
Each non-zero \alpha_i indicates the corresponding x_i is a support vector. The classifying function is:
Soft margin is to minimize
Linear classifier is a separating hyperplane, the support vectors (the most important training points) define the hyperplane
Non-linear SVM: mapping data to a higher-dimensional feature space where the training data is separable.
Linear classifier relies on dot production between vectors, the non-linear relies on kernel function. Kernel function is some function that corresponds to an inner product in feature space. Kernel function examples:
Gaussian(radial-basis function network), is the distance between closest points with different classification.
Non-linear SVM locates a separating hyperplane in the feature space and classify points in that space. It doesn't need to represent the space explicitly, simply by defining a kernel function. The kernel function plays the role of the dot product in feature space.
Weakness of SVM
sensitive to noise.
It only consider two classes (binary class). Didn't we use it to classify multiply categories? If you have m categories, you have m SVM leans. SVM1 leans "output ==1" vs "output != 1". ......SVM m leans "output == m" vs "output != m". In prediction part, predict the new input with each SVM and find out the best probability.
--most contents are from: http://www.cs.cmu.edu/~awm/tutorials