Where is it best to use svm with linear kernel?

One more thing to add: linear SVM is less prone to overfitting than non-linear. And you need to decide which kernel to choose based on your situation: if your number of features is really large compared to the training sample, just use linear kernel; if your number of features is small, but the training sample is large, you may also need linear kernel but try to add more features; if your feature number is small (10^0 - 10^3), and the sample number is intermediate (10^1 - 10^4), use Gaussian kernel will be better.

As far as I know, SVM with linear kernel is usually comparable with logistic regression .


Linear kernels are best to apply on linearly separable data. Imagine your dataset has only 2 features, and 2 classes. If you plot your dataset samples in a chart using the 2 features as X and Y, you'll be able to see how samples from different classes position in relation to each other.

If it's easy to draw a line that separates the two classes, then a linear kernel is great for the job:

enter image description here

Of course this works with many features, not only two, rendering multi-dimensional spaces. However, if your data is not linearly separable, you'll need to map your samples into another dimensional space, using kernels like RBF, or polynomial.

Also, since the linear kernel does not perform any mapping, it is generally faster to train your classifier than with other kernels.


SVM with linear kernel is indeed one of the most simplest classifiers, but it won't be surprising if we get very high performance accuracy when the data distribution is linearly separable.

In this sense, I think your opinion is correct. However, you do need to realize the power of SVM lies in the extension with much more complex non-linear kernels (eg RBF).

One link on choosing classifiers.


Linear kernel has some advantages but probably (in my opinion) the most significant one is the fact that generally is way faster to train in comparison with non-linear kernels such as RBF.

If your dataset size is in terms of gigabytes, you would see that the training time difference is huge (minutes vs. hours).