something about kernel

1.The kernel is effectively a similarity measure, so choosing a kernel according to prior knowledge of invariances as suggested by Robin (+1) is a good idea.

In the absence of expert knowledge, the Radial Basis Function kernel makes a good default kernel (once you have established it is a problem requiring a non-linear model).

The choice of the kernel  parameters can be automated by optimising a cross-valdiation based model selection (or use the radius-margin or span bounds). The simplest thing to do is to minimise a continuous model selection criterion using the Nelder-Mead simplex method, which doesn't require gradient calculation and works well for sensible numbers of hyper-parameters. If you have more than a few hyper-parameters to tune, automated model selection is likely to result insevere over-fitting, due to the variance of the model selection criterion. It is possible to use gradient based optimisation, but the performance gain is not ususally worth the effort of coding it up)

Automated choice of kernels parameters is a tricky issue, as it is very easy to overfit the model selection criterion (typically cross-validation based), and you can end up with a worse model than you started with. Automated model selection also can bias performance evaluation, so make sure your performance evaluation evaluates the who process of fitting the model (training and model selection), for details, see

G. C. Cawley and N. L. C. Talbot, Preventing over-fitting in model selection via Bayesian regularisation of the hyper-parameters, Journal of Machine Learning Research, volume 8, pages 841-861, April 2007. (pdf)

and

G. C. Cawley and N. L. C. Talbot, Over-fitting in model selection and subsequent selection bias in performance evaluation, Journal of Machine Learning Research, 2010. Research, vol. 11, pp. 2079-2107, July 2010.(pdf)

2.

If you are not sure what would be best you can use automatic techniques of selection (e.g. cross validation, ... ). In this case you can even use a combination of classifiers (if your problem is classification) obtained with different kernel.

However, the "advantage" of working with a kernel is that you change the usual "Euclidian" geometry so that it fits your own problem. Also, you should really try to understand what is the interest of a kernel for your problem, what is particular to the geometry of your problem. This can include:

  • Invanriance: if there is a familly of transformations that do not change your problem fundamentally, the kernel should reflect that. Invariance by rotation is contained in the gaussian kernel, but you can think of a lot of other things: translation, homothetie, any group representation, ....
  • What is a good separator ? if you have an idea of what a good separator is (i.e. a good classification rule) in your classification problem, this should be included in the choice of kernel. Remmeber that SVM will give you classifiers of the form


if you know that a linear separator would be a good one, then you can uses Kernel that give affine functions (i.e. K(x,xi)=x,Axi+c), if you think smooth boundaries much in the spirit of smooth KNN would be better, then you can take a gaussian kernel...

原文地址:https://www.cnblogs.com/xiangshancuizhu/p/2253648.html