Regularization and model selection

Suppose we are trying select among several different models for a learning problem.For instance, we might be using a polynomial regression model h_θ(x) = g(θ₀ + θ₁x + θ₂x² + · · · + θ_kx^k ),and wish to decide if k should be 0, 1, . . . , or 10. How can we automatically select a model that represents a good tradeoff between the twin evils of bias and variance?Alternatively, suppose we want to automatically choose the bandwidth parameter τ for locally weighted regression, or the parameter C for our l1-regularized SVM. How can we do that?

假设我们正在努力为某个学习问题从几个不同的模型做选择。例如，我们可能正在使用一种多项式回归模型h_θ(x) = g(θ₀ + θ₁x + θ₂x² + · · · + θ_kx^k )，并且希望决定k是应该为0,1,...,或者10。那么我们怎么自动选择一个好的模型从而实现偏差和过拟合之间好的权衡？或者，假设我们想为自动为局部加权回归选择带宽参数τ，或者为我们的l1-正规化SVM选择参数C。我们怎么做到这一点呢？

For the sake of concreteness, in these notes we assume we have some finite set of models M = {M1, . . . , Md} that we’re trying to select among. For instance, in our first example above, the model Mi would be an i-th order polynomial regression model. (The generalization to infinite M is not hard.2 )

为了具体地考虑，在这些笔记中我们假设我们有一些有限的模型M={M₁,...,M_d}，我们要从中选择。