regularization

通过让各( heta_i(i>0))值尽量小一些, 可以避免overfitting (why?)
(J( heta))上增加(frac{lambda}{2m} sum_{i=1}^n heta_i^2) (注意( heta_0)不需要加)
注意这里分母(2m)不是(2n) (why?)
但如果(lambda)过大,则会导致underfitting(( heta)都趋于0了)
随着(lambda) 增大,train err单调增大 (why?)

对于梯度下降法:

这样的话(gradient(i))就多了(alpha frac {lambda} m heta) (注意(gradient(0))不变)
调整后J仍然bowl shape

function [J, grad] = costFunctionReg(theta, X, Y, lambda)
	m = length(Y);
	n = length(theta);
	H = sigmoid(X * theta);
	J = -1 / m * (Y' * log(H) + (1.-Y)' * log(1.-H)) + lambda / (2*m) * sum(theta(2:n).^2);
	grad = 1 / m * (X' * (H - Y) + [0; (lambda .* theta(2:n))]);
end

对于normal equation:

(left(X^T X+lambda egin{bmatrix} 0\&1\&&ddots\&&&1end{bmatrix} ight)^{-1} X^T Y) (why?)

原文地址:https://www.cnblogs.com/acha/p/11042145.html