斯坦福机器学习视频之线性回归习题详解

今天做了下斯坦福的ML的ex1，包括两大部分：

1，单变量线性回归

2，多变量线性回归

其中多变量线性回归部分相对于单变量线性回归，主要就多了关于feature的归一化处理，因为各种特征表示不同，可能数值想相差很大，因而需要归一化处理

这里(ex1.pdf)的归一化处理为数值本身与该feature项的平均值之差除以该feature的标准差

代码如下：

function [X_norm, mu, sigma] = featureNormalize(X)
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));

% ====================== YOUR CODE HERE ======================
% Instructions: First, for each feature dimension, compute the mean
%               of the feature and subtract it from the dataset,
%               storing the mean value in mu. Next, compute the
%               standard deviation of each feature and divide
%               each feature by it's standard deviation, storing
%               the standard deviation in sigma.
%
%               Note that X is a matrix where each column is a
%               feature and each row is an example. You need
%               to perform the normalization separately for
%               each feature.
%
% Hint: You might find the 'mean' and 'std' functions useful.
%
mu=mean(X_norm);
sigma=std(X_norm,0,1);
for i=1:size(X,2)
    X_norm(:,i)=(X_norm(:,i)-mu(i))/sigma(i);
end

end

另外线性回归不管是单变量（特征）或者是多特征，代码基本一致，主要包裹计算cost函数和gradient descent

loss函数如下：

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
% J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
% parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.
sum=0;
for i=1:m
    % sum=sum+(theta(1)+theta(2).*X(i,2)-y(i))^2;
    sum=sum+(X(i,:)*theta-y(i))^2;
end
J=sum/(2*m);
end

gradient descent函数（多特征与单一特征一样）如下：

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
%GRADIENTDESCENTMULTI Performs gradient descent to learn theta
% theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta.
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCostMulti) and gradient here.
    %
    sum=zeros(size(theta,1),size(theta,2));
        %for j=1:size(theta,1)
    for i=1:m
    sum=sum+(X(i,:)*theta-y(i))*X(i,:)';
    end
    theta=theta-alpha*sum/m;
    % ============================================================

% Save the cost J in every iteration
J_history(iter) = computeCostMulti(X, y, theta);

end

代码主要就是以上几个方面，另如果要估计给定特征值的y（预测房价），需要先将给定特征值进行之前的归一化处理，归一化函数返回的mu和sigma能够帮助很容易解决。

另外由代数方法，解决回归问题有另外一种方法叫做normal equation，

计算出的theta：

normal equation相对于Gradient Descent有以下优点与不足：

优点：

1，不需要预先选定Learning rate

2，GD需要多次iteration

3，GD需要特征归一化处理

缺点是

当feature数目过大时，normal equation计算相当之复杂，此时应该选择gradient descent。