斯坦福机器学习视频之线性回归习题详解

今天做了下斯坦福的ML的ex1,包括两大部分:

1,单变量线性回归

2,多变量线性回归

其中多变量线性回归部分相对于单变量线性回归,主要就多了关于feature的归一化处理,因为各种特征表示不同,可能数值想相差很大,因而需要归一化处理

这里(ex1.pdf)的归一化处理为数值本身与该feature项的平均值之差除以该feature的标准差

代码如下:

function [X_norm, mu, sigma] = featureNormalize(X)
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));

% ====================== YOUR CODE HERE ======================
% Instructions: First, for each feature dimension, compute the mean
%               of the feature and subtract it from the dataset,
%               storing the mean value in mu. Next, compute the
%               standard deviation of each feature and divide
%               each feature by it's standard deviation, storing
%               the standard deviation in sigma.
%
%               Note that X is a matrix where each column is a
%               feature and each row is an example. You need
%               to perform the normalization separately for
%               each feature.
%
% Hint: You might find the 'mean' and 'std' functions useful.
%      
mu=mean(X_norm);
sigma=std(X_norm,0,1);
for i=1:size(X,2)
    X_norm(:,i)=(X_norm(:,i)-mu(i))/sigma(i);
end

end

另外线性回归不管是单变量(特征)或者是多特征,代码基本一致,主要包裹计算cost函数和gradient descent

loss函数如下:

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.
sum=0;
for i=1:m
    % sum=sum+(theta(1)+theta(2).*X(i,2)-y(i))^2;
    sum=sum+(X(i,:)*theta-y(i))^2;
end
J=sum/(2*m);
end

gradient descent函数(多特征与单一特征一样)如下:

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
%GRADIENTDESCENTMULTI Performs gradient descent to learn theta
%   theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta.
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCostMulti) and gradient here.
    %
    sum=zeros(size(theta,1),size(theta,2));
        %for j=1:size(theta,1)
    for i=1:m   
    sum=sum+(X(i,:)*theta-y(i))*X(i,:)';
    end
    theta=theta-alpha*sum/m;
    % ============================================================

    % Save the cost J in every iteration   
    J_history(iter) = computeCostMulti(X, y, theta);

end

end

代码主要就是以上几个方面,另如果要估计给定特征值的y(预测房价),需要先将给定特征值进行之前的归一化处理,归一化函数返回的mu和sigma能够帮助很容易解决。

另外由代数方法,解决回归问题有另外一种方法叫做normal equation,

计算出的theta:

image

normal equation相对于Gradient Descent有以下优点与不足:

优点:

1,不需要预先选定Learning rate

2,GD需要多次iteration

3,GD需要特征归一化处理

缺点是

当feature数目过大时,normal equation计算相当之复杂,此时应该选择gradient descent。

原文地址:https://www.cnblogs.com/burness/p/3246465.html