今天做了下斯坦福的ML的ex1,包括两大部分:
1,单变量线性回归
2,多变量线性回归
其中多变量线性回归部分相对于单变量线性回归,主要就多了关于feature的归一化处理,因为各种特征表示不同,可能数值想相差很大,因而需要归一化处理
这里(ex1.pdf)的归一化处理为数值本身与该feature项的平均值之差除以该feature的标准差
代码如下:
function [X_norm, mu, sigma] = featureNormalize(X)
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
% ====================== YOUR CODE HERE ======================
% Instructions: First, for each feature dimension, compute the mean
% of the feature and subtract it from the dataset,
% storing the mean value in mu. Next, compute the
% standard deviation of each feature and divide
% each feature by it's standard deviation, storing
% the standard deviation in sigma.
%
% Note that X is a matrix where each column is a
% feature and each row is an example. You need
% to perform the normalization separately for
% each feature.
%
% Hint: You might find the 'mean' and 'std' functions useful.
%
mu=mean(X_norm);
sigma=std(X_norm,0,1);
for i=1:size(X,2)
X_norm(:,i)=(X_norm(:,i)-mu(i))/sigma(i);
end
end
另外线性回归不管是单变量(特征)或者是多特征,代码基本一致,主要包裹计算cost函数和gradient descent
loss函数如下:
function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
% J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
% parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
% You should set J to the cost.
sum=0;
for i=1:m
% sum=sum+(theta(1)+theta(2).*X(i,2)-y(i))^2;
sum=sum+(X(i,:)*theta-y(i))^2;
end
J=sum/(2*m);
end
gradient descent函数(多特征与单一特征一样)如下:
function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
%GRADIENTDESCENTMULTI Performs gradient descent to learn theta
% theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCostMulti) and gradient here.
%
sum=zeros(size(theta,1),size(theta,2));
%for j=1:size(theta,1)
for i=1:m
sum=sum+(X(i,:)*theta-y(i))*X(i,:)';
end
theta=theta-alpha*sum/m;
% ============================================================
% Save the cost J in every iteration
J_history(iter) = computeCostMulti(X, y, theta);
end
end
代码主要就是以上几个方面,另如果要估计给定特征值的y(预测房价),需要先将给定特征值进行之前的归一化处理,归一化函数返回的mu和sigma能够帮助很容易解决。
另外由代数方法,解决回归问题有另外一种方法叫做normal equation,
计算出的theta:
normal equation相对于Gradient Descent有以下优点与不足:
优点:
1,不需要预先选定Learning rate
2,GD需要多次iteration
3,GD需要特征归一化处理
缺点是
当feature数目过大时,normal equation计算相当之复杂,此时应该选择gradient descent。