CheeseZH: Stanford University: Machine Learning Ex3: Multiclass Logistic Regression and Neural Network Prediction

Handwritten digits recognition (0-9)

Multi-class Logistic Regression

1. Vectorizing Logistic Regression

(1) Vectorizing the cost function

(2) Vectorizing the gradient

(3) Vectorizing the regularized cost function

(4) Vectorizing the regularized gradient

All above 4 formulas can be found in the previous blog: click here.

lrCostFunction.m

 1 function [J, grad] = lrCostFunction(theta, X, y, lambda)
 2 %LRCOSTFUNCTION Compute cost and gradient for logistic regression with 
 3 %regularization
 4 %   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
 5 %   theta as the parameter for regularized logistic regression and the
 6 %   gradient of the cost w.r.t. to the parameters. 
 7 
 8 % Initialize some useful values
 9 m = length(y); % number of training examples
10 
11 % You need to return the following variables correctly 
12 J = 0;
13 grad = zeros(size(theta));
14 
15 % ====================== YOUR CODE HERE ======================
16 % Instructions: Compute the cost of a particular choice of theta.
17 %               You should set J to the cost.
18 %               Compute the partial derivatives and set grad to the partial
19 %               derivatives of the cost w.r.t. each parameter in theta
20 %
21 % Hint: The computation of the cost function and gradients can be
22 %       efficiently vectorized. For example, consider the computation
23 %
24 %           sigmoid(X * theta)
25 %
26 %       Each row of the resulting matrix will contain the value of the
27 %       prediction for that example. You can make use of this to vectorize
28 %       the cost function and gradient computations. 
29 %
30 % Hint: When computing the gradient of the regularized cost function, 
31 %       there're many possible vectorized solutions, but one solution
32 %       looks like:
33 %           grad = (unregularized gradient for logistic regression)
34 %           temp = theta; 
35 %           temp(1) = 0;   % because we don't add anything for j = 0  
36 %           grad = grad + YOUR_CODE_HERE (using the temp variable)
37 %
38 
39 hx = sigmoid(X*theta);
40 reg = lambda/(2*m)*sum(theta(2:size(theta),:).^2);
41 J = -1/m*(y'*log(hx)+(1-y)'*log(1-hx)) + reg;
42 theta(1) = 0;
43 grad = 1/m*X'*(hx-y)+lambda/m*theta;
44 
45 
46 
47 
48 
49 
50 % =============================================================
51 
52 grad = grad(:);
53 
54 end

View Code

2. One-vs-all Classification (Training)

Return all the classifier parameters in a matrix Θ (a K x N+1 matrix, K is the num_labels and N is the num_features ), where each row of Θ corresponds to the learned logistic regression parameters for one class. You can do this with a 'for'-loop from 1 to K, training each classifier independently.

oneVsAll.m

 1 function [all_theta] = oneVsAll(X, y, num_labels, lambda)
 2 %ONEVSALL trains multiple logistic regression classifiers and returns all
 3 %the classifiers in a matrix all_theta, where the i-th row of all_theta 
 4 %corresponds to the classifier for label i
 5 %   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
 6 %   logisitc regression classifiers and returns each of these classifiers
 7 %   in a matrix all_theta, where the i-th row of all_theta corresponds 
 8 %   to the classifier for label i
 9 
10 % Some useful variables
11 m = size(X, 1);
12 n = size(X, 2);
13 
14 % You need to return the following variables correctly 
15 all_theta = zeros(num_labels, n + 1);
16 
17 % Add ones to the X data matrix
18 X = [ones(m, 1) X];
19 
20 % ====================== YOUR CODE HERE ======================
21 % Instructions: You should complete the following code to train num_labels
22 %               logistic regression classifiers with regularization
23 %               parameter lambda. 
24 %
25 % Hint: theta(:) will return a column vector.
26 %
27 % Hint: You can use y == c to obtain a vector of 1's and 0's that tell use 
28 %       whether the ground truth is true/false for this class.
29 %
30 % Note: For this assignment, we recommend using fmincg to optimize the cost
31 %       function. It is okay to use a for-loop (for c = 1:num_labels) to
32 %       loop over the different classes.
33 %
34 %       fmincg works similarly to fminunc, but is more efficient when we
35 %       are dealing with large number of parameters.
36 %
37 % Example Code for fmincg:
38 %
39 %     % Set Initial theta
40 %     initial_theta = zeros(n + 1, 1);
41 %     
42 %     % Set options for fminunc
43 %     options = optimset('GradObj', 'on', 'MaxIter', 50);
44 % 
45 %     % Run fmincg to obtain the optimal theta
46 %     % This function will return theta and the cost 
47 %     [theta] = ...
48 %         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
49 %                 initial_theta, options);
50 %
51 
52 for c=1:num_labels,
53   initial_theta = all_theta(c,:)';
54   options = optimset('GradObj','on','MaxIter',50);
55   theta = fmincg(@(t)(lrCostFunction(t,X,(y==c),lambda)),initial_theta,options);
56   all_theta(c,:) = theta';
57 end;
58 
59 
60 % =========================================================================
61 
62 
63 end

View Code

3. One-vs-all Classification (Prediction)

predictOneVsAll.m

View Code

Neural Network Prediction

Feedword Propagation and Prediction

predict.m

 1 function p = predict(Theta1, Theta2, X)
 2 %PREDICT Predict the label of an input given a trained neural network
 3 %   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
 4 %   trained weights of a neural network (Theta1, Theta2)
 5 
 6 % Useful values
 7 m = size(X, 1);
 8 num_labels = size(Theta2, 1);
 9 
10 % You need to return the following variables correctly 
11 p = zeros(size(X, 1), 1);
12 
13 % ====================== YOUR CODE HERE ======================
14 % Instructions: Complete the following code to make predictions using
15 %               your learned neural network. You should set p to a 
16 %               vector containing labels between 1 to num_labels.
17 %
18 % Hint: The max function might come in useful. In particular, the max
19 %       function can also return the index of the max element, for more
20 %       information see 'help max'. If your examples are in rows, then, you
21 %       can use max(A, [], 2) to obtain the max for each row.
22 %
23 a1 = X; %5000*400
24 a1 = [ones(size(X,1), 1),X]; %5000*401
25 a2 = sigmoid(a1*Theta1');%5000*25
26 a2 = [ones(size(a2,1),1),a2]; %5000*26
27 a3 = sigmoid(a2*Theta2');%5000*10
28 [tmp,p] = max(a3,[],2);
29 % =========================================================================
30 
31 
32 end

View Code

Other files and dataset can be download in Coursera.