Machine Learning No.2: Linear Regression with Multiple Variables

1. notation:

n = number of features

x(i) = input (features) of ith training example

 = value of feature j in ith training example

2. Hypothesis:

3. Cost function:

4. Gradient descent:

Repeat {

  

}

substituting cost function, then

Repeat {

  

  (simultaneously update θj for j = 0, ... n)

}

5. Mean normalization

replace xwith xi - µi to make features have approximately zero mean(Do not apply to x= 1).

ex: x_1 = (x_1 - u_1) / s_1

6. Declare convergence if J(θ) decreases by less than 10^-3 in one iteration.

if α is too small: slow convergence.

if α is too large: J(θ) may not decrease on every iteration; may not converge

7. normal equation

Octave: pinv(X'*X)*X'*y

8. comparation between gradient descent and normal equation

Gradient Descent: need to choose α

                            needs many iterations

          works well even when n is large

Normal Equation: No need to choose α

           Don't need to iterate

         need to compute pinv(X'X)

         slow if n is very large

9. Some problems

  what if X'T is non-invertible?

    Redundant features(linearly dependent)

           E.g.  x1 = size in feet^2

                   x2 = size in m^2

    Too many features(e.g. m <= n)

         Delete some features, or use regularization

原文地址:https://www.cnblogs.com/yingzhongwen/p/3154042.html