Machine Learning No.2: Linear Regression with Multiple Variables

1. notation:

n = number of features

x⁽ⁱ⁾ = input (features) of i^thtraining example

= value of feature j in i^thtraining example

2. Hypothesis:

3. Cost function:

4. Gradient descent:

Repeat {

}

substituting cost function, then

Repeat {

　　(simultaneously update θ_j for j = 0, ... n)

}

5. Mean normalization

replace x_iwith x_i - µ_i to make features have approximately zero mean(Do not apply to x₀= 1).

ex: x_1 = (x_1 - u_1) / s_1

6. Declare convergence if J(θ) decreases by less than 10^-3 in one iteration.

if α is too small: slow convergence.

if α is too large: J(θ) may not decrease on every iteration; may not converge

7. normal equation

Octave: pinv(X'*X)*X'*y

8. comparation between gradient descent and normal equation

Gradient Descent: need to choose α

needs many iterations

　　　　　　　　　 works well even when n is large

Normal Equation: No need to choose α

　　　　　　　　 Don't need to iterate

　　　　　　　　　need to compute pinv(X'X)

　　　　　　　　　slow if n is very large

9. Some problems

　 what if X'T is non-invertible?

Redundant features(linearly dependent)

E.g. x1 = size in feet^2

x2 = size in m^2

Too many features(e.g. m <= n)

Delete some features, or use regularization