L1范式和L2范式的区别

L1 and L2 regularization add a cost to high valued weights to prevent overfitting. L1 regularization is an absolute value cost function and tends to set more weights to 0 (places more mass on zero weights) compared to L2 regularization.

Difference between L1 and L2
L2 (Ridge) shrinks all the coefficient by the same proportions but eliminates none, while L1 (Lasso) can shrink some coefficients to zero, performing variable selection.

Which to use
If all the features are correlated with the label, ridge outperforms lasso, as the coefficients are never zero in ridge. If only a subset of features are correlated with the label, lasso outperforms ridge as in lasso model some coefficient can be shrunken to zero.

reference：http://www.quora.com/What-is-the-difference-between-L1-and-L2-regularization