吴恩达机器学习笔记 some tips on applying machine Learning

to deal with underfitting

增加feature的数目（通过feature的平方，立方项等增加feature或者增加其他的feature）

减小λ的值
to deal with overfitting

找更多的训练样本

减少feature的数目

用regulation，增加λ的值θ
evaluate hypothesis

通过把数据分为，训练样本和测试样本，min J train（θ）得到θ的系数（利用training data），然后用testing data和已经得到的weights系数

求J test（θ）的值，该值越小，说明hypothesis的拟合效果越好
coss validation 主要是用来model selection（避免underfiting和overfitting）

a.先通过把数据分为，training data ，cross validation data ，testing data

b.用不同的model（hypothesis）+training data来min J train（θ）得到一组θ的值

c.用cross validation data 和得到的θ求， J cv（θ）,通过代价最小就可以选择model了

d.用testing data来测试算法的准确率j test(θ)
porter stemmer(将意思相近的单词归为同一类)
人工地查看错误的分类，然后根据这些再选择需要的feature（先选择quick and dirty的实现方法，再error analysis选择新的feature ）
precision and recall

背景：（例），例如判断病人患癌症为1，不患为0，一批训练样本中，只有0.5%患有癌症，如果根据样本训练出来的算法的准确率高达95%，但是，如果我们判断所有患者都有癌症，准确率高达99.5%（但这样肯定不是一个好的算法）

所以我们除了关注准确率之外还要关注precision 和 recall

根据 f score准则，选择precision recall合适的算法

metric =2*Precision*Recall/(Precision+Recall)

吴恩达 机器学习 笔记 some tips on applying machine Learning

吴恩达机器学习笔记 some tips on applying machine Learning