机器学习实战

核心公式 - 贝叶斯准则

[p(c|x) = frac{p(x|c)p(c)}{p(x)} ]

p(c|x) 是在x发生的情况下，c发生的概率。
p(x|c) 是在c发生的情况下，x发生的概率。
p(c) 是c发生的概率。
p(x) 是x发生的概率。

规则

如果P(c₁|x) > P(c₂|x)，那么属于类别c₁。
如果P(c₁|x) < P(c₂|x)，那么属于类别c₂。

等价变化

[p(c1|x) = frac{p(x|c1)p(c1)}{p(x)} ]

[p(c2|x) = frac{p(x|c2)p(c2)}{p(x)} ]

Therefore, comparing p(c1|x) and p(c2|x)
are same as comparing
(frac{p(x|c1)p(c1)}{p(x)}) and (frac{p(x|c2)p(c2)}{p(x)})
same as comparing
(p(x|c1)p(c1)) and (p(x|c2)p(c2))

多个独立特征的变化

p(x|c1)中，x是多个独立特征，即(x=x_0,x_1...x_n),
则： (p(x|c1)=p(x_0,x_1...x_n|c1))
(p(x|c1)=p(x_0|c1)p(x_1|c1)...p(x_n|c1))

下溢出问题

实际应用

过滤侮辱性留言
过滤垃圾邮件