2014-11-09:EM

EM


Struct

loop : (precision condition)

  • zero_inilialize_ss ( ss, model) :set every item of class_word and * class_total* to 0
  • e_step:
# 以document为单位建模
for i in corpus->num_docs:
         #每次把class_word置0,迭代,其实最后为了更新phi和gamma  
         doc_e_step( corpus->doc[i],var_gamma[d],phi,model,ss);
  • m_step
    -- lda_mle() : model->log_prob_w[k][w] = log( ss->class_word[k][w] / ss->class_total[k] )

  • update precision condition
    about convergence


doc_e_step ( Every document )

  • likehood = lda_inference()
  • update alpha_ss
    ss->alpha_ss = sum[1~NTOPICS]digamma(gamma[i]) - NTOPICS * digamma ( sum[1~NTOPICS]gamma[i] )
  • set value of class_word and class_total.
    ss->class_word[topic][doc-word[n]] += doc -> count[n] * phi [n][k]
    ss->class_total = sum[K:1~NTOPICS] phi[n][k] #phi[][] : word ~ topic
    这里的phi[n][k]是以doc为单位的,所以没有word[k]和k值不同,total是每个word下topic分布和。

重点在lda_inference中

原文地址:https://www.cnblogs.com/cyno/p/4085930.html