Distance dependent Chinese Restaurant Processes

Here is a note of Distance dependent Chinese Restaurant Processes

文章链接http://pan.baidu.com/s/1dEk7ZA5

1. Distance dependent CRPs

In the traditional CRP ,the probability of a customer sitting at a table is computed from the number of other customers already sitting at that table.

Now we introduce the distance dependent CRP, the seating plan probability is described in terms of the probability of a customer sitting with each of the other customers .

let denote the i th customer assignment ,the index of the customer with whom the i th customer is sitting ,let denote the distance measurement between customers i and j , let D denote the set of all distance measurements between all customers ,and let be a decay function .

Notice that the customer assignments do not depend on other customer assignment , only the distances between customers.

This distribution is determined by the nature of the distance measurements and the decay function .For many sets of distance measurements ,the resulting distribution over partition is no longer exchangeable ;this is an appropriate distribution to use when exchangeability is not a reasonable assumption.

2.The decay function:

In general the decay function mediates how distances between customers affect the resulting distribution over partitions .Function f is non-increasing , takes non-negative finite values ,and satisfies f(∞)=0。 (衰减函数的性质)

3. Sequential CRPs and the traditional CRP

A sequential CRP is constructed by assuming that dij=∞ for those j>i ,and this guarantees that no customer can be assigned to a later customer.And when f(d)=1 for d≠∞ and dij<∞ for j<i, the sequential CRP is can re-express the traditional CRP.

NOTICE : although these models are the same ,the corresponding Gibbs samplers are different .(why ?)

4. Marginal invariance:

The traditional CRP is marginally invariant : Marginalizing over a particular customer gives the same probability distribution as if that customer were not included in the model at all .But the DDCRP does not have this property ,and this paper gives us two example of the relevant property of DDCRPS.

Language modeling : a fully observed model

Mixture modeling: a mixture model

5. Relationship to dependent Dirichlet processes (DDP):（they are both infinite clustering model that models dependencies between the latent component assignments of the data ）

The first difference is that the dependent Dirichlet process mixture use the truncations of the stick-breaking representation for approximate posterior inference ,in CONTRAST, the ddCRP mixtures are amenable to Gibbs sampling algorithms . Another difference is that the spirit behind them ,in the DDP, data are drawn from distributions that are similar to distributions of nearby data,and the particular values of the nearby data impose softer constraints than those in the ddCRP.（区分ddCRP与贝叶斯非参数模型）