众包中关于DS模型及其扩展设定总结

2019-01-25

最近对众包领域的文章有了新的认识，之前写的太乱了，下面来重新整理一下。

Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing（Arxiv14）

这篇文章对众包中的 Dawid-Skene model 有着非常好的总结和概括。我个人认为这篇文章挂在 Arxiv 上，最终没发表出来的原因，在于其提出的加权投票法中的两种权重设置（$w = 2p-1$, $w= log(frac{p}{1-p})$ ），已经被前人研究出来过，并且它们都有更加简单和浅显的证明和分析，在这篇文章中用更加复杂的理论重新发明了“轮子“，有些可惜。下面来注记一下其提供的 DS model 的总结：

General DS model：最初的DS model 是针对的 “多分类问题”，设有 L 个类别，每个工人有一个大小为 $L imes L$ 的 confusion matrix。每个工人由 $L^2$个参数决定。

其有两种特殊情形：

class-conditional DS model: 这里假定工人错误选择任何不正确的类别标记的概率都相同。即假定 confusion matrix 同一行的非对角元都相等。每个工人（每个矩阵）只需其对角线的 L 个参数刻画。
Homogenous DS model（one coin model）：不仅假定confusion matrix 的同一行的对角元相等，还假定矩阵的对角元相同。每个工人（每个矩阵）只需一个参数刻画

当类别数 L = 2 时， General DS model 与 Class-conditional DS model 是相同的，通常称为 two-coin model。（每个工人只需两个参数刻画）

在信号处理中，one-coin model 通常也被称为 random classification noise model.

另外众包中 DS model 还有两种模式的扩展:

TrueLabel + confusions: A spectrum of probabilistic models in analyzing multiple ratings (ICML12)

主要内容： This paper generalizes the well-known D-S model to a spectrum of probabilistic models under the same " TrueLabel + Confusion " paradigm.The original D-S model has a large number of parameters---each worker has her own confusion matrix, which may lead to overfitting. So it proposes a model called SingleConfusion --- all workers share the same confusion matrix. But SingleConfusion is too rigid for real-world data and it may result to underfitting. As a tradeoff of the two model, the paper further proposes a hierarchical Bayesian model called HybridConfusion whith allows each worker to have her own confusion matrix, but at the same time regularizes these matrices through Bayesian shrinkage.

注：这是一篇非常有意思的工作! 作者claim 原始的 D-S model 中混淆矩阵参数过多，导致模型过于复杂，易于过拟合，作者在这篇文章中考虑了减少混淆矩阵中的参数个数: 多个工人在某种程度上共用一个混淆矩阵。这篇工作和下面的 arixiv 上的这一篇 Generative model for learning from crowds 恰好是往两个不同方向走，可以放在一起看。

Generative Models for learning from crowds (Arxiv17)

主要内容： Different from the classical D-S model, the paper sets a confusion matrix for each worker-difficulty level pair. The confusion matrices is not only correlate to workers but also correlate to item difficulty levels. It defines a generative probabilistic model called IDBLA that considers item difficulty in label aggregation and use Gibbs Sampling and a novel varitional inference algorithm to perform the posterior inference.

注：在上面提到的所有的 origal D-S model 都是每个工人有一个混淆矩阵，假定了对同一个工人，所有任务是相同难度的. 而这篇 paper 在DS model 中考虑了任务具有不同的难度，每个 "工人 - 任务难度级别" 对应到一个混淆矩阵，相当于扩大了 D-S model 中的参数数量。

##### 以下为之前的内容 ######

1.只能处理同质任务(假定所有任务难度相同)

没有刻画任务难度，但刻画了工人对不同类别的偏好，例如，当图片中有猫（真实标记为1）时，工人能以高概率给出 1 ；但当图片中没有猫（真实标记为0）时，工人可能误把某些东西看做是猫，从而给出错误标记 1。即当真实类别变化时，工人的精度可能会变化，有 FP 和 FN 不等。

最初的DS. 原文（1979年）工人的质量是有一个隐混淆矩阵确定的，此矩阵定义了在确定了正确的标记的条件下工人回复任一可能标记的概率。工人 k 在 j 为正确标记下回复 l 的概率。

一元模型：工人有一个质量参数 p，p 即为工人目前的准确率。

二元模型： sensitivity 和 specificity 分别对应于工人对正负类的准确率

来自 Learning from crowd
Variational Inference for crowdsourcing 用了一元model，方法是建立工人能力q 与true label z 的联合概率分布。也提到了此二元model

2.处理异质任务(不同任务对同一工人难度不同)

generalization of the D-S model
主要看论文 Achieving Budget-optimality with Adaptive Schemes 此文说是参考了周登勇的论文 Regularized minimax ，很是奇怪.

p quality of worker (刻画工人是否会如实表达自己的观点) 0 表达自己反向的观点 1/2 随机表达 1 如实表达
q difficulty of task (刻画工人认为task 的标记是 1的概率)

the response confuse matrix A_{i,j}=1 with probability q_i*p_j+(1-q_i)*(1-p_j)

定义: task 的 ground truth 为（假想的）全体工人的多数投票

还有论文【9】【18】【22】【15】
whose vote should count more
The multidimensional wisdom of crowds
Regularized minimax
A permutation-based model for crowd labeling

另外，Making better use of .... 介绍了两到三篇论文全面处理了 D_S 模型及其衍生的各种模型