[论文笔记] Collaborative workflow for crowdsourcing translation (CSCW, 2012)

Time: 1.5 hours
Timespan: Apr 15 – Apr 16 , 2012
Vamshi Ambati, Stephan Vogel, and Jaime Carbonell. 2012. Collaborative workflow for crowdsourcing translation. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (CSCW '12). ACM, New York, NY, USA, 1191-1194.

本文作者Vamshi Ambati (Linkedin, publication)是CMU的博士生(since 2007)，研究方向为“Active Learning and Crowd-Sourcing techniques for building low-resource Machine Translation systems”, 毕业论文题目为“Active Learning for Machine Translation in Scarce Data Scenarios”。

以下是论文笔记：

1. 本文提出了一种基于众包的翻译流程，分为三个步骤，依次为：word translation, assisted sentence translation and translation synthesis. 前一个步骤的输出均做为后一个步骤的输入。

步骤名称	任务	参与人员（最低要求）	说明
word translation	翻译句子中出现的单词（根据语境）	bilingual	成本低, 可重复进行；自动化验证不难
assisted sentence translation	翻译整句	weak bilingual	对同一个句子可能获得多份译文
translation synthesis	合成译文、修改拼写语法错误等	monolingual

2. 文中提到了以下几种”Challenges in crowdsourcing translation”

large lable space: 输出空间太大，难以评估翻译的质量。
availability: 双语用户数量较少。
low quality: 参与者中多为非专业人士。
cost: 冗余会增加成本。

3. 本文的实验验证也是基于mTurk，进行了对比实验，对比的基准是传统的众包翻译流程。
参考了相关比较方法：

“fuzzy matching algorithm for comparing two sentences and computing majority agreement”: V. Ambati, S. Vogel, and J. Carbonell. Active learning and crowd-sourcing for machine translation. In Proceedings of the LREC 2010, Malta, May 2010.
“automatic translation evaluation metric”: K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL 2002, pages 311–318, Morristown, NJ, USA,2002.