深度知识追踪论文阅读——L@S 2017 Deep Knowledge Tracing On Programming Exercises

L@S 2017 Deep Knowledge Tracing On Programming Exercises (斯坦福)

本文主要目的：将embedded program submission喂入循环神经网络（LSTM），训练模型并预测学生是否能通过后续的编程训练

Task definition：

Based on a student’s sequence of code submission attempts over time (hereby, their "trajectory") on a programming exercise, predict whether the student will successfully complete the next programming exercise within the same course.

Dataset:

code research
The Hour of Code course
Exercise 18

This Exercise 18 data set contains 1,263,360
code submissions, of which 79,553 are unique, made by 263,569 students. 81 . 0% of these students arrived at the correct solution in their last submission

单个练习的学生的学习轨迹：
每次代码提交都被表达成一次抽象语法树
对轨迹长度进行控制，轨迹长度不同的分开训练，分为2-10这样的九个数据集长度分别进行训练

Model

用循环神经网络（LSTM）处理学生的学习轨迹

假设学生的轨迹包含k次提交，这些被转换成程序嵌入，形成k个嵌入序列。将这k个嵌入喂入rnn，其最终隐藏状态通过一个完全连接的层和随后的softmax层传递。softmax的输出y就是一二分类答案，反映学生是否能成功解决下一个问题。
用递归神经网络（Recursive Neural Network处程序嵌入（program embedding））

训练了一个递归神经网络，使我们能够对学生程序的AST表示进行矢量化。
思路是参考前人的文献【9】，实现是利用文献【11】
作者自己设置了一Baseline Model
pathSocre（T）=提交次数倒数之和
训练了一简单的logistic regression model

Results