学习笔记（3）- BioASQ

本次目的是验证BioBERT在QA的效果。

A challenge on large-scale biomedical semantic indexing and question answering
http://bioasq.org/
http://participants-area.bioasq.org/Tasks/

Tsatsaronis, G., Balikas, G., Malakasiotis, P., Partalas, I., Zschunke, M., Alvers, M. R., Weissenborn, D., Krithara, A., Petridis, S., Polychronopoulos, D., Almirantis, Y., Pavlopoulos, J., Baskiotis, N., Gallinari, P., Artiéres, T., Ngomo, A. C. N., Heino, N., Gaussier, E., Barrio-Alvers, L., … Paliouras, G. (2015). An overview of the BioASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics, 16(1), 1–28. https://doi.org/10.1186/s12859-015-0564-6

模型：biobert_v1.0_pubmed_pmc
训练数据：QA.zip，是预处理的BioASQ-4/5/6b数据集，韩国大学团队提供
测试数据：BioASQ-TaskB-testData.zip，比赛官方提供

注意需要换文件名称

python run_qa.py      --do_train=True      --do_predict=True      --vocab_file=$BIOBERT_DIR/vocab.txt      --bert_config_file=$BIOBERT_DIR/bert_config.json      --init_checkpoint=$BIOBERT_DIR/biobert_model.ckpt      --max_seq_length=384      --train_batch_size=12      --learning_rate=5e-6      --doc_stride=128      --num_train_epochs=5.0      --do_lower_case=False      --train_file=$BIOASQ_DIR/BioASQ-train-4b.json      --predict_file=$BIOASQ_DIR/BioASQ-test-4b-1.json      --output_dir=QA_output/