XQuAD XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. 240 paragraphs and 1,190 question-answer pairs CC BY-SA 4.0 DeepMind GitHub
Thai QA Question answering program from Thai Wikipedia. 4,000 question-answer pairs CC BY-SA-NC 3.0 NECTEC Dataset: aiforthai (registration required), wiki: copycatch, Sample data set: copycatch
TyDi QA A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages 200k human-annotated question-answer pairs Apache-2.0 License Google Research GitHub
iapp-wiki-qa-dataset Open Thai Wikipedia QA Dataset made by iApp Technology 1,961 Documents
9,170 Questions
MIT License iApp Technology GitHub
MKQA MKQA: Multilingual Knowledge Questions & Answers. MKQA contains 10,000 queries sampled from the Google Natural Questions dataset. 10,000 queries Apple GitHub
Thai WIKI QA Dataset from National Software Contest (NSC) 2018 - 2019 Factoid 15,000 question-answer pairs, boolean 2,000 question CC BY-SA-NC 3.0 NECTEC Dataset: aiforthai


Zero-shot multilingual QA from DeepPavlov DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. Neural Networks and Deep Learning lab, MIPT GitHub