Question Answering
Corpus
Name | Description | Size | License | Creator | Download |
---|---|---|---|---|---|
XQuAD | XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. | 240 paragraphs and 1,190 question-answer pairs | CC BY-SA 4.0 | DeepMind | GitHub |
Thai QA | Question answering program from Thai Wikipedia. | 4,000 question-answer pairs | CC BY-SA-NC 3.0 | NECTEC | Dataset: aiforthai (registration required), wiki: copycatch, Sample data set: copycatch |
TyDi QA | A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages | 200k human-annotated question-answer pairs | Apache-2.0 License | Google Research | GitHub |
iapp-wiki-qa-dataset | Open Thai Wikipedia QA Dataset made by iApp Technology | 1,961 Documents 9,170 Questions |
MIT License | iApp Technology | GitHub |
MKQA | MKQA: Multilingual Knowledge Questions & Answers. MKQA contains 10,000 queries sampled from the Google Natural Questions dataset. | 10,000 queries | Apple | GitHub | |
Thai WIKI QA | Dataset from National Software Contest (NSC) 2018 - 2019 | Factoid 15,000 question-answer pairs, boolean 2,000 question | CC BY-SA-NC 3.0 | NECTEC | Dataset: aiforthai |
Software
Name | Detail | Owner | Download |
---|---|---|---|
Zero-shot multilingual QA from DeepPavlov | DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. | Neural Networks and Deep Learning lab, MIPT | GitHub Colab |