TALPCo |
TUFS Asian Language Parallel Corpus |
1,327 sent |
CC BY 4.0 |
Nomoto, Hiroki, Kenji Okano, Sunisa Wittayapanyanon and Junta Nomura |
GitHub |
scb-mt-en-th-2020 |
English-Thai Machine Translation Dataset with the collaboration between Vidyasirimedhi Institute of Science and Technology (VISTEC) and Digital Economy Promotion Agency (depa), publishes an open English-Thai machine translation dataset, with the sponsorship from Siam Commercial Bank (SCB) |
1,001,752 segment pairs |
CC BY-SA 4.0 |
AI Research Institute of Thailand (AIResearch) |
GitHub |
Software Documentation Data Set for Machine Translation |
A parallel evaluation data set of SAP software documentation with document structure annotation |
dev: 2048 segment pairs, test: 2050 segment pairs |
CC BY-NC 4.0 |
SAP |
GitHub |
Thai Lao Parallel corpus |
Thai Lao Parallel corpus |
|
CC0-1.0 License |
Wannaphong Phatthiyaphaibun |
GitHub |
Contradictory, My Dear Watson Translated text |
Non-English text converted to English language |
|
|
|
Kaggle |
Asian Language Treebank Parallel Corpus |
This is the Asian Language Treebank (ALT) Parallel Corpus. |
train: 1,698 articles, 18,088 sentences dev: 98 articles, 1,000 sentences test: 97 articles, 1,018 sentences |
CC BY 4.0 |
|
Website |
WikiLingua |
A Multilingual Abstractive Summarization Dataset |
14,770 parallel (for thai) |
CC0-1.0 License |
Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown |
GitHub |
Web Inventory of Transcribed & Translated(WIT) Ted Talks |
The Web Inventory Talk is a collection of the original Ted talks and their translated version. The translations are available in more than 109+ languages, though the distribution is not uniform. |
|
|
|
Hugging Face |
generated_reviews_enth |
generated_reviews_enth is created as part of scb-mt-en-th-2020 for machine translation task. |
|
CC BY-SA 4.0 |
AI Research Institute of Thailand (AIResearch) |
GitHub |
FLORES-101 |
FLORES-101 is a Many-to-Many multilingual translation benchmark dataset for 101 languages. |
|
|
Facebook |
GitHub |
thai_usembassy |
This dataset collect all Thai & English news from U.S. Embassy Bangkok. |
|
CC-0 |
PyThaiNLP |
HuggingFace |