| TALPCo | TUFS Asian Language Parallel Corpus | 1,327 sent | CC BY 4.0 | Nomoto, Hiroki, Kenji Okano, Sunisa Wittayapanyanon and Junta Nomura | GitHub | 
| scb-mt-en-th-2020 | English-Thai Machine Translation Dataset with the collaboration between Vidyasirimedhi Institute of Science and Technology (VISTEC) and Digital Economy Promotion Agency (depa), publishes an open English-Thai machine translation dataset, with the sponsorship from Siam Commercial Bank (SCB) | 1,001,752 segment pairs | CC BY-SA 4.0 | AI Research Institute of Thailand (AIResearch) | GitHub | 
| Software Documentation Data Set for Machine Translation | A parallel evaluation data set of SAP software documentation with document structure annotation | dev: 2048  segment pairs, test: 2050 segment pairs | CC BY-NC 4.0 | SAP | GitHub | 
| Thai Lao Parallel corpus | Thai Lao Parallel corpus |  | CC0-1.0 License | Wannaphong Phatthiyaphaibun | GitHub | 
| Contradictory, My Dear Watson Translated text | Non-English text converted to English language |  |  |  | Kaggle | 
| Asian Language Treebank Parallel Corpus | This is the Asian Language Treebank (ALT) Parallel Corpus. | train: 1,698 articles, 18,088 sentences dev: 98 articles, 1,000 sentences
 test: 97 articles, 1,018 sentences
 | CC BY 4.0 |  | Website | 
| WikiLingua | A Multilingual Abstractive Summarization Dataset | 14,770 parallel (for thai) | CC0-1.0 License | Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown | GitHub | 
| Web Inventory of Transcribed & Translated(WIT) Ted Talks | The Web Inventory Talk is a collection of the original Ted talks and their translated version. The translations are available in more than 109+ languages, though the distribution is not uniform. |  |  |  | Hugging Face | 
| generated_reviews_enth | generated_reviews_enth is created as part of scb-mt-en-th-2020 for machine translation task. |  | CC BY-SA 4.0 | AI Research Institute of Thailand (AIResearch) | GitHub | 
| FLORES-101 | FLORES-101 is a Many-to-Many multilingual translation benchmark dataset for 101 languages. |  |  | Facebook | GitHub | 
| thai_usembassy | This dataset collect all Thai & English news from U.S. Embassy Bangkok. |  | CC-0 | PyThaiNLP | HuggingFace |