Treebank

Corpus

Name Description Size License Creator Download
UD Thai PUD This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. 1,000 sentences CC BY-SA 3.0 Universal Dependencies GitHub
Thai Treebanks Dataset (thtb) To enable research oppotunities with very few Thai Computational Linguitic resources, we willingly introduce fundamental high-level language resouces built with passion, Thai Treebanks, build from scratch for researchers and enthusiasts. 5,200 sentences CC BY 4.0 Pechlada Seenual, Thodsaporn Chay-intr and Thanaruk Theeramunkong GitHub
Blackboard Treebank Blackboard Treebank is a Thai dependency corpus based on the LST20 Annotation Guideline. It features dependency structures, constituency structures, word boundaries, named entities, clause boundaries, and sentence boundaries. 122,851 clauses (38,558 sentences) CC BY 3.0 Prachya Boonkwan, NECTEC bitbucket