Dependency Parser
Corpus
Name | Description | Size | License | Creator | Download |
---|---|---|---|---|---|
UD Thai PUD | This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. | 1,000 sentences | CC BY-SA 3.0 | Universal Dependencies | GitHub |
Blackboard Treebank | Blackboard Treebank is a Thai dependency corpus based on the LST20 Annotation Guideline. It features dependency structures, constituency structures, word boundaries, named entities, clause boundaries, and sentence boundaries. | 122,851 clauses (38,558 sentences) | CC BY 3.0 | Prachya Boonkwan, NECTEC | bitbucket or GitHub |
Thai Discourse Treebank | The Thai Discourse Treebank (TDTB) is a project at Chulalongkorn University, Bangkok, Thailand. The annotation adopts the sense inventory from PDTB 3.0. | 180 documents | - | Chulalongkorn University | GitHub |
Software
Name | Description | Status | Language | License |
---|---|---|---|---|
spaCy-Thai | Tokenizer, POS-tagger, and dependency-parser for Thai language, working on Universal Dependencies. | active | Python 3.X | MIT license |
esupar | Tokenizer, POS-tagger, and dependency-parser with Transformers and SuPar. | active | Python 3.X | MIT license |
TowerParse | TowerParse is a Python tool for multilingual dependency parsing, built on top of the HuggingFace Transformers library. Unlike other multilingual dependency parsers (e.g., UDify , UDapter), TowerParse offers a language-dedicated parsing model for each language (actually, for each test UD treebank, i.e., for languages with multiple treebanks, we offer multiple parsing models). | ? | Python 3.X | CC0-1.0 license |