Other
Menu
Dictionaries
| Name | Description | Size | License | Creator | Download | 
|---|---|---|---|---|---|
| LEXiTRON | Thai<->English Dictionary | Thai-English 83,000 words | CC BY-SA-NC 3.0 | NECTEC | aiforthai (registration required) | 
| Yaitron | Yaitron English-Thai and Thai-English dictionary based on LEXiTRON created since May 2006. An objective of Yaitron is to built a dictionary that is formatted in well formed XML and easy to be manipulated by machine. | LEXiTRON License | Vee Satayamas | GitHub | |
| Volubilis Dict - Thai-English-French | VOLUBILIS - Thai English French Database | sourceforge | |||
| Ground-truth bilingual dictionaries | 110 large-scale ground-truth bilingual dictionaries | train 5000 word and test 1500 word | Facebook Research | GitHub | |
| Thai Wrong words dataset | Wannaphong Phatthiyaphaibun | GitHub | 
N-gram
| Name | Description | Size | License | Creator | Download | 
|---|---|---|---|---|---|
| Unigram from OSCAR Corpus | Unigram from OSCAR Corpus | Korakot Chaovavanich | |||
| TTC | N-gram from Thai text book | 3,037,772 word | Website | ||
| Thai National Corpus | Thai National Corpus (Unigram, Bi-gram, Ti-gram) | Faculty of Arts, Chulalongkorn University | Website | 
Word Similarity
| Name | Description | Size | License | Creator | Download | 
|---|---|---|---|---|---|
| Word Similarity Datasets for Thai Language | This repo contains translated and re-rated datasets for word similarity for Thai language. | Ponrudee Netisopakul, Gerhard Wohlgenannt, Aleksei Pulich | GitHub | 
Thai Name
| Name | Description | Size | License | Creator | Download | 
|---|---|---|---|---|---|
| Thai Male and Female Names Corpus | The project contains Thai male, female, and family names, aimed for Thai language analysis. | 22,058 Name | CC BY-SA 4.0 | Korkeat W. | GitHub | 
WordNet
| Name | Description | Size | License | Creator | Download | 
|---|---|---|---|---|---|
| Open Multilingual Wordnet | The goal is to make it easy to use wordnets in multiple languages. | 81% | Website | ||
| th-wn-sqlite | Thai wordnet in SQLite | - | Vee Satayamas | sourceforge | |
| ธนนท์ หลีน้อย 2008 | ธนนท์ หลีน้อย | Website | |||
| ปริศนา อัครพุทธิพร Data 2008 | ปริศนา อัครพุทธิพร | Website | 
Word embeddings
| Name | Detail | Download | 
|---|---|---|
| ConceptNet Numberbatch | ConceptNet Numberbatch is a set of semantic vectors (also known as word embeddings) than can be used directly as a representation of word meanings or as a starting point for further machine learning. | GitHub | 
| FastText Word vectors | The pre-trained word vectors for 157 languages, trained on Common Crawl and Wikipedia using fastText. | Website | 
| Thai2Fit (old Thai2Vec) | Homepage Download word2vec: PyThaiNLP | |
| LTW2V: The Large Thai Word2Vec | LTW2V is The large Thai Word2Vec. It built with oxidized-thainlp from OSCAR Corpus (Open Super-large Crawled Aggregated coRpus). | GitHub | 
Sentence Embedding
| Name | Detail | Paper | Owner | Download | 
|---|---|---|---|---|
| LASER | LASER Language-Agnostic SEntence Representations | Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond | GitHub | |
| MUSE | Multilingual Universal Sentence Encoderfor Semantic Retrieval | Multilingual Universal Sentence Encoder for Semantic Retrieval | Tensorflow Hub | |
| LaBSE | Language-Agnostic BERT Sentence Embedding by Google AI. | Language-agnostic BERT Sentence Embedding | 
Glossary
| Name | Detail | Website | 
|---|---|---|
| Thai Glossary | Thai Glossary for Open Source Software by OpenTLE (backup) | Website | 
| Glossary for Open Source Software by OpenTLE | Web archive |