Other
Menu
Dictionaries
Name | Description | Size | License | Creator | Download |
---|---|---|---|---|---|
LEXiTRON | Thai<->English Dictionary | Thai-English 83,000 words | CC BY-SA-NC 3.0 | NECTEC | aiforthai (registration required) |
Yaitron | Yaitron English-Thai and Thai-English dictionary based on LEXiTRON created since May 2006. An objective of Yaitron is to built a dictionary that is formatted in well formed XML and easy to be manipulated by machine. | LEXiTRON License | Vee Satayamas | GitHub | |
Volubilis Dict - Thai-English-French | VOLUBILIS - Thai English French Database | sourceforge | |||
Ground-truth bilingual dictionaries | 110 large-scale ground-truth bilingual dictionaries | train 5000 word and test 1500 word | Facebook Research | GitHub | |
Thai Wrong words dataset | Wannaphong Phatthiyaphaibun | GitHub |
N-gram
Name | Description | Size | License | Creator | Download |
---|---|---|---|---|---|
Unigram from OSCAR Corpus | Unigram from OSCAR Corpus | Korakot Chaovavanich | |||
TTC | N-gram from Thai text book | 3,037,772 word | Website | ||
Thai National Corpus | Thai National Corpus (Unigram, Bi-gram, Ti-gram) | Faculty of Arts, Chulalongkorn University | Website |
Word Similarity
Name | Description | Size | License | Creator | Download |
---|---|---|---|---|---|
Word Similarity Datasets for Thai Language | This repo contains translated and re-rated datasets for word similarity for Thai language. | Ponrudee Netisopakul, Gerhard Wohlgenannt, Aleksei Pulich | GitHub |
Thai Name
Name | Description | Size | License | Creator | Download |
---|---|---|---|---|---|
Thai Male and Female Names Corpus | The project contains Thai male, female, and family names, aimed for Thai language analysis. | 22,058 Name | CC BY-SA 4.0 | Korkeat W. | GitHub |
WordNet
Name | Description | Size | License | Creator | Download |
---|---|---|---|---|---|
Open Multilingual Wordnet | The goal is to make it easy to use wordnets in multiple languages. | 81% | Website | ||
th-wn-sqlite | Thai wordnet in SQLite | - | Vee Satayamas | sourceforge | |
ธนนท์ หลีน้อย 2008 | ธนนท์ หลีน้อย | Website | |||
ปริศนา อัครพุทธิพร Data 2008 | ปริศนา อัครพุทธิพร | Website |
Word embeddings
Name | Detail | Download |
---|---|---|
ConceptNet Numberbatch | ConceptNet Numberbatch is a set of semantic vectors (also known as word embeddings) than can be used directly as a representation of word meanings or as a starting point for further machine learning. | GitHub |
FastText Word vectors | The pre-trained word vectors for 157 languages, trained on Common Crawl and Wikipedia using fastText. | Website |
Thai2Fit (old Thai2Vec) | Homepage Download word2vec: PyThaiNLP |
|
LTW2V: The Large Thai Word2Vec | LTW2V is The large Thai Word2Vec. It built with oxidized-thainlp from OSCAR Corpus (Open Super-large Crawled Aggregated coRpus). | GitHub |
Sentence Embedding
Name | Detail | Paper | Owner | Download |
---|---|---|---|---|
LASER | LASER Language-Agnostic SEntence Representations | Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond | GitHub | |
MUSE | Multilingual Universal Sentence Encoderfor Semantic Retrieval | Multilingual Universal Sentence Encoder for Semantic Retrieval | Tensorflow Hub | |
LaBSE | Language-Agnostic BERT Sentence Embedding by Google AI. | Language-agnostic BERT Sentence Embedding |
Glossary
Name | Detail | Website |
---|---|---|
Thai Glossary | Thai Glossary for Open Source Software by OpenTLE (backup) | Website |
Glossary for Open Source Software by OpenTLE | Web archive |