Thai Constitution Corpus |
The Constitution of Thailand Dataset Since 1932 |
|
Public Domain |
Wannaphong Phatthiyaphaibun |
GitHub |
Thai Law |
Thai Law Dataset (Act of Parliament) |
|
Public Domain |
Wannaphong Phatthiyaphaibun |
GitHub |
IO-LM |
Learn how to talk like an Information-Operation-er |
|
|
|
GitHub |
HC corpora |
HC corpora is a collection of corpora for various languages freely available to download. homepage : http://corpora.epizy.com/about.html |
|
|
|
MediaFire |
thai-joke-corpus |
Thai jokes scraped from 4 Thai jokes facebook pages collected by iApp Technology Co, Ltd. |
449 Jokes |
GPL-3.0 License |
iApp Technology Co, Ltd |
GitHub |
Thai Literature Corpora (TLC) |
texts from Vajirayana Digital Library, stored by chapters and stanzas (non-tokenized). |
a total of 34 documents, 292,270 lines, 31,790,734 characters |
|
Jitkapat Sawatphol |
Website |
HSE Thai Corpus |
A 35 Million Word Corpus of Thai |
|
|
|
Kaggle |
ThaiGov corpus |
Data from Thai government website. |
|
public domain |
Wannaphong Phatthiyaphaibun |
GitHub |
ThaiGov V2 Corpus |
Thai News Dataset from Thai government website. |
|
public domain |
Wannaphong Phatthiyaphaibun |
GitHub |
OSCAR Corpus |
OSCAR or Open Super-large Crawled Aggregated coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture. |
951,743,087 words |
public domain |
|
Homepage |
mC4 |
A multilingual colossal, cleaned version of Common Crawl's web crawl corpus. |
|
|
|
Hugging Face |
Multilingual Open Text 1.0: Public Domain News in 44 Languages |
This is a corpus of public domain news in 44 languages. |
|
public domain |
|
GitHub |
Thai depression detection dataset and baseline models |
Detecting Depression in Thai Blog Posts: a Dataset and a Baseline. |
|
|
|
Zenodo |