Lotus |
Thai Speech Recognition corpus from NECTEC (not full corpus) |
12 hours |
CC BY-SA-NC 3.0 |
NECTEC |
aiforthai (registration required) and Mirror from @korakot: GitHub |
Common Voice Corpus |
Common Voice Corpus from mozilla |
171 hours (valid) |
CC0-1.0 License |
mozilla |
Common Voice |
Gowajee corpus |
The corpus was collected in the Automatic Speech Recognition class offered at Chulalongkorn University as a homework assignment. |
11 hours |
MIT License |
Ekapol Chuangsuwanich, Atiwong Suchato, Korrawe Karunratanakul, Burin Naowarat, Chompakorn CChaichot and Penpicha Sangsa-nga |
GitHub |
Lotus BN |
Thai News Speech Recognition corpus from NECTEC (not full corpus) |
28 minute |
CC BY-SA-NC 3.0 |
NECTEC |
Mirror from @korakot: GitHub |
Lotus Cell |
Thai Speech corpus over the phone. (not full corpus) |
11 hours |
CC BY-SA-NC 3.0 |
NECTEC |
Mirror from @korakot: GitHub |
Thai Elderly Speech dataset by Data Wow and VISAI |
Thai Elderly Speech dataset, consisting of 17 hours 11 minutes (19,200 files). The files are divided into 2 categories: Health care (health issues and services) and Smart Home (using Smart Home devices in household contexts). |
17 hours 11 minutes |
CC BY-SA 4.0 |
VISAI AI Company Limited and Data Wow Company Limited |
VISAI AI Company Limited and Data Wow Company Limited |
FLEURS |
Fleurs is the speech version of the FLoRes machine translation benchmark. We use 2009 n-way parallel sentences from the FLoRes dev and devtest publicly available sets, in 102 languages. |
|
CC BY |
Google |
huggingface |
XTREME-S |
The Cross-lingual TRansfer Evaluation of Multilingual Encoders for Speech (XTREME-S) benchmark is a benchmark designed to evaluate speech representations across languages, tasks, domains and data regimes. It covers 102 languages from 10+ language families, 3 different domains and 4 task families: speech recognition, translation, classification and retrieval. |
|
CC BY |
Google |
huggingface |
Thai Dialect Corpus |
Corpus of Central Thai dialect and three other Thai dialects (Khummuang, Korat, and Pattani). |
|
CC BY-SA 4.0 |
Chulalongkorn University |
Github |