Speech Recognition

Corpus

Name Description Size License Creator Download
Lotus Thai Speech Recognition corpus from NECTEC (not full corpus) 12 hours CC BY-SA-NC 3.0 NECTEC aiforthai (registration required) and Mirror from @korakot: GitHub
Common Voice Corpus Common Voice Corpus from mozilla 133 hours (valid) CC0-1.0 License mozilla Common Voice
Gowajee corpus The corpus was collected in the Automatic Speech Recognition class offered at Chulalongkorn University as a homework assignment. 11 hours MIT License Ekapol Chuangsuwanich, Atiwong Suchato, Korrawe Karunratanakul, Burin Naowarat, Chompakorn CChaichot and Penpicha Sangsa-nga GitHub
Lotus BN Thai News Speech Recognition corpus from NECTEC (not full corpus) 28 minute CC BY-SA-NC 3.0 NECTEC Mirror from @korakot: GitHub
Lotus Cell Thai Speech corpus over the phone. (not full corpus) 11 hours CC BY-SA-NC 3.0 NECTEC Mirror from @korakot: GitHub

Software

Name Description Status Language License
PyThaiASR PyThaiASR is a Python package for Automatic Speech Recognition with focus on Thai language. It have offline thai automatic speech recognition model from Artificial Intelligence Research Institute of Thailand (AIResearch.in.th). active Python 3.X Apache License 2.0

Preatrained

Name Detail Owner Download
wav2vec2-large-xlsr-53-th` Finetuning wav2vec2-large-xlsr-53 on Thai Common Voice 7.0 Artificial Intelligence Research Institute of Thailand (AIResearch.in.th) Hugging Face