Speech Recognition

Automatic Speech Recognition

Corpus

Name Description Size License Creator Download
Lotus Thai Speech Recognition corpus from NECTEC (not full corpus) 12 hours CC BY-SA-NC 3.0 NECTEC aiforthai (registration required) and Mirror from @korakot: GitHub
Common Voice Corpus Common Voice Corpus from mozilla 171 hours (valid) CC0-1.0 License mozilla Common Voice
Gowajee corpus The corpus was collected in the Automatic Speech Recognition class offered at Chulalongkorn University as a homework assignment. 11 hours MIT License Ekapol Chuangsuwanich, Atiwong Suchato, Korrawe Karunratanakul, Burin Naowarat, Chompakorn CChaichot and Penpicha Sangsa-nga GitHub
Lotus BN Thai News Speech Recognition corpus from NECTEC (not full corpus) 28 minute CC BY-SA-NC 3.0 NECTEC Mirror from @korakot: GitHub
Lotus Cell Thai Speech corpus over the phone. (not full corpus) 11 hours CC BY-SA-NC 3.0 NECTEC Mirror from @korakot: GitHub
Thai Elderly Speech dataset by Data Wow and VISAI Thai Elderly Speech dataset, consisting of 17 hours 11 minutes (19,200 files). The files are divided into 2 categories: Health care (health issues and services) and Smart Home (using Smart Home devices in household contexts). 17 hours 11 minutes CC BY-SA 4.0 VISAI AI Company Limited and Data Wow Company Limited VISAI AI Company Limited and Data Wow Company Limited
FLEURS Fleurs is the speech version of the FLoRes machine translation benchmark. We use 2009 n-way parallel sentences from the FLoRes dev and devtest publicly available sets, in 102 languages. CC BY Google huggingface
XTREME-S The Cross-lingual TRansfer Evaluation of Multilingual Encoders for Speech (XTREME-S) benchmark is a benchmark designed to evaluate speech representations across languages, tasks, domains and data regimes. It covers 102 languages from 10+ language families, 3 different domains and 4 task families: speech recognition, translation, classification and retrieval. CC BY Google huggingface

Software

Name Description Status Language License
PyThaiASR PyThaiASR is a Python package for Automatic Speech Recognition with focus on Thai language. It have offline thai automatic speech recognition model from Artificial Intelligence Research Institute of Thailand (AIResearch.in.th). active Python 3.X Apache License 2.0

Preatrained

Name Detail Owner Download
wav2vec2-large-xlsr-53-th` Finetuning wav2vec2-large-xlsr-53 on Thai Common Voice 7.0 Artificial Intelligence Research Institute of Thailand (AIResearch.in.th) Hugging Face
Thai Wav2Vec2 with CommonVoice V8 (newmm tokenizer) + language model This model trained with CommonVoice V8 dataset by increase data from CommonVoice V7 dataset that It was use in airesearch/wav2vec2-large-xlsr-53-th. It was finetune wav2vec2-large-xlsr-53. Wannaphong Phatthiyaphaibun Hugging Face
Thai Wav2Vec2 with CommonVoice V8 (deepcut tokenizer) + language model This model trained with CommonVoice V8 dataset by increase data from CommonVoice V7 dataset that It was use in airesearch/wav2vec2-large-xlsr-53-th. It was finetune wav2vec2-large-xlsr-53. Wannaphong Phatthiyaphaibun Hugging Face
Whisper Whisper is a general-purpose speech recognition model. (include S2T X->English) OpenAI GitHub

Language Identification

Corpus

Name Description Size License Creator Download
VoxLingua107 VoxLingua107 is a speech dataset for training spoken language identification models and contains data for 107 languages. (including Thai!!!) 61 hours, 5.8G (Thai) CC-BY 4.0 License Jörgen Valk, Tanel Alumäe. Website