Skip to content

Pre-trained models/Models

<- back to homepage

Menu

Text summarization

Model Detail Paper Download
mT5: Multilingual T5 Multilingual T5 (mT5) is a massively multilingual pretrained text-to-text transformer model, trained following a similar recipe as T5. mT5: A massively multilingual pre-trained text-to-text transformer GitHub
BertSum Trained Model from Nakhun Chumpolsathien & Tanachat Arayachutinan Using Knowledge Distillation from Keyword Extraction to Improve the Informativeness of Neural Cross-lingual Summarization GitHub
ARedSum Trained Model from Nakhun Chumpolsathien & Tanachat Arayachutinan Using Knowledge Distillation from Keyword Extraction to Improve the Informativeness of Neural Cross-lingual Summarization GitHub

... [WIP]

Word embeddings

Name Detail Download
ConceptNet Numberbatch ConceptNet Numberbatch is a set of semantic vectors (also known as word embeddings) than can be used directly as a representation of word meanings or as a starting point for further machine learning. GitHub
FastText Word vectors The pre-trained word vectors for 157 languages, trained on Common Crawl and Wikipedia using fastText. Website
Thai2Fit (old Thai2Vec) Homepage
Download word2vec: PyThaiNLP
LTW2V: The Large Thai Word2Vec LTW2V is The large Thai Word2Vec. It built with oxidized-thainlp from OSCAR Corpus (Open Super-large Crawled Aggregated coRpus). GitHub

... [WIP]

Language model

Name Detail Owner Download
Thai2Fit ULMFit Language Modeling, Text Feature Extraction and Text Classification in Thai Language. Created as part of pyThaiNLP with ULMFit implementation from fast.ai Charin Polpanumas GitHub
BERT-th BERT pre-training in Thai language ThAIKeras GitHub
BERT-Base, Multilingual Cased 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters Google GitHub
bert-base-th-cased We are sharing smaller versions of bert-base-multilingual-cased that handle a custom number of languages. Geotrend Hugging Face
WangchanBERTa Pretraining transformer-based Thai Language Models AI Research Institute of Thailand (AIResearch) GitHub & Hugging Face

Notebook

Sentence Embedding

Name Detail Paper Owner Download
LASER LASER Language-Agnostic SEntence Representations Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond Facebook GitHub
MUSE Multilingual Universal Sentence Encoderfor Semantic Retrieval Multilingual Universal Sentence Encoder for Semantic Retrieval Google Tensorflow Hub
LaBSE Language-Agnostic BERT Sentence Embedding by Google AI. Language-agnostic BERT Sentence Embedding Google

Question Answer

Name Detail Owner Download
Zero-shot multilingual QA from DeepPavlov DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. Neural Networks and Deep Learning lab, MIPT GitHub
Colab