Text Summarization

Corpus

Name Description Size License Creator Download
ThaiSum The largest dataset for Thai text summarization. 350,000 articles (2.9 GB) MIT Licence Nakhun Chumpolsathien GitHub
TR-TPBS A dataset for Thai text summarization. 310K articles MIT License Nakhun Chumpolsathien GitHub

Pretrained

Model Detail Paper Download
mT5: Multilingual T5 Multilingual T5 (mT5) is a massively multilingual pretrained text-to-text transformer model, trained following a similar recipe as T5. mT5: A massively multilingual pre-trained text-to-text transformer GitHub
BertSum Trained Model from Nakhun Chumpolsathien & Tanachat Arayachutinan Using Knowledge Distillation from Keyword Extraction to Improve the Informativeness of Neural Cross-lingual Summarization GitHub
ARedSum Trained Model from Nakhun Chumpolsathien & Tanachat Arayachutinan Using Knowledge Distillation from Keyword Extraction to Improve the Informativeness of Neural Cross-lingual Summarization GitHub