Text Summarization

Corpus

Name Description Size License Creator Download
ThaiSum The largest dataset for Thai text summarization. 350,000 articles (2.9 GB) MIT Licence Nakhun Chumpolsathien GitHub
TR-TPBS A dataset for Thai text summarization. 310K articles MIT License Nakhun Chumpolsathien GitHub
XL-Sum This dataset annotated article-summary pairs from BBC News and covers 45 languages ranging from low to high-resource. 8,268 (for thai) CC BY-NC-SA 4.0 GitHub

Pretrained

Model Detail Paper Download
mT5: Multilingual T5 Multilingual T5 (mT5) is a massively multilingual pretrained text-to-text transformer model, trained following a similar recipe as T5. mT5: A massively multilingual pre-trained text-to-text transformer GitHub
BertSum Trained Model from Nakhun Chumpolsathien & Tanachat Arayachutinan Using Knowledge Distillation from Keyword Extraction to Improve the Informativeness of Neural Cross-lingual Summarization GitHub
ARedSum Trained Model from Nakhun Chumpolsathien & Tanachat Arayachutinan Using Knowledge Distillation from Keyword Extraction to Improve the Informativeness of Neural Cross-lingual Summarization GitHub