Named Entity Recognition

Corpus

Name Description Size License Creator Download
Thai-NNER (Thai Nested Named Entity Recognition Corpus) This work presents the first Thai Nested Named Entity Recognition (N-NER) dataset. Thai N-NER consists of 264,798 mentions, 104 classes, and a maximum depth of 8 layers obtained from news articles and restaurant reviews, a total of 4894 documents. Our work, to the best of our knowledge, presents the largest non-English N-NER dataset and the first non-English one with fine-grained classes. CC-BY-SA 3.0 IST, VISTEC GitHub
นัชชา ถิระสาโรช corpora by Wirote Aroonmanakun's students ? นัชชา ถิระสาโรช นัชชา ถิระสาโรช Data
ศศิวิมล กาลันสีมา corpora by Wirote Aroonmanakun's students ? ศศิวิมล กาลันสีมา ศศิวิมล กาลันสีมา Data
ณัฐดาพร เลิศชีวะ corpora by Wirote Aroonmanakun's students ? ณัฐดาพร เลิศชีวะ ณัฐดาพร เลิศชีวะ Data
Thai NER Thai NER project is part of PyThaiNLP. CC BY 3.0 Wannaphong Phatthiyaphaibun GitHub
THAI-NEST Thai Named Entity tagging Corpus from NECTEC & Thammasat University CC BY-SA-NC 3.0 NECTEC aiforthai (registration required)
WikiANN WikiANN (sometimes called PAN-X) is a multilingual named entity recognition dataset consisting of Wikipedia articles annotated with LOC (location), PER (person), and ORG (organisation) tags in the IOB2 format. Rahimi, Afshin and Li, Yuan and Cohn, Trevor GitHub
Crime Named Entity Recognition NER project with Thai crime news dataset GitHub

Software

Name Description Status Language License
PyThaiNLP It's part of PyThaiNLP. active Python 3.X Apache License 2.0
TLTK Thai Language Toolkit active Python 3.X BSD License (BSD-3-Clause)
Thai-NNER Thai Nested Named Entity Recognition active Python 3.X MIT License