Natural Language Understanding
Corpus
Name | Description | Size | License | Creator | Download |
---|---|---|---|---|---|
MASSIVE | MASSIVE is a parallel dataset of > 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions. | ~1M utterances , 51 languages | CC BY 4.0 | Amazon | GitHub |
Thai Winograd | A collection of Winograd Schemas in the Thai language. These schemas are adapted from the original set of English Winograd Schemas proposed by Levesque et al., which was based on Ernest Davis's collection. A Winograd schema is a pair of sentences that differ by only a word or two. They include ambiguities that are resolved differently in each sentence and require world knowledge and reasoning to understand. This concept is named after Terry Winograd, who provided a well-known example. | 285 questions | CC BY 4.0 | Phakphum Artkaew | Hugging Face |