An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
-
Updated
Jun 12, 2024 - Python
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
[ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)
We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/
KH Coder: for Quantitative Content Analysis or Text Mining
HausaHate is a benchmark dataset for Hausa hate speech detection task. it was extracted from West African Facebook pages and comprises 2,000 comments annotated according to a binary class (offensive and non-offensive) and hate speech targets (race, gender and none).
The SentiAspect-pt comprises 180 product reviews annotated according to implicit and explicit fine-grained opinions, which were hierarchically organized for aspect-based sentiment analysis and opinion summarization applications.
粵文語料篩選器 Cantonese text filter
Thai News Dataset from Thai government website.
EZCAT: an Easy Conversation Annotation Tool
📑 Galician corpus for misogyny detection
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Linguistic search for large annotated text corpora, based on Apache Lucene
A corpus and models for the atuomated legal assessment of clauses in German consumer contracts.
A very simple news crawler with a funny name
BlackLab Frontend, a feature-rich corpus search interface for BlackLab.
Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.
To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."