Im Folgenden finden Sie eine Übersicht über die am IMS entstandenen Korpora.
Korpora des IMS
Titel | Beschreibung |
A Survey and Experiments on Annotated Corpora for Emotion Classification in Text | Aggregated corpus of emotion classification datasets |
ANVAN-LS: Lexical Substitution for Evaluating Compositional Distributional Models | ANVAN-LS is a lexical substitution dataset for CDSM evaluation sampled from an English-language corpus with manual “all-words” lexical substitution annotation. |
Analysis of emotion communication channels in fan-fiction | A corpus of fan fiction excerpts, annotated with emotion channels and emotion |
Appraisal-based Emotion Analysis | Corpora and Models for Appraisal-based Emotion Analysis |
Author Regulatory Focus Detection | |
BASHI | BASHI is a corpus consisting of 50 Wall Street Journal (WSJ) articles which adds bridging anaphors and their antecedents to the other gold annotations that have been created as part of the OntoNotes project. Bridging anaphors are context-dependent expressions that do not refer to the same entity as their antecedent, but to a related entity. |
CPM Korpus | Ein mit Gefühlskomponenten annotierter Emotionskorpus |
Chess Dataset | This corpus consists of annotated chess games that were posted on |
Clean Corpus of Historical American English (CCOHA) | Bereinigte Version des Corpus of Historical American English (COHA) |
CoInCo: Concepts in Context | An English corpus that adds add-words lexical substitution annotation to a sample of the newswire and fiction genres of the freely available MASC corpus |
DEmaNet | Korpus DEmaNet |
DIRE dataset | Datensatz aus Boleda et al. IWCS 2017 |
DIRNDL | (D)iskurs-(I)nformations-(R)adio-(N)achrichten-(D)atenbank für (L)inguistische Analysen – basiert auf stündlich gesendeten Radionachrichten |
Data and Implementation for "Frowning Frodo, Wincing Leia, and a Seriously Great Friendship: Learning to Classify Emotional Relationships of Fictional Characters" | Data for NAACL 2019 publication of Evgeny Kim and Roman Klinger |
Data and Implementation for German Satire Detection with Adversarial Training | Source with documentation |
Data for the Intensifiers in the context of emotions | Data for the papers: "Florian Strohm and Roman Klinger. An empirical analysis of the role of amplifiers, downtoners, and negations in emotion classification in microblogs.", and "Laura Ana Maria Bostan and Roman Klinger. Exploring fine-tuned embeddings that model intensifiers for emotion analysis." |
DeFaBel: A Corpus of Belief-based Deception | DeFaBel: A Corpus of Belief-based Deception |
Determinants of Grader Agreement: An Analysis of Multiple Short Answer Corpora | Determinants of Grader Agreement: An Analysis of Multiple Short Answer Corpora |
Europarl Nominal Compound Database | Die Europarl Nominal Compound Datenbank (ENCD) wurde automatisch aus Europarl v7 von OPUS extrahiert. Diese Datenbank enthält englische nominale Verbindungen und deren Äquivalente in bis zu neun Sprachen. |
Europarl Nominal Compoundhood Ratings | The Europarl Nominal Compoundhood Ratings (ENCR) is a selection of 394 sentences from the English portion of the Europarl corpus (Europarl v7, OPUS), annotated with 824 candidate compounds. |
Event-focused Emotion Corpora for German and English | German and English emotion corpora for emotion classification, annotated with crowdsourcing in the style of the ISEAR resources |
GRAIN | The GRAIN corpus -- (G)erman-(RA)dio-(IN)terviews -- based on weekly broadcasted radio interviews We present GRAIN (German RAdio INterviews) as part of the SFB732 Silver Standard Collection. |
GRAIN-S | GRAIN-S -- Manually annotated (S)yntax for (G)erman (RA)dio (IN)terviews |
GerDraCor-Coref - German Drama Corpus for Coreference | A corpus with coreference annotations for German dramatic texts |
GerSti | Ein deutsches Emotionsstimulationskorpus von Nachrichtenschlagzeilen |
GoodNewsEveryone | An annotation of the SemEval 2016 Twitter stance and sentiment corpus with emotion labels |
Huge German Corpus (HGC) | Das "Huge German Corpus" (HGC) ist eine Sammlung deutschsprachiger Texte (Zeitungsartikel und Rechtstexte), das für die Nutzung mit der IMS Corpus Workbench (CWB) aufbereitet ist. |
IMS Citation Corpus | Online appendix to the COLING 2012 paper "Towards a Generic and Flexible Citation Classifier Based on a Faceted Classification Scheme." |
IMS GECO Datenbank | Sprachkorpus von spontanen Gesprächen, einschließlich der gegenseitigen sozialen Bewertungen und Persönlichkeitsfaktoren der Teilnehmer |
IMSCONV database | Korpus zur Untersuchung von Konvergenz in spontansprachlichen Dialogen |
Multi-Modal Emotion Recognition Corpus of Reddit Posts | Multi-Modal Emotion Recognition Corpus of Reddit Posts |
Multilingual parallel TED talk dataset | Multilingual parallel TED talk dataset |
NLI corpora (Stehwien & Pado 2015) | Daten für das Paper "Generalization in Native Language Identification -- Learners versus Scientists" (Stehwien & Pado CLiC 2015) |
Nachrufkorpus | Nachrufe in Abschnitten annotiert |
REMAN - Relational Emotion Annotation for Fiction | Relational EMotion ANnotation – a corpus with 1720 fictional text exceprts from the Project Gutenberg |
Referential Distributional Semantics: City and Country Datasets | City und Country-Datensätze aus Gupta et al. EMNLP 2015 |
Resources for Emotion Analysis | A collection of ressources created at IMS related to emotion and sentiment analysis |
Ressourcen für biomedizinisches Fact-Checking in Tweets | Korpora für biomedizinisches Fact-Checking und bioNER in Tweets |
RiQuA – Rich Quotation Analysis Corpus | A corpus of English literary texts, annotated for quotations including their social structures. |
SCARE - The Sentiment Corpus of App Reviews with Fine-grained Annotations in German | Fine-grained annotations for mobile application reviews |
SciCorp | Corpus of full-text English scientific papers of genetics and computational linguistics |
SdeWaC | SdeWaC basiert auf dem deWaC-Webkorpus der WaCky-Initative. Für SdeWaC wurden Sätze aus deWaC ausgewählt, die von Webseiten der .de-Domain stammen und von einem Parser verarbeitet werden können. |
SemEval-2020 Task 1: Deutsche Testdaten | Deutsche Testdaten für SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection |
SemEval-2020 Task 1: Englische Testdaten | Englische Testdaten für SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection |
SemEval-2020 Task 1: Testdaten | Testdaten für SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection der Universität Stuttgart |
Sentiment Relevance Corpus | This corpus contains 3847 sentences, taken from 125 documents annotated for Sentiment Relevance. The data is a subset of the v2.0 movie polarity dataset (Pang & Lee, 2004). |
Sich20 | Annotation & Jupyter Notebook for/für Pado & Hole 2020 (Distributional Analysis of Polysemous Function Words) |
Span ID Meta Learning | Code and Material for Performance Prediction on Meta Learning |
Stance Sentiment Emotion Corpus (SSEC) | An annotation of the SemEval 2016 Twitter stance and sentiment corpus with emotion labels |
Stance and Hate/Offensive Speech Detection during the US2020 elections | Corpora and Models for Appraisal-based Emotion Analysis |
TIGER Corpus | Der TIGER-Korpus besteht aus ca. 900.000 Token (50.000 Sätze) deutscher Zeitungstexte aus der Frankfurter Rundschau. Der Korpus wurde halbautomatisch mit POS-Tags und mit syntaktischer Struktur versehen. Darüber hinaus enthält er morphologische und lemmatische Informationen für Endknoten. |
UNIDECOR | UNIDECOR: A Unified Deception Corpus for Cross-Corpus Deception Detection |
USAGE Corpus | This USAGE corpus consists of annotations of Amazon reviews for different product categories in the languages German and English. The reviews themselves are not part of this data publication. |
Vergleiche in Produktbewertungen | Sätze aus Kamerabewertungen ergänzt durch Vergleiche |
Visual Emotion Corpus | Visual Emotion Corpus |
Wind-Of-Change Korpora (WOCC) | Diese Sammlung enthält die Korpora (Lemma-Version) zu den Experimenten in Schlechtweg et. al (2019) |
