Position innerhalb des Seitenbaumes

Institut für Maschinelle Sprachverarbeitung
Forschung
Ressourcen
Korpora

Korpora am IMS

Eine Übersicht der am IMS verfügbaren Korpora

Im Folgenden finden Sie eine Übersicht über die am IMS entstandenen Korpora.

Korpora des IMS

Titel	Beschreibung
A Survey and Experiments on Annotated Corpora for Emotion Classification in Text	Aggregated corpus of emotion classification datasets
ANVAN-LS: Lexical Substitution for Evaluating Compositional Distributional Models	ANVAN-LS is a lexical substitution dataset for CDSM evaluation sampled from an English-language corpus with manual “all-words” lexical substitution annotation.
Analysis of emotion communication channels in fan-fiction	A corpus of fan fiction excerpts, annotated with emotion channels and emotion
Appraisal-based Emotion Analysis	Corpora and Models for Appraisal-based Emotion Analysis
Author Regulatory Focus Detection
BASHI	BASHI is a corpus consisting of 50 Wall Street Journal (WSJ) articles which adds bridging anaphors and their antecedents to the other gold annotations that have been created as part of the OntoNotes project. Bridging anaphors are context-dependent expressions that do not refer to the same entity as their antecedent, but to a related entity.
CPM Korpus	Ein mit Gefühlskomponenten annotierter Emotionskorpus
Chess Dataset	This corpus consists of annotated chess games that were posted on chess.com
Clean Corpus of Historical American English (CCOHA)	Bereinigte Version des Corpus of Historical American English (COHA)
CoInCo: Concepts in Context	An English corpus that adds add-words lexical substitution annotation to a sample of the newswire and fiction genres of the freely available MASC corpus
DEmaNet	Korpus DEmaNet
DIRE dataset	Datensatz aus Boleda et al. IWCS 2017
DIRNDL	(D)iskurs-(I)nformations-(R)adio-(N)achrichten-(D)atenbank für (L)inguistische Analysen – basiert auf stündlich gesendeten Radionachrichten
Data and Implementation for "Frowning Frodo, Wincing Leia, and a Seriously Great Friendship: Learning to Classify Emotional Relationships of Fictional Characters"	Data for NAACL 2019 publication of Evgeny Kim and Roman Klinger
Data and Implementation for German Satire Detection with Adversarial Training	Source with documentation
Data for the Intensifiers in the context of emotions	Data for the papers: "Florian Strohm and Roman Klinger. An empirical analysis of the role of amplifiers, downtoners, and negations in emotion classification in microblogs.", and "Laura Ana Maria Bostan and Roman Klinger. Exploring fine-tuned embeddings that model intensifiers for emotion analysis."
DeFaBel: A Corpus of Belief-based Deception	DeFaBel: A Corpus of Belief-based Deception
Determinants of Grader Agreement: An Analysis of Multiple Short Answer Corpora	Determinants of Grader Agreement: An Analysis of Multiple Short Answer Corpora
Europarl Nominal Compound Database	Die Europarl Nominal Compound Datenbank (ENCD) wurde automatisch aus Europarl v7 von OPUS extrahiert. Diese Datenbank enthält englische nominale Verbindungen und deren Äquivalente in bis zu neun Sprachen.
Europarl Nominal Compoundhood Ratings	The Europarl Nominal Compoundhood Ratings (ENCR) is a selection of 394 sentences from the English portion of the Europarl corpus (Europarl v7, OPUS), annotated with 824 candidate compounds.
Event-focused Emotion Corpora for German and English	German and English emotion corpora for emotion classification, annotated with crowdsourcing in the style of the ISEAR resources
GRAIN	The GRAIN corpus -- (G)erman-(RA)dio-(IN)terviews -- based on weekly broadcasted radio interviews We present GRAIN (German RAdio INterviews) as part of the SFB732 Silver Standard Collection.
GRAIN-S	GRAIN-S -- Manually annotated (S)yntax for (G)erman (RA)dio (IN)terviews
GerDraCor-Coref - German Drama Corpus for Coreference	A corpus with coreference annotations for German dramatic texts
GerSti	Ein deutsches Emotionsstimulationskorpus von Nachrichtenschlagzeilen
GoodNewsEveryone	An annotation of the SemEval 2016 Twitter stance and sentiment corpus with emotion labels
Huge German Corpus (HGC)	Das "Huge German Corpus" (HGC) ist eine Sammlung deutschsprachiger Texte (Zeitungsartikel und Rechtstexte), das für die Nutzung mit der IMS Corpus Workbench (CWB) aufbereitet ist.
IMS Citation Corpus	Online appendix to the COLING 2012 paper "Towards a Generic and Flexible Citation Classifier Based on a Faceted Classification Scheme."
IMS GECO Datenbank	Sprachkorpus von spontanen Gesprächen, einschließlich der gegenseitigen sozialen Bewertungen und Persönlichkeitsfaktoren der Teilnehmer
IMSCONV database	Korpus zur Untersuchung von Konvergenz in spontansprachlichen Dialogen
Multi-Modal Emotion Recognition Corpus of Reddit Posts	Multi-Modal Emotion Recognition Corpus of Reddit Posts
Multilingual parallel TED talk dataset	Multilingual parallel TED talk dataset
NLI corpora (Stehwien & Pado 2015)	Daten für das Paper "Generalization in Native Language Identification -- Learners versus Scientists" (Stehwien & Pado CLiC 2015)
Nachrufkorpus	Nachrufe in Abschnitten annotiert
REMAN - Relational Emotion Annotation for Fiction	Relational EMotion ANnotation – a corpus with 1720 fictional text exceprts from the Project Gutenberg
Referential Distributional Semantics: City and Country Datasets	City und Country-Datensätze aus Gupta et al. EMNLP 2015
Resources for Emotion Analysis	A collection of ressources created at IMS related to emotion and sentiment analysis
Ressourcen für biomedizinisches Fact-Checking in Tweets	Korpora für biomedizinisches Fact-Checking und bioNER in Tweets
RiQuA – Rich Quotation Analysis Corpus	A corpus of English literary texts, annotated for quotations including their social structures.
SCARE - The Sentiment Corpus of App Reviews with Fine-grained Annotations in German	Fine-grained annotations for mobile application reviews
SciCorp	Corpus of full-text English scientific papers of genetics and computational linguistics
SdeWaC	SdeWaC basiert auf dem deWaC-Webkorpus der WaCky-Initative. Für SdeWaC wurden Sätze aus deWaC ausgewählt, die von Webseiten der .de-Domain stammen und von einem Parser verarbeitet werden können.
SemEval-2020 Task 1: Deutsche Testdaten	Deutsche Testdaten für SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
SemEval-2020 Task 1: Englische Testdaten	Englische Testdaten für SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
SemEval-2020 Task 1: Testdaten	Testdaten für SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection der Universität Stuttgart
Sentiment Relevance Corpus	This corpus contains 3847 sentences, taken from 125 documents annotated for Sentiment Relevance. The data is a subset of the v2.0 movie polarity dataset (Pang & Lee, 2004).
Sich20	Annotation & Jupyter Notebook for/für Pado & Hole 2020 (Distributional Analysis of Polysemous Function Words)
Span ID Meta Learning	Code and Material for Performance Prediction on Meta Learning
Stance Sentiment Emotion Corpus (SSEC)	An annotation of the SemEval 2016 Twitter stance and sentiment corpus with emotion labels
Stance and Hate/Offensive Speech Detection during the US2020 elections	Corpora and Models for Appraisal-based Emotion Analysis
TIGER Corpus	Der TIGER-Korpus besteht aus ca. 900.000 Token (50.000 Sätze) deutscher Zeitungstexte aus der Frankfurter Rundschau. Der Korpus wurde halbautomatisch mit POS-Tags und mit syntaktischer Struktur versehen. Darüber hinaus enthält er morphologische und lemmatische Informationen für Endknoten.
UNIDECOR	UNIDECOR: A Unified Deception Corpus for Cross-Corpus Deception Detection
USAGE Corpus	This USAGE corpus consists of annotations of Amazon reviews for different product categories in the languages German and English. The reviews themselves are not part of this data publication.
Vergleiche in Produktbewertungen	Sätze aus Kamerabewertungen ergänzt durch Vergleiche
Visual Emotion Corpus	Visual Emotion Corpus
Wind-Of-Change Korpora (WOCC)	Diese Sammlung enthält die Korpora (Lemma-Version) zu den Experimenten in Schlechtweg et. al (2019)

ANVAN-LS: Lexical Substitution for …

ANVAN-LS is a lexical substitution dataset for CDSM evaluation sampled from an English-language …

Author Regulatory Focus Detection

Ereignisbeschreibungsdaten und Twitter-Daten gekennzeichnet mit dem regulatorischen Fokus des Autors.

BASHI

BASHI is a corpus consisting of 50 Wall Street Journal (WSJ) articles which adds bridging anaphors …

Ressourcen für automatisches …

Tweets mit biomedizinischen Behauptungen (BioClaim), Tweets mit annotierten biomedizinischen …

Referential Distributional Semantics: …

City und Country-Datensätze aus Gupta et al. EMNLP 2015

CoInCo: Concepts in Context

An English corpus that adds add-words lexical substitution annotation to a sample of the newswire …

Chess Dataset

This corpus consists of annotated chess games that were posted on chess.com

Clean Corpus of Historical American …

Bereinigte Version des Corpus of Historical American English (COHA)

CPM Korpus: Ein mit Gefühlskomponenten …

Eine Reannotation von Twitter- und Literaturdaten nach Scherers Emotionstheorie

DeFaBel: A Corpus of Belief-based …

A dataset for studying belief, factuality and deception in German argumentative texts.

DEmaNet

Korpus DEmaNet

Determinants of Grader Agreement: An …

Dataset and Jupyter/R Notebook

DIRE Datensatz

Datensatz aus Boleda et al. IWCS 2017

DIRNDL

(D)iskurs-(I)nformations-(R)adio-(N)achrichten-(D)atenbank für (L)inguistische Analysen – basiert …

Event-focused Emotion Corpora for German …

German and English emotion corpora for emotion classification, annotated with crowdsourcing in the …

Resources for Emotion Analysis

A collection of ressources created at IMS related to emotion and sentiment analysis

Appraisal-based Emotion Analysis

Corpora and Models

Analysis of emotion communication …

A corpus of fan fiction excerpts, annotated with emotion channels and emotion

Data for the Intensifiers in the context …

Data for the papers: "Florian Strohm and Roman Klinger. An empirical analysis of the role of …

Europarl Nominal Compound Datenbank

Die Europarl Nominal Compound Datenbank (ENCD) wurde automatisch aus Europarl v7 von OPUS …

Europarl Nominal Compoundhood Ratings

The Europarl Nominal Compoundhood Ratings (ENCR) is a selection of 394 sentences from the English …

Data and Implementation for German …

Source with documentation

GerDraCor-Coref - German Drama Corpus …

Ein Korpus mit Koreferenzannotationen auf deutschen Dramen

GerSti: Ein deutsches …

Eine neue Ressource für die Klassifizierung von Emotionen und die Kennzeichnung von Sequenzen

GoodNewsEveryone

Nachrichtenschlagzeilen mit Emotionsrollen annotiert

GRAIN

The GRAIN corpus -- (G)erman-(RA)dio-(IN)terviews -- based on weekly broadcasted radio interviews

GRAIN-S

Manually annotated (S)yntax for (G)erman (RA)dio (IN)terviews

Hate Speech / Offensive Speech in the US …

Corpus for hate speech detection and stance detection

Huge German Corpus (HGC)

Das "Huge German Corpus" (HGC) ist eine Sammlung deutschsprachiger Texte (Zeitungsartikel und …

IMS Citation Corpus

Online appendix to the COLING 2012 paper "Towards a Generic and Flexible Citation Classifier Based …

IMS GECO Datenbank

Sprachkorpus von spontanen Gesprächen, einschließlich der gegenseitigen sozialen Bewertungen und …

IMSCONV Datenbank

Korpus zur Untersuchung von Konvergenz in spontansprachlichen Dialogen

Multilingual TED Talks

A small corpus of parallel TED talks together with models for topic and gender classification.

MMEmo Corpus

Multi-Modal Emotion Recognition Corpus of Reddit Posts

Todesanzeigenkorpus

Todesanzeigen in Abschnitten annotiert

NLI corpora (Stehwien & Pado 2015)

Daten für das Paper "Generalization in Native Language Identification -- Learners versus Scientists" …

Data and Implementation for "Frowning …

Data for NAACL 2019 publication of Evgeny Kim and Roman Klinger

REMAN - Relational Emotion Annotation …

Relational EMotion ANnotation – a corpus with 1720 fictional text exceprts from the Project Gutenberg

Vergleiche in Produktbewertungen

Sätze aus Kamerabewertungen ergänzt durch Vergleiche

RiQuA – Rich Quotation Analysis Corpus

A corpus of English literary texts, annotated for quotations including their social structures.

SCARE - The Sentiment Corpus of App …

Fine-grained annotations for mobile application reviews

Testdaten für SemEval-2020 Task 1: …

Testdaten für SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection

Deutsche Testdaten für SemEval-2020 Task …

Deutsche Testdaten für SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection

Englische Testdaten für SemEval-2020 …

Englische Testdaten für SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection

SciCorp

Corpus of full-text English scientific papers of genetics and computational linguistics

SdeWaC

SdeWaC basiert auf dem deWaC-Webkorpus der WaCky-Initative. Für SdeWaC wurden Sätze aus deWaC …

Sentiment Relevance Corpus

This corpus contains 3847 sentences, taken from 125 documents annotated for Sentiment Relevance. The …

Pado / Hole (2020): Distributional …

Annotation + Jupyter Notebook

Span ID Meta Learning

Material for Meta Learning for Performance Prediction of Sequence Labeling (span identification) Tasks

Stance Sentiment Emotion Corpus (SSEC)

An annotation of the SemEval 2016 Twitter stance and sentiment corpus with emotion labels

TIGER Korpus

Das TIGER-Korpus besteht aus ca. 900.000 Token (50.000 Sätze) deutscher Zeitungstexte aus der …

A Survey and Experiments on Annotated …

Aggregated corpus of emotion classification datasets

UNIDECOR

A Unified Deception Corpus for Cross-Corpus Deception Detection

USAGE Corpus

This USAGE corpus consists of annotations of Amazon reviews for different product categories in the …

Visual Emotion Corpus

Wind-Of-Change Korpora (WOCC)

Diese Sammlung enthält die Korpora (Lemma-Version) zu den Experimenten in Schlechtweg et. al (2019)

Weitere Informationen
E-Mail schreiben
Allgemeine Kontaktadresse des IMS

E-Mail schreiben
Bei Problemen mit den Webseiten kontaktieren Sie den Webmaster direkt