Institute

Studying

Research


 

DIRNDL

Type Corpus
Title DIRNDL

Description

DIRNDLDIRNDL -- (D)iscourse (I)nformation (R)adio (N)ews (D)atabase for (L)inguistic Analysis -- is a corpus resource based on hourly broadcast German radio news. The textual version of the news is annotated with syntactic information. On top of this, the syntactic phrases are labeled with information status categories (given-new information). The speech version is prosodically annotated, i.e. with pitch accents and prosodic phrase boundaries. As the textual and the speech version slightly deviate from each other due to slips of the tongue, fillers and minor modifications, a (semi-automatic) linking of the two versions was carried out and the results were stored inside the database. With the help of these newly established links, all annotation layers can be accessed for exploring the relations between prosody, syntax and information status. The corpus contains several repetitions of the same news items, which are read with slight changes in their prosody on each occasion.


Reference

Eckart, Kerstin; Riester, Arndt; Schweitzer, Katrin (2012) A Discourse Information Radio News Database for Linguistic Analysis. In Christian Chiarcos; Sebastian Nordhoff and Sebastian Hellmann, editors, Linked Data in Linguistics. Representing and Connecting Language Data and Language Metadata pp. 65-75 Springer, Heidelberg. pdf

Björkelund, Anders; Eckart, Kerstin; Riester, Arndt; Schauffler, Nadja; Schweitzer, Katrin (2014) The Extended DIRNDL Corpus as a Resource for Automatic Coreference and Bridging Resolution. In Proceedings LREC 2014, Reykjavik, pp. 3222-3228. pdf

Rösiger, Ina and Riester, Arndt (2015). Using prosodic annotations to improve coreference resolution of spoken text. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL-IJCNLP) pp. 83-88 Beijing. pdf.

Riester, Arndt; Piontek, Jörn (2015). Anarchy in the NP. When new nouns get deaccented and given nouns don't. Lingua 165(B): 230-253. pdf. Adjective-noun data sets: deaccented nouns, accented given nouns

 


Download

The DIRNDL_anaphora corpus from Björkelund et al. (2014) can be downloaded here.
The DIRNDL version described in Rösiger and Riester (2015) can be downloaded here.