Institut

Studium

Forschung


 

 

Annotation guidelines

The TIGER project aims to produce a large syntactically annotated corpus of German newspaper text. In order to yield a high-quality and theoretically well-founded annotation of the corpus, detailed annotation guidelines have been developed:

Annotation example

Here (samples.tgz) is a short extract from the corpus. It contains:

  • sample1.export : Negra export format (Release 1)
  • sample1.xml : TIGER-XML format (Release 1)
  • sample2.export : Negra export format (Release 2)
  • sample2.xml : TIGER-XML format (Release 2)

Here is the graphical representation of a single corpus sentence:

Annotation approaches

The quality (in terms of consistency) and the speed of the manual annotation are improved with the help of automatic annotation tools. For the annotation of the TIGER corpus, we are using two different approaches:

  • Annotate

    The major part of the TIGER corpus annotation is carried out by means of the Annotate software. Annotate is a graphical tool for efficient semi-automatic annotation of corpus data. In the framework of the TIGER project, the tool includes a partial parser and a part-of-speech tagger for the automatic partial corpus annotation. Annotate was developed in the NEGRA project at the University of Saarbrücken. For more information about Annotate, see the Annotate homepage and the LREC'2000 paper by Brants/Plaehn (ps.gz, pdf).

  • LFG Annotation

    In parallel to the Annotate tool, a broad coverage symbolic LFG grammar - developed in the Pargram project at the University of Stuttgart - is used for annotating the TIGER corpus. Annotation by the LFG grammar involves two steps which are now illustrated by examples (Please follow the links.):

      1. LFG parsing: First the TIGER corpus is parsed by the LFG grammar. The output of the LFG grammar is disambiguated semi-automatically.

     

    1. TIGER transfer: The selected output is then automatically converted to the TIGER export format.