|
|
 |
|
|
| |
|
last modified: Wednesday, 01-Jun-2011 13:26:40 CEST
|
|
TIGER Corpus
Annotation
|
Annotation guidelines
The TIGER project aims to produce a large syntactically annotated
corpus of German newspaper text. In order to yield a high-quality and
theoretically well-founded annotation of the corpus,
detailed annotation guidelines have been developed:
Annotation example
Here is a short extract from the corpus:
Here is the graphical representation of a single corpus sentence:
Annotation approaches
The quality (in terms of consistency) and the speed of the manual
annotation are improved with the help of automatic annotation tools.
For the annotation of the TIGER corpus, we are using two different
approaches:
- Annotate
The major part of the TIGER corpus annotation is carried out by means
of the Annotate
software. Annotate is a graphical tool for efficient
semi-automatic annotation of corpus data. In the framework of the
TIGER project, the tool includes a partial parser and a part-of-speech
tagger for the automatic partial corpus annotation. Annotate was
developed in the NEGRA project at the University of Saarbrücken.
For more information about Annotate, see the Annotate
homepage and the LREC'2000 paper by Brants/Plaehn (ps.gz,
pdf).
- LFG Annotation
In parallel to the
Annotate tool,
a broad coverage symbolic LFG grammar - developed in the
Pargram project at the University of Stuttgart - is used for
annotating the TIGER corpus.
Annotation by the LFG grammar involves two steps which are now illustrated
by examples (Please follow the links.):
- LFG
parsing: First the TIGER corpus is parsed by the LFG grammar.
The output of the LFG grammar is disambiguated semi-automatically.
- TIGER
transfer: The selected output is then automatically converted
to the TIGER export format.
|
|
|
|