The supported data model is based on so-called syntax graphs, i.e. directed acyclic graphs with a single root node. Thus, corpus graphs cannot be encoded by embedding XML elements. As a solution, all terminal and nonterminal nodes are listed and edges are explicitly encoded as elements. The following example illustrates the corpus graph encoding.
<body> <s id="s5"> <graph root="s5_504"> <terminals> <t id="s5_1" word="Die" pos="ART" morph="Def.Fem.Nom.Sg"/> <t id="s5_2" word="Tagung" pos="NN" morph="Fem.Nom.Sg.*"/> <t id="s5_3" word="hat" pos="VVFIN" morph="3.Sg.Pres.Ind"/> <t id="s5_4" word="mehr" pos="PIAT" morph="--"/> <t id="s5_5" word="Teilnehmer" pos="NN" morph="Masc.Akk.Pl.*"/> <t id="s5_6" word="als" pos="KOKOM" morph="--"/> <t id="s5_7" word="je" pos="ADV" morph="--"/> <t id="s5_8" word="zuvor" pos="ADV" morph="--"/> </terminals> <nonterminals> <nt id="s5_500" cat="NP"> <edge label="NK" idref="s5_1"/> <edge label="NK" idref="s5_2"/> </nt> <nt id="s5_501" cat="AVP"> <edge label="CM" idref="s5_6"/> <edge label="MO" idref="s5_7"/> <edge label="HD" idref="s5_8"/> </nt> <nt id="s5_502" cat="AP"> <edge label="HD" idref="s5_4"/> <edge label="CC" idref="s5_501"/> </nt> <nt id="s5_503" cat="NP"> <edge label="NK" idref="s5_502"/> <edge label="NK" idref="s5_5"/> </nt> <nt id="s5_504" cat="S"> <edge label="SB" idref="s5_500"/> <edge label="HD" idref="s5_3"/> <edge label="OA" idref="s5_503"/> </nt> </nonterminals> </graph> </s> </body>
Please note: Feature values, represented as attribute-value pairs, cannot be omitted. If a feature value or edge label does not make sense for a token or inner node (e.g. in the example sentence the feature morph is sometimes unspecified), please use a meaningful symbol instead. We recommend you to use the symbol -- which is also used in our implemented import filters. When viewing a matching corpus graph using the TIGERGraphViewer, the display of a feature value or edge label such as -- can be suppressed (cf. subsection 7.5, chapter IV).