11. Appendix: Corpus definition

Actually, corpora for TIGERSearch have to be defined in TIGER-XML. However, TIGER-XML is a direct translation of the corpus definition sublanguage of the TIGER description language. The restrictions for corpora are as follows:

Declaration of required features

A corpus definition must include the feature declarations for the type NT of nonterminal constraints and the type T of terminal constraints.

Please note: If no explicit feature declarations are given, in most cases, they can be derived automatically from the corpus.

Single Root Node, Connectedness, No Structure Sharing

Every node in a graph except for one distinguished node (root node) has to be directly dominated by exactly one other node.

Please note: A multi-rooted graph (unconnected subgraphs) can be turned automatically into a graph with unique root node by adding an 'artificial' root node plus the edges which point to the individual subgraphs.

Please note: A structure sharing mechanism (multi-dominance) is provided by the additional layer of 'secondary edges'.

Acyclicity

No node may (indirectly) dominate itself.

Full Disambiguation

The graph constraints may only include the basic node relations labelled direct dominance (>L) and direct precedence (.).

The feature constraint for a node must be a conjunction of feature-value pairs either for all features which have been declared for NT or for those which have been defined for T.

For each node, its arity (i.e. the number of its children) has to be fixed.

The precedence relation on terminal nodes (leaf nodes) must be a total order.

The only logical connective on all structural levels is the conjunction operator &.

Neither types nor template calls nor regular expressions are admitted.