Like other formal languages, the TIGER language has been defined
recursively, i.e. by nested layers of formal constructs. The invidual
nodes in syntax graphs can be described with feature
constraints, i.e. Boolean expressions over feature-value
pairs. For the sake of computational simplicity, we do not admit
nested feature structures. This means that feature value
descriptions must denote constants or more precisely, strings.
For this reason, we rather talk about feature records instead
of feature structures. Here are some sample queries for the
TIGERSampler
corpus which is part of the TIGERSearch distribution
(cf. [Smith2002] for an introduction to the corpus annotation).
[word="Abend" & pos="NN"]
[word=/Ma.*/ & pos= ("NN"|"NE")]
There are two elementary node relations, precedence (.) and labelled dominance (>L), and a range of derived node relations such as unlabelled direct dominance (>) and general dominance (>*). Example:
[cat="NP"] > [pos="ART"]
The graph descriptions are made from (restricted) Boolean expressions over node relations. Feature values, feature constraints, and nodes can be refered to by logical variables (e.g. #n) which are bound existentially at the outmost formula level. Example:
(#n:[cat="NP"] > [pos="ART"]) & (#n >* [pos="ADJA"])
Queries can include template calls and type names. Template definitions help to modularize lengthy queries. Types are a means to structure the universe of feature-value pairs. Type definitions include the declaration of features with domain and range types and the definition of type hierarchies.
In the subsequent sections, we introduce the TIGER language in an
informal manner, i.e. by the way of examples. All sample queries
should work on the TIGERSampler
corpus, which is
distributed with the TIGERSearch software. A formal definition of the
TIGER language is given in a separate document
(cf. [KoenigLezius2001]).
In section 12, the reader will find a quick reference of all language elements.