Section: Language overview

2. Language overview

Like other formal languages, the TIGER language has been defined recursively, i.e. by nested layers of formal constructs. The invidual nodes in syntax graphs can be described with feature constraints, i.e. Boolean expressions over feature-value pairs. For the sake of computational simplicity, we do not admit nested feature structures. This means that feature value descriptions must denote constants or more precisely, strings. For this reason, we rather talk about feature records instead of feature structures. Here are some sample queries for the TIGERSampler corpus which is part of the TIGERSearch distribution (cf. [Smith2002] for an introduction to the corpus annotation).

[word="Abend" & pos="NN"]

[word=/Ma.*/ & pos= ("NN"|"NE")]

There are two elementary node relations, precedence (.) and labelled dominance (>L), and a range of derived node relations such as unlabelled direct dominance (>) and general dominance (>*). Example:

[cat="NP"] > [pos="ART"]

The graph descriptions are made from (restricted) Boolean expressions over node relations. Feature values, feature constraints, and nodes can be refered to by logical variables (e.g. #n) which are bound existentially at the outmost formula level. Example:

(#n:[cat="NP"] > [pos="ART"]) & (#n >* [pos="ADJA"])

Queries can include template calls and type names. Template definitions help to modularize lengthy queries. Types are a means to structure the universe of feature-value pairs. Type definitions include the declaration of features with domain and range types and the definition of type hierarchies.

In the subsequent sections, we introduce the TIGER language in an informal manner, i.e. by the way of examples. All sample queries should work on the TIGERSampler corpus, which is distributed with the TIGERSearch software. A formal definition of the TIGER language is given in a separate document (cf. [KoenigLezius2001]).

In section 12, the reader will find a quick reference of all language elements.