8. Type definitions

If the user has to declare the symbols which will be used in a corpus and in queries, inconsistencies in the corpus annotation and in the corpus query can be detected much more easily. TIGER allows for the declaration of type hierarchies (cf. subsection 8.2) and features (cf. subsection 8.4).

Type hierarchies have to be linked to a corpus. The linking of type hierarchies is described in subsection 4.3, chapter VI.

8.1 Built-in types

There are the following built-in types in the TIGER language:

String for feature values not listable such as the values of the features word or lemma

UserDefConstant for user-defined ('listable') feature values

Constant comprises both String and UserDefConstant

NT for feature constraints of nonterminal nodes

T for feature constraints of terminal nodes

FREC stands for all feature records (feature constraints), i.e. NT and T

Node stands for node descriptions

Graph means all graphs

Top means anything (in the world of syntax graphs)

The hierarchy of built-in types is visualized in the following figure. Defining a type hierarchy is introduced in the following subsection.

Figure: Built-in types

The built-in types Top, Node and Graph are only required on the conceptual level. In the current implementation, they cannot be referred to in the description language.

8.2 Definition of a type hierarchy

In the current implementation, the user can only add type definitions for feature values, i.e. for the type UserDefConstant. Let us start with a sample type hierarchy for the pos feature:

<typedeclaration base="pos" version="1.0">
...
  <!-- base type: pos -->
  <type name="pos">
     <subtype nameref="openclass"/>
     <subtype nameref="closedclass"/>
     <subtype nameref="punctuation"/>
     <subtype nameref="misc"/>
  </type>
...
</typedeclaration>

Type declaration are encoded in an XML-based format: The type declarations for a feature constitute the contents of a typedeclaration root element. The base attribute defines the base type (or root type) of the type system.

Type definition rules are encoded by type elements. The value of the name attribute is the type t which is being defined. The (direct) subtypes are given by the nameref attribute values of the individual subtype child elements. An occurrence of a type t' in an subtype element is called a use of t'.

In this way, type hierarchies can be defined. The 'terminal nodes' of a type hierarchy (of constant denoting types) are constants:

  <!-- open word classes-->
  <type name="openclass">
     <subtype nameref="noun"/>
     <subtype nameref="verb"/>
     <subtype nameref="adjective"/>
     <!-- adverb -->
     <constant value="ADV" comment="schon, bald, doch"/>
  </type>

  <!-- noun -->
  <type name="noun">
     <!-- common noun -->
     <constant value="NN" comment="Tisch, Herr, [das] Reisen"/>
     <!-- proper noun -->
     <constant value="NE" comment="Hans, Hamburg, HSV"/>
  </type>

On the basis of these type definitions, some disjunctions of feature values can be written in a more concise manner, e.g. the following query can now be replaced by the subsequent query:

[pos = ("NE"|"NN")]
[pos = noun]

Restrictions for type definitions

A type must be defined at most once.

A type must be used exactly once in a type definition.

The first restriction means that neither recursion nor cross-classification (alternative definitions of the same type symbol) can be expressed. If you think you need cross-classification, template definitions (section 9) might be a way out. The second restriction enforces that every type must be hooked up in the type hierarchy. In total this means that type definitions define a tree-shaped type hierarchy. Undefined types may only occur as the leaves of a type hierarchy.

8.3 Type definition example

In this subsection, the part-of-speech type hierarchy used for the TIGER Corpus Sampler (based on a modified version version of the STTS tag set) is listed as an example. The file is also placed in the doc/examples/ subdirectory of your TIGERSearch installation.

<typedeclaration base="pos" version="1.0">

  <!-- base type: pos -->
  <type name="pos">
     <subtype nameref="openclass"/>
     <subtype nameref="closedclass"/>
     <subtype nameref="punctuation"/>
     <subtype nameref="misc"/>
  </type>

  <!-- open word classes-->
  <type name="openclass">
     <subtype nameref="noun"/>
     <subtype nameref="verb"/>
     <subtype nameref="adjective"/>
     <!-- adverb -->
     <constant value="ADV" comment="schon, bald, doch"/>
  </type>

  <!-- closed word classes-->
  <type name="closedclass">
     <!-- definite or indefinite article -->
     <constant value="ART" comment="der, die, das, ein, eine"/>
     <subtype nameref="proform"/>
     <!-- cardinal number -->
     <constant value="CARD" comment="zwei [Männer], [im Jahre] 1994"/>
     <subtype nameref="conjunction"/>
     <subtype nameref="adposition"/>
     <!-- interjection -->
     <constant value="ITJ" comment="mhm, ach, tja"/>
     <subtype nameref="particle"/>
  </type>

  <!-- noun -->
  <type name="noun">
     <!-- common noun -->
     <constant value="NN" comment="Tisch, Herr, [das] Reisen"/>
     <!-- proper noun -->
     <constant value="NE" comment="Hans, Hamburg, HSV"/>
  </type>

  <!-- verb -->
  <type name="verb">
     <subtype nameref="finite"/>
     <subtype nameref="nonfinite"/>
  </type>

  <!-- finite verbform -->
  <type name="finite">
     <!-- finite full verb -->
     <constant value="VVFIN" comment="du] gehst, [wir] kommen [an]"/>
     <!-- finite auxiliary verb -->
     <constant value="VAFIN" comment="[du] bist, [wir] werden"/>
     <!-- finite modal verb --> 
     <constant value="VMFIN" comment="dürfen"/>
  </type>

  <!-- non-finite verbform -->
  <type name="nonfinite">
     <subtype nameref="infinitive"/>
     <subtype nameref="participle"/>
     <subtype nameref="imperative"/>
     <!-- infinitive with zu, full verb -->
     <constant value="VVIZU" comment="anzukommen, loszulassen"/>
  </type>

  <!-- infinitive verbform -->
  <type name="infinitive">
     <!-- inifinitive, full verb -->
     <constant value="VVINF" comment="gehen, ankommen"/>
     <!-- infinitive, auxiliary verb -->
     <constant value="VAINF" comment="werden, sein"/>
     <!-- infinitive, modal verb -->
     <constant value="VMINF" comment="wollen"/>
  </type>

  <!-- past participle -->
  <type name="participle">
     <!-- past participle, full verb -->
     <constant value="VVPP" comment="gegangen, angekommen"/>
     <!-- past participle, auxiliary verb -->
     <constant value="VAPP" comment="gewesen"/>
     <!-- past participle, modal verb -->
     <constant value="VMPP" comment="gekonnt, [er hat gehen] können"/>
  </type>

  <!-- past participle -->
  <type name="participle">
     <!-- past participle, full verb -->
     <constant value="VVPP" comment="gegangen, angekommen"/>
     <!-- past participle, auxiliary verb -->
     <constant value="VAPP" comment="gewesen"/>
     <!-- past participle, modal verb -->
     <constant value="VMPP" comment="gekonnt, [er hat gehen] können"/>
  </type>

  <!-- imperative -->
  <type name="imperative">
     <!-- imperative, full verb -->
     <constant value="VVIMP" comment="komm [!]"/>
     <!-- imperative, auxiliary verb -->
     <constant value="VAIMP" comment="sei [ruhig !]"/>
  </type>

  <!-- adjective -->
  <type name="adjective">
     <!-- attributive adjective -->
     <constant value="ADJA" comment="[das] große [Haus]"/>
     <!-- adverbal or predicative adjective -->
     <constant value="ADJD" comment="[er fährt] schnell, [er ist] schnell"/>
  </type>

  <!-- proform -->
  <type name="proform">
  <subtype nameref="prodemon"/>
  <subtype nameref="proindef"/>
  <!-- irreflexive personal pronoun -->
  <constant value="PPER" comment="ich, er, ihm, mich, dir"/>
  <subtype nameref="propos"/>
  <subtype nameref="prorel"/>
  <!-- reflexive pronoun -->
  <constant value="PRF" comment="sich, einander, dich, mir"/>
  <subtype nameref="prointer"/>
  <!-- pronominal adverb, bug, should be "PAV" -->
  <constant value="PROAV" comment="dafür, dabei, deswegen, trotzdem"/>
  </type>

  <!-- demonstrative pronoun -->
  <type name="prodemon">
     <!-- substitutive demonstrative pronoun -->
     <constant value="PDS" comment="dieser, jener"/>
     <!-- attributive demonstrative pronoun -->
     <constant value="PDAT" comment="jener [Mensch]"/>
  </type>

  <!-- indefinite pronoun -->
  <type name="proindef">
     <!-- substitutive indefinite pronoun -->
     <constant value="PIS" comment="keiner, viele, man, niemand"/>
     <!-- attributive indefinite pronoun -->
     <constant value="PIAT" comment="kein [Mensch], irgendein [Glas]"/>
  </type>

  <!-- posessive pronoun -->
  <type name="propos">
     <!-- substitutive possesive pronoun -->
     <constant value="PPOSS" comment="meins, deiner"/> 
     <!-- attributive possessive pronoun -->
     <constant value="PPOSAT" comment="mein [Buch], deine [Mutter]"/>
  </type>

  <!-- relative pronoun -->
  <type name="prorel">
     <!-- substitutive relative pronoun -->
     <constant value="PRELS" comment="[der Hund,] der"/>
     <!-- attributive relative pronoun -->
     <constant value="PRELAT" comment="[der Mann ,] dessen [Hund]"/>
  </type>

  <!-- interrogative pronoun -->
  <type name="prointer">
     <!-- substitutive interrogative pronoun -->
     <constant value="PWS" comment="wer, was"/>
     <!-- attributive interrogative pronoun -->
     <constant value="PWAT" comment="welche [Farbe], wessen [Hut]"/>
     <!-- interrogative adverb or adverbial relative pronoun -->
     <constant value="PWAV" comment="warum, wo, wann, worüber, wobei"/>
  </type>

  <!-- conjunction -->
  <type name="conjunction">
  <subtype nameref="conjsub"/>
  <!-- coordinating conjunction -->
  <constant value="KON" comment="und, oder, aber"/>
  <!-- comparative conjunction -->
  <constant value="KOKOM" comment="als, wie"/>
  </type>

  <!-- subordinating conjunction -->
  <type name="conjsub">
     <!-- subordinating conjunction with zu-infinitive -->
     <constant value="KOUI" comment="um [zu leben], anstatt [zu fragen]"/>
     <!-- subordinating conjunction with sentence -->
     <constant value="KOUS" comment="weil, daß, damit, wenn, ob"/>
  </type>

  <!-- adposition -->
  <type name="adposition">
     <!-- preposition -->
     <constant value="APPR" comment="in [der Stadt], ohne [mich], von [jetzt an]"/>
     <!-- preposition + article -->
     <constant value="APPRART" comment="im [Haus], zur [Sache]"/>
     <!-- postposition -->
     <constant value="APPO" comment="[ihm] zufolge, [der Sache] wegen"/>
     <!-- circumposition, right part -->
     <constant value="APZR" comment="[von jetzt] an"/>
  </type>

  <!-- particle -->
  <type name="particle">
     <!-- "zu" before infinitive -->
     <constant value="PTKZU" comment="zu [gehen]"/>
     <!-- negating particle -->
     <constant value="PTKNEG" comment="nicht"/>
     <!-- separated verb particle -->
     <constant value="PTKVZ" comment="er kommt] an, [er fährt] rad"/>
     <!-- answer particle -->
     <constant value="PTKANT" comment="ja, nein, danke, bitte"/>
     <!-- particle with adjektive or adverb -->
     <constant value="PTKA" comment="am [schönsten], zu [schnell]"/>
  </type>

  <type name="punctuation">
     <!-- comma -->
     <constant value="$," comment=","/>
     <!-- final punctuation -->
     <constant value="$." comment=". ? ! ; :"/>
     <!-- other punctuation marks -->
     <constant value="$(" comment="- [,]()"/>
  </type>

  <type name="misc">
     <!-- foreign material -->
     <constant value="FM" comment="[Er hat das mit ``] A big fish ['' übersetzt]"/>
     <!-- nonword, with special characters -->
     <constant value="XY" comment="3:7, H2O, D2XW3"/>
     <!-- truncated element -->
     <constant value="TRUNC" comment="An- [und Abreise]"/>
     <!-- untagged -->
     <constant value="--"/>
     <!-- tagging of the token unknown -->
     <constant value="UNKNOWN"/>
  </type>

</typedeclaration>

8.4 Feature declarations

Feature declarations are part of the corpus definition (cf. section 11). A feature declaration states the following information:

It declares the symbol under consideration as a feature name.

It restricts the domain of the feature, i.e. it states for which type the feature may be used.

It restricts the feature values (i.e. the range).

In the current implementation, features can only be declared for the built-in types NT and T. Furthermore, one cannot use a type to restrict the range of a feature, but the possible values for a feature have to be enumerated. One reason is that we want to keep the number of dependencies between corpus definition and type definitions as small as possible. The other reason is that such simple feature declarations can be also be constructed automatically - for those corpora which do not come with feature declarations.

If the value enumeration is omitted from a feature declaration, the default range of that feature is String.

For example, for the type T of feature constraints for terminal nodes, the features word, lemma, and pos may be defined as follows.

<feature name="word" domain="T"/>

<feature name="lemma" domain="T"/>

<feature name="pos" domain="T">
  ...
  <value name="VAFIN">...comment...</value>
  <value name="VAIMP"/>
  <value name="VAINF"/>
  <value name="VAPP"/>
  <value name="VMFIN"/>
  <value name="VMINF"/>
  ...
</feature>

Please note: Each feature must be declared exactly once. The exclusion of multiple declarations for the same feature means that polymorphic overloading of a feature symbol is not permitted.

Please note: In the TIGER description language being a typed language, the following two queries are not equivalent!

[word="das" & !(pos="ART")]
[word="das" & pos != "ART"]

The reason is that !(pos="ART") equals !(T & pos="ART") due to the corresponding feature declaration. The latter formula again is equivalent to !(T) | (pos != "ART"), i.e. either the feature pos is not defined on a type or it is defined and its value is not equal to "ART".