3. Feature value descriptions

3.1 String

Strings are to be marked by quotation marks, e.g. "NN". To function as an actual TIGERSearch query, we must use the string as the value in a feature-value pair, surrounded by brackets:

[pos="NN"]

Characters which are TIGER reserved symbols must be preceded by the \-symbol (cf. following query). TIGER reserved symbols are listed in section 12.

[word="\."]

3.2 Types

Constant-denoting type symbols can serve as descriptions of feature values (cf. section 8). If a feature value symbol comes without quotes, it is interpreted as a type name:

[pos=proform]

3.3 Boolean expressions

Descriptions of feature values may be complex in the sense that type symbols and strings can be combined as Boolean expressions. The operators are ! (negation), & (conjunction), and | (disjunction). Here are uses of a Boolean expressions as a feature value descriptions:

[pos = ("NN" | "NE")]
[pos = ! ("NN" | "NE") ]

By the way, the query above can be written as:

[pos != ("NN" | "NE") ]

Please note: Boolean expressions for feature values which involve binary operators (conjunction and disjunction) must always be put into parentheses. See the example above where the outer parentheses cannot be omitted!

The operator precedence is defined as follows: !, &, |. This definition is illustrated by the following examples:

Example Interpretation
! "NN" & "NE" (!"NN") & ("NE")
"NN" & "NE" | "PREL" (NN"&"NE") | ("PREL")

3.4 Variables

A Boolean feature value description can be refered to by a variable (in the example: #c):

[pos= #c:("NN"|"NE")]

A variable name has to start with a #-symbol. See subsection 7.2 for more meaningful applications of variables.

3.5 Regular expressions

One can also use regular expressions as feature value descriptions. Regular expressions are marked by enclosing slashes /. The syntax of regular expressions in TIGER is compatible with the syntax of regular expressions in Perl 5.003 (cf. [WallEtAl1996]). In our implementation, the following expression types are available:

Single characters

a the character a
. any character

Character classes

[ace] any of the characters a, c, e
[a-z] any lower-case letter
[^a-f] any character except a to f

Special characters

\s whitespace (space, tab, return)
\d digit (0-9)

Alternatives

(abc|de) the string abc, or de

Quantifiers

(ab)* no or any number of ab (empty string, ab, abab, ...)
(ab)+ at least one ab (ab, abab, ...)
(ab)? no or exactly one ab
(ab){m,n} from m to n occurences of ab

Grouping

ab+ a followed by at least one b (a, ab, abb, ...)
(ab)+ at least one ab (ab, abab, ...)

Please note: In our notation /x/ means /^x$/ in the Perl notation.

The following example means 'find words which start with spiel':

[word = /spiel.*/]

With the following query, one can locate the words das and der, irrespective of capitalization of the first letter:

[lemma = /[dD](as|er)/]

The following query finds words which contain at least one uppercase letter or a figure at a non-initial position, i.e. hyphenated compounds, and potential abbreviations and product names:

[word = /.+([0-9A-Z])+.*/]

Please note: There is a difference between . and \. in the context of regular expressions. The following example denotes all strings starting with the prefix sagt, whereas the subsequent query means all strings with the prefix sagt followed by an arbitrary, possibly empty number of full stops:

[word = /sagt.*/]
[word = /sagt\.*/]

Please note: The TIGER language compiler performs only a rough check of the syntax of regular expressions. The fine-grained syntax check for regular expressions will be carried out when a query is evaluated. Therefore it may involve more effort for you to discover the syntax errors you have made in a regular expression.

3.6 Boolean expressions vs. types vs. regular expressions

Regular expressions for feature values should be reserved for 'open-ended' feature values such as the word values. For features with a restricted range of a values such as syntactic categories, the use of types and Boolean expressions is suggested in order to increase readability and processing efficiency. If possible, types should be used instead of Boolean expressions.