Strings are to be marked by quotation marks, e.g. "NN". To function as an actual TIGERSearch query, we must use the string as the value in a feature-value pair, surrounded by brackets:
[pos="NN"]
Characters which are TIGER reserved symbols must be preceded by the \-symbol (cf. following query). TIGER reserved symbols are listed in section 12.
[word="\."]
Constant-denoting type symbols can serve as descriptions of feature values (cf. section 8). If a feature value symbol comes without quotes, it is interpreted as a type name:
[pos=proform]
Descriptions of feature values may be complex in the sense that type symbols and strings can be combined as Boolean expressions. The operators are ! (negation), & (conjunction), and | (disjunction). Here are uses of a Boolean expressions as a feature value descriptions:
[pos = ("NN" | "NE")]
[pos = ! ("NN" | "NE") ]
By the way, the query above can be written as:
[pos != ("NN" | "NE") ]
Please note: Boolean expressions for feature values which involve binary
operators (conjunction and disjunction) must always be put into
parentheses. See the example above where the outer parentheses
cannot be omitted!
The operator precedence is defined as follows: !, &, |. This definition is illustrated by the following examples:
Example | Interpretation |
! "NN" & "NE" | (!"NN") & ("NE") |
"NN" & "NE" | "PREL" | (NN"&"NE") | ("PREL") |
A Boolean feature value description can be refered to by a variable (in the example: #c):
[pos= #c:("NN"|"NE")]
A variable name has to start with a #-symbol. See subsection 7.2 for more meaningful applications of variables.
One can also use regular expressions as feature value descriptions. Regular expressions are marked by enclosing slashes /. The syntax of regular expressions in TIGER is compatible with the syntax of regular expressions in Perl 5.003 (cf. [WallEtAl1996]). In our implementation, the following expression types are available:
Single characters
a | the character a |
. | any character |
Character classes
[ace] | any of the characters a, c, e |
[a-z] | any lower-case letter |
[^a-f] | any character except a to f |
Special characters
\s | whitespace (space, tab, return) |
\d | digit (0-9) |
Alternatives
(abc|de) | the string abc, or de |
Quantifiers
(ab)* | no or any number of ab (empty string, ab, abab, ...) |
(ab)+ | at least one ab (ab, abab, ...) |
(ab)? | no or exactly one ab |
(ab){m,n} | from m to n occurences of ab |
Grouping
ab+ | a followed by at least one b (a, ab, abb, ...) |
(ab)+ | at least one ab (ab, abab, ...) |
Please note: In our notation /x/
means /^x$/ in the Perl notation.
The following example means 'find words which start with spiel':
[word = /spiel.*/]
With the following query, one can locate the words das and der, irrespective of capitalization of the first letter:
[lemma = /[dD](as|er)/]
The following query finds words which contain at least one uppercase letter or a figure at a non-initial position, i.e. hyphenated compounds, and potential abbreviations and product names:
[word = /.+([0-9A-Z])+.*/]
Please note: There is a difference between .
and \.
in the context of regular expressions. The following example
denotes all strings starting with the prefix sagt, whereas
the subsequent query means all strings with the prefix sagt
followed by an arbitrary, possibly empty number of full stops:
[word = /sagt.*/]
[word = /sagt\.*/]
Please note: The TIGER language compiler performs only a rough check of the
syntax of regular expressions. The fine-grained syntax check for
regular expressions will be carried out when a query is evaluated.
Therefore it may involve more effort for you to discover the syntax
errors you have made in a regular expression.
Regular expressions for feature values should be reserved for 'open-ended' feature values such as the word values. For features with a restricted range of a values such as syntactic categories, the use of types and Boolean expressions is suggested in order to increase readability and processing efficiency. If possible, types should be used instead of Boolean expressions.