3.5 Regular expressions

One can also use regular expressions as feature value descriptions. Regular expressions are marked by enclosing slashes /. The syntax of regular expressions in TIGER is compatible with the syntax of regular expressions in Perl 5.003 (cf. [WallEtAl1996]). In our implementation, the following expression types are available:

Single characters

a the character a
. any character

Character classes

[ace] any of the characters a, c, e
[a-z] any lower-case letter
[^a-f] any character except a to f

Special characters

\s whitespace (space, tab, return)
\d digit (0-9)

Alternatives

(abc|de) the string abc, or de

Quantifiers

(ab)* no or any number of ab (empty string, ab, abab, ...)
(ab)+ at least one ab (ab, abab, ...)
(ab)? no or exactly one ab
(ab){m,n} from m to n occurences of ab

Grouping

ab+ a followed by at least one b (a, ab, abb, ...)
(ab)+ at least one ab (ab, abab, ...)

Please note: In our notation /x/ means /^x$/ in the Perl notation.

The following example means 'find words which start with spiel':

[word = /spiel.*/]

With the following query, one can locate the words das and der, irrespective of capitalization of the first letter:

[lemma = /[dD](as|er)/]

The following query finds words which contain at least one uppercase letter or a figure at a non-initial position, i.e. hyphenated compounds, and potential abbreviations and product names:

[word = /.+([0-9A-Z])+.*/]

Please note: There is a difference between . and \. in the context of regular expressions. The following example denotes all strings starting with the prefix sagt, whereas the subsequent query means all strings with the prefix sagt followed by an arbitrary, possibly empty number of full stops:

[word = /sagt.*/]
[word = /sagt\.*/]

Please note: The TIGER language compiler performs only a rough check of the syntax of regular expressions. The fine-grained syntax check for regular expressions will be carried out when a query is evaluated. Therefore it may involve more effort for you to discover the syntax errors you have made in a regular expression.