Subsection: Use of variables

7.2 Use of variables

Variables for feature values

Variables for feature values are typically used to express agreement constraints. The following query looks for two adjacent nodes which are labelled with NN or NE.

[pos = #noun] . [pos = #noun:("NN" | "NE")]

Variables for feature constraints

With variables for feature constraints, we can search e.g. for sentences which contain the same preposition (the same word form!), twice:

[#f:(pos="APPR")] .* [#f]

Please note: There is a subtle difference if we used a feature value variable instead. If we only require the identity of the feature value, i.e. of the part-of-speech tag, we get all sentences which contain at least two prepositions (not necessarily the same word form!):

[pos = #v:"APPR"] .* [pos=#v]

Node variables

Node variables are necessary to express multiple node relations with respect to one node, e.g. to list the children of a node like in the example in subsection 7.1:

#np:[cat="NP"] &
#np > [pos="ADJA"] &
#np > [pos="NN"]

Node (in)equality

Two nodes variables #n1 and #n2 may match the same node in the corpus. If this causes problems, the inequality of two node variables can be enforced e.g. by adding the following subformula which requires the variables #n1 and #n2 to match distinct nodes (due to the irreflexivity of the precedence relation):

((#n1 .* #n2) | (#n2 .* #n1))

In the case your corpus contains unary transitions (nonterminal nodes with one single nonterminal daughter), you should use a weaker constraint for node inequality:

((#n1 .* #n2) | (#n2 .* #n1)) | ((#n1 >* #n2) | (#n2 >* #n1))