7. Graph descriptions

7.1 Boolean expressions

Graph descriptions or graph constraints are (restricted) Boolean expressions over node relations and node descriptions. Currently, conjunction & and disjunction | are available as logical connectives. For example, with the help of the &-operator, the following node relations can be joined into a graph constraint which retrieves the tree shown below.

Figure: A simple syntax graph

(#n1 >SB #n2) &
(#n1 >HD #n3) &
(#n2 >NK #n4) &
(#n2 >NK #n5)

Parentheses can be omitted in the usual fashion:

#n1 >SB #n2 &
#n1 >HD #n3 &
#n2 >NK #n4 &
#n2 >NK #n5

The operator precedence is defined as follows: Relation, &, |. This definition is illustrated by the following examples:

Example Interpretation
#v > #w & #x (#v > #w) & #x
#v & #w | #x (#v & #w) | #x

7.2 Use of variables

Variables for feature values

Variables for feature values are typically used to express agreement constraints. The following query looks for two adjacent nodes which are labelled with NN or NE.

[pos = #noun] . [pos = #noun:("NN" | "NE")]

Variables for feature constraints

With variables for feature constraints, we can search e.g. for sentences which contain the same preposition (the same word form!), twice:

[#f:(pos="APPR")] .* [#f]

Please note: There is a subtle difference if we used a feature value variable instead. If we only require the identity of the feature value, i.e. of the part-of-speech tag, we get all sentences which contain at least two prepositions (not necessarily the same word form!):

[pos = #v:"APPR"] .* [pos=#v]

Node variables

Node variables are necessary to express multiple node relations with respect to one node, e.g. to list the children of a node like in the example in subsection 7.1:

#np:[cat="NP"] &
#np > [pos="ADJA"] &
#np > [pos="NN"]

Node (in)equality

Two nodes variables #n1 and #n2 may match the same node in the corpus. If this causes problems, the inequality of two node variables can be enforced e.g. by adding the following subformula which requires the variables #n1 and #n2 to match distinct nodes (due to the irreflexivity of the precedence relation):

((#n1 .* #n2) | (#n2 .* #n1))

In the case your corpus contains unary transitions (nonterminal nodes with one single nonterminal daughter), you should use a weaker constraint for node inequality:

((#n1 .* #n2) | (#n2 .* #n1)) | ((#n1 >* #n2) | (#n2 >* #n1))

7.3 Graph predicates

In principle, by now there are all the operators to describe syntax graphs. For reasons of convenience, and to a certain extent for reasons of completeness, we have added so-called graph predicates, e.g. to designate the root of a graph.

Root predicate

The root of a graph (for a whole sentence) can be identified by the predicate root.

root(#n1)

Arity predicates

The following graph description describes all graphs which contain a certain node #n1 with at least two children #n2 and #n3:

(#n1 > #n2) & (#n1 > #n3)

However, one would like to state that there must be exactly two children. For this reason, we introduce a two-place operator arity in order to be able to restrict the number of children of a node #n1, e.g. to two children:

(#n1 > #n2) & (#n1 > #n3) & arity(#n1,2)

The arity predicate can also come with three arguments in order to indicate an interval of number of children, e.g. from two to four children:

(#n1 > #n2) & (#n1 > #n3) & arity(#n1,2,4)

Similarly, there is a tokenarity operator to constrain the number of leaves which are dominated by this node. For example, the following means that node #n1 must dominate exactly 5 terminal nodes. And the subsequent example states that node #n1 must have between 5 and 7 leaves.

tokenarity(#n1,5)
tokenarity(#n1,5,7)

Continuity predicates

It may be useful to state that the leaves which are dominated by a node must form a continuous string or not. For this purpose, the two unary operators continuous and discontinuous have been introduced:

continuous(#n1)
discontinuous(#n1)