6. Node relations

6.1 Elementary node relations

Since syntax graphs are two-dimensional objects, we need two operators to express the spatial relations. We simply take the nomenclature from linguistics, and call the vertical dimension the dominance relation and the horizontal dimension the precedence relation.

Labelled direct dominance

The symbol > means direct dominance. It is further specified by an edge label, for example HD. This means that labelled direct dominance is expressed by e.g. >HD. The constraint that an edge which is labelled HD leads from node #n1 to node #n3 (or that #n3 is a direct HD-successor of #n1) is written as follows:

#n1 >HD #n3

As a more comprehensive example, the following labelled dominance constraints encode the vertical dimension of the tree in the graph presented below. Note that labelled dominance is a relation among nodes, not a function like it is the case for feature structures. This means that there may be more than one edge with the same label leading out of one mother node, cf. the NK-edges in the presented graph.

#n1 >SB #n2
#n1 >HD #n3
#n2 >NK #n4
#n2 >NK #n5

Figure: A simple syntax graph

On the basis of the directed edges, which are defined by the dominance relation, the nodes of a syntax graph in the corpus annotation are classified in the following manner:

Inner nodes (nonterminal nodes)

Nodes with outgoing edges, i.e. nodes with children, are called inner nodes or nonterminal nodes. In the presented figure only the nodes #n1 and #n2 are inner nodes.

Leaf nodes (terminal nodes)

Nodes without successors are named leaf nodes or terminal nodes. For example, the nodes #n4, #n5, and #n3 are the leaves of the tree in the presented figure.

Direct precedence of leaf nodes

A syntax graph is not only defined by constraining the vertical arrangement of its nodes, but also by the horizontal order of its leaf nodes, i.e. by the precedence relation among the leaves. We use the . symbol to represent direct precedence, since it serves as the concatenation operator in some programming languages. Now, the horizontal dimension of the tree in the presented figure is determined by the two precedence constraints:

#n4 . #n5
#n5 . #n3

6.2 Derived node relations

In queries, one wants to refer to indeterminate portions of a graph. Therefore, generalized notions of dominance and precedence ('wildcards') are necessary. Furthermore, certain convenient abbreviations should be introduced like a sibling operator.

Dominance

Dominance in general means that there is a path from one node to another one via a connected series of direct dominance relations. The distance of a dominance relation is the length of the path between the two nodes, i.e. the number of direct dominance relations to be traversed. The minimum distance is 1, i.e. in TIGERSearch, a node does not dominate itself.

We introduce the following generalizations of the labelled direct dominance relation >L:

> unlabelled direct dominance
>* dominance (minimum distance 1)
>n dominance with distance n (n>0)
>m,n dominance with distance between m and n (0<m<n)
>@l leftmost terminal successor ('left corner')
>@r rightmost terminal successor ('right corner')
!>L negated labelled direct dominance
!> negated unlabelled direct dominance
!>* negated dominance
!>n negated dominance, distance n
!>m,n negated dominance, distance m...n (0<m<n)
!>@l negated left corner
!>@r negated right corner

For unlabelled edges, the label is considered to be unspecified, i.e. an unlabelled edge can match any edge.

The left corner resp. the right corner relation is reflexive, i.e. the leftmost resp. rightmost successor of a leaf node is the leaf node itself.

For the example tree in subsection 6.1, the following relations hold:

#n1 >* #n4
#n1 >2 #n4
#n1 >@l #n4
#n4 >@l #n4

Concerning the negated node relations, remember that variables are bound existentially at the outermost formula level. For example, the following query has to be read as 'find a syntax graph with an NP node and an NE node which are not connected by an NK-edge.'

[cat="NP"] !>NK [pos="NE"]

Precedence

So far, we have only talked about the precedence of leaf nodes. Whereas, for syntax trees, the precedence relation among its inner nodes is obvious, the situation is not clear at all for syntax graphs, which admit crossing edges. For example, the following two figures are different visualizations of the same syntax graph wrt. the arrangement of the nodes #n2 and #n3.

Figure: Precedence of inner nodes, first layout

Figure: Precedence of inner nodes, second layout

There are several alternatives to define the precedence of inner nodes. One could adopt the strict precedence definition of proper trees: A node #n1 precedes a node #n2 if all nodes dominated by #n1 precede all nodes dominated by #n2 (cf. [SteinerKallmeyer2002]), i.e. if the right corner of #n1 precedes the left corner of #n2. Based on this strict definition, there would be no precedence relation at all between the nodes #n2 and #n3 in the first syntax graph shown above.

We decided for a somewhat weaker definition, which admits precedence statements even for overlapping graphs: A node #n1 is said to precede another node #n2 if the left corner of #n1 precedes the left corner of #n2 (left corner precedence).

According to the left corner based definition, node #n2 precedes node #n3 in the first syntax graph shown above. The user may define his own version of precedence as a template (cf. section 9), e.g. with the help of the right corner operator >@r.

The distance n between two non-identical leaf nodes is the number of leaf nodes between these two nodes increased by 1. For example, the distance between two direct neighbour leaves is 1. The distance between two inner nodes #x and #y is the distance between their respective left corners. Two nodes which share the same left corner, do not precede each other. In particular, a node does not precede itself. For the precedence relation, there are the derived operators listed below:

.* precedence (minimum distance 1)
.n precedence with distance n (n>0)
.m,n precedence with distance between m and n (0>m>n)
!.* negated precedence
!.n negated precedence, distance n
!.m,n negated precedence, distance m...n

Sibling relation

Two nodes #n1 and #n2 are siblings if they are directly dominated by the same node #n0, i.e. if they have the same mother node #n0. We use the following writing conventions:

$ siblings
$.* siblings with precedence
!$ negated siblings

We did not include the negated $.* relation since it seems rather counter-intuitive.

6.3 Secondary edges

Although, in the kernel of the TIGER language, a node must not be dominated by more than one node, we have added so-called secondary edges as an extra layer in order to allow for structure sharing in syntax graphs. A secondary edge defines an additional parent node for a given node.

>~L labelled secondary edge
>~ secondary edge
!>~L negated labelled secondary edge
!>~ negated secondary edge