Next: Create and Fill a
Up: A Library Package for
Previous: Contents
The Verbmobil Interface Term (VIT) is one of the still existing ``inventions'' of Verbmobil-1. The same holds for the ADT library package which handles it. No more needs to be said. Well, this report is an update of Verbmobil report 104 [Dorna (1996)] where you may find the first part of the story.
The VIT is used as a uniform data structure at the interfaces between several software components of the Verbmobil system. It is an encoding of different linguistically motivated pieces of information produced and used in these components.
The contents of a VIT correspond to a segmented utterance in a dialogue turn. This partitioning of turns enables the linguistic components to work incrementally.
The main contents of a VIT are semantic representations. On the other hand, information like morpho-syntax, syntactic tense, semantic sorts, scope, prosody, and the analyzed surface string is also part of a VIT. This information is linked to semantics and can be used for computing semantic tense, for disambiguation of underspecified analyses, for guiding semantic evaluation such as anaphora resolution, for adjacency or linear precedence determination, and for many more.
In a large project like Verbmobil data abstraction is an important basis for the parallel development of different components which should communicate with each other in the end. Hence, also from a software engineering perspective there are a lot arguments for the VIT. Among them there are:
The contents of a VIT are filled into the following slots:
| Slot Name | Description |
| VIT ID | combines a unique tag for a turn segment described by the current VIT and the word lattice path used in its linguistic analysis; |
| Index | a triple consisting of the entry points for traversing the VIT representation; |
| Conditions | labelled conditions describing the possibly underspecified semantic content of an utterance; |
| Constraints | scope and grouping constraints, e.g. used for underspecified quantifier and operator scope representation; |
| Sorts | sortal specifications for instance variables introduced in labelled conditions; |
| Discourse | additional semantic and pragmatic information, e.g. discourse roles for individual instances; |
| Syntax | morpho-syntactic features, e.g. number and gender of individual instances; |
| Tense and Aspect | morpho-syntactic tense combined with aspect and sentence mood information, e.g. used for computing surface tense; |
| Prosody | prosodic information such as accenting and sentence mood. |
Beside ``VIT ID'' and ``Index'' all slots are encoded as variable free lists of terms. The lists are used because they are very easy to manipulate. In typical AI languages such as Lisp and Prolog they are built-in and they can be ported easily to other programming languages. In general, the list elements do not introduce any further recursive embedding, i.e. the elements are like fixed records with fields containing constants. The minimally recursive representation was chosen for efficient information access.
The information within the several slots is connected to information in other slots by identical constants which can be seen as bidirectional links. These constants - we call them holes , labels and instances - could be interpreted as skolemized logical variables which denote a node in a graph. I.e., a sharing like this combines information of different nature or source which, therefore, is part of different slots in a VIT.
vitID(SegmentID, WHGString)
SegmentID is a unique marker for each VIT. The Prolog
encoding looks like
E.g.sid(TurnNo, Channel, SourceLanguage, BeginTime, EndTime,
Reading, CurrentLanguage, TurnEnd, Sender)
sid(104,a,ge,7002,7199,1,jp,y,transfer) is an example of
a segment identifier with a string representation
t104ageb7002e7199r1jpytransfer.WHGString is a Prolog list which contains edges of a word lattice. This sequence of edges describes the surface string selected by a syntactic/semantic component for producing the current VIT. An edge can have either one of the following forms:
word(String, WID, ListOfLs)
filler(String, WID)
String is the label of the edge and WID is a unique edge
identifier. Both, String and WID, are Prolog atoms. If
String is a linguistic information, the word/3
representation is used. For more technical information such as pauses,
noice, etc. filler/2 is used.
The list of labels ListOfLs links the edges to some labelled
conditions in the Conditions slot. If there is more than one label in
a single list, a word was decomposed into several lexemes in semantic
analysis. If there are identical labels in different lists, the
corresponding words form one labelled condition together. Hence, this
encoding reflects an n-to-m mapping between syntax and semantics
during linguistic analysis.
index(TopHole,MainLabel,MainInst)
where the hole TopHole, the label MainLabel, and the
instance MainInst are the entry points for traversing a VIT.
Functor(Label, Arg2, ..., Argx)called labelled conditions. The semantic entities are, e.g., predicates, roles, operators, and quantifiers. The first argument - sometimes called base label - is always a unique identifier for such an object. The semantic variables for labels and markers, such as events, states and individuals, are skolemized with special constant symbols, e.g. l1 for a label and i1 for a state.
The labeling of semantic conditions is very useful since the recursive embedding of argument structure, operator scope, etc. is no longer syntactically represented in a recursive data structure but achieved through the interpretation of additional labeling constraints. In this respect, label arguments act as pointers to the corresponding arguments. Additionally, all these special constants can be seen as pointers for adding or linking information within and between multiple slots of the VIT.
The set of all possible conditions for one language is defined by its
lexicon. The ADT package supports on-line lexicons for arbitrary
languages. See Appendix
for the expected format and for
hooks to extend a language specific lexicon. Examples of such lexicons
can be found in vitSemLex.pl for English, German, and
Japanese.
.
The argument values of the terms above and of the predicates described
in the next sections are given in the following table:
The output of each component dealing with VITs has to make sure the
following properties:
For describing the predicates of the ADT package we use the ``standard'' notation for call patterns (aka mode information):
The ADT package is realized as a Prolog module named vitADT
which exports the predicates described in the following sections.
For further remarks on the usage see Appendix
.
The rest of this documentation is organized as follows. In
Section
we present the predicates for constructing a
new VIT and filling it with information. In Section
and
Section
we outline predicates for information access and
those for deleting information, respectively. Section
informs about predicates for checking the contents of a VIT. In
Section
we show predicates for printing a VIT.
Miscellaneous predicates are described in Section
.
Appendix
explains how to get and install the ADT
package and Appendix
shows how to use it.
Appendix
introduces the term conversion package
atom2term which comes with the library package distribution.
Finally, Appendix
sketches briefly the contents and
possible extensions of the on-line dictionaries contained in
vitSemLex.pl and vitValues.pl.
For each predicate presented in this document we give the call pattern(s) and a brief description sometimes including an example.
Michael Dorna , VM Report 238, 5/18/2000