next up previous contents index
Next: Create and Fill a Up: A Library Package for Previous: Contents

Subsections


Introduction

The VIT Library Package

The Verbmobil Interface Term (VIT) is one of the still existing ``inventions'' of Verbmobil-1. The same holds for the ADT library package which handles it. No more needs to be said. Well, this report is an update of Verbmobil report 104 [Dorna (1996)] where you may find the first part of the story.

The Verbmobil Interface Term

The VIT is used as a uniform data structure at the interfaces between several software components of the Verbmobil system. It is an encoding of different linguistically motivated pieces of information produced and used in these components.

The contents of a VIT correspond to a segmented utterance in a dialogue turn. This partitioning of turns enables the linguistic components to work incrementally.

The main contents of a VIT are semantic representations. On the other hand, information like morpho-syntax, syntactic tense, semantic sorts, scope, prosody, and the analyzed surface string is also part of a VIT. This information is linked to semantics and can be used for computing semantic tense, for disambiguation of underspecified analyses, for guiding semantic evaluation such as anaphora resolution, for adjacency or linear precedence determination, and for many more.

In a large project like Verbmobil data abstraction is an important basis for the parallel development of different components which should communicate with each other in the end. Hence, also from a software engineering perspective there are a lot arguments for the VIT. Among them there are:

Multiple Levels of Information

The contents of a VIT are filled into the following slots:

Slot Name Description
VIT ID combines a unique tag for a turn segment described by the current VIT and the word lattice path used in its linguistic analysis;
Index a triple consisting of the entry points for traversing the VIT representation;
Conditions labelled conditions describing the possibly underspecified semantic content of an utterance;
Constraints scope and grouping constraints, e.g. used for underspecified quantifier and operator scope representation;
Sorts sortal specifications for instance variables introduced in labelled conditions;
Discourse additional semantic and pragmatic information, e.g. discourse roles for individual instances;
Syntax morpho-syntactic features, e.g. number and gender of individual instances;
Tense and Aspect morpho-syntactic tense combined with aspect and sentence mood information, e.g. used for computing surface tense;
Prosody prosodic information such as accenting and sentence mood.

Beside ``VIT ID'' and ``Index'' all slots are encoded as variable free lists of terms. The lists are used because they are very easy to manipulate. In typical AI languages such as Lisp and Prolog they are built-in and they can be ported easily to other programming languages. In general, the list elements do not introduce any further recursive embedding, i.e. the elements are like fixed records with fields containing constants. The minimally recursive representation was chosen for efficient information access.

The information within the several slots is connected to information in other slots by identical constants which can be seen as bidirectional links. These constants - we call them holes , labels and instances - could be interpreted as skolemized logical variables which denote a node in a graph. I.e., a sharing like this combines information of different nature or source which, therefore, is part of different slots in a VIT.

Contents of VIT Slots

  This section describes the information which can be found in the slots of a VIT. First, we list the terms,gif and then we explain the possible argument bindings.

VIT ID

  The ``VIT ID'' slot is filled with a VIT identifier of the form 
vitID(SegmentID, WHGString)
SegmentID is a unique marker for each VIT. The Prolog encoding looks like    
sid(TurnNo, Channel, SourceLanguage, BeginTime, EndTime,
Reading, CurrentLanguage, TurnEnd, Sender)
E.g. sid(104,a,ge,7002,7199,1,jp,y,transfer) is an example of a segment identifier with a string representation t104ageb7002e7199r1jpytransfer.gif

WHGString is a Prolog list which contains edges of a word lattice. This sequence of edges describes the surface string selected by a syntactic/semantic component for producing the current VIT. An edge can have either one of the following forms:   

word(String, WID, ListOfLs)
filler(String, WID)
String is the label of the edge and WID is a unique edge identifier. Both, String and WID, are Prolog atoms. If String is a linguistic information, the word/3 representation is used. For more technical information such as pauses, noice, etc. filler/2 is used.

The list of labels ListOfLs links the edges to some labelled conditions in the Conditions slot. If there is more than one label in a single list, a word was decomposed into several lexemes in semantic analysis. If there are identical labels in different lists, the corresponding words form one labelled condition together. Hence, this encoding reflects an n-to-m mapping between syntax and semantics during linguistic analysis.

Index

  The ``Index'' slot is filled with a term of the form  
index(TopHole,MainLabel,MainInst)
where the hole TopHole, the label MainLabel, and the instance MainInst are the entry points for traversing a VIT.

Conditions

  The contents of the ``Conditions'' slot are language specific.gif In general, it contains terms of the form
Functor(Label, Arg2, ..., Argx)
called labelled conditions. The semantic entities are, e.g., predicates, roles, operators, and quantifiers. The first argument - sometimes called base label - is always a unique identifier for such an object. The semantic variables for labels and markers, such as events, states and individuals, are skolemized with special constant symbols, e.g. l1 for a label and i1 for a state.

The labeling of semantic conditions is very useful since the recursive embedding of argument structure, operator scope, etc. is no longer syntactically represented in a recursive data structure but achieved through the interpretation of additional labeling constraints. In this respect, label arguments act as pointers to the corresponding arguments. Additionally, all these special constants can be seen as pointers for adding or linking information within and between multiple slots of the VIT.

The set of all possible conditions for one language is defined by its lexicon. The ADT package supports on-line lexicons for arbitrary languages. See Appendix gif for the expected format and for hooks to extend a language specific lexicon. Examples of such lexicons can be found in vitSemLex.pl for English, German, and Japanese.

.

Other Slots

  The information located in the rest of the slots is given in the following table:

Info Description Slot Name
ana_ante(Inst,ListOfIs)  anaphoric antecedents Discourse
c_class(Label,CClass)  coordination type Discourse
cas(Inst, Case)  syntactic case Syntax
dialog_act(Label, DialogAct)  dialogue act information Discourse
dialog_phase(DialogPhase)  dialogue phase Discourse
disc_honor(Inst,Inst)  honorifics (for Japanese) Discourse
disc_role(Inst,DiscRole)  discourse role Discourse
disc_stat(Label,Status)  discourse status Discourse
discourse_function(L,DiscFunc)  pragmatic discourse relation Sorts
dir(Label, YesNo)  (non)directional preposition Discourse
eq(Label,LabelOrHole)  equality constraint Constraints
gend(Inst, Gender)  morpho-syntactic gender Syntax
in_g(Label, GroupLabel)  label is in group Constraints
leq(Label,Hole)  scope/subordination constraint Constraints
nom_ante(Inst,ListOfIs)  anaphoric reference Discourse
num(Inst, Number)  morpho-syntactic number Syntax
pcase(Label,Inst,Atom)  subcategorized preposition Syntax
pers(Inst, Person)  person Syntax
prontype(I, PRef, PType)  type of a pronoun Discourse
pros_accent(Label)  prosodic accent Prosody
pros_boundary(Label)  prosodic (b3) marker Prosody
pros_emph(Label)  emphasis marking Prosody
pros_mood(Label, PMood)  prosodic mood Prosody
pros_probab(Label, Prob)  prosodic accent probability Prosody
ref_ante(Inst, Inst)  anaphora resolution Discourse
rel_ante(Inst,ListOfIs)  anaphoric reference Discourse
s_class(Label, SClass)  disambiguation class Sorts
s_sort(Inst, Sort)  sortal restriction Sorts
sem_focus(Label, LabelOrHole)  focus resolution Constraints
subj_honor(Inst,Inst,Honor)  subject honorific marker (Japanese) Syntax
syn_cat(Label,Atom)  syntactic category Syntax
ta_aspect(Inst, Aspect)  aspectual information Tense and Aspect
ta_mood(Inst, TMood)  mood Tense and Aspect
ta_perf(Inst, Perf)  perfect Tense and Aspect
ta_tense(Inst, Tense)  surface tense Tense and Aspect
topic(Label,GroupLabel,Topic)  topic information Discourse
unbound(LabelOrInstOrHole)  unbound markers Discourse

The argument values of the terms above and of the predicates described in the next sections are given in the following table:gif

Argument Description or list of values
Argx Hole, Inst, Label, Atom, or a Prolog list of the named Args
ArgType np, acomp, comp, xcomp
Aspect progr, nonprogr plus prospective, terminative for Japanese only
Atom Prolog atomic
BeginTime unsigned integer
Case ;(Case,Case), nom, gen, dat, acc, noc(ase)
Check ground, shape
Channel a, b, c, d, e, f, g, s, t
Class ambig, disc, grad, mod, mood, noun, quant, verb
CurrentLanguage see Language
CClass m, n, q, s, s
DialogAct ;(DialogAct,DialogAct), ~(DialogAct), accept, backchannel, bye, clarify, close, commit, confirm, control_dialogue, defer, deliberate, deviate_scenario, dialogue_act, digress, exclude, explained_reject, feedback, feedback_negative, feedback_positive, give_reason, greet, inform, init, introduce, manage_task, offer, politeness_formula, promote_task, refer_to_setting, reject, request, request_clarify, request_comment, request_commit, request_suggest, signal_non_understanding, suggest, thank
DialogPhase hello, opening, negotation, closing, good_bye
DiscFunc attitude, check, coherence, edit, emphasize, exemplify, hesitate, indifferent,, known, negative, nonpragmatic, pop, positive, prefer, push, repair, revised, smoothen, structure, surprise, uptake
DiscRole he, sp
EndTime unsigned integer
Functor Prolog functor
Gender ;(Gender,Gender), fem, masc, neut
GramFunc subj, obj, obj2, nof(unction)
Hole or H Prolog atom of the form h[a-z][0-9]*gif
Honor minus, plus
Info any valid term encoding information in a VIT
Inst or I Prolog atom of the form i[a-z][0-9]*
Label or L Prolog atom of the form l[a-z][0-9]*
Language de, en, ge, jp
ListOfLs Prolog list of Label elements
ListOfIs Prolog list of Inst elements
TMood ind, conj, imp
Number ;(Number,Number), sg, pl
Perf perf, nonperf
Person ;(Person,Person), 1, 2, 3
PRef sp, he, sp_he, third, third_he, top
Prob an integer i with $0 \leq i \leq 100$
PType refl, std, refl_std, recip, imp, event, event_std, demon, demon_event, zero (and intersent for Japanese)
PMood decl, prog, quest
Reading integer
SemRole arg1, arg2, arg3, refl
Sender Prolog atom which is the name of a Verbmobil component
String Prolog atom encoding a LATEX string
SClass sa, fp, mp, dra
SMood decl, imp, quest, decl_quest, decl_imp, imp_quest, decl_imp_quest
SlotName 'Conditions', 'Constraints', 'Tense and Aspect', 'Index', 'Prosody', 'Discourse', 'Sorts', 'VIT ID', 'Syntax'
Sort ;(Sort,Sort) or &(Sort,Sort) or ~(Sort) or abstract, action_sit, agentive, animal, anything, communicat_sit, entity, field, food, geo_location, human, info_bearer, info_content, institution, instrument, location, meeting_sit, mental_sit, move_sit, nongeo_location, object, position_sit, property, situation, space_time, substance, symbol, temp_sit, temporal, thing, time, vehicle
SourceLanguage see Language
Status old, new, inferred_old, contrast_new
Stream a valid Prolog stream object
Tense pres, past, future
Topic fr, mo, vf, wa
TurnEnd n, y
TurnNr unsigned integer
VIT Verbmobil Interface Term (not necessarily ground)
YesNo yes, no, yesno
   

The Prolog Implementation

  In Prolog, the VIT is implemented as a term of arity 9 named vit. The name and arity of this realization should be of no interest if the access to a VIT is always handled by the ADT package (or a similar abstraction). If this is the case, we are not restricted to this implementation in future. As already pointed out, in general, the information is encoded in lists and the elements are terms. This data structure is very flexible in adding and removing information, i.e. the terms.

The output of each component dealing with VITs has to make sure the following properties:gif

For describing the predicates of the ADT package we use the ``standard'' notation for call patterns (aka mode information):

+  
the argument is expected to be instantiated (not necessarily ground) and will not be changed during processing of the called predicate;

-  
the argument is expected to be a variable and will be bound during processing of the called predicate;

?  
the argument can be instantiated when calling the predicate and/or will be bound during processing of the called predicate.

The ADT package is realized as a Prolog module named vitADT which exports the predicates described in the following sections. For further remarks on the usage see Appendix gif.

Overview

The rest of this documentation is organized as follows. In Section gif we present the predicates for constructing a new VIT and filling it with information. In Section gif and Section gif we outline predicates for information access and those for deleting information, respectively. Section gif informs about predicates for checking the contents of a VIT. In Section gif we show predicates for printing a VIT. Miscellaneous predicates are described in Section gif.

Appendix gif explains how to get and install the ADT package and Appendix gif shows how to use it. Appendix gif introduces the term conversion package atom2term which comes with the library package distribution. Finally, Appendix gif sketches briefly the contents and possible extensions of the on-line dictionaries contained in vitSemLex.pl and vitValues.pl.

For each predicate presented in this document we give the call pattern(s) and a brief description sometimes including an example.


next up previous contents index
Next: Create and Fill a Up: A Library Package for Previous: Contents

Michael Dorna , VM Report 238, 5/18/2000