An in depth analysis of spoken dialogue data typically requires close inspection of multiple levels of description whereas, e.g., various kinds of exploitation of data requires access to only one parlicular level. MATE has looked at both the dimension of individual levels and the cross-level dimension. The individual levels addressed by MATE includes prosody, morpho-syntax, dialogue acts, co-reference, and communication problems. These levels as well as cross-level issues have been addressed within the same common framework to ensure a common approach across levels. This framework makes it easier for the annotator to move from one level to another and facilitates the use of the same set of software tools and the same interface look and feel, independently of the level in question. MATE has addressed fairly many levels which were all found to work within the proposed framework. It is therefore very likely that it will also work for any other level following the same approach.
This report is primarily aimed at people working in the area of markup
of spoken dialogue corpora. It builds on a common standard framework in
terms of a coding module (see below) at the conceptual level and an underlying
representation in XML at the implementational level. For each level considered
by MATE recommendations are provided on how to encode relevant phenomena,
one or more best practice coding modules are provided and several examples
are given. This should make it easy for a person from the target group
to apply the coding modules for markup and to design his/her own coding
module for a level following the MATE approach. The MATE workbench - asfotware
also developed in the MATE project for the support of annotation work -
comes with all the coding modules presented in this report built-in and
supported and there is general support for adding and applying new coding
modules.
| # | Item | Example |
| 1. | Name of the module, including an acronym. | Verbmobil dialogue acts [VM-DA]. |
| 2. | Coding purpose of the module. | To code task-specific dialogue acts for Task T7. |
| 3. | Coding level. | Dialogue acts. |
| 4. | The type of data source scoped by the module. | Spoken dialogue corpora |
| 5. | References to other modules, if any.
For transcriptions, the reference is to a resource. |
Orthographic transcription module OTM2 + Prosody module PM3 + Semantics module SM5. |
| 6. | A declaration of the markup elements and their attributes.
An element is a feature, or type of phenomenon, in the corpus for which a tag is being defined. |
- |
| 7. | A supplementary informal description of the elements and their attributes,
including:
a. Purpose of the element, its attributes, and their values.
|
- |
| 8. | An example of the use of the elements and their attributes. | - |
| 9. | A coding procedure. | - |
| 10. | Creation nodes | - |
The descriptions given in this document allow a complete separation from the underlying machine representation for which MATE uses XML. The separation means that in principle one could decide to other formats than XML at the implementational level without affecting the coding module in any way.
In this document recommendations will be made that rely on a given markup language, XML, that has already found broad support. This is an important factor as the availability of parsers and other software enhances the integration of this proposal into existing environments.
Aiming at maximal user-friendliness, descriptive appropriateness, integration of information, reuse, and computational efficiency necessarily requires technical description, thus the focus of this document is not a discussion of theoretical concepts of the five levels only, but a proposal for the annotation of the phenomena of these levels, their relations, and the encoding of that information.
The guidelines and the markup schemes proposed in this document are
intended to be flexible. The general framework described above was applied
to the levels investigated for the MATE project. The level specific coding
modules or coding schemes developed are designed to follow and thereby
exemplify the usability of the markup framework. As these coding schemes
are examples, they certainly reflect level specific theoretical considerations,
but the claim of this document is not that the markup schemes proposed
for individual linguistic levels are the only theoretical conceptualizations
of the levels that can be encoded in this markup framework. In contrast,
they have been developed for the community to have best practice annotation
schemes for the individual levels, cf. D1.1, which can be adapted to special
needs by refinement, language specific additions or which can be seen as
templates for other levels of description.
Chapter 2 "Level specific markup" includes the level wise description of the markup for individual levels covered in MATE. At the beginning of chapter 2 there is an outline of the level internal structure of the chapters that has been used to make descriptions as compatible as possible.
Chapter 2 and its sub-sections are relevant for people who want to
Chapter 3 "Annexes" provides information relevant for the implementation
of the approach described. It is intended for people who need to