name: Edited Transcription (ET)
coding purpose: to code disfluency phenomena in speech
coding level: Morphosyntax
data sources: spoken corpora
module references: orthographic transcription module
description: four elements are used to annotate disfluency phenomena. seg elements are all-purpose elements intended to mark dysfluent portions of dialogue and their possible repairs (when present in context). Attribute type identifies the specific type of disfluency which is found in the annotated segment, namely whether it is an interruption, a non-standard use, an omission, or a completion of a previous utterance. Attribute rep allows the annotator to indicate the target or standard form of a non-standard usage. Attribute ins allows to insert missing elements. seg elements convey basic, obligatory information. Further refinements of this obligatory information are possible through use of recommended and optional elements, which refer to seg elements through inline href links, namely dys, reparandum, signal and repair. dys elements serve the purpose of specifying the type of relationship between two seg elements, when these are used to mark dysfluencies which are contiguous in nature. An attribute type can be used to further define the type of disfluency. The elements reparandum, signal and repair are to be seen as a means for a more detailed analysis of the components of a dysfluency. By referring to and qualifying seg elements, they serve the purpose of specifying which previously identified seg element is repaired, which element is signalling that a repairing sequence is about to be uttered, and which element corresponds to the repair in the strict sense.
example:
<seg id="seg_001" type="broken" href="orth.xml#id(w_001)..id(w_002)"/>
markup declaration:
ELEMENT edit_file (seg+, dys+)
ELEMENT seg
ATTRIBUTES:
type (broken | sic | gap | scomp | ocomp)
rep TEXT
ins TEXT
ID
HREF
Recommended extensions to the core scheme:
ELEMENT dys (repair?, reparandum?, signal?)
ATTRIBUTES:
type TEXT
ID
HREF
Optional extensions:
ELEMENT reparandum
ATTRIBUTES:
ID
HREF
ELEMENT signal
ATTRIBUTES:
ID
HREF
ELEMENT repair
ATTRIBUTES:
ID
HREF
coding procedure: Encode by coder 1. Check by coder 2.
creation notes:
Morphosyntactic Annotation Coding Module
name: Morphosyntactic Annotation
coding purpose: identification of morphological words, annotation of part-of-speech categories, annotation of morpho-syntactic features, annotation of interrupted words, annotation of clitics, annotation of compound words, annotation of derivational morphology.
coding level: Morphosyntax
data sources: spoken or written corpora
module references: orthographic transcription module
description: six elements are used to annotate morphological analysis . mw elements identify morphological words. Attribute type is mandatory: it specifies the part-of-speech category of an item. In this implementation, type is used to encode EAGLES-conformant part-of-speech categories; attribute subtype is optional, and may be used to specify additional morphosyntactic features to be associated with words. In the actual implementation presented here, subtype is used to convey EAGLES-conformant recommended morpho-syntactic values. Finally, attribute lemma allows for specification of the lemma of the item in question. An optional attribute broken serves to annotate word partials.cpw elements are used to annotate compounds. Attributes are the same as those for mw elements. A cpw_h element is used to mark the semantic head in a compound. Three elements, namely stem, prefix and suffix are used to annotate derivational morphology.
example:
markup declaration:
ELEMENT mw (lexit*, stem*, suffix*, prefix*)
ATTRIBUTES
type (N|V|AJ|PD|AT|AV|AP|C|NU|I|U|R|F|DM|PU)
lemma TEXT
subtype TEXT
broken (Y|N)
ID
HREF
ELEMENT cpw (cpw_h?)
ATTRIBUTES
type (N|V|AJ|PD|AT|AV|AP|C|NU|I|U|R|F|DM|PU)
subtype TEXT
broken (Y|N)
ID
HREF
ELEMENT cpw_h
ATTRIBUTES
ID
HREF
ELEMENT stem
ATTRIBUTES
type (N|V)
ID
HREF
ELEMENT suffix
ATTRIBUTES
ID
HREF
ELEMENT prefix
ATTRIBUTES
ID
HREF
The following element is used in case there is a reference lexicon in xml format
ELEMENT lexit
ATTRIBUTES
ID
HREF
coding procedure: morphological annotation is almost always performed automatically. Manual checking is recommended.
creation notes:
name: Chunking
coding purpose: to code syntactic structure in terms of labelled entities corresponding to chunks. Each chunk is further analyzed for its internal structure.
coding level: Morphosyntax
data sources: spoken or written corpora
module references: morphosyntactic annotation module
description: seven elements are used to annotate syntactic analysis. ch elements are used to identify a sequence of adjacent word tokens which are mutually related through dependency links (i.e., a chunk). Two attributes are used for the description of chunks: type is mandatory, and encodes the syntactic category to which a given chunk belongs. broken is optional, and serves to annotate chunk partials. potgov elements identify “potential governors”, namely the lexical heads of chunks. aux, cop, intro, modal and causal elements specify, respectively, the auxiliary verb, the copula, the introducer or preposition, the modal auxiliary verb and the causative verb in a chunk, if applicable.
example:
given this input…:
<mw id="mw_001"> hello </mw>
<mw id="mw_002"> can </mw>
<mw id="mw_003"> I </mw>
<mw id="mw_004"> help </mw>
<mw id="mw_005"> you </mw>
…the following annotation is built:
<ch id="ch_001" type="ADV" href="mword.xml#id(mw_001)">
<potgov id=”p_001” href=” mword.xml#id(mw_001)”/>
</ch>
<ch id="ch_002" type="FV" href="mword.xml#id(mw_002)">
<potgov id=”p_002” href=” mword.xml#id(mw_002)”/>
</ch>
<ch id="ch_003" type="N" href="mword.xml#id(mw_003)">
<potgov id=”p_003” href=” mword.xml#id(mw_003)”/>
</ch>
<ch id="ch_004" type="FV" href="mword.xml#id(mw_004)">
<potgov id=”p_004” href=” mword.xml#id(mw_004)”/>
</ch>
<ch id="ch_005" type="N" href="mword.xml#id(mw_005)">
<potgov id=”p_005” href=” mword.xml#id(mw_005)”/>
</ch>
markup declaration:
ELEMENT ch (potgov, aux?, cop?, intro?, modal?, caus?)
ATTRIBUTES
type (ADJ|PA|ADV|SUBORD|N|P|FV|G|I|PART|Di|ADJ_PART|COORD|U)
broken (Y | N)
ID
HREF
ELEMENT potgov
ATTRIBUTES
ID
HREF
ELEMENT aux
ID
HREF
ELEMENT cop
ATTRIBUTES
ID
HREF
ELEMENT intro
ATTRIBUTES
ID
HREF
ELEMENT modal
ATTRIBUTES
ID
HREF
ELEMENT caus
ATTRIBUTES
ID
HREF
coding procedure: Chunking can be performed either automatically or manually. In the first case, a manual checking of the chunker output is recommended. In the second case, the standard practice is sufficient (i.e., encode by coder 1. check by coder 2.)
creation notes:
Functional Annotation Coding Module
name: Functional Annotation
coding purpose: to encode functional analysis of data, that is to provide information about how grammatical relations such as subject, object and indirect object are instantiated in context.
coding level: Morphosyntax
data sources: spoken or written corpora
module references: morphosyntactic annotation module
description: Encoding is carried out by means of funct elements, which point to lexical tokens only indirectly.
example:
<funct id="funct_001" >
markup declaration:
ELEMENT funct (head, dep+)
ATTRIBUTES:
ID
HREF
ELEMENT head
ATTRIBUTES:
head TEXT
diath (active|passive|middle)
person (1|2|3)
number (sg|pl)
gender (m|f|n)
v_type (impers)
ID
HREF
ELEMENT dep
ATTRIBUTES:
type (subj|dobj|obj2|iobj|mod|comp)
intro TEXT
case TEXT
synt_real (n_cl|cl|c|x)
ID
HREF
ELEMENT coord (arg+)
ATTRIBUTES:
type (and|or|comma)
ID
HREF
ELEMENT arg
ATTRIBUTES:
ID
HREF
ELEMENT bind (arg+)
ATTRIBUTES:
ID
HREF
coding procedure: manual annotation
of the functional syntactic analysis of a text is performed through the
following steps:
creation notes: