In this section we introduce the common core of markup distinctions common to all options allowed by the meta-scheme.
4.1 Markup Declaration
The following elements are used in the coreference schemes. As in all other schemes we use a single element to mark both anaphoric expressions and the NPs that serve as antecedents; the main difference from the MUC-7 scheme and DRAMA is that, following Bruneseaux and Romary (1997) (who, in turn, followed the TEI specification), we separated out the annotation of co-specification from the annotation of discourse entities. We use therefore two main elements: <coref:de>, used to annotate the elements which enter in co-specification relations; and <coref:link>, used for expressing co-specification between discourse entities. This way of annotating relations has the advantage that a discourse entity can be related by links to more than one other discourse entity; this is important to allow a discourse entity to be related both to an antecedent introduced in the discourse and to an entity in the universe of discourse. In addition, we have elements for specifying objects in the visual situation that can serve as antecedents, and for marking text constituents that introduce elements which participate in anaphoric relations in an indirect way.
Embedded elements: <coref:anchor>
Attributes:
Attributes:
Attributes:
4.2 Description of Elements
Description
The assumption underlying most annotation schemes for coreference is that processing text involves building a discourse model containing discourse entities, and that anaphoric relations are relations between these discourse entities (Webber, 1978; Heim, 1982; Kamp, 1981). We use the <coref:de> tag to annotate the text spans that introduce a discourse entity - that is, that can be subsequently referred to by means of anaphoric expressions. These are commonly noun phrases. Not all noun phrases do this: for example, whereas
John likes Bill
introduces two discourse entities, as can be shown by the fact that a follow-up like
He is crazy
is ambiguous in that he can refer either to John or to Bill, the sentence
John is a policeman
which from a syntactic point of view also contains two NPs, nevertheless
only introduces one discourse entity, as can be seen by the fact that in
this case, the continuation He is crazy is not ambiguous. As a consequence,
the NP a policeman would not get a <coref:de> tag; in
other words, the textual elements given a <coref:de> tag are
a subset of the range of NPs.
Data Source
The annotation for <coref:de>'s should be included in a file with pointers to a base file which has already been XML tagged with information about the structure of the conversation, ideally using TEI coding (http://etext.virginia.edu/TEI.html), suitably converted into XML. A typical dialogue marked up in TEI has a <teiHeader>, <head>, and a <body> which is broken up into utterances (<u>), marked for speaker. Each <pause> is marked. The <u> might be further segmented, for example into prosodic phrases, using the TEI <seg> tags. Gestures and mouse clicks may also be marked, as may notes made by the annotator or the initial transcriber, and more detailed information can be given about pause durations, type of transitions between speakers, and many other features. The French conversation in (4.1), for example (from the Microfusées corpus), might be marked up as in (4.2):
(4.1)
Formateur: Alors donc / vous avez / ici [au milieu de la table] / les
modèles des fusées volé /
[Le formateur dispose le petit paquet de dessins des 9 fusées.]
Mia: Oui
Formateur: Et vous allez essayer de vous mettre d'accord sur un classement
/ hein classer les fusées qui ont bien volé ou qui ont moins
bien volé / [Le formateur montre avec les mains un endroit (bien
volé puis un autre (moins bien volé .]
Mia: Alors par exemple de celle qui a / le / qui a volé le plus
loin / à à celle qui a volé moins loin(?)
Instructor: OK, then, here you have [in the middle of the table] the
models of the rockets. [The instructor puts down the little packet of 9
rocket designs.]
Mia: Yes
Instructor: And you are going to try to agree on a classification...
to classify the rockets which flew well or which flew less well.. [The
instructor points to one place (those which flew well) then another (those
which flew less well)]
Mia: So for example from the one which.. it.. which flew the furthest...
to the one which flew the least far?
(4.2)
| <u id="u1" who="F">
<seg id="u1seg1"> Alors donc <pause dur="short"/> vous avez <pause dur="short"/> ici <note place="inline"> au milieu de la table </note> <pause dur="short"/> les modèles des fusées <pause dur="short"/> </seg> <note place="outline" type="stage directions"> Le formateur dispose le petit paquet de dessins des 9 fusées. </note> </u> <u id="u2" who="M" trans="pause"> <seg id="u2seg1"> Oui </seg> </u> <u id="u3" who="F"> <seg id="u3seg1"> Et vous allez essayer de vous mettre d'accord sur un classement <pause dur="short"/> </seg> <seg id="u3seg2"> hein classer les fusées qui ont bien volé ou qui ont moins bien volé <pause dur="short"/> </seg> <note place="outline" type="stage directions"> Le formateur montre avec les mains un endroit (bien volé) puis un autre (moins bien volé) . </note> </u> <u id="u4" who="M" trans="pause"> <seg id="u4seg1"> Alors par exemple de celle qui a <pause dur="short"/> le <pause dur="short"/> qui a volé le plus loin <pause dur="short"/> à celle qui a volé moins loin (?) </seg> </u> <u id="u1" who="F"> <seg id="u1seg1"> OK, then, <pause dur="short"/> you have <pause dur="short"/> here <note place="inline"> in the middle of the table </note> <pause dur="short"/> the models of the rockets <pause dur="short"/> </seg> <note place="outline" type="stage directions"/> The instructor puts down the little packet of 9 rocket designs </note> </u> <u id="u2" who="M" trans="pause"> <seg id="u2seg1"> Yes </seg> </u> <u id="u3" who="F"> <seg id="u3seg1"> And you are going to try to agree on a classification <pause dur="short"/> </seg> <seg id="u3seg2"> to classify the rockets which flew well or which flew less well <pause dur="short"/> </seg> <note place="outline" type="stage directions"> The instructor points to one place (those which flew well) then another (those which flew less well). </note> </u> <u id="u4" who="M" trans="pause"> <seg id="u4seg1"> So for example from the one which <pause dur="short"/> it <pause dur="short"/> which flew the furthest <pause dur="short"/> to the one which flew the least far (?) </seg> </u> |
The details of the TEI mark-up may not suit all corpora, depending on the format in which the initial transcription has been presented. For example, in the TRAINS corpus each speaker turn is segmented into a number of different utterances, separated at prosodic phrase boundaries (4.3). This means that the <u> are much shorter than those in true TEI-conformant mark-up, and there is then no TEI tag suitable for grouping the utterances into turns. For the moment, we have adopted the procedure in this case of introducing <turn> tags for a whole turn, and using <u> for each utterance or prosodic phrase:
(4.3)
| 44.1 S: +okay+
44.2 : okay 44.3 : lemme run / 44.4 : lemme make sure I got all this 44.5 : okay 44.6 : you wanna send E2 44.7 : you wanna link 44.8 : uh 44.9 : the boxcar at Elmira to E2 44.10 : and send that to Corning 45.1 M: yeah 46.1 S: and have it load oranges 47.1 M: right 48.1 S: okay |
(4.4)
| <turn id="t44" who="S">
<u id="u44.1">+okay+</u> <u id="u44.2">okay</u> <u id="u44.3">lemme run</u> <u id="u44.4">lemme make sure I got all this</u> <u id="u44.5">okay</u> <u id="u44.6">you wanna send E2</u> <u id="u44.7">you wanna link</u> <u id="u44.8">uh</u> <u id="u44.9">the boxcar at Elmira to E2</u> <u id="u44.10">and send that to Corning</u> </turn <turn id="t45" who="M"> <u id="u45.1">yeah</u> </turn <turn id="t46" who="S"> <u id="u46.1">and have it load oranges</u> </turn <turn id="t47" who="M"> <u id="u47.1">right</u> </turn <turn id="t48" who="S"> <u id="u48.1">okay</u> </turn> |
If one wishes to impose syntactic restrictions on potential markables - which is a good idea for annotation exercises of any complexity - then this basic level must be further annotated with something which allows that constraint to be expressed - word tags, or full syntactic elements, or morpho-syntax tags as defined in the MATE Morpho-syntax scheme (Pirrelli and Soria, 1999). Since different schemes make different choices, the exact data source requirements are left to the individual schemes.
Assignment
The only attributes of <coref:de> that have to be set are id and href, both of which are automatically computed by the MATE workbench, either by making <coref:de> elements match the output of some MATE query on morphosyntactic tagging or by computation from text selected in the coding interface by the human user.
Example
Assuming that chunks with nominal governors are chosen as markables and that the sentence
(4.5) John likes Bill
would get annotated with chunks as follows:
(4.6)
| ch.xml |
| <ch id="ch_001" type="N">
<potgov id="p_001">John </potgov> </ch> <ch id="ch_002" type="V"> <potgov id="p_002">likes </potgov> </ch> <ch id="ch_003" type="N"> <potgov id="p_003">Bill </potgov> </ch> |
then the following discourse entities would be annotated:
(4.7)
| coref.xml |
| <coref:de id="de_001" href="ch.xml#id(ch_001)"/>
<coref:de id="de_002" href="ch.xml#id(ch_003)"/> |
Important Note: Since the underlying XML representation is meant to be transparent to the annotator using the MATE tools, in the examples below we have simplified the notation considerably so as to make it easier for non-XML experts to understand the annotation; this would also make it clearer that the meta-scheme does not crucially depend on a particular type of basic level markup. First of all, we give examples in plain text, abstracting away from the chunking level, except in a few cases when this is necessary. Second, instead of representing the markup by means of href pointers as in (4.7), we will adopt a more conventional SGML-style format with tags wrapped around the parts of the text to be annotated with a <coref:de> element, so as to make it clearer to the annotator which part of the text to highlight and to mark; the representation in (4.7) will be automatically constructed by the tool and the annotator need not be aware of it. In our examples, we will generally use the following representation, rather than the format in (4.7):
(4.8)
| <coref:de>John</coref:de>
likes <coref:de>Bill</coref:de> |
Coding Procedure
Left to the individual schemes.
Markup Table
|
|
|
| id | [ASCII] |
| href | <ch> |
4.2.2 Link and Anchor Entities
Description
<coref:link> elements are used to mark anaphoric relations between discourse entities, the most basic of which is the identity relation. This relation obtains between two phrases in a text when they denote the same object in the world; the phrases used to refer to this object can be the same, like 'la surface... la surface' in (4.9), 'orange juice... orange juice' in (4.10), 'les ailerons... les ailerons' in (4.11) or different, as is seen with 'the engine E3... it... it' in (4.12), or 'ces deux fusées... elles' in (4.13). As these last two examples suggest, it is very common for a pronoun to be used to refer to a discourse entity previously referred to by a full noun phrase.
(4.9)
| S: Créer la surface.
W: Opération effectuée S: Modéliser la surface W: Quel nom voulez-vous donner à la surface ? S: Create the surface W: Done S: Model the surface W: What name do you want to give to the surface ? (MF) |
(4.10)
| When do we have orange juice
at Elmira?
We have orange juice at Elmira at 6 a.m. (T) |
(4.11)
| 197 F: mmh / Donc qu'est ce que vous
allez garder en fait (?) + /
198 M: |la longueur du tube et les ailerons | 199 D:| les ailerons | 200 F: Donc les ailerons vous m'avez dit. 197 F: mmm / Well, what are you going to keep, then ? / 198 M: the length of the tube and the wings | 199 D: | the wings | 200 F: well, the wings, you said (MF) |
(4.12)
| we're gonna take the engine E3 and shove it over to Corning, hook it up to the tanker car... (T) |
(4.13)
| 193 F: Donc qu'est ce qui / qu'est
ce qui serait commun à ces deux fusées. Ces deux
fusées ont /
194 D: c'est qu'elles ont / elles ont la même... 193 F: What would it be that these two rockets have in common? These two rockets have / 194 D: it's that they have / they have the same... (MF) |
| A group of children perform
an intricate dance in a small theatre in the northern Sri Lankan town of
Jaffna.
The appreciative audience sit in the open air and applaud their performance. The members of the Centre for Performing Arts in Jaffna are justly proud of their performance...(BBC) |
In this section we only discuss the case of links describing identity relations, but nothing prevents an annotator to use a wider range of relations, as done in the DRAMA scheme; some suggestions concerning possible relations are in Section 8.
Data Source
The <coref:link> and <coref:anchor> elements point to <coref:de> elements.
Segmentation/Selection
Not applicable (the information provided by <coref:link> elements comes entirely from their attributes).
Assignment
The HREF attributes of link and anchor elements both refer to the ID of an antecedent, which can be either a <coref:de> element, a <coref:ue> element, or a <coref:seg> element (see below). For the moment, we assume that the antecedent denotes the same object as the <coref:de> element, and the ident relation is used. We assume in the rest of this document that the annotation is contained within a file 'coref.xml' to which the href elements point.
Coreference chains: It is often the case that more than two discourse entities refer to the same object; in this case, a coreference chain is formed. Because the identity relation is transitive, if A is ident with B and B is ident with C, then A is ident with C; so it doesn't matter which item in a coreference chain is chosen as antecedent for a new phrase. This can be tracked through the markup.
Furthermore, since the identity relation is symmetric, it doesn't matter which <coref:de> element is chosen as 'current element' and which one as 'anchor'. It is often less confusing, however, to adopt the convention that the <coref:link> element should point to the latest discourse entity, whereas the <coref:anchor> element should point to the antecedent.
Participants interpret anaphoric expressions differently: It is also possible to observe that at a certain point in a dialogue the conversational participants had differences of opinion about coreferential links. For this reason, links can contain specifications of which agent or set of agents believes them to hold, via the optional WHO-BELIEVES attribute. The default value for this attribute is SHARED.
Example
We use the <coref:link> and <coref:anchor> elements to mark anaphoric relations, as follows. When two noun phrases marked as <coref:de> elements co-specify, a <coref:link> element is added. The href attribute of this element points to the anaphoric expression, and contains at least one <coref:anchor> element specifying the antecedent (by means of a second href pointer). The type of relation that holds between the two discourse entities (the values of which depend on the exact scheme implemented) is specified by the type attribute of the <coref:link> element. (As we will see below, specifying anaphoric relations by means of elements embedded into a <coref:link> element allows the annotator to mark for ambiguities of co-specification.) Here are some example annotations.
(4.15)
| coref.xml |
| When do we have<coref:de ID="de
_01">orange juice</coref:de>at Elmira?
We have <coref:de ID="de _02">orange juice</coref:de>at Elmira at 6 a.m. (T) <coref:link type="ident" href="coref.xml#id(de_02)">
|
(4.16)
| coref.xml |
| 197 F: mmh / Donc qu'est ce que vous
allez garder en fait (?) + /
198 M: |la longueur du tube et <coref:de ID="de _98">les ailerons</coref:de> 199 D:<coref:de ID="de_99">les ailerons</coref:de> 200 F: Donc <coref:de ID="de_100">les ailerons</coref:de> vous m'avez dit. <coref:link href="coref.xml#id(de_99)" type="ident">
|
(4.17)
| we're gonna take <coref:de ID="de_07">the
engine E3</coref:de> and shove
<coref:de ID="de_08">it</coref:de> over to Corning, hook <coref:de ID="de_09">it</coref:de> up to the tanker car... <coref:link href="coref.xml#id(de_08)" type="ident">
|
Ambiguity: The reason why more than one <coref:anchor> element
may be embedded in a <coref:link> element is to annotate ambiguity.
In case more than one entity appear to be equally likely antecedents for
an anaphoric expression, each of the possibilities can be marked by means
of a separate <coref:anchor> element. In the following example, the
pronoun it in 15.16 could refer equally well to engine E3 or to the tanker
car. If the annotator desires to annotate both antecedents, as in DRAMA
or in the Lancaster scheme, this can be done as shown below.
| coref.xml |
| 15.12 : we're gonna take <coref:de
ID="de_15">the engine E3</coref:de>
15.13 : and shove <coref:de ID="de_16">it</coref:de> over to Corning 15.14 : hook <coref:de ID="de_17">it</coref:de> up to <coref:de ID="de_18">the tanker car</coref:de> 15.15 : _and_ 15.16 : and send <coref:de ID="de_19">it</coref:de> back to Elmira <coref:link href="coref.xml#id(de_16)" type="ident">
|
Coding Procedure
Left to the individual schemes.
Markup Table
|
|
|
| id | [ASCII] |
| who-believes | [ASCII] |
| type | ident, member, subset, poss, e-rel, argptv, prop, bound, f-v, inst, genrel |
| subtype | attr, part, sposs, cause |
| href | <coref:de> |
| content | <coref:anchor> |
|
|
|
| id | [ASCII] |
| href | <coref:de> |
4.2.3 Universe and UE Entities
In face-to-face or human-machine dialogue, participants may make reference to items visible to them at the time of speaking. A simple example of this is Pass the salt, please, where salt may not have been previously mentioned in the conversation, and thus does not corefer with any other <coref:de>, but does refer to an entity which is in the visible situation. Tracking these references is important for multimodal systems (Bruneseaux and Romary, 1997), and they have been annotated reliably in the MapTask. This tracking requires two new elements: a <coref:universe> element (as in the Bruneseaux and Romary scheme) used to specify a 'universe of discourse', that is, a set of objects, each specified by a <coref:ue> element.
The <coref:universe> element may also be used to specify references to items in the non-visible 'universe' of shared knowledge which allows hearers to correctly assign reference to items such as the Eiffel Tower - the so-called `larger-situation' (Hawkins, 1978) or `hearer-old' (Prince, 1981) references; however, annotators should keep in mind that it is often difficult to do such categorizations reliably, as found out by Fraurud (1990) and Poesio and Vieira (1998).
Description
In order to mark up reference to items in the visual situation, the items in the visual situation are listed as universe entities (<coref:ue>), embedded within a <coref:universe> element. Each <coref:ue> element has an ID, like <coref:de> do, so that a relation of identity between a noun phrase and an object in the visual situation can be encoded by an ident link between a <coref:de> and a <coref:ue> just like identity between two <coref:de> elements.
Where feasible, it is suggested that all objects in the visual situation be included in a single <coref:universe> element. In cases like the MapTask dialogues where the participants to the conversation have two different maps, it is suggested that three universes be created: one with ID common containing all objects shared between the visual situations, and then one universe for each conversational participant containing the elements known only to that element, and with value modifies="common". This will ensure that the shared elements receive a unique ID.
In some types of dialogues the visual situation may change: new objects may be created and old objects destroyed (e.g.,when the visual situation is the screen). These situations may be modeled by allowing for the creation of new universes in the middle of dialogues, although this is not yet supported.
Data Source
There are no additional requirements on source data for the use of universes, unless a scheme implements a restriction on what coreferences are to be annotated based on the types of objects referred to; in this case, the annotator needs a description of the objects to check against. For instance, if the annotator were to mark up only references to Map Task landmarks, then the annotator would need a list of landmarks or copies of the maps. This information may not be enshrined in the data files themselves but in the coding module for the scheme instantiation.
Segmentation
Not applicable.
Assignment
The modifies attribute for all but the common universe should be set to common.
Example
The following is a simple example of the use of a universe.
(4.18)
| <coref:universe ID="u1">
<coref:ue ID="ue1">Diamond mine</coref:ue> <coref:ue ID="ue2">Graveyard</coref:ue> <coref:ue ID="ue3">Fast running creek</coref:ue> <coref:ue ID="ue4">Fast flowing river</coref:ue> <coref:ue ID="ue5">Canoes</coref:ue> </coref:universe> FOLLOWER: Uh-huh. Curve round. To your right.
<coref:link href="coref.xml#id(de_50)" type="ident">
|
Note that <coref:de ID="de_55">, that, could be marked
up as ident with either the universe entity ue4, or with the discourse
entity de_54. One of the advantages of this way of annotating references
to the visual situation is that an extended coreference chain tracking
mechanism should be able to include in a coreference chain both references
to universe elements and references to discourse entities; the annotator
may then choose how he/she wishes to annotate this. If the annotation tool
can't do this type of coreference chain tracking, then the coding manual
should include a disambiguation rule: for the type of multimodal applications
on which Bruneseaux and Romary worked it seems preferable to mark links
with universe entities rather than marking links with previous discourse
entities.
The following is a more complex example which includes multiple universes
encoded different world knowledge and a disagreement about a coreferential
link in the dialogue.
(4.19)
| <coref:universe ID="common">
<coref:ue ID="ue2">gold mine</coref:ue> </coref:universe> <coref:universe ID="GIVER_universe" modifies="common"> <coref:ue ID="ue1">diamond mine</coref:ue> </coref:universe> <coref:universe ID="FOLLOWER_universe" modifies="common"> ..... </coref:universe> GIVER: Do_you have <coref:de
ID="de_20">diamond_mine.</coref:de>
<coref:link href="coref.xml#id(de_20)" who-believes="G"
type="ident">
|
Coding Procedure
The annotation should begin with the creation of a <coref:universe>
element (or a common universe plus one for each participant, if their knowledge
is not the same). This is commonly done before the annotation of discourse
entities if the universe is static.
Markup Table
|
|
|
| id | [ASCII] |
| modifies | <ch> |
| content | <coref:ue> |
|
|
|
| id | [ASCII] |
| content | TEXT: description of
object |
Description
Even if we only consider anaphoric relations involving nominal elements,
there are at least two situations in which an annotator may wish to mark
an anaphoric relation that also involves other types of constituents. The
first is the case in which the anaphoric element is either unexpressed
or incorporated in the verb. The second situation are the cases of so-called
discourse
deixis (Webber, 1991), in which the antecedent of a nominal expression
is an abstract object such as an event or proposition introduced in the
discourse somewhat indirectly by sentences. (DRAMA allows for such relations
to be marked.)
The solution we propose is to use a <coref:seg> element which, like
the TEI <seg> element, can be used to mark up arbitrary pieces
of text. <coref:seg> elements are given an id which can then
be pointed at by a <coref:link> element just like for other
anaphoric relations.
The <coref:seg> element could also be used to annotate anaphoric
relations between non-nominal elements, such as in VP ellipsis.
Data Source
Data source requirements for <coref:seg> elements are the
same as for <coref:de> elements.
Segmentation
To be specified by the coding manual for a given scheme.
Assignment
The id attribute is automatically set by the workbench.
Example
Using <coref:seg> to mark up empty and incorporated constituents:
As seen above, in Italian, Spanish and many other languages, certain nominal
constituents may not be realized; this is especially common for nominals
in subject position, but can also happen in object position, especially
in instructions, as in:
Add the dry yeast to the water and let _ sit for a few minutes. Add the rest of the water and sugar. Stir _
These nominals are present in annotations produced by hand (e.g., in
the Penn Treebank), but the parsers used for parsing spoken dialogues tend
not to produce representations containing empty constituents in this case.
In case these nominals are not represented in the base level, we verb can
be marked with a <coref:seg> element, and the anaphoric relation
coded as usual by means of <coref:link> elements, as follows:
(4.20)
| coref.xml |
| A: Dov'e` <coref:de ID="de_157">Gianni?</coref:de>
[Where is Gianni?] B: <coref:seg type="pred" ID="seg_158> e` andato a mangiare </coref:seg> [_ went to have lunch] <coref:link href="coref.xml#id(seg_158)" type="ident">
|
This representation can only be used without loss of information when
there is at most one empty elements; this is true for Italian, but not
for Japanese or Portuguese. If more precision is needed, the annotator
could define more specific identity relations also specifying which empty
argument of the verb enters in the anaphoric relation: such relations could
be called, e.g., subj-ident, obj-ident, etc. These relations could then
used instead of ident as the value of the type attribute of the <coref:link>
element; we won't make them part of the annotation scheme discussed here,
however.
A second case in which an argument is not realized by means of a nominal
is that of incorporated clitics, such as daselo in (4.21) below.
Clitic suffixes are also found in transcriptions of spoken English:
| 44.4 : lemme make sure I
got all this
44.5 : okay (T) |
In the case of incorporated clitics, as well, the verb can be marked
with a <coref:seg> element when the parser doesn't produce a morphologically
decomposed representation, and then the anaphoric relations in which the
clitics are involved can be encoded either by means of a single ident relation
or by means of more fine-grained relations such as subj-ident or obj-ident.
(4.21)
| coref.xml |
| A: Mira, te doy <coref:de ID="de_167">este
libro</coref:de>
¿Conoces a <coref:de ID="de_168">mi suegra?</coref:de> B: Sí, claro. A: Pues <coref:seg ID="seg_169">dáselo</coref:seg> cuando <coref:de ID="de_170">la</coref:de> veas. <coref:link href="coref.xml#id(seg_169)" type="obj-ident">
|
Provided that the <coref:seg> elements are identified during
the first pass of markable identification, encoding this information should
not be any harder than in the case of MUCCS. The real question for this
type of annotation is which empty elements to annotate --e.g., in addition
to 'small pro' elements such as those discussed above, the annotator may
also decide to annotate `big PRO' elements that according to some syntactic
theories occupy the subject position of infinitival clauses.
Using SEG to mark the antecedents of discourse deixis: Abstract
objects such as events, actions and propositions can all serve as antecedents
of anaphoric expressions. We are not aware of any reliability results for
this type of annotation, but the <coref:seg> element can be
used to identify the antecedents in this type of anaphora. If desired,
the annotator could use a second attribute type to specify the type of
object introduced by the <coref:seg> element; type would have
values event, prop and action.
(4.22)
| <coref:seg type="event" ID="seg_130">
The 23-year-old had hit his head against another player </coref:seg> during a game of Aussie-rules football. McGlinn remembered nothing of <coref:de ID="de_131"> the collision </coref:de>, but developed a headache and had several seizures. <coref:link href="coref.xml#id(de_131)" type="ident">
|
(4.23)
| a. Despite the latest negative results,
doctors
are still
convinced that Tamoxifen can prevent breast cancer. This is because of the way it blocks the action of oestrogen, the female sex hormone that can make the breast cells of some women go out of control. b. Despite the latest negative results,
<coref:link href="coref.xml#id(de_130)" type="ident">
|
(4.24)
| a.
GIVER: You're sort_of going past stone creek ... but your line's curving up past the ... flat rocks. FOLLOWER: Right. Okay. GIVER: and then starting to come down again. FOLLOWER: Got that b.
<coref:link href="coref.xml#id(de_136)" type="ident">
|
These examples also ilustrate some of the problems to be addressed when
designing a reliable annotation scheme for discourse deixis: these include
deciding what part of the text counts as antecedent as well as deciding
which type of object the antecedent is (see, e.g., (4.24)).
Coding Procedure
Left to the individual schemes.
Markup Table
|
|
|
| id | [ASCII] |
| type | [ASCII] |
4.3 Integrated Example
See (4.18) and (4.19).
4.4 Joint Coding Procedure
Left to the individual schemes.