EAGLES recommended attributes/values
(quoting from Leech and Wilson, 1996)
These are specified below under part-of-speech headings. Each numbered heading refers to the number assigned under major category. The set of values for each attribute is definitely not a closed set and will need to be augmented to handle peculiar features of individual languages. Not all EU languages will instantiate all attributes or all values of an individual attribute. For each attribute, 0 designates a zero value, meaning ``this attribute is not applicable'' for the particular language, or for a particular textword in that language. The standard requirement for these recommended attributes/values is that, if they occur in a particular language, then it is advisable that the tagset of that language should encode them.
1. Nouns (N)
| (i) | Type: | 1. Common | 2. Proper | |||
| (ii) | Gender: | 1. Masculine | 2. Feminine | 3. Neuter | ||
| (iii) | Number: | 1. Singular | 2. Plural | |||
| (iv) | Case: | 1. Nominative | 2. Genitive | 3. Dative | 4. Accusative | 5. Vocative |
Inflection type is omitted as an attribute, as it is purely
morphological.
2. Verbs (V)
| (i) | Person: | 1. First | 2. Second | 3. Third | ||
| (ii) | Gender: | 1. Masculine | 2. Feminine | 3. Neuter | ||
| (iii) | Number: | 1. Singular | 2. Plural | |||
| (iv) | Finiteness: | 1. Finite | 2. Non-finite | |||
| (v) | Verb form / Mood: | 1. Indicative | 2. Subjunctive | 3. Imperative | 4. Conditional | 5. Infinitive |
| 6. Participle | 7. Gerund | 8. Supine | ||||
| (vi) | Tense: | 1. Present | 2. Imperfect | 3. Future | 4. Past | |
| (vii) | Voice: | 1. Active | 2. Passive | |||
| (viii) | Status: | 1. Main | 2. Auxiliary |
Attribute (v.) has two names because of different traditions, for different European languages, regarding the use of the term Mood.
In fact, the first four values (v.) 1-4 are applicable to Finite Verbs and the last four (v.) 5-8 to Non-finite Verbs.
Attribute (vii) Voice refers to the morphologically-encoded passive, e.g. in Danish and in Greek. Where the passive is realised by more than one verb, this does not need to be represented in the tagset.
The same applies to compound tenses (attribute (vi)).
In general, compound tenses are not dealt with at the morphosyntactic level,
since they involve the combination of more than one verb in a larger construction.
3. Adjectives (AJ)
| (i) | Degree: | 1. Positive | 2. Comparative | 3. Superlative | |
| (ii) | Gender: | 1. Masculine | 2. Feminine | 3. Neuter | |
| (iii) | Number: | 1. Singular | 2. Plural | ||
| (iv) | Case: | 1. Nominative | 2. Genitive | 3. Dative | 4. Accusative |
Attribute (i) Degree applies only to inflectional comparatives
and superlatives. In some languages, e.g. Spanish, the number of such adjectives
is very small.
4. Pronouns and Determiners (PD)
| (i) | Person: | 1. First | 2. Second | 3. Third | ||
| (ii) | Gender: | 1. Masculine | 2. Feminine | 3. Neuter | ||
| (iii) | Number: | 1. Singular | 2. Plural | |||
| (iv) | Possessive: | 1. Singular | 2. Plural | |||
| (v) | Case: | 1. Nominative | 2. Genitive | 3. Dative | 4. Accusative | 5. Non-genitive |
| 6. Oblique | ||||||
| (vi) | Category: | 1. Pronoun | 2. Determiner | 3. Both | ||
| (vii) | Pron.-Type: | 1. Demonstrative | 2. Indefinite | 3. Possessive | 4. Int./Rel. | 5. Pers./Refl. |
| (viii) | Det.-Type | 1. Demonstrative | 2. Indefinite | 3. Possessive | 4. Int./Rel. | 5. Partitive |
The parts of speech Pronoun, Determiner and Article heavily overlap in their formal and functional characteristics, and different analyses for different languages entail separating them out in different ways. For the present purpose, we have proposed placing Pronouns and Determiners in one `super-category', recognising that for some descriptions it may be thought best to treat them as totally different parts of speech.
There is also an argument for subsuming Articles under Determiners. The present guidelines do not prevent such a realignment of categories, but do propose that articles (assuming they exist in a language) should always be recognised as a separate class, whether or not included within determiners. The requirement is that the descriptive scheme adopted should be automatically mappable into the present one via an Intermediate Tagset.
Attribute (iv) accounts for the fact that a possessive pronoun or possessive determiner may have two different numbers. This attribute handles the number which is inherent to the possessive form (e.g. Italian (la) mia, (la) nostra as first-person singular and first-person plural) as contrasted with the number it has by virtue of agreeing with a particular noun (e.g. Italian (la) mia, (le) mie).
Under attribute (v) Case, the value Oblique applies to pronouns such as them and me in English, and equivalent pronouns such as dem and mig in Danish. These occur in object function, and also after prepositions.
Under attributes (vi) and (vii), the subcategories Interrogative and Relative are merged into a single value Int./Rel.. It is often difficult to distinguish these in automatic tagging, but they may be optionally distinguished at a more delicate level of granularity.
Similarly, under attribute (vi), Personal and Reflexive
pronouns are brought together as a single value Pers./Refl.. Again, they
may be optionally separated at a more delicate level.
5. Articles (AT)
| (i) | Article-Type: | 1. Definite | 2. Indefinite | ||
| (ii) | Gender: | 1. Masculine | 2. Feminine | 3. Neuter | |
| (iii) | Number: | 1. Singular | 2. Plural | ||
| (iv) | Case: | 1. Nominative | 2. Genitive | 3. Dative | 4. Accusative |
6. Adverbs (AV)
| (i) | Degree: | 1. Positive | 2. Comparative | 3. Superlative |
There are many possible subdivisions of adverbs on syntactic
and semantic grounds, but these are regarded as optional rather than recommended.
7. Adpositions (AP)
| (i) | Type: | 1. Preposition |
In practice, the overwhelming majority of cases of adpositions
we have to consider in European languages are prepositions. Hence only
this one value needs to be recognised at the recommended level. Other possibilities,
such as Postpositions and Circumpositions are dealt with at the optional
level.
8. Conjunctions (C)
| (i) | Type: | 1. Coordinating | 2. Subordinating |
9. Numerals (NU)
| (i) | Type: | 1. Cardinal | 2. Ordinal | ||
| (ii) | Gender: | 1. Masculine | 2. Feminine | 3. Neuter | |
| (iii) | Number: | 1. Singular | 2. Plural | ||
| (iv) | Case: | 1. Nominative | 2. Genitive | 3. Dative | 4. Accusative |
| (v) | Function: | 1. Pronoun | 2. Determiner | 3. Adjective |
In some languages (e.g. Portuguese) this category is not normally considered to be a separate part of speech, because it can be subsumed under others (e.g. cardinal numerals behave like pronouns/determiners; ordinal numerals behave more like adjectives).
We recognise that in some tagsets Numeral may therefore
occur as subcategory within other parts of speech. (Compare the treatment
of articles under 5 above). At the same time, it is possible to indicate
the part-of-speech function of a word within the numeral category by making
use of attribute (v).
10. Interjections (I)
No subcategories are recommended.
11. Unique/Unassigned (U)
No subcategories are recommended, although it is expected
that tagsets for individual languages will need to identify such one-member
word-classes as Negative particle, Existential particle, Infinitive marker,
etc. (further details.)
12. Residual (R)
| (i) | Type: | 1. Foreign word | 2. Formula | 3. Symbol | 4. Acronym | 5. Abbreviation |
| 6. Unclassified | ||||||
| (ii) | Number: | 1. Singular | 2. Plural | |||
| (iii) | Gender: | 1. Masculine | 2. Feminine | 3. Neuter | ||
| (v) | Case: | 1. Nominative | 2. Genitive | 3. Dative | 4. Accusative | 5. Non-genitive |
| 6. Oblique | ||||||
| (vi) | Category: | 1. Pronoun | 2. Determiner | 3. Both | ||
| (vii) | Pron.-Type: | 1. Demonstrative | 2. Indefinite | 3. Possessive | 4. Int./Rel. | 5. Pers./Refl. |
| (viii) | Det.-Type | 1. Demonstrative | 2. Indefinite | 3. Possessive | 4. Int./Rel. | 5. Partitive |
The Unclassified category applies to word-like text segments which do not easily fit into any of the foregoing values. For example: incomplete words and pause fillers such as er and erm in transcriptions of speech, or written representations of singing such as dum-de-dum.
Although words in the Residual category are on the periphery
of the lexicon, they may take some of the grammatical characteristics,
e.g., of nouns. Acronyms such as IBM are similar to proper nouns;
symbols such as alphabetic characters can vary for singular and plural
(e.g. How many Ps are there in `psychopath'?), and are in this respect
like common nouns. In some languages (e.g. Portuguese) such symbols also
have gender. It is quite reasonable that in some tagging schemes some of
these classes of word will be classified under other parts of speech.
13. Punctuation marks (PU)
Word-external punctuation marks, if treated as words for
morphosyntactic tagging, are sometimes assigned a separate tag (in effect,
an attribute value) for each main punctuation mark:
| (i) | 1. Period | 2. Comma | 3. Question mark |
An alternative is to group the punctuation marks into
positional classes:
| (i) | 1. Sentence-final | 2. Sentence-medial | 3. Left-Parenthetical | 4. Right-Parenthetical |
Under 1 are grouped . ? !. Under 2 are grouped , ; : --
. Under 3 are placed punctuation marks which signal the initiation of a
constituent, such as (, [ , and ¿ in Spanish). Under 4 are grouped
punctuation marks which conclude a constituent the opening of which is
marked by one of the devices in 3: e.g. ), ] and Spanish ? . We make no
recommendation about choosing between these two sets of punctuation values.
Special extensions - Optional generic attributes/values
Here we deal with aspects of morphosyntactic annotation
which are optional, and may be included in the annotation scheme according
to need. Many of them go beyond morphosyntax and are of a syntactic or
semantic nature. There is decidedly no claim to completeness. We do not
recommend any of these features, but simply present them as having illustrative
value. This subsection deals with generic optional features, i.e. those
which are application- or task-specific. See language-specific features
for another class of special extension.
1. Nouns
One might wish to introduce semantically and syntactically
oriented attributes such as countability:
| (v) | Countability: | 1. Countable | 2. Mass |
2. Verbs
Additional optional attributes:
| (ix) | Aspect: | 1. Perfective | 2. Imperfective |
| (x) | Separability: | 1. Non-separable | 2. Separable |
| (xi) | Reflexivity: | 1. Reflexive | 2. Non-reflexive |
| (xii) | Auxiliary: | 1. Have | 2. Be |
Attribute (ix) is needed for Greek and Slavonic languages. It corresponds also to the Past Simple/Imperfect distinction of Romance languages.
Attribute (x) is relevant for German compound verbs (fängt ...an, anfangen) and also to phrasal verbs in Danish and English.
Attribute (xii) is applied to main verbs in French, German, Dutch, etc., and determines the selection of avoir or être, etc., as auxiliary for the Perfect.
Additional optional value for recommended attribute Status:
| (viii) | Status: | 3. Semi-auxiliary |
In addition to main and auxiliary verbs, it may be useful
(e.g. in English) to recognise an intermediate category of semi-auxiliary
for such verbs as be going to, have got to, ought to.
3. Adjectives
Additional optional attributes:
| (v) | Inflection-type: | 1. Weak-Flection | 2. Strong-Flection | 3. Mixed |
| (vi) | Use: | 1. Attributive | 2. Predicative |
| (vii) | NP Function: | 1. Premodifying | 2. Postmodifying | 3. Head-function |
Weak and Strong (attribute (v)) are values for adjectival
inflection in the Germanic languages German, Dutch and Danish. The syntactic
attribute (vi) makes a distinction, for example, between main (Attributive)
and asleep (Predicative) in English.
4. Pronouns and Determiners
Additional optional attributes:
| (ix) | Special Pronoun Type: | 1. Personal | 2. Reflexive | 3. Reciprocal |
| (x) | Wh-Type: | 1. Interrogative | 2. Relative | 3. Exclamatory |
| (xi) | Politeness: | 1. Polite | 2. Familiar |
Attribute (xi) is limited to second-person pronouns. In some languages (e.g. French) it is possible to treat Polite and Familiar simply as pragmatic values encoded through other attributes -- especially person and number. In languages where there are special polite pronoun forms (e.g. Dutch u and Spanish Usted), the additional Politeness attribute is required.
6. Adverbs
| (ii) | Adverb-Type: | 1. General | 2. Degree |
| (iii) | Polarity: | 1. Wh-type | 2. Non-wh-type |
| (iv) | Wh-Type: | 1. Interrogative | 2. Relative | 3. Exclamatory |
Attribute (ii) allows the tagset to distinguish degree
adverbs, which have a distinctive syntactic function, (such as very, so,
too) from others. Attribute (iv) enables the tagset to mark separately
the Wh- or Qu- adverbs which are interrogative, relative or exclamatory
in function. The relevant adverbs (in English) are when, where, how and
why.
7. Adposition
| (i) | Type: | 2. Fused prep-art |
The additional value Fused prep-art is for the benefit
of those who do not find it practical to split fused words such as French
au
(= à + le) into two textwords. This very common phenomenon
of a fused preposition + article in West European languages should preferably,
however, be handled by assigning two tags to the same orthographic word
(one for the preposition and one for the article).
8. Conjunctions
| (ii) | Coord-Type: | 1. Simple | 2. Correlative | 3. Initial | 4. Non-initial |
This attribute subclassifies coordinating conjunctions. It is easier to assign one tag to one orthographic word and it is therefore suggested that the four values are assigned as follows: Simple applies to the regular type of coordinator occurring between conjuncts: German und, for example. When the same word is also placed before the first conjunct, as in French ou...ou..., the former occurrence is given the Correlative value and the latter the Simple value. When two distinct words occur, as in German weder...noch..., then the first is given the Initial value and the second the Non-initial value.
For some more Optional language-specific attributes/values,
the interest reader is referred to the EAGLES document "Recommendations
for the Morphosyntactic Annotation of Corpora" (Leech and Wilson, 1996).