 |
|
 |
Institute for Natural Language Processing |
 |
Semantically Annotated Lexica |
|
 |
 |
 |
|
 |
 |
|
An important challenge in computational linguistics concerns the
construction of large-scale computational lexicons for the numerous
natural languages where very large samples of language use are now
available. The most approaches require as
a prerequisite a fixed taxonomy of semantic relations.
This is a problem because (i) entailment hierarchies are presently
available for few languages, and (ii) we regard it as an open
question whether and to what degree existing designs for lexical
hierarchies are appropriate for representing lexical meaning. Both of
these considerations suggest the relevance of inductive and
experimental approaches to the construction of lexicons with semantic
information. In the following papers
- Inducing a Semantically Annotated Lexicon via EM-based
Clustering. Mats Rooth, Stefan Riezler, Detlef Prescher, Glenn Carroll, and
Franz Beil.
In 37th Annual Meeting of the ACL, 1999, Maryland.
(.ps/.ps.gz)
- EM-Based Clustering for NLP Applications. Mats Rooth,
Stefan Riezler, Detlef Prescher, Glenn Carroll, and Franz Beil.
In Inducing Lexicons with the EM Algorithm, AIMS Report
4(3), 1998, IMS, Universität Stuttgart. 97-124.
(.ps/.ps.gz)
we present a method for automatic induction of semantically
annotated subcategorization frames from unannotated corpora.
We use a statistical subcat-induction system which estimates probability
distributions and corpus frequencies for pairs of a head and a
subcat frame.
The statistical parser can also collect frequencies for
the nominal fillers of slots in a subcat frame. The induction of labels
for slots in a frame is based upon estimation of a probability
distribution over tuples consisting of a class label, a selecting
head, a grammatical relation, and a filler head. The class label is
treated as hidden data in the EM-framework for statistical estimation.
In the following, we report results on
experiments with observations derived from large English and German
corpora:
- English
Experiments with British National Corpus (1280715 tokens of
verb-noun pairs):
- Latent Semantic Class Model (.ps.gz)
- Sematically annotated lexicon of intransitive and transitive
verbs (.ps.gz, 983 pages)
- German
Experiments with Huge German Corpus (418290 tokens of verb-noun and
adj-noun pairs)
- Latent Semantic Class Model (.ps.gz)
- Sematically annotated lexicon of intransitive and transitive
verbs (.ps.gz, 939 pages)
Please contact Stefan Riezler or
Detlef Prescher for more information.
|
|
|
|
|