It is widely accepted that lexical semantic information is needed for processing human language. Hand tagged text (e.g., Penn Treebank) has proven useful for researchers working on assigning to language expressions non-semantic characteristics such as part-of-speech tags and syntactic structure. (In hand tagged text we include automatically tagged text that was post-edited by hand.) It is likely that hand tagged text will also be of use for assigning semantic characteristics to words in their context. The aim of this workshop is to address the following questions: to what end should hand tagging be performed, what lexical semantic information should be hand tagged, and how should this tagging be done.
Lexical semantic information is determined in part by the words themselves and in part by the context in which they appear. Such lexical semantic information includes verbal aspect, nominal classification (e.g., count-mass, locative and frequency), modifier classification (e.g., positive-negative, intersective-nonintersective, and eventive-propositional) and relations between participants and events (e.g., sentience and volition). Other examples of lexical semantic information include membership in classes from hierarchies such as WordNet or Beth Levin's verb classification.
Thus, robust NLP systems need to have a large store of lexical semantic information (i.e., a lexicon) and a method for accounting for the effect of context (e.g., modules for handling discrete word sense ambiguity, regular polysemy, semantic coercion, metaphorical extension, etc.).
Given the experience with part-of-speech tagging and robust parsing, it is hoped that hand tagged text will make the comparison of systems possible and provide training data for quantitative approaches. Some semantically tagged texts already exist including the WordNet 1.4 semantic concordance (wnsemcor). An additional example was discussed at ACL-96: Hwee Tou Ng and Hian Beng Lee made use of a corpus of 192,800 occurrences of 191 words hand tagged with WordNet classes. This corpus was used as a training set for a case-based word sense disambiguation algorithm. Although we are aware of no systems that use hand tagged corpora in service of acquiring lexical semantics, it seems likely that such corpora would aid the identification of non-semantic cues for lexical semantic information.
Thus, we are soliciting papers that address one or more of the following questions:
Especially desirable are papers that shed light on these questions through the discussion of actual tagging experience both hand and automatic.
In addition to paper presentations, working sessions that discuss actual attempts at tagging text, such as the Wordnet taggings, the Singapore taggings, and the semantic tagging done as part of the MUC competitions are planned. Samples of tagged text will be sent to participants in advance for careful consideration, with specific issues in mind. A discussion of obstacles to achieving consensus is planned.
Authors are asked to submit previously unpublished papers only; a workshop proceedings will be published. There is a 2000 word limit (exclusive of references) on the length of submissions. Electronic submission of either self-contained latex or postscript is strongly preferred. Please use the aclsub.sty latex style file. Hard copy submissions should include 6 copies of the paper. Since the papers will be reviewed anonymously, please do not place the author name on the paper. Instead include a separate title page with title, abstract, author, and e-mail address. Unless requested otherwise, notification of acceptance will be sent electronically to the first author. Parallel submission is unproblematic; however if your paper is accepted to this workshop and you decide to present it here, we will ask you to withdraw it from any other events.