Human judgements play a key role in the development and the assessment of linguistic resources and methods in Computational Linguistics. They are commonly used in the creation of lexical resources and corpus annotation, and also in the evaluation of automatic approaches to linguistic tasks: In the developmental phase, human judgements help to define an inventory of categories as well as robust annotation criteria, and in the assessment phase they are used to evaluate the results of automatic systems against existing linguistic standards. Furthermore, systematically collected human judgements provide clues for research on linguistic issues that underlie the judgement task, providing insights complementary to introspective analysis or evidence gathered from corpora.
The goal of this workshop is to discuss experiments that collect human judgements for Computational Linguistic purposes. A particular focus of the workshop is concerned with human judgements on "controversial" linguistic tasks (those that are not clear from a theoretical point of view, such as many tasks having to do with semantics or pragmatics), which tend to result in low agreement scores. Such controversial tasks and their sub-optimal results are typically poorly documented in the literature; however, they are especially well-suited as a basis for a fruitful discussion.
Organisers: Ron Artstein (University of Southern California), Gemma Boleda (Universitat Politècnica de Catalunya), Frank Keller (University of Edinburgh), Sabine Schulte im Walde (Unversität Stuttgart)