Tutorials at ANLP94



Corpus hacking (Tuesday)

Mats Rooth and Oliver Christ, University of Stuttgart

Note that this tutorial will take place at IMS, not at Keplerstr. 17.
We will introduce a family of programs for representing and computing with text corpora in a Unix/C environment. The first session will be devoted to representation of the corpus and linguistic markup and to an associated query language. In the second session, we will look at statistical computations and applications to linguistic and computational linguistic problems.

Mats Rooth obtained a Phd in linguistics from UMass Amherst after studying mathematics at MIT. He has worked at CSLI/Stanford, AT\&T Bell Labs, and the Universities of Stuttgart and Tübingen. His research interests include statistical parsing, the semantics of intonation, and methodologies for employing corpus data in theoretical linguistics.

Oliver Christ studied computer science at the University of Stuttgart, Germany. His thesis work, finished in late 1992, dealt with the design and implementation of a CLIM-based graphical user interface for the TFS system. Since late 1992, he is working as a research assistant in a project which aims at the development of tools for the exploration of large text corpora, where he developed numerous corpus management and access tools.


Partial Parsing (Tuesday)

Steven Abney, University of Tübingen

Efficient, accurate parsing of unrestricted text is not within the reach of current techniques. Standard algorithms are too expensive for use on very large corpora, and relatively fragile. Partial parsing aims to buy speed and robustness of processing by sacrificing depth of analysis. Partial parsing can be seen as an application of the principles that motivate stochastic tagging. Namely, tagging illustrates how low-level processing can be sliced out of the parsing problem and solved independently; shallow parsing represents the ``next slice''. Partial parsing is generally useful as a preprocessing step, either for bootstrapping---extracting information from corpora for use by more sophisticated parsers---or for end-user applications such as data extraction.

In the tutorial, we will discuss partial-parsing methods, including finite-state recognition, cascaded finite-state recognizers, and HMM's. Generally, techniques used in tagging can be readily applied to shallow parsing: in addition to HMM's, regression techniques, including regression trees, are applicable. We will also touch on grammatical inference techniques, and techniques for recognizing low-level phrases without grammars, on the basis of word-level statistical properties such as mutual information.

Finally, we will discuss methods for assembling low-level phrases into complete parse-trees. To do so, we require something like case-frame relations. If a domain model provides semantic frames, it is possible to do semantic interpretation directly on a stream of low-level phrases, making partial parsing useful as a technique for cleaning up after traditional parsers. Where domain models are not available, methods have been developed for inducing syntactic and semantic frames from a corpus, using partial parsing as a preprocessing step.

Steven Abney: PhD, MIT Department of Linguistics, 1987. 1987-1993: Member of Technical Staff, Bell Communications Research. 1993-present: Assistant Professor, Computational Linguistics, University of Tübingen. Areas of research: parsing unrestricted text, stochastic methods, psycholinguistic modelling, phrase structure.


Machine Translation (Wednesday)

Louisa Sadler, University of Essex UK

The goal of achieving high quality automatic translation has long provided an impetus for work in NLP. There has been much activity in the field in recent years, with a number of developments (such as the use of statistical or mixed approaches) promising significant progress in the development of practical working systems.

This tutorial is directed towards those who would like to be made aware of current research in Machine Translation. The focus will mainly be on the architecture of machine translation systems, surveying the major current approaches (rule-based, statistical, mixed), although issues such as controlled input, user interaction, translation aids, multilingual generation and the evaluation of MT systems will also be touched~on.

In looking at approaches based on the formulation of explicit linguistic rules, we will start by considering the traditional distinction between interlingual and transfer. We will consider the issue of how an interlingua may be defined and the problems this raises, looking at some proposed interlinguas. We will briefly examine traditional transfer systems, focussing on the problem of how (or whether) bilingual equivalences can be established, before discussing more recent proposals permitting a more flexible view of translational equivalence (flexible transfer, (multilingual) type hierarchies, translation by abduction, correspondence and negotiation).

We will also look at statistical and mixed (hybrid) approaches to MT (translation by analogy, example based translation, etc), considering, inter alia issues such as quality of translation, robustness and the acquisition and use of large data sets in such systems. This part of the tutorial will also briefly review work on the automatic acquisition of terminological, lexical and grammatical resources for MT.

Louisa Sadler teaches Computational Linguistics and syntax at the University of Essex UK. She has worked on a number of MT and related projects since 1985 and is currently interested in flexible and correspondence based approaches to MT. She is author/co-author of a number of articles and a recent introductory book on MT.


Context, Information Structure, Focus and Ellipsis (Wednesday)

Stephen Pulman, SRI Cambridge and University of Cambridge

This tutorial will examine some recent approaches to the interpretation of constructs that are sensitive to context and information structure, in particular intonational focus, focus-sensitive particles, and ellipsis.

I will describe some influential linguistic theories of ellipsis and focus, and also survey some recent computationally inspired approaches using notions like `higher order unification', discourse grammar and `most specific common denominators'

Finally, I will look at how some of these theories might be implemented so as to achieve reasonable analysis coverage of sentences involving ellipsis or focus. I will also look at how to generate sentences involving ellipsis or focus in appropriate contexts.

Stephen Pulman is a lecturer at the University of Cambridge Computer Laboratory and is Director of SRI International Cambridge Computer Science Research Centre. His current research interests are in computational semantics and dialogue in the context of spoken language understanding systems.


NLP meets Multimedia: Coordinating Language, Graphics, and Gestures (Wednesday)

Wolfgang Wahlster and Elisabeth André, German Research Center for AI (DFKI) Saarbrücken

The goal of this tutorial is to survey a new generation of intelligent multimedia human-computer interfaces with the ability to interpret some forms of multimedia input and to generate coordinated multimedia output. The tutorial is organized into four sections: from images to text, from text to images, coordinating gestures and language, and integrating multiple media in adaptive presentation systems.

Over the past years, researchers have begun to explore how to translate visual information into natural language. A great practical advantage of natural language image description is the possibility of the application-specific selection of varying degrees of condensation of visual information. There are many promising applications in medical technology, remote sensing, traffic control and other surveillance tasks.

Work in the inverse direction, the generation of images from natural language text, has shown how a physically based semantics of motion verbs and locative prepositions can be seen as conveying spatial, kinematic and temporal constraints, thereby enabling a system to create an animated graphical simulation of events described by natural language utterances. There is an expanding range of exciting applications for these methods such as advanced simulation, entertainment, animation and CAD systems.

The use of deictic gestures parallel to verbal descriptions is of great importance for multimedia interfaces, because it simplifies and speeds up reference to objects in a visual context. However, natural pointing behavior is possibly ambiguous and vague, so that without a careful analysis of the discourse context of a gesture there is a high risk of reference failure. We will discuss the state of the art of gesture interpretation and generation and show how explicit meanings can be given to pointing behavior in terms of a formal semantics of the visual world.

In the fourth section of this tutorial, we will present a new generation of intelligent multimedia systems that goes beyond the standard canned text, predesigned graphics and prerecorded images and sounds typically found in commercial multimedia systems of today. Intelligent multimedia presentation systems include a number of key processes: content planning (determining what information should be presented in a given situation), medium selection (apportioning the selected information to text and graphics), presentation design (determining how text and graphics can be used to communicate the selected information), and coordination (resolving conflicts and maintaining consistency between text and graphics). We will show that it is possible to adapt many of the fundamental concepts developed to date in computational linguistics in such a way that they become useful for text-picture combinations as well. We will address key applications such as multimedia helpware, information retrieval and analysis, authoring, training, monitoring, and decision support.

Wolfgang Wahlster is a Professor of Artificial Intelligence in the Department of Computer Science at the University of Saarbrücken, Germany where he currently serves as a Scientific Director of DFKI. He received his diploma and doctoral degree in computer science from the University of Hamburg. Since 1975 he has been working in the field as a principal investigator in various natural language projects, including HAM-ANS, WISBER, SC, XTRA, VITRA, PRACMA, WIP, and VERBMOBIL. He has published more than 150 technical papers on natural language processing and AI. His current research includes intelligent multimodal interfaces, user modeling, natural language scene description, intelligent help systems, deductive plan recognition, and speech translation. He is a AAAI Fellow and a recipient of the Fritz Winter Award for his research on cooperative user interfaces. Prof. Wahlster served as the Conference Chair for IJCAI-93 in Chambery and the Chair of the Board of Trustees of IJCAII from 1991 - 1993. He is currently the Chair of the Association of German AI Institutes (AKI).

Elisabeth André studied computer science at the University of Saarbrücken, Germany. Her thesis work dealt with the generation of natural language scene descriptions in the project VITRA. Since 1988, she has been working as a research scientist in the Intelligent User Interfaces group at DFKI on the WIP and PPP projects. Her current research focuses on multimedia communication, intelligent user interfaces and knowledge-based presentation systems. She is the author of over 40 scientific papers on natural language generation and multimedia communication. In January 1994, she was elected European Representative of the ACL Special Interest Group on Multimedia Language Processing (SIGMEDIA).


IMS Stuttgart / www@ims.uni-stuttgart.de / Fri Sep 30 16:03:04 1994 (oli)