Overview

Note that the latest version of CWB is available at http://cwb.sourceforge.net

(The IMS version is still available in its version of 2003 here)


 
 
CWB home Applications Features Online Demos Availability Papers Users' Corner


The previous version of this page can be found here

In order to support work in the fields of lexicography and terminology, IMS has developed a workbench for full-text retrieval from large textual resources (`corpora'). This work was initiated by the TC Project (`Text Corpora and Tools for their Exploitation').

Applications

The IMS Corpus Workbench is used for

Features

Query language

See the overview of the query syntax and some more sample queries.

Display of results

Corpus Administration and Preparation

Retrieval

The query language is interpreted by the 'Corpus Query Processor' (CQP). CQP requires corpora to be registered and encoded in the specific manner.
There used to be a Motif-based graphical user interface 'xkwic', which made access to CQP more convenient for non-programmers. This hasn't been changed for a couple of years now, and it doesn't seem to run with newer versions of the operating systems. So, the Corpus Query Processor is a command line tool only.

At IMS, the largest corpus currently being handled by the Corpus Workbench is a German newspaper corpus which consists of about 200 million tokens, annotated with lemmata, two different part-of-speech tag sets, and sentence boundaries.


Background papers

Oli Christ: "A modular and flexible architecture for an integrated corpus query system". COMPLEX'94, Budapest, 1994. .ps.gz

Oli Christ and B.M.Schulze: "Ein flexibles und modulares Anfragesystem für Textcorpora". Tagungsbericht des Arbeitstreffen Lexikon + Text. Niemeyer, Tübingen, 1995. .ps.gz

Contact

Dr. Ulrich Heid, University of Stuttgart, Institute for Natural Language Processing, Azenbergstr.12, 70174 Stuttgart, Germany
Uli.Heid@ims.uni-stuttgart.de, fon: +49-711-685-81373, fax: +49-711-685-81366