Für die volle Funktionalität dieser Site ist JavaScript notwendig. Hier finden Sie eine Anleitung zum Aktivieren von JavaScript in Ihrem Browser.

Modellseite: 3 Spalten (links, Mitte, rechts)

Default-Text der hier stehen soll ...

Overview

Note that the latest version of CWB is available athttp://cwb.sourceforge.net
(The IMS version is still available in its version of 2003 here)

CWB home	Applications	Features	Online Demos	Availability	Papers	Users' Corner

The previous version of this page can be found here

In order to support work in the fields of lexicography andterminology, IMS has developed a workbench for full-text retrievalfrom large textual resources (`corpora').This work was initiated by the TC Project (`Text Corpora and Tools for theirExploitation').

Applications

The IMS Corpus Workbench is used for

Data-driven linguistics:
Extraction of linguistic knowledge from textual resources or cross-checking of linguistic assumptions against large texts.
Lexicography:
Corpus-based evidence for lexical descriptions.
Terminology:
Extraction of terms and bootstrapping of terminological resources.

Features

Query language

unrestricted number of attributes per corpus position
regular expressions over attribute values of individual corpus positions (e.g. wild cards for word forms, part-of-speech values)
regular expressions over sequences of corpus positions
(partial) support of structural annotations (e.g. SGML)
incremental concordancing
application of a query to all items of a list
'virtual attributes', i.e. runtime access to external applications (e.g. a thesaurus)
queries on parallel translated texts

See the overview of the query syntax and some moresample queries.

Display of results

user-definable size of 'keyword in context' display
'keyword in context' lines can be sorted in various ways
frequency counts, e.g. for word combinations
multilingual concordances from aligned corpora
html and latex output supported
query history

Corpus Administration and Preparation

registration of corpora
'encoding' of corpora, i.e. indexing (and compression)
(for text sources in one-word-per-line format, using ISO8859/Latin-1 8bit character sets, and maybe others)
For example, the BNC corpus with part-of-speech and lemma annotation will need about 1 GB of disk space.
incremental addition of types of corpus annotations ('attributes'). E.g. add part-of-speech values to a corpus once you have access to a POS-tagger.

Retrieval

The query language is interpreted by the 'Corpus Query Processor'(CQP). CQP requires corpora to be registered and encoded in thespecific manner.
There used to be a Motif-based graphical user interface'xkwic', which made access to CQP more convenient fornon-programmers. This hasn't been changed for a couple of years now,and it doesn't seem to run with newer versions of the operatingsystems. So, the Corpus Query Processor is a command line tool only.
At IMS, the largest corpus currently being handled by the CorpusWorkbench is a German newspaper corpus which consists of about 200million tokens, annotated with lemmata, two different part-of-speechtag sets, and sentence boundaries.

Background papers

Oli Christ: "A modular and flexible architecture for an integrated corpus querysystem". COMPLEX'94, Budapest, 1994..ps.gz
Oli Christ and B.M.Schulze:"Ein flexibles und modulares Anfragesystem für Textcorpora".Tagungsbericht des Arbeitstreffen Lexikon + Text. Niemeyer, Tübingen, 1995..ps.gz

Contact

Dr. Ulrich Heid,University of Stuttgart, Institute for Natural Language Processing,Azenbergstr.12, 70174 Stuttgart,Germany
Uli.Heid@ims.uni-stuttgart.de,fon: +49-711-685-81373, fax: +49-711-685-81366

Modellseite: 3 Spalten (links, Mitte, rechts)

Overview

Note that the latest version of CWB is available athttp://cwb.sourceforge.net
(The IMS version is still available in its version of 2003 here)

Applications

Features

Query language

Display of results

Corpus Administration and Preparation

Retrieval

Background papers

Contact

Zielgruppe

Formalia

Services

Organisation

Modellseite: 3 Spalten (links, Mitte, rechts)

Overview

Note that the latest version of CWB is available athttp://cwb.sourceforge.net(The IMS version is still available in its version of 2003 here)

Applications

Features

Query language

Display of results

Corpus Administration and Preparation

Retrieval

Background papers

Contact

So erreichen Sie uns

Zielgruppe

Formalia

Services

Organisation

Note that the latest version of CWB is available athttp://cwb.sourceforge.net
(The IMS version is still available in its version of 2003 here)