A more detailed description of the TC corpus query processor

The query processor cqp (for corpus query processor) is a command-language based query interpreter with some additional commands which affect the behaviour of the interpreter or describe the output format in which a query result is printed.

cqp can be used in two modes:


Xkwic Screenshot (GIF)
cqp does not directly access the files in which corpus data is stored, but only via a physical data layer which abstracts from all aspects of data access .

Queries can be seen as a set of constraints which must be fulfilled by a sequence of corpus ``entities'' (words, in the most common case). These constraints can be expressed by several means:

cqp supports the concept of subcorpora (which is nothing else than the result of a query). Queries can be run only on a subcorpus (that is, on the result of an earlier query), which greatly reduces search space and therefore improves efficiency. Furthermore, set operators can be used for the union, intersection or difference of subcorpora for the creation of new subcorpora. Although queries could be used to produce the same result as well, set operators work much faster. Subcorpora can be saved to files and can be reloaded in subsequent cqp sessions.

Our largest encoded corpus is currently a 200 million word German newspaper corpus with part-of-speech tags, as well as sentence and article boundaries, but this does not seem to be an upper bound for the corpus size the physical layer can handle. Currently, English, French, Italian, and German corpora are in use with cqp (non-ASCII characters can be entered by their octal code or in most cases also by their LaTeX-code).

You may try out a (very limited) demo version of the query processor where almost all of the interesting features have been turned off, but we have to take care of our CPUs here ...

Here's a description of how to get the system.

More detailed information is available through the IMS Corpus Toolbox Homepage.


IMS Stuttgart, Mon Feb 8 11:00:30 1999 (www@ims.uni-stuttgart.de)