IMS Corpus Workbench

General FAQ

This FAQ refers to Version 2.2 of the Corpus Workbench.

Warning Messages issued by Xkwic and CQP

When I start xkwic or cqp, I get this warning message about ~/.cqprc not found

~/.cqprc can hold individual variable settings, such as the default corpus, user-defined registry directories etc. When you don't need to change the default settings, you can create an empty ~/.cqprc with the command
touch ~/.cqprc
and the warning message won't appear any more. However, there will be a message that the ~/.cqprc file could be read successfully (this message will probably vanish in a future release).

When I start xkwic or cqp, I get this warning message about some corpora not being accessible from the machine I'm using.

When an error message such as
Data access error (CDA: No error) Perhaps the corpus BLABLA~ is not accessible from the machine you are using.
there is a file blabla~ in your registry directory. In the registry directory, there should be only files which are valid registry files for existing corpora. Especially, remember to remove emacs backup files after you changed registry files. It will definitely lead to errors or even program crashes when you have files in the registry directory which are not valid registry files.

Core dumps, program crashes and other surprises

In general, it's a good idea to limit the size of the core dumps on your system to 0, so that no core files are being produced at all. This will prevent you from clattering up your file system with useless core files. In your shell initialization file (~/.csh or ~/.tcsh), include
limit coredumpsize 0k
(zero k). This is (t)csh syntax, I don't know how the same can be accomplished in other shells (such as sh, jsh, bash, ksh or others).

Xkwic and cqp dump core when I enter a query such as "?";

This is a known bug, but fortunately not within my responsibility. This problem is caused by a regular expression libary we are using, and which was not written by us. The reason is that the "?" modifies the preceding regular expression, which does not exist. If you are searching for a question mark in your text, escape it:
Some other characters must also be escaped when searched literally, see the CQP User's manual for a complete list.

Character sets other than ISO-8859-Latin-1

The users in Prague have apparently succeeded in using cqp and xkwic with Czech fonts. Both, cqp as well as xkwic, are widely independent of character sets. Indeed there is no special treatment of character sets included in the implementation,so that a change of character sets in cqp/xkwic may work, but a sensible character treatment cannot be guaranteed. For example, it is not possible to change the collation order of the characters and so on. On the other hand, cqp as well as xkwic seems to be able to cope with Czech, provided you change the relevant system settings outside the tools.

Known bugs

For very large corpora (some hundred million tokens) with many positional attributes, the machine-internal memory address space may be exceeded (mmap: no space). This problem may be circumvented by using the limit (csh/tcsh) command or the ulimit command (sh) to unlimit the address space which a program may use (the respective limit which has to be changed is datasize). This problem occurred mainly on Solaris 2.x systems.

IMS Stuttgart, Mon Feb 15 15:07:59 1999 (