Research Interests and Projects
I have left the IMS in March 2005.You can always find my current homepage athttp://purl.org/stefan.evert.
Statistical Methods in Natural-Language Processing
My main research interest is the application of statistical methods in natural-language processing, with an emphasis on the statistical analysis of cooccurrence data. My current PhD project collects, explains, and discusses the various mathematical equations that have been used to measure the degree of association between the components of bigrams. Some well-known examples of such association measures for word pairs are Mutual Information (MI), t-score, chi-squared, log-likelihood, and Fisher's exact test. The PhD thesis is accompanied by an open-source software, the UCS toolkit, which includes reference implementations for all measures as well as various tools for the empirical analysis and evaluation. Learn more about association measures and the statistics of word cooccurrences on my website:
I am also interested in lexical statistics and its application to (morphological) productivity.
The IMS Corpus Workbench
I have been maintaining and improving the IMS Corpus Workbench (CWB) since 1998. At the moment, I am preparing the official release of Version 3.0, which is about
two three years late. :o)
While the original design was centered around a monolithic corpus query processor (which existed in separate command-line and GUI versions), recent development has emphasised modularity and flexibility, using Perl as a glue language. A good example of this approach is the CQPDemo web front-end that is distributed with the CWB/Perl interface.
The Corpus Query Interface
The Corpus Query Interface defines a socket-based client-server protocal providing access to most features of the IMS Corpus WorkBench. It will provide as a standardised interface to a variety of programming languages such as Java, Perl, C, Tcl, Prolog, and many more. Since the CQi is still in the early stages of development, no official documentation is available. However, you can take a look at the CQi tutorial (.ps, .pdf) to get an idea of the basic concepts of CQi programming.
The NITE Project
I have been involved in the EU-funded NITE project, which aimed to develop software for the annotation, display, and analysis of multi-modal human-human and human-machine dialogue data. My contributions to the project include the specification of the NXT Object Model (an extension of the XML Document Object Model to a collection of intersecting ordered trees) and the NXT Query Language, which are implemented in the NITE XML Toolkit (NXT).