Research Interests and Projects
I have left the IMS in March 2005.
You can always find my current homepage at
http://purl.org/stefan.evert.
Statistical Methods in Natural-Language Processing
My main research interest is the application of statistical
methods in natural-language processing, with an emphasis
on the statistical analysis of cooccurrence data. My current
PhD project collects, explains, and discusses the various
mathematical equations that have been used to measure the degree of
association between the components of bigrams. Some well-known
examples of such association measures for word pairs
are Mutual Information (MI), t-score, chi-squared, log-likelihood,
and Fisher's exact test. The PhD thesis is accompanied by an open-source software,
the UCS toolkit,
which includes reference implementations for all measures as well as
various tools for the empirical analysis and evaluation. Learn more about
association measures and the statistics of word cooccurrences on my website:
I am also interested in lexical statistics and its application to
(morphological) productivity.
The IMS Corpus Workbench
I have been maintaining and improving the
IMS Corpus Workbench
(CWB) since 1998. At the moment, I am preparing the official release of
Version 3.0, which is about two three years late. :o)
While the original design was centered around a monolithic corpus query processor (which existed
in separate command-line and GUI versions), recent development has emphasised modularity and
flexibility, using Perl as a glue language.
A good example of this approach is the
CQPDemo
web front-end that is distributed with the CWB/Perl interface.
The Corpus Query Interface
The Corpus Query Interface defines a socket-based client-server
protocal providing access to most features of the IMS Corpus
WorkBench. It will provide as a standardised interface to a variety of
programming languages such as Java, Perl, C,
Tcl, Prolog, and many more. Since the CQi is still in
the early stages of development, no official documentation is
available. However, you can take a look at the CQi tutorial
(.ps, .pdf)
to get an idea of the basic concepts of CQi programming.
The NITE Project
I have been involved in the EU-funded
NITE project,
which aimed to develop software for the annotation, display, and
analysis of multi-modal human-human and human-machine dialogue data.
My contributions to the project include the specification of the
NXT Object Model
(an extension of the XML Document Object Model to a collection of intersecting ordered trees)
and the NXT Query Language,
which are implemented in the
NITE XML Toolkit (NXT).
|