Projekt Textkorpora und Erschliessungswerkzeuge

Gesammeltes Textmaterial für Deutsch, Französisch und Italienisch, eine Darstellung für Texte und Markups sowie eine Abfragesprache und ein Korpuszugriffssystem zur sprachlichen Erkundung des Textmaterials

Textcorpora und Erschliessungswerkzeuge

Laufzeit

04-1993 through 12-1994, extension until 10-1996

Kurzbeschreibung

In 1993/1994 the project collected textual material for German, French and Italian, developed a representation for texts and markups, along with a query language and a corpus access system for linguistic exploration of the text material. Texts and analysis results are kept separate from each other, for reasons of flexibility and extensibility of the system; this is possible because of a particular approach for storage and representation. Tool components under development, language-specific and general, range from morphosyntactic analysis to partial parsing, and from mutual information, t-score, collocation extraction and clustering to HMM-based tagging and n-gram tagging. Research on statistical models for noun phrases, verb-object collocations, etc. is going on.

Geldgeber
The Ministry of Science and Research of the Land Baden-Württemberg (MWF, Stuttgart), in 1993/1994 and 1995/1996, in the framework of the Forschungsschwerpunktprogramm Baden-Württemberg
Langbeschreibung

In 1993/1994 the project collected textual material for German, French and Italian, developed a representation for texts and markups, along with a query language and a corpus access system for linguistic exploration of the text material. Texts and analysis results are kept separate from each other, for reasons of flexibility and extensibility of the system; this is possible because of a particular approach for storage and representation. Tool components under development, language-specific and general, range from morphosyntactic analysis to partial parsing, and from mutual information, t-score, collocation extraction and clustering to HMM-based tagging and n-gram tagging. Research on statistical models for noun phrases, verb-object collocations, etc. is going on.

Funding:
Funded at 100% by the Ministry of Science and Research of the Land Baden-Württemberg (MWF, Stuttgart), in 1993/1994 and 1995/1996, in the framework of the Forschungsschwerpunktprogramm Baden-Württemberg.
Partners:
Ressourcen

The part-of-speech tagset for German, mainly developed by Anne Schiller and Christine Thielen (University of Tübingen)

The various taggers which have been built by André Kempe and Helmut Schmid

The IMS Corpus Workbench which has been developed by Oliver Christ and Bruno Schulze

Ulrich Heid

Apl. Prof. PD Dr.
Zum Seitenanfang