SubCat-Extractor - Induction of Verb Subcategorisation from Dependency Parses

A tool to obtain verb subcategorisation data from parsed German corpora

SubCat-Extractor - Induction of Verb Subcategorisation from Dependency Parses

Type
Tool
Author
Silke Scheible, Sabine Schulte im Walde, Marion Weller, Max Kisselew
Description

The SubCat-Extractor is a tool to obtain verb subcategorisation data from parsed German corpora. It is based on a set of detailed rules that go beyond what is directly accessible in the parses. The extracted subcategorisation database is represented in a compact but linguistically detailed and flexible format, comprising various aspects of verb information, complement information and sentence information, within a one-line-per-clause style.

The input format required by the SubCat-Extractor is parsed text produced by Bernd Bohnet’s MATE dependency parser (Bohnet, 2010). The parses are defined according to the tab-separated CoNNL format. The extraction rules are specified for part-of-speech tags from the STTS tagset (Schiller et al., 1999) and syntactic functions from TIGER (Brants et al., 2004).

The verb subcategorisation databases that have so far been induced are listed here.

Reference

Silke Scheible, Sabine Schulte im Walde, Marion Weller, Max Kisselew
A Compact but Linguistically Detailed Database for German Verb Subcategorisation relying on Dependency Parses from a Web Corpus: Tool, Guidelines and Resource
In: Proceedings of the 8th Web as Corpus Workshop. Lancaster, UK, July 2013.

Download

The SubCat-Extractor is freely available for education, research and other non-commercial purposes.

This image shows Sabine Schulte im Walde

Sabine Schulte im Walde

Prof. Dr.

Akademische Rätin (Associate Professor)

To the top of the page