SubCat-Extractor - Induction of Verb Subcategorisation from Dependency Parses

A tool to obtain verb subcategorisation data from parsed German corpora

SubCat-Extractor - Induction of Verb Subcategorisation from Dependency Parses

Silke Scheible, Sabine Schulte im Walde, Marion Weller, Max Kisselew

The SubCat-Extractor is a tool to obtain verb subcategorisation data from parsed German corpora. It is based on a set of detailed rules that go beyond what is directly accessible in the parses. The extracted subcategorisation database is represented in a compact but linguistically detailed and flexible format, comprising various aspects of verb information, complement information and sentence information, within a one-line-per-clause style.

The input format required by the SubCat-Extractor is parsed text produced by Bernd Bohnet’s MATE dependency parser (Bohnet, 2010). The parses are defined according to the tab-separated CoNNL format. The extraction rules are specified for part-of-speech tags from the STTS tagset (Schiller et al., 1999) and syntactic functions from TIGER (Brants et al., 2004).

The verb subcategorisation databases that have so far been induced are listed here.


Silke Scheible, Sabine Schulte im Walde, Marion Weller, Max Kisselew
A Compact but Linguistically Detailed Database for German Verb Subcategorisation relying on Dependency Parses from a Web Corpus: Tool, Guidelines and Resource
In: Proceedings of the 8th Web as Corpus Workshop. Lancaster, UK, July 2013.


Please contact the SemRel group to obtain the tool.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA).
Creative Commons License


Logo der Forschergruppe SemRel

Forschergruppe SemRel

Dieses Bild zeigt Sabine Schulte im Walde

Sabine Schulte im Walde

Prof. Dr.

Akademische Rätin

Zum Seitenanfang