SubCat-Extractor - Induction of Verb Subcategorisation from Dependency Parses
- Silke Scheible, Sabine Schulte im Walde, Marion Weller, Max Kisselew
The SubCat-Extractor is a tool to obtain verb subcategorisation data from parsed German corpora. It is based on a set of detailed rules that go beyond what is directly accessible in the parses. The extracted subcategorisation database is represented in a compact but linguistically detailed and flexible format, comprising various aspects of verb information, complement information and sentence information, within a one-line-per-clause style.
The input format required by the SubCat-Extractor is parsed text produced by Bernd Bohnet’s MATE dependency parser (Bohnet, 2010). The parses are defined according to the tab-separated CoNNL format. The extraction rules are specified for part-of-speech tags from the STTS tagset (Schiller et al., 1999) and syntactic functions from TIGER (Brants et al., 2004).
The verb subcategorisation databases that have so far been induced are listed here.
Silke Scheible, Sabine Schulte im Walde, Marion Weller, Max Kisselew
A Compact but Linguistically Detailed Database for German Verb Subcategorisation relying on Dependency Parses from a Web Corpus: Tool, Guidelines and Resource
In: Proceedings of the 8th Web as Corpus Workshop. Lancaster, UK, July 2013.
The SubCat-Extractor is freely available for education, research and other non-commercial purposes.