Bild von Institut mit Unilogo
home uni IMS suche Search kontakt Contact
unilogo University of Stuttgart
Institute for Natural Language Processing

German Chunker

 
 

The chunker for German was developed by Helmut Schmid and Sabine Schulte im Walde. It is based on the German Head-Lexicalised Probabilistic Context-Free Grammar (HL-PCFG). The manually developed grammar was semi-automatically extended by robustness rules in order to allow parsing of unrestricted text. The model parameters were learned from unlabelled training data by the probabilistic parser LoPar.

Chunking has been performed on nouns, verbs, adjectives and adverbs. The chunker is freely available via FTP for education, research and other non-commercial purposes. The gzip-compressed file contains the trained chunker, example chunks, chunked newpaper example sentences, and documentation on the chunker.

Documentation is provided on the German probabilistic context-free grammar and on the chunker.



Please send comments, suggestions and bug reports to Helmut Schmid at the address FirstName.LastName@ims.uni-stuttgart.de.