Bild von Institut mit Unilogo
home uni IMS suche Search kontakt Contact
unilogo University of Stuttgart
Institute for Natural Language Processing

RFTagger

 
 

The RFTagger is a tool for the annotation of text with fine-grained part-of-speech tags. It has been trained on German, Czech, Slovene, and Hungarian data.

The tagger is described in the following paper:

Helmut Schmid and Florian Laws: "Estimation of Conditional Probabilities with Decision Trees and an Application to Fine-Grained POS Tagging" (pdf)

Here is some sample output:

word  part of speech 
Das  PRO.Dem.Subst.-3.Nom.Sg.Neut 
ist  VFIN.Sein.3.Sg.Pres.Ind 
ein  ART.Indef.Nom.Sg.Masc 
Testsatz  N.Reg.Nom.Sg.Masc 
SYM.Pun.Sent 

Download

The source code of the RFTagger can be downloaded here. It comes with parameter files for German, Czech, Slovene, and Hungarian (Linux PCs only) and is freely available for education, research and other non-commercial puposes.

Publication

Please cite the following publication if you want to refer to the RFTagger:

Helmut Schmid and Florian Laws: Estimation of Conditional Probabilities with Decision Trees and an Application to Fine-Grained POS Tagging, COLING 2008, Manchester, Great Britain. (pdf)

Links

  • Java interface to the RFTagger developed by Niels Ott and Ramon Ziai



Please send comments, suggestions and bug reports to Helmut Schmid at the address FirstName.LastName@ims.uni-stuttgart.de.