|
The RFTagger is a tool for the annotation of text with fine-grained
part-of-speech tags. It has been trained on German, Czech, Slovene,
and Hungarian data.
The tagger is described in the following paper:
Helmut Schmid and Florian Laws: "Estimation of Conditional Probabilities with
Decision Trees and an Application to Fine-Grained POS Tagging" (pdf)
Here is some sample output:
| word |
part of speech |
| Das |
PRO.Dem.Subst.-3.Nom.Sg.Neut |
| ist |
VFIN.Sein.3.Sg.Pres.Ind |
| ein |
ART.Indef.Nom.Sg.Masc |
| Testsatz |
N.Reg.Nom.Sg.Masc |
| . |
SYM.Pun.Sent |
Download
The source code of the RFTagger can be
downloaded here.
It comes with parameter files for German, Czech, Slovene, and
Hungarian (Linux PCs only) and is freely available for education,
research and other non-commercial puposes.
Publication
Please cite the following publication if you want to refer to the RFTagger:
Helmut Schmid and Florian Laws: Estimation of Conditional Probabilities with
Decision Trees and an Application to Fine-Grained POS Tagging,
COLING 2008, Manchester, Great Britain. (pdf)
Links
Please send comments, suggestions and bug reports to Helmut
Schmid at the address FirstName.LastName@ims.uni-stuttgart.de.
|