TreeTagger – a language independent part-of-speech tagger
- Helmut Schmid
The TreeTagger is a tool for annotating text with part-of-speech and lemma information. It was developed by Helmut Schmid in the TC project at the Institute for Computational Linguistics of the University of Stuttgart. The TreeTagger has been successfully used to tag various languages including German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek, Portuguese, Chinese, Swahili, Latin, Estonian and old French texts and is adaptable to other languages if a lexicon and a manually tagged training corpus are available.
word pos lemma The DT the TreeTagger NP TreeTagger is VBZ be easy JJ easy to TO to use VB use . SENT .
The TreeTagger can also be used as a chunker for English, German, and French. The parameter file for the French chunker was kindly provided by Michel Généreux.
- Helmut Schmid (1995): Improvements in Part-of-Speech Tagging with an Application to German. Proceedings of the ACL SIGDAT-Workshop. Dublin, Ireland.
- Helmut Schmid (1994): Probabilistic Part-of-Speech Tagging Using Decision Trees. Proceedings of International Conference on New Methods in Language Processing, Manchester, UK.
The Tree Tagger pages are maintained by Helmut Schmid.