TreeTagger – a Language Independent Part-of-Speech Tagger

A tool for annotating text with part-of-speech and lemma information

TreeTagger – a language independent part-of-speech tagger

Type
Tool
Author
Helmut Schmid
Description

The TreeTagger is a tool for annotating text with part-of-speech and lemma information. It was developed by Helmut Schmid in the TC project at the Institute for Computational Linguistics of the University of Stuttgart. The TreeTagger has been successfully used to tag various languages including German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek, Portuguese, Chinese, Swahili, Latin, Estonian and old French texts and is adaptable to other languages if a lexicon and a manually tagged training corpus are available.

Sample output:

word pos lemma
The DT the
TreeTagger NP TreeTagger
is VBZ be
easy JJ easy
to TO to
use VB use
. SENT .

The TreeTagger can also be used as a chunker for English, German, and French. The parameter file for the French chunker was kindly provided by Michel Généreux.

Reference
Download

The Tree Tagger pages are maintained by Helmut Schmid.

 

General Contact IMS

Pfaffenwaldring 5 b, 70569 Stuttgart

 

Webmaster of the IMS

  • Write e-mail
  • If you have any problems with the website, please directly contact the webmaster.
To the top of the page