POS Tagger for Middle High German Texts
- Sarah Schulz, Nora Echelmeyer, Nils Reiter
The parameter file for TreeTagger has been created by training on a POS tagged version of the Mittelhochdeutsche Begriffsdatenbank. This annotation has been achieved semi-automatically by an automatic mapping of the grammatical annotation included in the database to Universal Dependencies POS tags based on manually annotated parts of the texts.
The training corpus contains about 10 million tokens with texts originating from different points of times, text genres and dialects.
You can find the tagger as web application here.
Nora Echelmeyer, Nils Reiter, Sarah Schulz (2017): “Ein PoS-Tagger für "das" Mittelhochdeutsche” in Dhd 2017 Konferenzabstracts, 2017, pp. 141-147.