Derivational Lexicons for German: DErivBase and DErivCELEX

DErivBase is a large-coverage derivational lexicon for German (Zeller et al., 2013). It consists of derivational families, groups of lemmas which are derivationally related among each other. Since v2.0, such derivational families are automatically split into semantically consistent clusters (Zeller et al., 2014). Version 2.0 covers 280,336 lemmas; 65,420 of them are grouped into 20,371 non-singleton families (i.e., 214,916 are singleton families). The lexicon was extracted from SDeWAC, a large German web corpus, with HOFM, a rule-based framework written in Haskell

DErivCELEX is a derivational lexicon for German (Shafaei et al., 2017) with a structure similar to DErivBase. It was constructed on the basis of German CELEX, a large, manually constructed lexicon resource. DErivCELEX covers roughly 47.000 lemmas grouped into 28.000 families. Its lexical coverage is lower than that of DErivBase (in particular with regard to the long tail of infrequent words), but it is built on the basis of a cleaner resource and thus contains fewer false positives.

Choose your favourite DErivBase version here:

DErivBase version Features Download link
v2.0 Morphological families are split into semantically coherent subclusters (cf. Zeller et al. 2014)
v1.4.1 Morphological families built with 267 derivation rules (incl. meaning-changing prefixations) + bugfix
v1.4 Morphological families built with 267 derivation rules (incl. meaning-changing prefixations)


Choose your favorite DErivCELEX version here (both described in Shafaei et al. 2017):

DErivCELEX version Features Download link
V1 Distinguishes conversion/derivation vs. composition as encoded in CELEX DErivCelex-v1.txt
V2 Treats prefix verbs (which are analysed as composition in CELEX) as cases of derivation. DErivCelex-v2.txt

The gold-annotated evaluation datasets described in the DErivBase paper are available here: test-samples.tar

For further details on DErivBase's build process, versions, and data format, please consult the documentation.


DErivBase and DErivCELEX are made available under the CreativeCommons license CC BY-SA 3.0. By downloading the software and/or lexicon, you acknowledge the terms and conditions of the CC BY-SA license.

