DErivBase and DErivCELEX

Derivational Lexicons for German

Derivational Lexicons for German: DErivBase and DErivCELEX

Sebastian Padó

DErivBase is a large-coverage derivational lexicon for German (Zeller et al., 2013). It consists of derivational families, groups of lemmas which are derivationally related among each other. Since v2.0, such derivational families are automatically split into semantically consistent clusters (Zeller et al., 2014). Version 2.0 covers 280,336 lemmas; 65,420 of them are grouped into 20,371 non-singleton families (i.e., 214,916 are singleton families). The lexicon was extracted from SDeWAC, a large German web corpus, with HOFM, a rule-based framework written in Haskell

DErivCELEX is a derivational lexicon for German (Shafaei et al., 2017) with a structure similar to DErivBase. It was constructed on the basis of German CELEX, a large, manually constructed lexicon resource. DErivCELEX covers roughly 47.000 lemmas grouped into 28.000 families. Its lexical coverage is lower than that of DErivBase (in particular with regard to the long tail of infrequent words), but it is built on the basis of a cleaner resource and thus contains fewer false positives.

  author = {Zeller, Britta and \v{S}najder, Jan and Pad{\'o}, Sebastian},
  title = {{DE}riv{B}ase: Inducing and Evaluating a 
Derivational Morphology Resource for {G}erman}, booktitle = {{Proceedings of ACL 2013}}, year = {2013}, address = {Sofia, Bulgaria}, pages = {1201--1211}, url = {}}
author = {Elnaz Shafaei and Diego Frassinelli and
Gabriella Lapesa and Sebastian Padó},
title = {{DErivCELEX}: Development and Evaluation of a {G}erman
Derivational Morphology Lexicon based on {CELEX}},
booktitle = {Proceedings of the DeriMo workshop},
year = 2017,
address = {Milan, Italy}}

@InProceedings{padoEtAl:13, author = {Pad{\'o}, Sebastian \v{S}najder, Jan and Zeller, Britta}, title = {Derivational Smoothing for Syntactic Distributional Semantics}, booktitle = {{Proceedings of ACL 2013}}, year = {2013}, address = {Sofia, Bulgaria}, pages = {731--735}, url = {}}
  author = {Zeller, Britta  and  Pad\'{o}, Sebastian  and  \v{S}najder, Jan},
  title = {Towards Semantic Validation of a Derivational Lexicon},
  booktitle = {Proceedings of COLING 2014, the 25th International Conference on
    Computational Linguistics: Technical Papers},
  month = {August},
  year = {2014},
  address = {Dublin, Ireland},
  publisher = {Dublin City University and Association for Computational Linguistics},
  pages = {1728--1739},
  url = {}}

Choose your favourite DErivBase version here:

DErivBase version Features Download link
v2.0 Morphological families are split into semantically coherent subclusters (cf. Zeller et al. 2014)
v1.4.1 Morphological families built with 267 derivation rules (incl. meaning-changing prefixations) + bugfix
v1.4 Morphological families built with 267 derivation rules (incl. meaning-changing prefixations)


Choose your favorite DErivCELEX version here (both described in Shafaei et al. 2017):

DErivCELEX version Features Download link
V1 Distinguishes conversion/derivation vs. composition as encoded in CELEX DErivCelex-v1.txt
V2 Treats prefix verbs (which are analysed as composition in CELEX) as cases of derivation. DErivCelex-v2.txt

The gold-annotated evaluation datasets described in the DErivBase paper are available here: test-samples.tar

For further details on DErivBase's build process, versions, and data format, please consult the documentation.


DErivBase and DErivCELEX are made available under the CreativeCommons license CC BY-SA 3.0. By downloading the software and/or lexicon, you acknowledge the terms and conditions of the CC BY-SA license.

Dieses Bild zeigt Sebastian Padó

Sebastian Padó

Prof. Dr.

Lehrstuhlinhaber Theoretische Computerlinguistik, Geschäftführender Direktor des IMS

Zum Seitenanfang