Derivational Lexicons for German: DErivBase and DErivCELEX
- Typ
-
Lexicon
- Autor
-
Sebastian Padó
- Beschreibung
-
DErivBase is a large-coverage derivational lexicon for German (Zeller et al., 2013). It consists of derivational families, groups of lemmas which are derivationally related among each other. Since v2.0, such derivational families are automatically split into semantically consistent clusters (Zeller et al., 2014). Version 2.0 covers 280,336 lemmas; 65,420 of them are grouped into 20,371 non-singleton families (i.e., 214,916 are singleton families). The lexicon was extracted from SDeWAC, a large German web corpus, with HOFM, a rule-based framework written in Haskell
DErivCELEX is a derivational lexicon for German (Shafaei et al., 2017) with a structure similar to DErivBase. It was constructed on the basis of German CELEX, a large, manually constructed lexicon resource. DErivCELEX covers roughly 47.000 lemmas grouped into 28.000 families. Its lexical coverage is lower than that of DErivBase (in particular with regard to the long tail of infrequent words), but it is built on the basis of a cleaner resource and thus contains fewer false positives.
- Referenz
-
@InProceedings{zellerEtAl:13, author = {Zeller, Britta and \v{S}najder, Jan and Pad{\'o}, Sebastian}, title = {{DE}riv{B}ase: Inducing and Evaluating a
Derivational Morphology Resource for {G}erman}, booktitle = {{Proceedings of ACL 2013}}, year = {2013}, address = {Sofia, Bulgaria}, pages = {1201--1211}, url = {www.aclweb.org/anthology/P13-1118.pdf}}
@InProceedings{shafaei17:_towar_cross_lingual_compar_of_deriv_lexic,
author = {Elnaz Shafaei and Diego Frassinelli and
Gabriella Lapesa and Sebastian Padó},
title = {{DErivCELEX}: Development and Evaluation of a {G}erman
Derivational Morphology Lexicon based on {CELEX}},
booktitle = {Proceedings of the DeriMo workshop},
year = 2017,
address = {Milan, Italy}}
@InProceedings{padoEtAl:13, author = {Pad{\'o}, Sebastian \v{S}najder, Jan and Zeller, Britta}, title = {Derivational Smoothing for Syntactic Distributional Semantics}, booktitle = {{Proceedings of ACL 2013}}, year = {2013}, address = {Sofia, Bulgaria}, pages = {731--735}, url = {www.aclweb.org/anthology/P13-2128}}@InProceedings{zeller-pado-vsnajder:2014:Coling, author = {Zeller, Britta and Pad\'{o}, Sebastian and \v{S}najder, Jan}, title = {Towards Semantic Validation of a Derivational Lexicon}, booktitle = {Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers}, month = {August}, year = {2014}, address = {Dublin, Ireland}, publisher = {Dublin City University and Association for Computational Linguistics}, pages = {1728--1739}, url = {http://www.aclweb.org/anthology/C14-1163}}
- Download
-
Choose your favourite DErivBase version here:
DErivBase version Features Download link v2.0 Morphological families are split into semantically coherent subclusters (cf. Zeller et al. 2014) DErivBase-v2.0.zip v1.4.1 Morphological families built with 267 derivation rules (incl. meaning-changing prefixations) + bugfix DErivBase-v1.4.1.zip v1.4 Morphological families built with 267 derivation rules (incl. meaning-changing prefixations) DErivBase-v1.4.zip Choose your favorite DErivCELEX version here (both described in Shafaei et al. 2017):
DErivCELEX version Features Download link V1 Distinguishes conversion/derivation vs. composition as encoded in CELEX DErivCelex-v1.txt V2 Treats prefix verbs (which are analysed as composition in CELEX) as cases of derivation. DErivCelex-v2.txt
The gold-annotated evaluation datasets described in the DErivBase paper are available here: test-samples.tar
For further details on DErivBase's build process, versions, and data format, please consult the documentation. -
DErivBase and DErivCELEX are made available under the CreativeCommons license CC BY-SA 3.0. By downloading the software and/or lexicon, you acknowledge the terms and conditions of the CC BY-SA license.
Sebastian Padó
Prof. Dr.Lehrstuhlinhaber Theoretische Computerlinguistik, Geschäftführender Direktor des IMS