DWUG DE Sense: A data set of historical word sense annotations in German
This data collection contains a subset of DWUG DE word usage data annotated with classical word sense definitions (DWUG DE Sense, see
data/*/judgments_senses.csv). From these annotations aggregated and cleaned sense labels were derived (
labels/*/labels_senses.csv). From these labels we derived additional binary semantic proximity labels between use pairs ('0' for different sense, '1' for same sense,
labels/*/labels_proximity.csv) and change labels reflecting sense changes between the two time periods from which word usages were sampled (
The sense labels were derived from the sense annotation by removing instances where not at least 2/3 annotators agree on the label (
maj_3). Note that the binary proximity labels were derived from the sense annotation, and not directly judged by humans (in contrast to other WUG data sets). Note that consequently also the change scores EARLIER, LATER and COMPARE were not calculated directly from human judgments, but from the inferred binary proximity labels. Please find the code aggregating and cleaning the data, deriving proximity labels and deriving change labels in the WUG repository.
Please find more information on the provided data in the paper referenced below.
Dominik Schlechtweg. 2023. Human and Computational Measurement of Lexical Semantic Change. PhD thesis. University of Stuttgart.
The resource is available per download.