Simulating Lexical Semantic Change from Sense-Annotated Data
Type
ExperimentData
Author
Dominik Schlechtweg, Sabine Schulte im Walde
Description
This data collection supplementing the paper referenced below contains:
- a lemmatized English text corpus pair (SEMCOR1, SEMCOR2) based on SemCor in which lexical semantic change has been simulated (
corpora/
) -
a lexical semantic change detection testset containing 148 lemmas with frequencies >=50 in both SEMCOR1 and SEMCOR2 (
testset/
)The file
testset.tsv
contains the following information:- lemma: lemma
- T1: sense frequency distribution in SEMCOR1
- T2: sense frequency distribution in SEMCOR2
- freq1: lemma frequency in SEMCOR1
- freq2: lemma frequency in SEMCOR2
- freq_error: relative frequency error of annotated frequency against final lemma frequency
- poly: maximal number of senses in SEMCOR1 and SEMCOR2
- freq: normalized frequency difference between freq1 and freq2
- graded: graded change score of lemma, G(lemma)
- binary: binary change score of lemma, B(lemma)
The files
poly.tsv
andfreq.tsv
contain the scores for the polysemy and frequency baselines from the paper.
Reference
Dominik Schlechtweg and Sabine Schulte im Walde. 2020. Simulating Lexical Semantic Change from Sense-Annotated Data. In Ravignani, A. and Barbieri, C. and Martins, M. and Flaherty, M. and Jadoul, Y. and Lattenkamp, E. and Little, H. and Mudd, K. and Verhoef, T. (Eds.): The Evolution of Language: Proceedings of the 13th International Conference (EvoLang13).
Download
The resources are freely available. More information can be requested via email to the authors.
Sabine Schulte im Walde
Prof. Dr.Akademische Rätin (Associate Professor)
Dominik Schlechtweg
Dr.Junior research group leader