Simulating Lexical Semantic Change from Sense-Annotated Data

Dominik Schlechtweg, Sabine Schulte im Walde


This data collection supplementing the paper referenced below contains:

  • a lemmatized English text corpus pair (SEMCOR1, SEMCOR2) based on SemCor in which lexical semantic change has been simulated (corpora/)
  • a lexical semantic change detection testset containing 148 lemmas with frequencies >=50 in both SEMCOR1 and SEMCOR2 (testset/)

    The file testset.tsvcontains the following information:

    • lemma: lemma
    • T1: sense frequency distribution in SEMCOR1
    • T2: sense frequency distribution in SEMCOR2
    • freq1: lemma frequency in SEMCOR1
    • freq2: lemma frequency in SEMCOR2
    • freq_error: relative frequency error of annotated frequency against final lemma frequency
    • poly: maximal number of senses in SEMCOR1 and SEMCOR2
    • freq: normalized frequency difference between freq1 and freq2
    • graded: graded change score of lemma, G(lemma)
    • binary: binary change score of lemma, B(lemma)


    The files poly.tsvand freq.tsvcontain the scores for the polysemy and frequency baselines from the paper.


Dominik Schlechtweg and Sabine Schulte im Walde. 2020. Simulating Lexical Semantic Change from Sense-Annotated Data. In Ravignani, A. and Barbieri, C. and Martins, M. and Flaherty, M. and Jadoul, Y. and Lattenkamp, E. and Little, H. and Mudd, K. and Verhoef, T. (Eds.): The Evolution of Language: Proceedings of the 13th International Conference (EvoLang13).


The resources are freely available for education, research and other non-commercial purposes. More information can be requested via email to the authors.

