On Tuesday, October 27, 2020, IMS is hosting a small international online workshop on automatic detection of semantic change with presentations by Haim Dubossarsky (University of Cambridge), Simon Hengchen (University of Gothenburg) and Nina Tahmasebi (University of Gothenburg). The presentations will be streamed live. The workshop is funded by the Center for Reflected Text Analysis (CRETA).
|10:00 - 10:30||Nina Tahmasebi||An introduction to lexical semantic change [slides]|
|10:30 - 11:00||Dominik Schlechtweg||SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection [slides]|
|11:15 - 11:45||Simon Hengchen||On the need for diachronic semantic models for downstream tasks: The case of the changing vocabulary of the ‘nation’ in British, Dutch, Swedish and Finnish historical newspapers [slides]|
|11:45 - 12:15||Haim Dubossarsky||Doubt thy models: rethinking hypothesis testing in NLP [slides]|
An introduction to lexical semantic change
Nina Tahmasebi (University of Gothenburg)
In this talk I will give an overview of the work done in computational detection of semantic change over the past decade. I will present semantic change, and the impact it has on research in e.g., digital humanities. I will talk about the challenges of detecting as well as evaluating lexical semantic change, and how the newest models are in fact very similar to the oldest ones.
SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
Dominik Schlechtweg (University of Stuttgart)
Lexical Semantic Change detection, i.e., the task of identifying words that change meaning over time, is a very active research area, with applications in NLP, lexicography, and linguistics. Evaluation is currently the most pressing problem in Lexical Semantic Change detection, as no gold standards are available to the community, which hinders progress. I will present the results of the first shared task that addresses this gap by providing researchers with an evaluation framework and manually annotated, high-quality datasets for English, German, Latin, and Swedish. 33 teams submitted 186 systems, which were evaluated on two subtasks.
On the need for diachronic semantic models for downstream tasks: The case of the changing vocabulary of the ‘nation’ in British, Dutch, Swedish and Finnish historical newspapers
Simon Hengchen (University of Gothenburg)
Nation and nationhood are some of the most studied concepts in intellectual history. At the same time, “nation” and its historical usage are very vague. This article aims to develop a data-driven method using dependency parsing and neural word embeddings to help clarify some of the vagueness in the evolution of this concept. To do so, we propose a two-step method that first, using linguistic processing, creates a large set of words pertaining to the topic of nation. Second, we train diachronic word embeddings and use them to quantify the strength of the semantic similarity between these words and create meaningful clusters, which are then aligned diachronically. To illustrate the robustness of the study across languages, time spans, as well as large datasets, we apply it to the entirety of five historical newspapers archives in Dutch, Swedish, Finnish, and English.
Doubt thy models: rethinking hypothesis testing in NLP
Haim Dubossarsky (University of Cambridge)
Recent years have seen the rise of machine learning models in NLP research, which are applied inter alia, to research on questions motivated by linguistic theory. Indeed, it has now become relatively easy to model and to test research problems. The ease with which models can be deployed comes at the risk of careless use, which may potentially lead to unreliable findings and ultimately even hinder our ability to extend our knowledge. Such misuse may stem, for example, from unfamiliarity with the assumptions and hypotheses that are implicit to the models, or inherent confounds that demand experimental controls. In this talk, I will focus on problems that are specific to computational research of semantic change, where word embeddings are the prominent ML models. I will suggest ways to mitigate some of these problems, and ideas on how to perform valid scientific research in the age of all-to-easy modeling.