Grammaticalization of German Prepositions - Test Set
- Dominik Schlechtweg, Sabine Schulte im Walde
This data collection supplementing the paper referenced below contains:
- a test set containing 206 German prepositions with 4 different degrees of grammaticalization as identified by Di Meola (2014). It comes as a tab-separated csv file where each line corresponds to one preposition and has the form
word_forms POS_tags degree
Different word forms for a particular preposition are separated by '/', while different words in multi-word prepositions are separated by '_'. POS-tags are separated by ','.
- the measure predictions coming as a tab-separated csv file where each line corresponds to one preposition and has the form
word_forms entropy frequency types degree
The value '-999' means that a word form was not found in the corpus after preprocessing.
Find more information in the paper referenced below.
Dominik Schlechtweg and Sabine Schulte im Walde. 2018. Distribution-based Prediction of the Degree of Grammaticalization for German Prepositions. In Cuskley, C., Flaherty, M., Little, H., McCrohon, L., Ravignani, A. & Verhoef, T. (Eds.): The Evolution of Language: Proceedings of the 12th International Conference (EVOLANGXII).
The resources are freely available for education, research and other non-commercial purposes.