Supplement: Target files and additional plots for Knupleš et al. CoNLL 2023 paper

In Knupleš et al. (2023), we utilised the concreteness norms collected by Brysbaert et al. (2014), including approximately 40,000 English target words. The resource contains individual ratings by 25 participants on a 5-point scale ranging from 1 (abstract) to 5 (concrete), mean ratings and standard deviations. No context or part-of-speech (POS) were given; in a post-processing step, Brysbaert et al. (2012) added POS and frequency information from the SUBTLEX-US corpus.

We followed a further post-processing step suggested by Schulte im Walde and Frassinelli (2022), who assigned the most frequently occurring POS tag and frequency information to the target words using the ENCOW web corpus (Schäfer and Bildhauer, 2012; Schäfer, 2015), and then reduced the targets to a less ambiguous and less low-frequent subset by discarding words for which (i) the predominant POS did not represent at least 95% of all POS occurrences; (ii) the newly assigned ENCOW POS tag was not identical to the SUBTLEX-US POS tag, or (iii) for which the ENCOW target frequency was lower than 10, 000. Our subset includes 5, 448 nouns, 1, 280 verbs and 2, 205 adjectives, and is available in the three files targets_adjectives_conll_2023.tsv, targets_nouns_conll_2023.tsv, and targets_verbs_conll_2023.tsv.

Moreover, we provide plots for characteristics of verb targets and characteristics of adjective targets that were done in parallel to the plots for characteristics of noun targets presented in the paper.

See the publication pdf for references and further information.