Vietnamese dataset for similarity and relatedness

This dataset consists of two kinds of datasets: The first dataset, namely ViCon, comprises pairs of synonyms and antonymys across noun, verb, and adjective classes, offerring data to distinguish between similarity and dissimilarity. The second dataset ViSim-400 is a dataset of semantic relation pairs which contains degrees of similarity across five semantic relations, as rated by human judges

Vietnamese dataset for similarity and relatedness

Typ
ExperimentData
Autor
Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu

This dataset consists of two kinds of datasets: The first dataset, namely ViCon, comprises pairs of synonyms and antonymys across noun, verb, and adjective classes, offerring data to distinguish between similarity and dissimilarity. The second dataset ViSim-400 is a dataset of semantic relation pairs which contains degrees of similarity across five semantic relations, as rated by human judges.

Referenz

Kim-Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu (2018)
Introducing Two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness
In: Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). New Orleans, LA.

Download
Dieses Bild zeigt Sabine Schulte im Walde

Sabine Schulte im Walde

Prof. Dr.

Akademische Rätin

Dieses Bild zeigt Thang Vu

Thang Vu

Prof. Dr.

Lehrstuhlinhaber Digitale Phonetik, Stiftungsprofessur der Carl-Zeiss-Stiftung

Zum Seitenanfang