Vietnamese dataset for similarity and relatedness

This dataset consists of two kinds of datasets: The first dataset, namely ViCon, comprises pairs of synonyms and antonymys across noun, verb, and adjective classes, offerring data to distinguish between similarity and dissimilarity. The second dataset ViSim-400 is a dataset of semantic relation pairs which contains degrees of similarity across five semantic relations, as rated by human judges

Vietnamese dataset for similarity and relatedness

Typ
ExperimentData
Autor
Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu

This dataset consists of two kinds of datasets: The first dataset, namely ViCon, comprises pairs of synonyms and antonymys across noun, verb, and adjective classes, offerring data to distinguish between similarity and dissimilarity. The second dataset ViSim-400 is a dataset of semantic relation pairs which contains degrees of similarity across five semantic relations, as rated by human judges.

Referenz

Kim Anh Nguyen, Sabine Schulte im Walde and Ngoc Thang Vu. Introducing two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HTL). New Orleans, Louisiana, June 2018.

Download

The resources are freely available for education, research and other non-commercial purposes. For download, click here.

Dieses Bild zeigt Sabine Schulte im Walde

Sabine Schulte im Walde

Apl. Prof. Dr.

Akademische Rätin

Dieses Bild zeigt Thang Vu

Thang Vu

Prof. Dr.

Lehrstuhlinhaber Digitale Phonetik, Stiftungsprofessur der Carl-Zeiss-Stiftung

Zum Seitenanfang