Vietnamese dataset for similarity and relatedness
- Typ
-
ExperimentData
- Autor
-
Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu
-
This dataset consists of two kinds of datasets: The first dataset, namely ViCon, comprises pairs of synonyms and antonymys across noun, verb, and adjective classes, offerring data to distinguish between similarity and dissimilarity. The second dataset ViSim-400 is a dataset of semantic relation pairs which contains degrees of similarity across five semantic relations, as rated by human judges.
- Referenz
-
Kim Anh Nguyen, Sabine Schulte im Walde and Ngoc Thang Vu. Introducing two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HTL). New Orleans, Louisiana, June 2018.
- Download
-
The resources are freely available for education, research and other non-commercial purposes. For download, click here.

Sabine Schulte im Walde
Apl. Prof. Dr.Akademische Rätin

Thang Vu
Prof. Dr.Lehrstuhlinhaber Digitale Phonetik, Stiftungsprofessur der Carl-Zeiss-Stiftung