Institut

Studium

Forschung

 

Vietnamese dataset for similarity and relatedness

Typ ExperimentData
Titel Vietnamese dataset for similarity and relatedness
Autor Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu

Beschreibung

This dataset consists of two kinds of datasets: The first dataset, namely ViCon, comprises pairs of synonyms and antonymys across noun, verb, and adjective classes, offerring data to distinguish between similarity and dissimilarity. The second dataset ViSim-400 is a dataset of semantic relation pairs which contains degrees of similarity across five semantic relations, as rated by human judges.


Referenz

Kim Anh Nguyen, Sabine Schulte im Walde and Ngoc Thang Vu. Introducing two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HTL). New Orleans, Louisiana, June 2018.


Download

The resources are freely available for education, research and other non-commercial purposes. For download, click here.