DIRE Dataset
- Type
-
Corpus
- Description
-
This page provides the dataset from Boleda et al. IWCS 2017. The dataset consists of six individual files:
- stimuli.train.gz, stimuli.valid.gz, stimuli.test.gz: The stimuli themselves, one sequence per line, for train set
(40K sequences), dev set (5K sequences), and test set (10K sequences). Total size: 4.5 MB. - image.dm.gz: The corresponding image vectors (from Lazaridou et al. NAACL 2015). Größe: 167MB.
- word.dm.gz: The corresponding word embeddings (aus Baroni et al. ACL 2014). Größe: 2.5MB.
The syntax of the stimulus files is as follows:
line = query query_position || entities || stimuli
query = category:modifier:modifier
entities = 6(entity )
entity = category_picindex
stimuli = 12(modifier:entity )The values of "category" serve as keys in word.dm, and the values of "entity" as keys in image.dm.gz.
These two files are simple line-based hash tables with the syntax "key value" which map string keys onto vectors.The DIRE implementation is available on this page: TBC.
- stimuli.train.gz, stimuli.valid.gz, stimuli.test.gz: The stimuli themselves, one sequence per line, for train set
- Reference
-
Living a discrete life in a continuous world: Reference in cross-modal entity tracking.
Proceedings of IWCS. Montpellier, France, 2017.
Gemma Boleda, Sebastian Padó, Nghia The Pham and Marco Baroni.
General Contact IMS
Pfaffenwaldring 5 b, 70569 Stuttgart
Webmaster of the IMS
- Write e-mail
- If you have any problems with the website, please directly contact the webmaster.