Manual: How to use the IMS coreference system ***************** German *************************** 1) Download the IMS coref DE system and the pre-trained models http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/HotCorefDe --> If you only want to use the pre-trained model to annotate coreference in your own texts, go directly to point 3! 2) Training (you can skip this point with the pre-trained TüBa model) $ java -Xmx20g -cp ... ims.hotcoref.LearnWeights -lang "ger" -in -model -features -cores -lemmaBased You need to specify: is a full concatenation of your training data is the model that will be output is a file containing a list of features to use and is the number of cores (threads) the system will use. For non-local features, additionally add the following arguments -delayUpdates -beamEarlyIter -beam where is the beam size (20, for example). Example command (for my linux system): $java -Xmx20g -cp "./ims-hotcoref-de.jar:./lib/*" ims.hotcoref.LearnWeights -lang "ger" -in trainingData.txt -model ./de-model.mdl -features listOfFeatures.txt -cores "4" -lemmaBased 3) Testing: annotates coreference in your text files Testing for local version $java -Xmx20g -cp ... ims.hotcoref.Test -model -out -cores -in -lemmaBased where is the output file the pre-trained model the output text file the file in which you want to annotate coreference is the number of cores (threads) the system will use. Example (for my linux system): $java -Xmx20g -cp "./ims-hotcoref-de.jar:./lib/*" ims.hotcoref.Test -model ./de-model.mdl -out output.txt -cores "4" -in example.conll -lemmaBased Testing for non-local version Additionally specify -beam --> To train on the pre-trained model, use something like this: $ java -Xmx20g -cp "./ims-hotcoref-de.jar:./lib/*" ims.hotcoref.Test-cores "4" -model -in -lemmaBased -beam 20 -out -cores "4" ************* for English ********************** 1) Download the IMS coref system and the pre-trained models http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/HOTCoref.en.html 2) Download the Bergsma & Lin gender data http://conll.cemantix.org/2012/download/gender.data.gz 3) Training (you can skip this point with the pre-trained OntoNotes models) 3a) only using local features, no beam search or delayed laso (the basic version) $ java -Xmx20g -cp ... ims.hotcoref.LearnWeights -lang -in -model -features -cores -gender You need to specify: is either {ara,chi,eng,ger} is a full concatenation of your training data is the model that will be output is a file containing a list of features to use and is the number of cores (threads) the system will use. Additionally you have to pass the following parameter for English: -gender is the Bergsma & Lin gender data. Example (for my linux system): $java -Xmx10g -cp "./ims-hotcoref.jar:./lib/*" ims.hotcoref.LearnWeights -lang "eng" -in "ontonotes.conll" -features ./features/eng-fo-bnf -cores "4" -gender ./gender.data.gz 3b) also using non-local features (for description see ACL 2014 paper/talk on Anders's webpage) Additionally add the following arguments -delayUpdates -beamEarlyIter -beam where is the beam size (20, for example). 4) Testing 4a) Testing for local version: annotates coreference in your text files $java -Xmx20g -cp ... ims.hotcoref.Test -model -out -cores -in where is the output file the pre-trained model the output text file the file in which you want to annotate coreference is the number of cores (threads) the system will use. Example (for my linux system): $java -Xmx20g -cp "./ims-hotcoref.jar:./lib/*" ims.hotcoref.Test -model ./train+dev-eng-fo-opt.mdl -out output.txt -cores "4" -in example.conll 4b) Testing for non-local version Additionally specify -beam 5) Evaluation & CoNLL Scorer: see CoNLL shared task software: http://conll.cemantix.org/2012/software.html