Statistical Machine Translation - Decoding with Moses

We will build a state of the art phrase-based SMT system, tune it with minimum error rate training (MERT, described in lecture 4) and run a test set to see the BLEU score.

We will replicate the shared task from the ACL 2008 Third Workshop on Statistical Machine Translation (WMT08). However, we will use a smaller data set so that the experiment runs quickly.

We will build the baseline system from WMT08 by following the directions here.

The instructions are written for French to English, but you may also build German to English. If you wish to build German to English, you need to substitute "de" for "fr" throughout the web page.

You should build the system on your own Linux laptop or a Linux computer at the IMS. First, transfer wmt08_small.tar.gz to your machine, it contains the data you will use for your experiments. Expand the tar file in a fresh directory (it will create a subdirectory called wmt08 containing the training, dev and devtest data you will use). Then follow the WMT08 directions from the top. One change from that description is that you should use my scripts.tar.gz rather than their "scripts.tgz" file because it contains the BLEU scoring tool you will need. See also "IMPORTANT COMMENTS" below for some bug fixes in the WMT08 directions.

Other details

IMPORTANT COMMENTS: