Models of Morphosyntax for Statistical Machine Translation -- Morphosyntaktische Modelle für statistische maschinelle Übersetzung
IMPORTANT: I have moved and this web page will no longer be updated, please see my new web page here
Models of Morphosyntax for Statistical Machine Translation
Statistical approaches to machine translation (MT) have shown themselves to be effective in the last few years. However, when translating into a morphologically rich language this is not true, particularly when there is also significant syntactic divergence between the two languages. The quality of statistical machine translation is poor in this case because of independence assumptions made between the models of morphology, syntax and translation that do not reflect linguistic reality.
The project uses advances in automatic linguistic analysis of syntax and morphology to advance statistical MT. The dependencies between morphology, syntax and translation are directly modeled. This leads to the creation of translation models and search algorithms that dramatically improve translation quality for morphologically rich languages.
Funded by the German Research Foundation
Principal Investigators
Present Staff
Fabienne Cap (nee Fritzinger)
Past Staff
Patrick Leucht
Renjing Wang
Publications
- Alexander Fraser, Helmut Schmid, Richard Farkas, Renjing Wang, Hinrich Schuetze (2013). Knowledge Sources for Constituent Parsing of German, a Morphologically Rich and Less-Configurational Language. Accepted for publication in Computational Linguistics, to appear. abstract
- Hassan Sajjad, Alexander Fraser, Helmut Schmid (2012). A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), pages 469-477, Jeju Island, Korea, July. abstract
- Fabienne Braune, Anita Gojun, Alexander Fraser (2012). Long-distance Reordering During Search for Hierarchical Phrase-based SMT. In Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT), pages 177-184, Trento, Italy, May. abstract
- Alexander Fraser, Marion Weller, Aoife Cahill, Fabienne Cap (2012). Modeling Inflection and Word-Formation in SMT. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 664-674, Avignon, France, April. abstract
- Anita Gojun, Alexander Fraser (2012). Determining the Placement of German Verbs in English-to-German SMT. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 726-735, Avignon, France, April. abstract
- Hassan Sajjad, Nadir Durrani, Helmut Schmid, Alexander Fraser (2011). Comparing Two Techniques for Learning Transliteration Models Using a Parallel Corpus. In Proceedings of The 5th International Joint Conference on Natural Language Processing (IJCNLP), pages 129-137, Chiang Mai, Thailand, November.
- Nadir Durrani, Helmut Schmid, Alexander Fraser (2011). A Joint Sequence Translation Model with Integrated Reordering. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), pages 1045-1054, Portland, Oregon, USA, June. Errata
- Hassan Sajjad, Alexander Fraser, Helmut Schmid (2011). An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), pages 430-439, Portland, Oregon, USA, June.
- Fabienne Braune, Alexander Fraser (2010). Improved Unsupervised Sentence Alignment for Symmetrical and Asymmetrical Parallel Corpora. In Proceedings of the the 23rd International Conference on Computational Linguistics (COLING) - Posters, pages 81-89, Beijing, China, August. Software
- Nadir Durrani, Hassan Sajjad, Alexander Fraser, Helmut Schmid (2010). Hindi-to-Urdu Machine Translation Through Transliteration. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pages 465-474, Uppsala, Sweden, July.
- Fabienne Fritzinger, Alexander Fraser (2010). How to Avoid Burning Ducks: Combining Linguistic Analysis and Corpus Statistics for German Compound Processing. In Proceedings of the ACL 2010 Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages 224-234, Uppsala, Sweden, July.
- Florian Schwarck, Alexander Fraser, Hinrich Schuetze (2010). Bitext-Based Resolution of German Subject-Object Ambiguities. In Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Short Papers, pages 737-740, Los Angeles, California, USA, June.