Diplomarbeit and Studienarbeit Subjects

If you are a computational linguistics or computer science student looking for an advisor for a Diplomarbeit or Studienarbeit I encourage you to email me (in either English or German). Typically I like to get an idea of your background (computational linguistics, programming languages, human languages spoken), and then we have a meeting where I outline 3 or 4 research projects with associated papers, and then after reading these papers you pick your topic.

Here is a brief list of a few ideas, but I am likely to present a different list to you based on your background.

The Machine Translation reading group would provide useful background for many of the research topics I am interested in.


Modeling context in phrase-based statistical machine translation Phrase-based statistical machine translation involves the memorization of long units of consecutive words, which are referred to as phrases. The purpose of this study is to determine where a large amount of context is necessary and where less context is sufficient.
Features for recommender systems Recommender systems are an interesting machine learning problem which are just beginning to be seriously studied. This work will involve the extraction and application of features useful in this domain.
Negation in German/English statistical machine translation Negation is often realized quite differently in German than it is in English, and existing machine translation systems do not perform well when translating sentences with negative polarity. This idea involves an initial study of simple baseline approaches to handling negation and then determining how to handle negation in a better way.
Distributed search algorithms for word alignment Word alignment is a computationally expensive process. This project will involve implementing and optimizing efficient algorithms coded in C++ and distributed across a large cluster of computers.
A study of language model smoothing for statistical machine translation Smoothing is a critical component of all statistical systems, but its impact on statistical machine translation performance has not been systematically explored.
Controlling search parameters Search algorithms used for structured prediction have parameters which control how the search space is pruned, but these are often set in an ad-hoc fashion outside of the learning process. This project will involve controlling these parameters directly as a part of the learning process.
Spanish to German (or English) machine translation Spanish to German machine translation involves overcoming many problems including PRO-Drop (pronouns are optional in many contexts in Spanish) and verb placement in German.