Junior Professorship Computational Linguistics (van der Plas)

Junior Professor of Computational Linguistics, Chairholder Lonneke van der Plas

You have reached the page of the former Computational Linguistics group, that was headed by Jun.-Prof Lonneke van der Plas. She is now senior lecturer at the Institute of Linguistics at the University of Malta. At the same time, she is co-project leader at the IMS (SFB 732:D11).

The project members' research interests include the following:

Multilingual approaches to CL

  • Transferring annotations from one language to the other: Supervised machine learning methods need annotated data. Annotating data is expensive. For many languages such data does not exist. Can we transfer annotation from languages that have annotated data to languages that do not have this data automatically?
  • Using cross-lingual variation to better analyse monolingual phenomena: Languages differ in surface structure. Schokoladentorte (chocolate cake) is written in one word, the French translation uses a preposition gateau au chocolat. Does this tell us something about the relation between chocolate and cake?
  • Finding term variations or synoynms: Autumn and fall are both translated to Herbst in German. We can infer that autumn and fall are synonyms. 
  • Applications for which we used multilingual approaches: (medical) terminology extraction,  semantic role labelling for French, lexicon bootstrapping for the patent domain.

Natural language understanding

  • Distributional semantics: The distribution of words over contexts tells us something about their meaning. By scanning large amounts of data and  using statistical methods we can, for example, find that milk and water are the same kind of things. They are both liquid and can be drunk.
  • Semantic role labelling: the structure of sentences can be described with syntactic trees. Semantic role labelling tries to bring syntactic structure a bit closer to semantics, by giving arguments a semantic role, such as agent or theme.
  • Applications for which we used NLU: question answering, dialogue systems, text-to-speech systems.

Staff

  • Head of group

Jun.-Prof Lonneke van der Plas

 

  • PhD students

Patrick Ziering

Stefan Müller

 

  • Secretary:

       Sybille Laderer

Projects

Current Projects:

 

SFB 732 project D11: A crosslingual approach to the analysis of compound nouns (DFG 2014-2018)

 
This project proposes a compositional approach to noun-noun (N-N) compound analysis with an interdependent three-level model that comprises compound splitting, capturing the meaning of the components and the covert relation that holds between them. Ambiguity is found on all levels, with the highest ambiguity found on the level where the implicit relation is uncovered. The two possible split points in the German compound Kuhlerwartung,  Kuhl-erwartung (‘cool expectation’) vs. Kuhler-wartung  (‘radiator maintenance’) illustrate the ambiguity that arises at the level of compound splitting. The endless list of covert relations that can hold between the constituents of the compound becomes apparent when we look at the following examples: a chocolate cake is a cake made of chocolate, a wedding cake is a cake made for a wedding, and a cupcake is a cake made using a metal cup.  Crosslingual approaches are promising for semantic analysis due to the regular variation found in different languages. For example, whereas English leaves the compound relation covert, in French we find prepositions that correlate with the relation type. Chocolate cake, cake made of chocolate, is translated with gateau au chocolat, whereas wedding cake, cake made for a wedding, is gateau de marriage. We will use multi-lingual data throughout the project, in analysis and evaluation. We will work towards a wide-coverage integrational approach, using automatic, knowledge-lean, corpus-based methods.

The SFB 732 Incremental Specification in Context is a collaborative research centre (Sonderforschungsbereich) funded by the German National Science Foundation (DFG),  which was established in 2006 and has been renewed in 2010 and in 2014. In its 16 research projects and the integrated graduate school it brings together more than 40 researchers from the Institute of Natural Language Processing and the Institute of Linguistics at the University of Stuttgart.

The common scientific goal is to achieve a better understanding of the mechanisms that lead to ambiguity control/disambiguation as well as the enrichment of missing/incomplete information and to develop methods that are able to fully describe these mechanisms.

 

Past Projects

 

CLASSiC project: Cross-lingual semantic annotation from English to French (EU FP7 2008-2011)

In the CLASSiC project  (Computational Learning in Adaptive Systems for Spoken Conversation) we are focusing on semantic role labeling for French and in particular on methods to automatically generate semantic annotations for French. Syntactic annotation is available for French, but no semantic information. Since there is semantic annotation available for English and there are parallel corpora for the language pair English-French, we transfer the semantic annotation from English to French translations using word alignments. Contrary to previous work (Padó and Pitel, TALN 2007; Padó and Lapata, Comp. Ling. 2009; Basili et al. CICLing 2009), we did not use an ontology constructed for the target language. We want to minimize the amount of manual labour and aim for broad coverage annotations. We used the PropBank annotation framework constructed for English to annotate French sentences, after having tested the cross-lingual validity of PropBank (Van der Plas et al., LAW 2010). Because we know that there is a high correlation between syntax and semantics (see also Merlo and Van der Plas, ACL 2009), we leveraged the information contained in the syntactic annotations in a second step. In this step we trained a syntactic-semantic parser on the combination of syntactic annotations and the semantic annotations resulting from transfer.  The automatically generated semantic annotations for French are close to the upper bound from manual annotations (Van der Plas et al., ACL 2011).

Watch a video of the current CLASSiC system.

PhD project: Automatic lexico-semantic acquisition for question answering (NWO IMIX 2004-2008)

(Promotor: John Nerbonne, co-promotor: Gosse Bouma)

Freedom and liberty share the same meaning. Paris denotes a city, and the word party triggers associations of wine and fun for many. People naturally acquire these lexico-semantic relations such as synonyms, categorised named entities, and associations by using language in their daily life.

For many natural language processing applications, such as question answering, this type of information is essential, e.g. to recognise that a particular meaning can be inferred from different text variants or to compensate for the lack of general world knowledge.

This thesis proposes three methods for using large text corpora to acquire lexico-semantic information automatically: a syntax-based method, a multilingual word-alignment-based method and a proximity-based method. The three methods complement each other in the type of data needed, the way they deal with sparse data and most importantly, in the types of lexico-semantic information they provide. This information is then applied to the Groningen question answering system Joost. Among the different types of lexico-semantic information acquired, categorised named entities, e.g. Paris denotes a city, improved the system the most and this information was obtained with the syntax-based method. 

Try our demo's of semantically related words (in Dutch). The complete text of my thesis can be found in here.

 

 

General Contact IMS

Pfaffenwaldring 5 b, 70569 Stuttgart

 

Student Counselling

 

Webmaster of the IMS

  • Write e-mail
  • If you have any problems with the website, please directly contact the webmaster.
To the top of the page