Institute

Studying

Research


 

Lonneke van der Plas

Misses  Dr.
Lonneke van der Plas
Co-project leader (previously junior professor)

Lonneke van der Plas
E-Mail
Address
Universität Stuttgart
Institut für Maschinelle Sprachverarbeitung
Pfaffenwaldring 5b
70569 Stuttgart
Deutschland

Since October 2014, I am senior lecturer at the Institute of Linguistics of the University of Malta. At the same time, I am co-project leader at the IMS, University of Stuttgart.  Before that I was junior professor  at the IMS in the framework of the collaborative research centre SFB 732. I was post-doc (maître-assistante) at the University of Geneva working in the field of cross-lingual transfer of semantic role labelling as part of the CLASSiC project. I  earned my PhD from the University of Groningen, where I worked on automatic lexical acquisition from corpora within the Alfa-Informatica group. I was a visiting academic at the Division of Information and Communication Sciences of Macquarie University, Sydney from January till March 2007. I worked at ISSCO/TIM-ETI (University of Geneva) from 2002 until 2003. I worked in industry for one year at Systran Translation Systems in 2001-2002. Before that I did the M.Phil Computer Speech and Language Processing at the University of Cambridge. The M.Phil has now been renamed into Computer Speech, Text and Internet Technology. 

I have been working on the following subjects: cross-lingual natural language processing,  automatic lexical acquisition, text mining, (medical) terminology extraction, computational lexicology, question answering, semantic role labelling, probabilistic modelling, cross-lingual annotation transfer.


Projects

Current Projects:

 

SFB 732 project D11: A crosslingual approach to the analysis of compound nouns (DFG 2014-2018)

 

This project proposes a compositional approach to noun-noun (N-N) compound analysis with an interdependent three-level model that comprises compound splitting, capturing the meaning of the components and the covert relation that holds between them. Ambiguity is found on all levels, with the highest ambiguity found on the level where the implicit relation is uncovered. The two possible split points in the German compound Kuhlerwartung,  Kuhl-erwartung (‘cool expectation’) vs. Kuhler-wartung  (‘radiator maintenance’) illustrate the ambiguity that arises at the level of compound splitting. The endless list of covert relations that can hold between the constituents of the compound becomes apparent when we look at the following examples: a chocolate cake is a cake made of chocolate, a wedding cake is a cake made for a wedding, and a cupcake is a cake made using a metal cup.  Crosslingual approaches are promising for semantic analysis due to the regular variation found in different languages. For example, whereas English leaves the compound relation covert, in French we find prepositions that correlate with the relation type. Chocolate cake, cake made of chocolate, is translated with gateau au chocolat, whereas wedding cake, cake made for a wedding, is gateau de marriage. We will use multi-lingual data throughout the project, in analysis and evaluation. We will work towards a wide-coverage integrational approach, using automatic, knowledge-lean, corpus-based methods.

The SFB 732 Incremental Specification in Context is a collaborative research centre (Sonderforschungsbereich) funded by the German National Science Foundation (DFG),  which was established in 2006 and has been renewed in 2010 and again in 2014. In its 16 research projects and the integrated graduate school it brings together more than 40 researchers from the Institute of Natural Language Processing and the Institute of Linguistics at the University of Stuttgart. The common scientific goal is to achieve a better understanding of the mechanisms that lead to ambiguity control/disambiguation as well as the enrichment of missing/incomplete information and to develop methods that are able to fully describe these mechanisms.

 

Past Projects

 

CLASSiC project: Cross-lingual semantic annotation from English to French (EU FP7, 2008-2011)

In the CLASSiC project  (Computational Learning in Adaptive Systems for Spoken Conversation) we are focusing on semantic role labeling for French and in particular on methods to automatically generate semantic annotations for French. Syntactic annotation is available for French, but no semantic information. Since there is semantic annotation available for English and there are parallel corpora for the language pair English-French, we transfer the semantic annotation from English to French translations using word alignments. Contrary to previous work (Padó and Pitel, TALN 2007; Padó and Lapata, Comp. Ling. 2009; Basili et al. CICLing 2009), we did not use an ontology constructed for the target language. We want to minimize the amount of manual labour and aim for broad coverage annotations. We used the PropBank annotation framework constructed for English to annotate French sentences, after having tested the cross-lingual validity of PropBank (Van der Plas et al., LAW 2010). Because we know that there is a high correlation between syntax and semantics (see also Merlo and Van der Plas, ACL 2009), we leveraged the information contained in the syntactic annotations in a second step. In this step we trained a syntactic-semantic parser on the combination of syntactic annotations and the semantic annotations resulting from transfer.  The automatically generated semantic annotations for French are close to the upper bound from manual annotations (Van der Plas et al., ACL 2011).

Watch a video of the current CLASSiC system.

PhD project: Automatic lexico-semantic acquisition for question answering (NWO IMIX 2004-2008)

(Promotor: John Nerbonne, co-promotor: Gosse Bouma)

Freedom and liberty share the same meaning. Paris denotes a city, and the word party triggers associations of wine and fun for many. People naturally acquire these lexico-semantic relations such as synonyms, categorised named entities, and associations by using language in their daily life.

For many natural language processing applications, such as question answering, this type of information is essential, e.g. to recognise that a particular meaning can be inferred from different text variants or to compensate for the lack of general world knowledge.

This thesis proposes three methods for using large text corpora to acquire lexico-semantic information automatically: a syntax-based method, a multilingual word-alignment-based method and a proximity-based method. The three methods complement each other in the type of data needed, the way they deal with sparse data and most importantly, in the types of lexico-semantic information they provide. This information is then applied to the Groningen question answering system Joost. Among the different types of lexico-semantic information acquired, categorised named entities, e.g. Paris denotes a city, improved the system the most and this information was obtained with the syntax-based method. 

Try our demo's of semantically related words (in Dutch). The complete text of my thesis can be found in here.

Teaching

Parsing (University of Stuttgart, Wintersemester 2014-2015, 2013-2014)

Algorithmisches Sprachverstehen `Natural language understanding' (University of Stuttgart, Summersemester 2012, 2013, 2014) 

Seminar Distributional Semantics (University of Stuttgart, Wintersemester 2012-2013, 2013-2014)

Méthodes empiriques et langages de script (University of Geneva, 2009-2010, 2010-2011)

Research Master - Corpus Linguistics (University of Groningen, 2007-2008)

Tekstmanipulatie (University of Groningen, 2005-2007)

Talks

2014

Coling, Dublin

  • Global methods for cross-lingual semantic role and predicate labelling
  • Global methods for cross-lingual semantic role and predicate labelling

 

2011

ACL, Portland, USA
Scaling up Cross-Lingual Semantic Annotation Transfer
        

2010

Journées de travail à la Fondation Hardt, Vandoeuvres, Switzerland (invited)
Présentation d'outils pour l'analyse lexicale et la recherche du vocabulaire propre à une thématique.

LAW workshop, ACL Uppsala, Sweden
Cross-lingual Validity of PropBank in the Manual Annotation of French
.
        
DART workshop, Webster University, Geneva, Switzerland
Automatic acquisition of synonyms for French using parallel corpora.

Mentorat-Relève, University of Geneva, Switzerland
How a computer learns the meaning of words.

Nhumi Technologies, Zürich, Switzerland (invited)
Natural language understanding.


2009

Section d'informatique et méthodes mathématiques, Université de Lausanne, Switzerland (invited)
Les textes, sources inépuisables d'informations: 
La découverte automatique de mots similaires.

NaTAL09 , LORIA, Nancy, France (invited)
Distributional methods for the extraction of semantically related words.

NAACL, Boulder, Colorado, USA
Domain Adaptation with Artificial Data for Semantic Parsing of Speech.
        
NAACL workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics, Colorado, USA
Combining Syntactic Co-occurrences and Nearest Neighbours in Distributional Methods to Remedy Data Sparseness.
  
CLIN, Groningen, The Netherlands
Using the transitivity of meaning for distributional methods to remedy data sparseness.

CLIN, Groningen, The Netherlands
Training a parser on artificially fragmented data for spoken language understanding.       


2008

ILPS seminar, University of Amsterdam, The Netherlands (invited)
Automatic acquisition of lexico-semantic information for question answering.
Listen to this talk on IslaTV
 
2007

HERA conference. Tallinn, Estonia (invited)
Finding Synonyms Automatically Using Multilingual Parallel Corpora.
Listen to this talk on mms://193.40.5.165/2007/archimedes/hera/5_Doctoral_students_thematic_poster_session.wmv
          
Kick-off Meeting Sem.Metrix. University of Leuven, Belgium (invited)
Finding Semantically Related Words Using Distributional Similarity in Syntactic Contexts.

LATL seminar, University of Geneva, Geneva, Switzerland (invited)
Automatic Acquisition of Semantically Related Words. 

Language Technology Meeting, Macquarie University, Sydney, Australia (invited)
Automatic Acquisition of Lexico-semantic Knowledge in Joost. 
 
Clin17, Leuven, Belgium. 
Finding Synonyms in Movie Subtitles Using Automatic Word Alignment.

Masterclass presentation WISER conference, Maastricht, The Netherlands
Wat leert een computer van krant lezen? 

Q-go Natural Language Search, Diemen, The Netherlands (invited)
Automatic Acquisition of Lexico-semantic Knowledge for QA.

2006

Réunion DES, Lattice-ENS Paris, France
Extraction Automatique d'Information Lexicale et Sémantique.

Language Technology Meeting. Macquarie University. Sydney, Australia
The Question Answering System Joost.

Tabu-dag. Groningen, The Netherlands
Finding Dutch Synonyms by Comparing Translations of the Same Text in Multiple Languages.

Coling/ACL. Sydney, Australia
Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity.

2005

Clin. Leiden, The Netherlands
Syntactic Contexts for finding Semantically Similar Words.

Ontolex, Jeju Island, South-Korea
Automatic Acquisition of Lexico-Semantic Knowledge for QA.

2004

LREC. Lisbon, Portugal
Keyword Extraction from Spoken Text. A Comparison of Two Lexical Resources: EDR and WordNet.

IM2 Meeting. Geneva, Switzerland
Keyword Extraction from Spoken Text.

Publications

2014

Lonneke van der Plas, Marianna Apidianaki, and Chenhua Chen
Global methods for cross-lingual semantic role and predicate labelling
Coling

Patrick Ziering and Lonneke van der Plas
What good are ‘Nominalkomposita’ for ‘noun compounds’: Multilingual Extraction and Structure Analysis of Nominal Compositions using Linguistic Restrictors
Coling

Lonneke van der Plas and Marianna Apidianaki
Cross-lingual word sense disambiguation for  predicate labelling of French
TALN
 
2013

IJCNLP
[bib]

Patrick Ziering, Lonneke van der Plas, Hinrich Schütze
Bootstrapping Semantic Lexicons for Technical Domains
IJCNLP
[bib]

Joerg Tiedemann, Lonneke van der Plas, Begona Villada Morón.
Bitexts as Semantic Mirrors
.

Workshop on Twenty Years of Bitext in connection with EMNLP 2013

2012
 
Sarah Cruchet, Celia Boyer, Lonneke van der Plas.
Trustworthiness and relevance in web-based clinical question answering.
in 
Health Informatics: Building a Healthcare Future Through Trusted Information. 
Stud Health Technol Inform.:180:863-7.

2011

Lonneke van der Plas, Paola Merlo and James Henderson
Scaling up Cross-Lingual Semantic Annotation Transfer [pdf]
In Proceedings of ACL/HLT, Portland, US, pp 299-304.

Lonneke van der Plas, Jörg Tiedemann, and Jean-Luc Manguin
Synonym acquisition across domains and languages
Chapter in V. Pallotta, A. Soro, and E. Vargiu, ed., Advances in Distributed Agent-based Retrieval Tools, Springer-Verlag, Berlin, pp 41-58.

Lonneke van der Plas, Jörg Tiedemann, and Ismail Fahmi
Automatic extraction of medical term variants from multilingual parallel translations
Chapter in A. van den Bosch, and G. Bouma, ed., Interactive Multi-modal Question Answering. Theory and Applications of Natural
Language Processing. Springer Verlag, Berlin, ISBN 978-3-642-17524-4, pp 149-170.

2010

Tanja Samardzic, Lonneke van der Plas,  Goljihan Kashaeva, and Paola Merlo
Variation in verbal predicates in English and French [pdf]
In Generative Grammar in Geneva (GG@G), Volume 6, pp 109 - 135.

Tanja Samardzic, Lonneke van der Plas,  Goljihan Kashaeva, and Paola Merlo
The Scope and the Sources of Variation in Verbal Predicates in English and French [pdf]
In Proceedings of the 9th International Workshop on Treebanks and Linguistic Theories, Tartu, Estonia.

Lonneke van der Plas, Jörg Tiedemann
Finding Medical Term Variations using Parallel Corpora and Distributional Similarity [pdf]
In Proceedings of the Coling workshop on ontologies and lexical resources, Beijing, China.

Lonneke van der Plas, Tanja Samardzic, and Paola Merlo
Cross-lingual Validity of PropBank in the Manual Annotation of French [pdf]
In Proceedings of the 4th Linguistic Annotation Workshop (The LAW IV), Uppsala, Sweden.

Lonneke van der Plas, Gosse Bouma, Jori Mur
Automatic Acquisition of Lexico-semantic Knowledge for QA
Chapter in Chu-Ren Huang, ed.,  Ontology and the Lexicon, Studies in Natural Language Processing,
Cambridge University Press, Cambridge, UK. pp 271--287
ISBN 978-0-521-88659-8.

Lonneke van der Plas, Jörg Tiedemann and Jean-Luc Manguin
Automatic acquisition of synonyms for French using parallel corpora [pdf]
In Proceedings of the 4th International Workshop on Distributed Agent-based Retrieval Tools, Geneva, Switzerland.

2009

Paola Merlo and Lonneke van der Plas
Abstraction and Generalisation in Semantic Role Labels: PropBank, VerbNet or both? [pdf]
In Proceedings of ACL-IJCNLP, Singapore.


Cedric Boidin, Verena Rieser, Lonneke van der Plas, Oliver Lemon, Jonathan Chevelu
Predicting how it sounds: Re-ranking dialogue prompts based on TTS quality for adaptive Spoken Dialogue Systems
In the Interspeech special session on Machine Learning for Adaptivity in Spoken Dialogue Systems.


Lonneke van der Plas
Combining Syntactic Co-occurrences and Nearest Neighbours in Distributional Methods to Remedy Data Sparseness [pdf]
In Proceedings of the NAACL workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics, Boulder, US.


Lonneke van der Plas, James Henderson, and Paola Merlo
Domain Adaptation with Artificial Data for Semantic Parsing of Speech [pdf]
In Proceedings of NAACL, Boulder, US.



2008

Lonneke van der Plas 

Automatic lexico-semantic acquisition for question answering (PhD thesis)
In the GRODIL series.
ISBN 978-90-367-3564-3.

Lonneke van der Plas and Jörg Tiedemann
Using Lexico-Semantic Information for Query Expansion in Passage Retrieval for Question Answering [pdf]
In Coling 2008 Workshop: Information Retrieval for Question Answering, Manchester, UK.

Lonneke van der Plas, Jean-Luc Manguin and Jörg Tiedeman
Extraction de synonymes à partir d'un corpus multilingue aligné [pdf]
In Actes des journées de linguistique de corpus, Lorient, France.

Jean-Luc Manguin, Lonneke van der Plas, Jörg Tiedemann
Le traitement automatique: un moteur pour l'évolution des dictionnaire de synonymes [pdf]
In Actes du colloque "Lexicographie et informatique: bilan et perspectives, Nancy, France.

Lonneke van der Plas and Jörg Tiedeman
Finding Synonyms Automatically in Multilingual Parallel Corpora
In Proceedings of the HERA Conference, Tallinn, Estonia. 

2007

Ismail Fahmi, Gosse Bouma and Lonneke van der Plas
Using Multilingual Terms for Biomedical Term Extraction
In Proceedings of the RANLP Workshop on Acquisition and Management of Multilingual Lexicons , Borovetz, Bulgaria.

2006

Lonneke van der Plas and Jörg Tiedemann 
Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity [pdf]
In Proceedings of ACL/Coling.

Jori Mur and Lonneke van der Plas 
Anaphora Resolution for Off-line Answer Extraction using Instances [pdf]
In Proceedings of the Workshop for Anaphora Resolution (WAR).

Gosse Bouma, Ismail Fahmi, Jori Mur, Gertjan van Noord, Lonneke van der Plas, Jörg Tiedemann
The University of Groningen at QA@CLEF2006. Using Syntactic Knowledge for QA.

Gosse Bouma, Ismail Fahmi, Jori Mur, Gertjan van Noord, Lonneke van der Plas, Jörg Tiedemann
Linguistic Knowledge and Question Answering [pdf]
In Traitement Automatique des Langues, vol 46(3), pp 15-39.

Gosse Bouma, Jori Mur, Gertjan van Noord, Lonneke van der Plas, and Jörg Tiedemann
Question Answering for Dutch using Dependency Relations
In Proceedings of the CLEF2005 workshop. Lecture Notes in Computer Science. Springer.

2005 

Lonneke van der Plas and Gosse Bouma
Automatic Acquisition of Lexico-Semantic Knowledge for QA [pdf]
In Proceedings of the IJCNLP workshop on Ontologies and Lexical Resources, Jeju Island, South Korea.

Lonneke van der Plas and Gosse Bouma 
Syntactic Contexts for finding Semantically Similar Words [pdf]
In Proceedings of CLIN 04.

Gosse Bouma, Jori Mur, Gertjan van Noord, Lonneke van der Plas, and Jörg Tiedemann
Question Answering for Dutch using Dependency Relations [pdf]
In Proceedings of the CLEF 2005 Workshop.

2004

Lonneke van der Plas, Vincenzo Palotta, Martin Rajman and Hatem Ghorbel
Keyword Extraction from Spoken Text. A Comparison of Two Lexical Resources: EDR and WordNet
In Proceedings of LREC, volume VI, Lisbon, Portugal.

Fabio Rinaldi, James Dowdall, Michael Hess, Kaarel Kaljurand, Andreas Persidis, Babis Theodoulidis, Bill Black, John McNaught, Haralampos Karanikas, Argyris Vasilakopoulos, Kelly Zervanou, Luc Bernard, Gian Piero Zarri, Hilbert Bruins Slot, Chris van der Touw, Margaret Daniel-King, Nancy Underwood, Agnes Lisowska, Lonneke van der Plas, Veronique Sauron, Myra Spiliopoulou, Marko Brunzel, Jeremy Ellman, Giorgos Orphanos, Thomas Mavroudakis, Spiros Taraviras
Parmenides: an opportunity for ISO TC37 SC4? 
In the ACL workshop Workshop on Linguistic Annotation: Getting the Model Right Sapporo, Japan.

Links
My personal website : http://sites.google.com/site/lonnekenlp/