Bild von Institut mit Unilogo
home uni IMS suche Search kontakt Contact
unilogo Universität Stuttgart

IMSLex German Lexicon

 
 

The IMSLex dictionary database is our central lexicon repository. It covers information on inflection, word formation, and valence for several ten thousand German base forms (see details below). From the IMSLex database, we derive specialized lexicon data for various applications in natural language processing, information retrieval, and information extraction. Where necessary, semantic information can be added from the Tübingen GermaNet lexical-semantic dictionary.


Technology

In order to keep the dictionary data as flexible as possible, it has been encoded on an XML basis. However, for efficiency reasons, the data is stored in a relational database with the help of XML-DBMS.

The lexical data itself has been built up semi-automatically with the help of specialized text mining methods from corpus linguistics.

Please refer to the dissertation of Arne Fitschen: Ein Computerlinguistisches Lexikon als komplexes System (ps) (pdf), the IMSLex related publications and the list of IMSLex related projects for further information.


Applications

The IMSLex data can be used in many ways:
  • creation of full form lexicons

    With the help of the AMOR generator of inflected forms, one can create full form lexicons like the one, which is incorporated in the statistical lexicon for the TreeTagger part-of-speech tagger.
    A sample full form lexicon (adjectives, nouns, and verbs starting with 'p') can be downloaded here.

  • on-line morphological and syntactic analysis

    For example, the German ParGram grammar uses IMSLex as its lexical database for deep syntactic analysis.


Current lexicon size

(May 2003)

  inflection stem derivation stem composition stem valence info
adjectives 11,000 23 80 2,000
adverbs 1,000 n/a n/a n/a
nouns 22,500 1,000 12,500 10,000
particles 300 n/a n/a n/a
proper nouns 10,000      
verbs 6,000 350 160 6,000
167 derivation suffixes


For the size and contents of the GermaNet lexical-semantic dictionary, please refer to the GermaNet homepage.