Titel IMSLex German Lexicon


The IMSLex dictionary database is our central lexicon repository. It covers information on inflection, word formation, and valence for several ten thousand German base forms (see details below). From the IMSLex database, we derive specialized lexicon data for various applications in natural language processing, information retrieval, and information extraction. Where necessary, semantic information can be added from the Tübingen GermaNet lexical-semantic dictionary.



In order to keep the dictionary data as flexible as possible, it has been encoded on an XML basis. However, for efficiency reasons, the data is stored in a relational database with the help of XML-DBMS.

The lexical data itself has been built up semi-automatically with the help of specialized text mining methods from corpus linguistics. 



The IMSLex data can be used in many ways:

  • creation of full form lexicons

    With the help of the AMOR generator of inflected forms, one can create full form lexicons like the one, which is incorporated in the statistical lexicon for the TreeTagger part-of-speech tagger.
    A sample full form lexicon (adjectives, nouns, and verbs starting with 'p') can be downloaded here.

  • on-line morphological and syntactic analysis

    For example, the German ParGram grammar uses IMSLex as its lexical database for deep syntactic analysis


Current lexicon size (May 2003)

 inflection stemderivation stemcomposition stemvalence info
adjectives 11,000 23 80 2,000
adverbs 1,000 n/a n/a n/a
nouns 22,500 1,000 12,500 10,000
particles 300 n/a n/a n/a
proper nouns 10,000      
verbs 6,000 350 160 6,000
167 derivation suffixes


For the size and contents of the GermaNet lexical-semantic dictionary, please refer to the GermaNet homepage.


Please contact CLARIN-D Stuttgart (clarin AT ims.uni-stuttgart.de) if you need more information.


Please refer to the dissertation of Arne Fitschen: Ein Computerlinguistisches Lexikon als komplexes System (pdf).

