IMSLex German Lexicon

The IMSLex dictionary database is our central lexicon repository. It covers information on inflection, word formation, and valence for several ten thousand German base forms (see details below). From the IMSLex database, we derive specialized lexicon data for various applications in natural language processing, information retrieval, and information extraction. Where necessary, semantic information can be added from the Tübingen GermaNet lexical-semantic dictionary

IMSLex German Lexicon

Type
Lexicon
Description

The IMSLex dictionary database is our central lexicon repository. It covers information on inflection, word formation, and valence for several ten thousand German base forms (see details below). From the IMSLex database, we derive specialized lexicon data for various applications in natural language processing, information retrieval, and information extraction. Where necessary, semantic information can be added from the Tübingen GermaNet lexical-semantic dictionary.

 

Technology

In order to keep the dictionary data as flexible as possible, it has been encoded on an XML basis. However, for efficiency reasons, the data is stored in a relational database with the help of XML-DBMS.

The lexical data itself has been built up semi-automatically with the help of specialized text mining methods from corpus linguistics. 

 

Applications

The IMSLex data can be used in many ways:

  • creation of full form lexicons

    With the help of the AMOR generator of inflected forms, one can create full form lexicons like the one, which is incorporated in the statistical lexicon for the TreeTagger part-of-speech tagger.
    A sample full form lexicon (adjectives, nouns, and verbs starting with 'p') can be downloaded here.

  • on-line morphological and syntactic analysis

    For example, the German ParGram grammar uses IMSLex as its lexical database for deep syntactic analysis

 

Current lexicon size (May 2003)

  inflection stem derivation stem composition stem valence info
adjectives 11,000 23 80 2,000
adverbs 1,000 n/a n/a n/a
nouns 22,500 1,000 12,500 10,000
particles 300 n/a n/a n/a
proper nouns 10,000      
verbs 6,000 350 160 6,000
167 derivation suffixes

 

For the size and contents of the GermaNet lexical-semantic dictionary, please refer to the GermaNet homepage.

Contact

Please contact CLARIN-D Stuttgart (clarin AT ims.uni-stuttgart.de) if you need more information.

Reference

Please refer to the dissertation of Arne Fitschen: Ein Computerlinguistisches Lexikon als komplexes System (pdf).

 

Project CLARIN-D

To the top of the page