Next: Expansions of special characters Up: Token-to-word rules Previous: Abbreviations   Contents
The punctuation marks are detached from the words in the module Text and saved as features in the token relation. Punctuation marks are used (among other things) to determine sentence breaks. In text preprocessing they are mainly used to determine ordinals and abbreviations. Whitespaces are handled as features in the token relation.