Institut

Studium

Forschung


 

next up previous contents
Next: Expansions of special characters Up: Token-to-word rules Previous: Abbreviations &nbsp Contents

Inter-punctuation and whitespaces

The punctuation marks are detached from the words in the module Text and saved as features in the token relation. Punctuation marks are used (among other things) to determine sentence breaks. In text preprocessing they are mainly used to determine ordinals and abbreviations. Whitespaces are handled as features in the token relation.



Martin Barbisch
2001-08-28