Prosodically annotated speech databases
The corpus of prosodically annotated speech data that is exploited in project C4 consists of the following components:
Material gathered via digital satellite radio at IMS
- news stories recorded from the digital satellite radio at IMS
- a feature series recorded from the digital satellite radio at IMS
Material recorded by others but further processed at IMS
- acted speech from a multimedia CD-ROM
- material recorded elsewhere but processed by the project
Material recorded at IMS
- read short excerpt of a novell recorded at IMS
- material recorded under controlled lab conditions at IMS
- some read newspaper articles recorded at IMS
Processing of the database
all subcorpora are further processed to different extends:
- automatic word, syllable and phoneme alignment
- automatic POS tagging
- automatic ToBI labelling
- manual ToBI labelling
There are several utilities that ease acces to the database
- selection & search utilities
sr