1. Phonetic Transcription Coding
Module
-
name: Phonetic Transcription
-
coding purpose: segmentation of
speech into phonetically labeled segments (SAMPA scheme)
-
data sources: spoken corpora (speech
files, orthographic transcription)
-
module references: orthographic
transcription module (optional)
-
description: The level defines
a base element, the <phone>
element, corresponding to a segment in the speech signal, labeled according
to its phonetic features. A <syllable>
element may be added, consisting of a sequence of <phones>.
The annotation at this level is a transcription and a segmentation, in
the sense that it refers directly to the speech signal, recognizes the
uttered sounds and splits the speech continuum into phonetic chunks. Each
<phone> will then be classified
with a phonetic label and associated with time information specifying its
start and end instants. Higher linguistic levels, like the phonological
prosodic levels or the orthographic word level, might inherit time information
from the phonetic level by linking their elements with <phone>
elements or <syllable> elements.
The scheme adopted here for
phonetic transcription is SAMPA [Wells et al., 1992], which is intended
for multi-lingual phonetic transcription. In the original SAMPA notation,
a transcription is a stream of phonetic labels and diacritics, where labels
classify phones and diacritics give further specifications about phones,
with the exception of stress marks which implicitly refer to the following
syllable. In our adaptation, the <syllable> element
is made explicit as a second layer built on top of the <phone>
layer.
-
example: The following example
shows the phonetic transcription of the Spanish word 'casa'
('house') and its corresponding syllabic segmentation, using the <phone>
and <syllable> elements:
phone.xml
<phone
id="phn_01" type="k" start="345" end="390"/>
<phone
id="phn_02" type="a" start="390" end="450"/>
<phone
id="phn_03" type="s" start="450" end="490"/>
<phone
id="phn_04" type="a" start="490" end="540"/>
syllable.xml
<syllable
id="sllbl_01" stress=""" href="phone.xml# id(phn_001)..id(phn_002)"/>
<syllable
id="sllbl_02" href="phone.xml# id(phn_003)..id(phn_004)"/>
ELEMENT
<phone>
ATTRIBUTES:
id
[ASCII]
type
[ASCII]*
start
[FLOAT]
end
[FLOAT]
* The attribute
`type', although defined as ASCII data, can only contain an allowable (language-dependent)
combination of SAMPA symbols and diacritics.
ELEMENT
<syllable>
ATTRIBUTES:
id [ASCII]
stress ", %
start [FLOAT]
end [FLOAT]
-
coding procedure:
-
Manual phonetic segmentation:
-
select the speech file and open
the synchronized windows for phonetic segmentation and waveform and spectrum
display
-
zoom until a detailed inspection
of the signal is possible
-
inspect and listen to the signal
portion until the uttered phonemes are recognized
-
select a phonetic label for the
first phone
-
identify its boundaries according
to the segmentation criteria and mark them by placing the cursor on the
proper point on the time-axis (this should automatically set the time attribute)
-
after phonetic segmentation is
concluded, define syllables by selecting their component <phone>'s and,
if stressed, by assigning the proper stress mark
-
Automatic segmentation:
-
listen to the speech sound and
transcribe it as a sequence of phones
-
apply the phonetic aligner to the
speech signal with its phonetic transcription and obtain its phonetic segmentation
-
import the phonetic segmentation
in the MATE environment
-
define syllables as in step 6 above.
-
creation notes:
-
Authors: Silvia Quazza, Juan María
Garrido
-
Version: 1., October 1999
-
Comments: none
-
Literature:
Phonetic Representation of Intonation -
F0 Coding Module