The .awd format
Files of type .awd (to found in /data/annot/text/awd of the annotation DVD)
comprise an automatically generated word segmentation in which the words
of the orthographic transcription have been linked to the audio signal. The files
also contain an automatically generated phoneme segmentation in which the individual
phonemes have been linked to the audio signal. The files are in ShortTextGrid
format and can be produced, changed or viewed by means of the PRAAT
software. For a description of the ShortTextGrid format, see the description
of the .ort format. For each speaker three tiers are envisaged. The
first tier bears the speakercode as tier name and is identitical to the tier with
the same name in the .ort file. The next tier has the same name with
the suffix _FON (N98765 and N98765_FON resp.) and comprise an automatically
generated phonetic transcription. The time markers on the two tiers are identical.
Finally, there is a third tier with the same name and with the suffix _SEG
(N98765_SEG). In this tier the underlying phoneme segmentations are represented
that correspond to the words on the other two tiers.
An interval in the tier with the orthographic transcription contains exactly
one word (with or without underscores), a single underscore ("_"),
a pause (empty interval), or a text (multiple words) as they occur exactly in
the same interval in the orthographic transcritpion (.ort file). In the latter
case the tiers with the phonetic transcription and the phoneme segmentation
is occupied by the automatically generated phonetic transcription without segmentation
information. Moreover, in all three tiers intervals of this type an exclamation
mark "!" precedes the text, which indicates that the segmentation
(which is absent) is unreliable. An exaclamation mark "!" can also
occur if a segmentation is present but was found to be unreliable (by some standard).
In the tier with the phonetic transcription the following phenomena may occur:
- In case it has been indicated in the .fon file that a phoneme is
shared by two words, then the following two situations may arise:
- the shared phoneme is not a plosive (for the set of plosive see the
description of the .fon format). On both
sides of the boundary that separates the two words an equal sign ("=")
is used to indicate that the two words share the last and first phoneme
resp.
- the shared phoneme is a plosive, and therefore acoustically undevidable.
A separate segment is defined that contains exactly the shared plosive
and which is labelled by means of an underscore ("_")
in both the tier with the phonetic transcription and the tier with the
orthographic transcription. If the shared plosive coincides with the transcription
of a word so that the shared plosive is shared by itself and the word
preceding or following it, then in the segment also the phonetic label
of the plosive is included with the underscore "_" on
the side where the plosive is shared.
- If because of reasons of pronunciation two words are connected by means
of an inserted sound,, then this is indicated in the tier of the phonetic
transcription by means of a hyphen on either side of the inserted sound ("-").
In the tier with the phoneme segmentation empty intervals or intervals with a
single phoneme symbol only occur in cases where the "_" segment
from the orthographic transcription and phonetic transcription is labelled with
the shared phoneme (a plosive). In a similar fashion a shared phoneme that is
not a plosive is represented in a single tier in which the boundaries in the orthographic
and phonetic tier occur in the middle of the interval.
For an overview of the phonetic symbols that have been used, see the description
of the .fon format. Analogous to the .wrd format
the .awd file does not comprise a BACKGOUND and/or COMMENT tier.