The .wrd format
Files of type .wrd (that can be found in /data/annot/text/wrd/ of the annotation DVD)
comprise a manually verified word segmentation in which the words occurring in
the orthographic transcription have been linked to the audio signal. The files
are in ShortTextGrid format and can be produced, changed or viewed by means of
the PRAAT software. For a description
of the ShortTextGrid format, see the description of the .ort-formaat.
For every speaker two tiers are envisaged. The tier name of the first tier is
the speaker ID. It is identical to the same tier in the .ort file. The
next tier receives the same name with the suffix _FON (N98765 and N98765_FON
respectively) and comprises the phonetic transcription that can also be found
in the .fon file. The time markers are the same in both tiers.
An interval in the tier with the orthographic transcription is filled with
exactly one word (with or without underscores), a single underscore ("_"),
or a pause (empty interval).
In the tier with the phonetic transcription the following phenomena can occur:
- when in the .fon file it is indicated that a phoneme is shared by
two words, the following two situations can occur:
- the shared phoneme is not a plosive (for a description of the class
of plosives see the description of the .fon format).
On both sides of the boundary that separates the two words an equal sign
("=") is used to indicate that the two words share the
last and the first phonemes resp.
- the shared phoneme is a plosive, and therefore acoustically cannot
be divided. A separate segment is defined that contains just the shared
plosive and is labelled with an underscore ("_") in both
the tier with the orthographic transcription and the tier with the orthographic
transcription. If the shared plosive coincides with the transcription
of a word so that the plosive is shared between itself and the following
or preceding word, then in the segment the phonetic label of this plosive
is represented by means of an underscore ("_") on the
side where the plosive is shared.
- when for reasons of pronunciation two words are connected by means of a
linking sound, this is represented in the tier with the phonetic transcription
by placing the linking sound in between hyphens ("-").
For an overview of the ponetic symbols that were used we refer to the description
of the .fon format. Analogous to the .fon
format, the .wrd file does not comprise a BACKGOUND and/or COMMENT tier.