The .wrd format

Files of type .wrd (that can be found in /data/annot/text/wrd/ of the annotation DVD) comprise a manually verified word segmentation in which the words occurring in the orthographic transcription have been linked to the audio signal. The files are in ShortTextGrid format and can be produced, changed or viewed by means of the PRAAT software. For a description of the ShortTextGrid format, see the description of the .ort-formaat. For every speaker two tiers are envisaged. The tier name of the first tier is the speaker ID. It is identical to the same tier in the .ort file. The next tier receives the same name with the suffix _FON (N98765 and N98765_FON respectively) and comprises the phonetic transcription that can also be found in the .fon file. The time markers are the same in both tiers.

An interval in the tier with the orthographic transcription is filled with exactly one word (with or without underscores), a single underscore ("_"), or a pause (empty interval).

In the tier with the phonetic transcription the following phenomena can occur:

when in the .fon file it is indicated that a phoneme is shared by two words, the following two situations can occur:
- the shared phoneme is not a plosive (for a description of the class of plosives see the description of the .fon format). On both sides of the boundary that separates the two words an equal sign ("=") is used to indicate that the two words share the last and the first phonemes resp.
- the shared phoneme is a plosive, and therefore acoustically cannot be divided. A separate segment is defined that contains just the shared plosive and is labelled with an underscore ("_") in both the tier with the orthographic transcription and the tier with the orthographic transcription. If the shared plosive coincides with the transcription of a word so that the plosive is shared between itself and the following or preceding word, then in the segment the phonetic label of this plosive is represented by means of an underscore ("_") on the side where the plosive is shared.
when for reasons of pronunciation two words are connected by means of a linking sound, this is represented in the tier with the phonetic transcription by placing the linking sound in between hyphens ("-").

For an overview of the ponetic symbols that were used we refer to the description of the .fon format. Analogous to the .fon format, the .wrd file does not comprise a BACKGOUND and/or COMMENT tier.