The .awd format

Files of type .awd (to found in /data/annot/text/awd of the annotation DVD) comprise an automatically generated word segmentation in which the words of the orthographic transcription have been linked to the audio signal. The files also contain an automatically generated phoneme segmentation in which the individual phonemes have been linked to the audio signal. The files are in ShortTextGrid format and can be produced, changed or viewed by means of the PRAAT software. For a description of the ShortTextGrid format, see the description of the .ort format. For each speaker three tiers are envisaged. The first tier bears the speakercode as tier name and is identitical to the tier with the same name in the .ort file. The next tier has the same name with the suffix _FON (N98765 and N98765_FON resp.) and comprise an automatically generated phonetic transcription. The time markers on the two tiers are identical. Finally, there is a third tier with the same name and with the suffix _SEG (N98765_SEG). In this tier the underlying phoneme segmentations are represented that correspond to the words on the other two tiers.

An interval in the tier with the orthographic transcription contains exactly one word (with or without underscores), a single underscore ("_"), a pause (empty interval), or a text (multiple words) as they occur exactly in the same interval in the orthographic transcritpion (.ort file). In the latter case the tiers with the phonetic transcription and the phoneme segmentation is occupied by the automatically generated phonetic transcription without segmentation information. Moreover, in all three tiers intervals of this type an exclamation mark "!" precedes the text, which indicates that the segmentation (which is absent) is unreliable. An exaclamation mark "!" can also occur if a segmentation is present but was found to be unreliable (by some standard).

In the tier with the phonetic transcription the following phenomena may occur:

In case it has been indicated in the .fon file that a phoneme is shared by two words, then the following two situations may arise:
- the shared phoneme is not a plosive (for the set of plosive see the description of the .fon format). On both sides of the boundary that separates the two words an equal sign ("=") is used to indicate that the two words share the last and first phoneme resp.
- the shared phoneme is a plosive, and therefore acoustically undevidable. A separate segment is defined that contains exactly the shared plosive and which is labelled by means of an underscore ("_") in both the tier with the phonetic transcription and the tier with the orthographic transcription. If the shared plosive coincides with the transcription of a word so that the shared plosive is shared by itself and the word preceding or following it, then in the segment also the phonetic label of the plosive is included with the underscore "_" on the side where the plosive is shared.
If because of reasons of pronunciation two words are connected by means of an inserted sound,, then this is indicated in the tier of the phonetic transcription by means of a hyphen on either side of the inserted sound ("-").

In the tier with the phoneme segmentation empty intervals or intervals with a single phoneme symbol only occur in cases where the "_" segment from the orthographic transcription and phonetic transcription is labelled with the shared phoneme (a plosive). In a similar fashion a shared phoneme that is not a plosive is represented in a single tier in which the boundaries in the orthographic and phonetic tier occur in the middle of the interval.

For an overview of the phonetic symbols that have been used, see the description of the .fon format. Analogous to the .wrd format the .awd file does not comprise a BACKGOUND and/or COMMENT tier.