Prosodic annotation

A small part of the corpus was annotated prosodically.

Below the prosodic annotation of the data is described in some detail, while also the aim and motivation for this type of annotation are touched upon. Other topics that are discussed include the protocol that was developed and the procedure that was adopted, and the file types and formats that are available. Finally, an overview is presented of the data for which in version 1.0 a prosodic annotation is available.

Aim and motivation

A prosodic annotation of part of the data was undertaken in order to make available those aspects (esp. of spontaneous speech) which refer to the temporal grouping of words, the emphasis put on particular words, the appearance of spontaneous speech effects (eg filled pauses) which are not yet encoded in the orthographic transcription, the presence of emotions, etc. In order to be useful for different types of research, the annotation should be independent of any specific theory and reflect those aspects of prosody that could easily be captured by a naive listener. Therefore, it was decided not to have a ToDI-like labelling (Gussenhoven et al. 1999); instead a perceptually-based annotation similar to the annotation as proposed in for example Portele & Heuft (1997) and Grover et al. (1998) was aimed for.

Key elements to be annotated were: (i) prominent syllables, i.e. syllables which are carrying clear prominence; (ii) breaks, i.e. between-word and within-word interruptions if the normal speech stream; and (iii) segmental lengthenings, i.e. unusual lengthenings of sounds, lengthenings which do not carry any prominence but which, for example, carry an emotion or represent a filled pause (hesitation).

Since the objective was to let naive listeners (i.e. listeners with no phonological or prosodic background) perform the prosodic annotation, it was decided to ask these listeners to mark the prosodic phenomena mentioned above in the orthographic transcription rather than in for example the phonetic transcription. This had the advantage that the procedure for the prosodic annotation was independent of the avaibility of the phonetic transcriptions.

Although several levels of prominence, interruption and lengthening can be distinguished, it was decided to aim for an annotation that would be a good compromise between the level of detail, annotation time and expected inter-transcriber consistency. Thus, it was decided that the annotation would show only one level of prominence, two levels of interruption (strong and weak), and one level of segmental lengthening.

Prior to the actual prosodic annotation of part of the corpus, a pilot study was carried out.

References

Grover, C., J. Facrell, H. Vereecken, J.P. Martens and B. Van Coile. 1998. Designing prosodic databases for automatic modelling in 6 languages. In Proceedings of the 3rd ESCA/COCOSDA Workshop on Speech Synthesis (Jenolan Caves). pp. 93-98.
Gussenhoven, C., T. Rietveld and J. Terken. 1999. ToDI, Transcription of Dutch Intonation. http: lands.let.kun.nl/todi
Portele, T. and B. Heuft. 1995. Two kinds of stress perceptions. In Proceedings LREC 2002. Las Palmas, pp. 1432-1437.

Return to the top of this page.

Procedure

Before the prosodic annotation was begun, all punctuation marks were removed from the orthographic transcription. This was done so as to avoid a bias towards putting breaks at syntactic boundaries. The orthographic transcription without the punctuation marks was then presented to the transcribers.

In order to facilitate the annotation process, use was made of the PRAAT program. All the data selected for prosodic annotation was annotated independently by two transcribers. It is left to the user to decide which annotation he/she wants to use.

Return to the top of this page.

Protocol

The protocol used in the process of annotating the data prosodically is the following:

Martens, J.-P. 2002. Protocol voor prosodische annotatie. (Here available in .ps and .pdf format; Dutch only.)

Return to the top of this page.

File types and formats

For all data that were annotated prosodically, two versions are available. These were produced independently of each other by two different people. The prosodic annoations have been stored in two formats:

the (short) TextGrid format as it was generated by the PRAAT software. This format can be imported into PRAAT again. The file extension for files of this type is .pro1 or .pro2;
XML format. The extension for these files is .prx1 or prx2.

For descriptions of the different formats, see the respective descriptions of the pro format and the prx format.

Files in the TextGrid format may be found in the directories /data/annot/text/pro1/ and /data/annot/text/pro2/ of the annotation DVD
Files in the XML format are to be found in the directories /data/annot/xml/prx1/ and /data/annot/xml/prx2/ of the annotation DVD

Return to the top of this page.

Overview of available data

In Table 1 an overview is presented of the data that are available in version 1.0 of the corpus. For a more detailed description of the design of the corpus and the motivation for this design, we refer to the corpus design.

Table 1. Overview of data for which prosodic annatations are available
(VL = data originating from Flanders; NL = data originating from the Netherlands)

Component Total number
of words

VL NL

a.
Spontaneous conversations ('face-to-face')
87,394
49,988 37,406

b.
Interviews with teachers of Dutch
15,263
7,667 7,596

c.
Spontaneous telephone dialogues (recorded via a switchboard)
39,944

19,874

20,070

d.
Spontaneous telephone dialogues (recorded on MD via a local interface)
0
0
0

e.
Simulated business negotiations
7,485
0 7,485

f. Interviews/dicsussions/debates (broadcast)
17,544
10,007 7,537

g.
(political) Discussions/debates/meetings (non-broadcast)
13,902

5,414
7,678

h.
Lessons recorded in the classroom
0

0

0

i.
Live (eg sports) commentaries (broadcast)
11,868
6,002 5,866

j.
Newsreports/reportages (broadcast)
11,671
6,054 5,617

k.
News (broadcast)
13,685
6,248 7,437

l.
Commentaries/columns/reviews (broadcast)
13,539
5,998 7,541

m.
Ceremonious speeches/sermons
2,102
1,124 798

n.
Lectures/seminars
10,457
3,880 6,577

o.
Read speech 0 0 0

Total
244,044
122,256 121,788

Component	Total number of words
VL	NL
a.	Spontaneous conversations ('face-to-face')	87,394	49,988	37,406
b.	Interviews with teachers of Dutch	15,263	7,667	7,596
c.	Spontaneous telephone dialogues (recorded via a switchboard)	39,944	19,874	20,070
d.	Spontaneous telephone dialogues (recorded on MD via a local interface)	0	0	0
e.	Simulated business negotiations	7,485	0	7,485
f.	Interviews/dicsussions/debates (broadcast)	17,544	10,007	7,537
g.	(political) Discussions/debates/meetings (non-broadcast)	13,902	5,414	7,678
h.	Lessons recorded in the classroom	0	0	0
i.	Live (eg sports) commentaries (broadcast)	11,868	6,002	5,866
j.	Newsreports/reportages (broadcast)	11,671	6,054	5,617
k.	News (broadcast)	13,685	6,248	7,437
l.	Commentaries/columns/reviews (broadcast)	13,539	5,998	7,541
m.	Ceremonious speeches/sermons	2,102	1,124	798
n.	Lectures/seminars	10,457	3,880	6,577
o.	Read speech	0	0	0
Total	244,044	122,256	121,788

Return to the top of this page.