Below the prosodic annotation of
the data is described in some detail, while also the aim and motivation
for this type of annotation are touched upon. Other topics that are discussed
include the protocol that was developed and the procedure that was adopted,
and the file types and formats that are available. Finally, an overview
is presented of the data for which in version 1.0 a prosodic annotation
is available.
Read more about
Key elements to be annotated were: (i) prominent syllables, i.e. syllables which are carrying clear prominence; (ii) breaks, i.e. between-word and within-word interruptions if the normal speech stream; and (iii) segmental lengthenings, i.e. unusual lengthenings of sounds, lengthenings which do not carry any prominence but which, for example, carry an emotion or represent a filled pause (hesitation).
Since the objective was to let naive listeners (i.e. listeners with no phonological or prosodic background) perform the prosodic annotation, it was decided to ask these listeners to mark the prosodic phenomena mentioned above in the orthographic transcription rather than in for example the phonetic transcription. This had the advantage that the procedure for the prosodic annotation was independent of the avaibility of the phonetic transcriptions.
Although several levels of prominence, interruption and lengthening can be distinguished, it was decided to aim for an annotation that would be a good compromise between the level of detail, annotation time and expected inter-transcriber consistency. Thus, it was decided that the annotation would show only one level of prominence, two levels of interruption (strong and weak), and one level of segmental lengthening.
Prior to the actual prosodic annotation of part of the corpus, a pilot study was carried out.
References
In order to facilitate the annotation
process, use was made of the PRAAT program. All the data selected for prosodic
annotation was annotated independently by two transcribers. It is left
to the user to decide which annotation he/she wants to use.
Martens, J.-P. 2002. Protocol
voor prosodische annotatie. (Here available in .ps
and .pdf
format; Dutch only.)
Files in the TextGrid format may
be found in the directories /data/annot/text/pro1/ and /data/annot/text/pro2/
of the annotation DVD
Files in the XML format are to be
found in the directories /data/annot/xml/prx1/ and /data/annot/xml/prx2/
of the annotation DVD
Table 1. Overview of data for
which prosodic annatations are available
(VL = data originating from Flanders;
NL = data originating from the Netherlands)
Component | Total number
of words |
|||
---|---|---|---|---|
|
|
|||
a.
|
Spontaneous conversations ('face-to-face') |
87,394
|
49,988 | 37,406 |
b.
|
Interviews with teachers of Dutch |
15,263
|
7,667 | 7,596 |
c.
|
Spontaneous telephone dialogues (recorded via a switchboard) |
39,944
|
19,874
|
20,070
|
d.
|
Spontaneous telephone dialogues (recorded on MD via a local interface) |
0
|
0 |
0
|
e.
|
Simulated business negotiations |
7,485
|
0 | 7,485 |
f. | Interviews/dicsussions/debates (broadcast) |
17,544
|
10,007 | 7,537 |
g.
|
(political) Discussions/debates/meetings (non-broadcast) |
13,902
|
5,414
|
7,678 |
h.
|
Lessons recorded in the classroom |
0
|
0
|
0
|
i.
|
Live (eg sports) commentaries (broadcast) |
11,868
|
6,002 | 5,866 |
j.
|
Newsreports/reportages (broadcast) |
11,671
|
6,054 | 5,617 |
k.
|
News (broadcast) |
13,685
|
6,248 | 7,437 |
l.
|
Commentaries/columns/reviews (broadcast) |
13,539
|
5,998 | 7,541 |
m.
|
Ceremonious speeches/sermons |
2,102
|
1,124 | 798 |
n.
|
Lectures/seminars |
10,457
|
3,880 | 6,577 |
o.
|
Read speech | 0 | 0 | 0 |
Total |
244,044
|
122,256 | 121,788 |