<ltext> | text with lexicon link-up |
<lau> | an annotation unit. The boundaries of this element are determined by the punctuation mark. |
<lw> | a word within the annotation unit <lau>. |
<lmu> | a mark-up unit that may contain COMMENT or BACKGROUND information |
<lm> | a marker within the marku-up unit <lmu>. |
<lkop> | a link-up unit within a word <lw>. |
ref | the identification code is composed of one, two or three parts (depending
on the element with which it is associated) which are separated by a full
stop. The meaning is as follows:
<sample number>.<annotation unit, rank number>.<word/marker/punctuation mark, rank number> |
s | speaker identification. In the context of the <pau> element possible values of this attribute are: "Nxxxxx", "Vxxxxx" or "UNKOWN" where x denotes a digit. In the context of the <pmu> element the s attribute may have one of two possible values: "COMMENT" or "BACKGROUND". |
w | word form as it occurs in the orthographic transcription (cf. data in the .ort files) |
klem | lemma of the word form. The underscore "_" symbolises the absence of a lemma |
nlid | lexicon ID of the single or multi-word lemma. The ID refers to the single word lexicon (/data/lexicon/text/cgnlex.txt on the annotation DVD) unless a multi-word expression is involved. In that case the ID refers to the multi-word lexicon (/data/lexicon/text/cgnmlex.txt on the annotation DVD). Multiple references to the lexicon are separated by a vertical bar ("|") (eg nlid="16763|16764). nlid="0" is used when there is no corresponding lemma in the lexicon. |
ksize | the number of parts in the multi-word expression. In case of a single word item the value is ksize="1". |
kparts | references to the individual parts of the multi-word expression:
<annotation unit, rank number>.<word rank number> |
All characters used from the ISO-8859.1 character set that fall outside the 7-bit range have been translated according to the Character entity references for ISO 8859-1 characters. The set of special characters used can be found in ltext.dtd on the annotation DVD. In entities.htm an overview is presented of the various standards for this character (sub)set.