The .lxk format

Files of type .lxk (lexicon link-up) are a chronological representation of this type of annotation in an XML text format. The structure of this XML text format is described in ltext.dtd which can be found on the annotation DVD.

<?xml version="1.0"?>
<!DOCTYPE ltext SYSTEM "ltext.dtd">
<ltext ref="fn123456">
  <lau ref="fn123456.1" s="N01036">
    <lw ref="fn123456.1.1" w="ga"> <lkop klem="gaan" nlid="30559"
      ksize="1" kparts="265.1"/> </lw>
    <lw ref="fn123456.1.2" w="je"> <lkop klem="je" nlid="135108"
      ksize="1" kparts="265.2"/> </lw>
    <lw ref="fn123456.1.3" w="nou"> <lkop klem="nou" nlid="135232"
      ksize="1" kparts="265.3"/> </lw>
    <lw ref="fn123456.1.4" w="met"> <lkop klem="met" nlid="135170"
      ksize="1" kparts="265.4"/> </lw>
    <lw ref="fn123456.1.5" w="de"> <lkop klem="de" nlid="134796"
      ksize="1" kparts="265.5"/> </lw>
    <lw ref="fn123456.1.6" w="trein"> <lkop klem="trein" nlid="104897"
      ksize="1" kparts="265.6"/> </lw>
    <lw ref="fn123456.1.7" w="naar"> <lkop klem="naar" nlid="135200"
      ksize="1" kparts="265.7"/> </lw>
    <lw ref="fn123456.1.8" w="Loon"> <lkop klem="Loon_Op_Zand" nlid="608839"
      ksize="3" kparts="265.8 265.9 265.10"/> </lw>
    <lw ref="fn123456.1.9" w="Op"> <lkop klem="Loon_Op_Zand" nlid="608839"
      ksize="3" kparts="265.8 265.9 265.10"/> </lw>
    <lw ref="fn123456.1.10" w="Zand"> <lkop klem="Loon_Op_Zand" nlid="608839"
      ksize="3" kparts="265.8 265.9 265.10"/> </lw>
    <lw ref="fn123456.1.11" w="of"> <lkop klem="of" nlid="135234"
      ksize="1" kparts="265.11"/> </lw>
    <lw ref="fn123456.1.12" w="met"> <lkop klem="met" nlid="135170"
      ksize="1" kparts="265.12"/> </lw>
    <lw ref="fn123456.1.13" w="de"> <lkop klem="de" nlid="134796"
      ksize="1" kparts="265.13"/> </lw>
    <lw ref="fn123456.1.14" w="bus"> <lkop klem="bus" nlid="16763|16764"
      ksize="1" kparts="265.14"/> </lw>
    <ll ref="fn123456.1.15" w="?"/>
  </lau>
  <lau ref="fn123456.2" s="N01265">
    <lw ref="fn123456.2.1" w="ja"> <lkop klem="ja" nlid="45366"
      ksize="1" kparts="73.1"/> </lw>
    <lw ref="fn123456.2.2" w="Partij"> <lkop klem="Partij_Van_De_Arbeid" nlid="610975"
      ksize="4" kparts="73.2 73.3 73.4 73.5"/> </lw>
    <lw ref="fn123456.2.3" w="Van"> <lkop klem="Partij_Van_De_Arbeid" nlid="610975"
      ksize="4" kparts="73.2 73.3 73.4 73.5"/> </lw>
    <lw ref="fn123456.2.4" w="De"> <lkop klem="Partij_Van_De_Arbeid" nlid="610975"
      ksize="4" kparts="73.2 73.3 73.4 73.5"/> </lw>
    <lw ref="fn123456.2.5" w="Arbeid"> <lkop klem="Partij_Van_De_Arbeid" nlid="610975"
      ksize="4" kparts="73.2 73.3 73.4 73.5"/> </lw>
    <lw ref="fn123456.2.6" w="is"> <lkop klem="zijn" nlid="122511"
      ksize="1" kparts="73.6"/> </lw>
    <lw ref="fn123456.2.7" w="iets"> <lkop klem="iets" nlid="135089"
      ksize="1" kparts="73.7"/> </lw>
    <lw ref="fn123456.2.8" w="vooruit"> <lkop klem="vooruitgaan" nlid="504346"
      ksize="2" kparts="73.8 73.9"/> </lw>
    <lw ref="fn123456.2.9" w="gegaan"> <lkop klem="vooruitgaan" nlid="504346"
      ksize="2" kparts="73.8 73.9"/> <lkop klem="achteruitgaan" nlid="500431"
      ksize="2" kparts="73.9 73.13"/> </lw>
    <lw ref="fn123456.2.10" w="'t"> <lkop klem="het" nlid="135669"
      ksize="1" kparts="73.10"/> </lw>
    <lw ref="fn123456.2.11" w="CDA"> <lkop klem="CDA" nlid="125724"
      ksize="1" kparts="73.11"/> </lw>
    <lw ref="fn123456.2.12" w="iets"> <lkop klem="iets" nlid="135089"
      ksize="1" kparts="73.12"/> </lw>
    <lw ref="fn123456.2.13" w="achteruit"> <lkop klem="achteruitgaan" nlid="500431"
      ksize="2" kparts="73.9 73.13"/> </lw>
    <lw ref="fn123456.2.14" w="SP"> <lkop klem="SP" nlid="132419"
      ksize="1" kparts="73.14"/> </lw>
    <lw ref="fn123456.2.15" w="verdubbeld"> <lkop klem="verdubbelen" nlid="109296"
      ksize="1" kparts="73.15"/> </lw>
    <ll ref="fn123456.2.16" w="."/>
  </lau>
</ltext>

<ltext> text with lexicon link-up
<lau> an annotation unit. The boundaries of this element are determined by the punctuation mark.
<lw> a word within the annotation unit <lau>.
<lmu> a mark-up unit that may contain COMMENT  or BACKGROUND information
<lm> a marker within the marku-up unit <lmu>.
<lkop> a link-up unit within a word <lw>.
ref the identification code is composed of one, two or three parts (depending on the element with which it is associated) which are separated by a full stop. The meaning is as follows: 
<sample number>.<annotation unit, rank number>.<word/marker/punctuation mark, rank number>
s speaker identification. In the context of the <pau> element possible values of this attribute are: "Nxxxxx", "Vxxxxx" or "UNKOWN" where x denotes a digit. In the context of the <pmu> element the s attribute may have one of two possible values: "COMMENT" or "BACKGROUND".
w word form as it occurs in the orthographic transcription (cf. data in the .ort files)
klem lemma of the word form. The underscore "_" symbolises the absence of a lemma
nlid lexicon ID of the single or multi-word lemma. The ID refers to the single word lexicon (/data/lexicon/text/cgnlex.txt on the annotation DVD) unless a multi-word expression is involved. In that case the ID refers to the multi-word lexicon (/data/lexicon/text/cgnmlex.txt on the annotation DVD). Multiple references to the lexicon are separated by a vertical bar ("|") (eg nlid="16763|16764). nlid="0" is used when there is no corresponding lemma in the lexicon. 
ksize the number of parts in the multi-word expression. In case of a single word item the value is ksize="1".
kparts references to the individual parts of the multi-word expression: 
<annotation unit, rank number>.<word rank number>

All characters used from the ISO-8859.1 character set that fall outside the 7-bit range have been translated according to the Character entity references for ISO 8859-1 characters. The set of special characters used can be found in ltext.dtd on the annotation DVD. In entities.htm an overview is presented of the various standards for this character (sub)set.