hi,

reading from XML spec:
http://www.w3.org/TR/2000/REC-xml-20001006#sec-line-ends

     (...) 2.11 End-of-Line Handling (...)
     To simplify the tasks of applications, the characters passed to an
     application by the XML processor must be as if the XML processor
     normalized all line breaks in external parsed entities (including the
     document entity) on input, before parsing, by translating both the
     two-character sequence #xD #xA and any #xD that is not followed by
     #xA to a single #xA character. (...)

i think conforming to this description the input "#x20 #xA #xD #x20"
should be normalized to "#x20 #xA #xA #x20" however it seems that Xerces 2
is normalizing it incorrectly to "#x20 #xA #x20" (or maybe it is correct?)

thanks,

alek

ps. i have used attached test2.xml for testing that contains this:

<t>-#xA-#xD-#xD#xA-#xA#xD-</t>



     $ od --format x1 test2.xml
     0000000 3c 74 3e 2d 0a 2d 0d 2d 0d 0a 2d 0a 0d 2d 3c 2f
     0000020 74 3e
     0000022
     $ od -c test2.xml
     0000000   <   t   >   -  \n   -  \r   -  \r  \n   -  \n  \r   -   <
     /
     0000020   t   >
     0000022

so the last sequence of \n\r should be normilzed to \n\n ...

however when running sax.DocumentTracer sample i get this:

     >java sax.DocumentTracer test2.xml
     setDocumentLocator(locator=org.apache.xerces.parsers.AbstractSAXParser$LocatorPr
     oxy@fd13b5)
     startDocument()
      startElement(uri="",localName="t",qname="t",attributes={})
       characters(text="-")
       characters(text="\n-")
       characters(text="\n-")
       characters(text="\n-")
       characters(text="\n-")
      endElement(uri="",localName="t",qname="t")
     endDocument()

the result is the same for xni.DocumentTracer:


     >java xni.DocumentTracer test2.xml
     startDocument(...)
      startElement(element={prefix=null,localpart="t",rawname="t",uri=null},attribute
     s={})
       characters(text="-")
       characters(text="\n-")
       characters(text="\n-")
       characters(text="\n-")
       characters(text="\n-")
      endElement(element={prefix=null,localpart="t",rawname="t",uri=null})
     endDocument()
<t>-
-
-
-

-</t>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to