Aleksander Slominski wrote:
>      document entity) on input, before parsing, by translating both the
>      two-character sequence #xD #xA and any #xD that is not followed by
>      #xA to a single #xA character. (...)

According to the wording of the spec and the behavior of Xerces
1.x, this seems to be a bug. It seems strange to me, though, that 
DOS newline sequences are normalized to a single newline character, 
whereas Mac newline sequences are not. (I haven't used a Mac in a 
long time so could someone confirm for me that Mac newlines are 
#x0A #x0D? or are they just #x0D?)

Anyway, I've fixed the problem and committed the changes to CVS.
Now, the output from Xerces2 using your sample file is the 
following:

 
setDocumentLocator(locator=org.apache.xerces.parsers.AbstractSAXParser$LocatorProxy@b66cc)
  startDocument()
   startElement(uri="",localName="t",qname="t",attributes={})
    characters(text="-")
    characters(text="\n-")
    characters(text="\n-")
    characters(text="\n-")
    characters(text="\n\n-")
   endElement(uri="",localName="t",qname="t")
  endDocument()

-- 
Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to