I am getting a UTFDataFormatException when using the following xml doc
(attached).
It appears to be complaining about the "bullet" character.  Note the xml doc
contains hidden character (LATIN A with circumflex) right before the bullet.

If I add the encoding="UTF-8" there is no UTFDataFormatException.
However, without specifying any encoding I get the following error.  When I
trace through the code it looks like the default encoding for xerces 2.3 is
to use UTF-8.  The UTFDataFormatException is thrown in XMLUTF8Transcoder.cpp
ln 222.
        if((gUTFByteIndicatorTest[trailingBytes] & *srcPtr) !=
gUTFByteIndicator[trailingBytes]) { throw error here}

I checked the values and 
gUTFByteIndicatorTest[trailingBytes] = 0 
*srcPtr = 183
gUTFByteIndicator[trailingBytes] = 0

So we should not go into this loop.  However the computation of the line:
gUTFByteIndicatorTest[trailingBytes] & *srcPtr = 128  //This should be 0.

Another observation I made was that if I were to use the xml doc without
specifying an encoding AND move the bullet character and hidden character
value to another element of the xml, this exception does not occur. Not sure
what's going on.

Fatal Error at file C:\temp\SAXSchemaParser\Debug/personal.xml, line 1, char
22
  Message: An exception occurred! Type:UTFDataFormatException,
Message:invalid byte 1 (╖) of a 1-byte sequence.

I am running xerces 2.3 compiled with MSVS 7.0.
Any ideas?

Attachment: personal.xml
Description: Binary data

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to