Not sure if this is accurate, but I thought some Asian languages could not be represent in UTF-8. Can someone confirm this? Is there a way to escape the problem character(s)?
Regards,
Thom Bentley
Iris Associates, 5 Technology Park Drive, Westford, MA 01886, 617-693-9210,
| "KELLEHER,KEVIN (Non-HP-Roseville,ex1)" <[EMAIL PROTECTED]>
06/21/2001 05:58 PM
|
To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]> cc: Subject: use of utf-8 with SAX |
I am having some trouble with Asian-language data in the SAX parser.
Specifically, some data that is originally in Taiwanese (roc15)
is converted to utf-8 and embedded in an XML message.
All the tags and attributes, etc. are in English, all the data is
in Taiwanese.
The problem occurs when I use the SAX parser to validate the message:
it hits a piece of data that it interprets as end-of-data, and complains
that it can't find the end tag that should follow the data.
I get this error in versions 1.3 and 1.5, in my own code and when I
run my data through the sample programs (i.e., SAXPrint, SAX2Print,
SAXCount, etc.).
Several people familiar with the language have confirmed the fitness of the
data.
My code is modeled after the SAXPrint example - is there anything missing
there for processing Asian language data written in utf-8?
Kevin Kelleher
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
