Never mind. I just looked at the output with a different editor; it looks like the characters are getting put in as entity references. Not sure now what is going on, I'll have to investigate further. Perhaps my end user can't handle US-ASCII, either :( |---------+----------------------------> | | Christopher | | | Painter-Wakefield| | | <[EMAIL PROTECTED]| | | .edu> | | | | | | 12/15/2003 10:28 | | | AM | | | Please respond to| | | users | | | | |---------+----------------------------> >--------------------------------------------------------------------------------------------------------------| | | | To: [EMAIL PROTECTED] | | cc: | | Subject: XML serializer; handling characters outside the encoding | >--------------------------------------------------------------------------------------------------------------| I have a data consumer who is pulling XML from our Cocoon webapp. They couldn't handle UTF-8 on their end, so I gave them the option to pull data in US-ASCII encoding. However, when I did that, symbol characters such as Greek and math symbols got sent over even though they aren't in the encoding. When I saved a result from our system and opened it with XML Spy, it complained about these characters. On my consumer's end, it makes his software blow up. I'm not sure exactly how these characters are output (I don't have a good byte-level editor), but I assume it is doing some kind of double-character thing that creates bytes outside the range of defined characters for the encoding, or something similar. My question is, what should the behavior be when coping with characters outside the encoding, and where does the responsibility lie? My assumption would be that the XML serializer should take characters outside the encoding and turn them into entity references (Δ for greek delta, for instance). I am on C2.0.3, so maybe that has been done in a later release, but if not, should it? I am going to explore a change to the serializer for just that purpose, but if it has already been done, I'd like to grab the code for it. I'm assuming you can use character entities in any encoding, regardless of whether the characters thus specified have a code in that encoding. Thanks, Christopher --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: XML serializer; handling characters outside the encoding
Christopher Painter-Wakefield Mon, 15 Dec 2003 08:38:28 -0800
- XML serializer; handling characters outside ... Christopher Painter-Wakefield
- Christopher Painter-Wakefield