I couldn't find the time to look at this in detail, but
here's a suggestion that may help:
TextPad (like Notepad) I think looks at the first bytes in
the file and if it sees something like FF FE decides that the encoding is
unicode. But your file being XML, it relies on the encoding="UTF-8" part to set
the encoding to UTF-8 and doesn't use the bytes, which TextPad doesn't pick up.
So in other words, I think you're fine. Try putting some non-ASCII chars in your
file, open it in TextPad and then set the encoding manually to UTF-8 and check
if the characters are the same.
The main idea in this story is that there is no "standard"
mechanism to decide if a set of bytes are text in UTF-8 encoding or in ASCII
encoding or a JPEG image (that's why XML needed an "encoding" attribute by the
way). So as long as you have rules and mechanisms to ensure that the same
encoding is used throughout your system, you are ok .
Radu
From: Michael White [mailto:[EMAIL PROTECTED]
Sent: Friday, August 18, 2006 2:52 PM
To: [email protected]
Subject: Cannot encode my XML document output into UTF-8
For example, if I do the following:
<<
ByteArrayOutputStream bos = new ByteArrayOutputStream();
FileOutputStream fos = new FileOutputStream("C:/test.xml");
PrintStream xmlStream = new PrintStream(fos, false, "UTF-8");
XmlOptions printOptions = new XmlOptions();
printOptions.setSavePrettyPrint();
printOptions.setSavePrettyPrintIndent (2);
printOptions.setUseDefaultNamespace();
printOptions.setCharacterEncoding("UTF-8");
paymentDoc.save(bos,printOptions);
xmlStream.print(bos); //xmlStream.print(bos.toString("UTF-8"));
xmlStream.close();
>>
I receive a properly formatted file, with all of the data I require. However, per textpad, the encoding is set to ANSI. I've tried numerous combinations of writers and encoding and can't seem to get the output into UTF-8! I'll be dealing with Japanese and Korean characters so it is a necessity.
The crazy part is that if I perform the following:
<<
ByteArrayOutputStream bos = new ByteArrayOutputStream();
FileOutputStream fos = new FileOutputStream("C:/test.xml");
PrintStream xmlStream = new PrintStream(fos, false, "UTF-8");
bos.write("A?u$(He933u3'u(BaÌ3̇".getBytes("UTF-8"));
xmlStream.print(bos);
xmlStream.close();
>>
The resulting file is listed as properly encoded in UTF-8 format!?
I'm at my wits end. I'm using the latest XmlBeans release as of today and JDK 1.4.2_12. I set the documentProperties encoding to UTF-8 as well and it just doesn't want to play nice.
Help!
Thanks, Mike
_______________________________________________________________________ Notice: This email message, together with any attachments, may contain information of BEA Systems, Inc., its subsidiaries and affiliated entities, that may be confidential, proprietary, copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

