Java Strings are always encoded as UTF-16, the encoding (ISO-8859-1, UTF-8,
etc) is used to decode byte[] data into Java's 2-byte, UTF-16 characters.
1) You are better off changing the xml processing instruction to use the
correct encoding. You should be able to leave it blank, assuming that you are
processing a java string. I suppose if for some reason you don't have control
of the xml you could dump the string into a byte array using
string.getBytes("ISO-8859-1")
(http://java.sun.com/j2se/1.3/docs/api/java/lang/String.html#getBytes(java.lang.String)
and build your StreamSource from a ByteArrayInputStream... there could be a
better way, but that's what pops off the top of my head...
2) Using output method html I'm not sure how you would disable the generation
of the character references. Using output method xml you will get the actual
characters, assuming they exist in the output encoding, otherwise you will get
the character reference.
Hope this helps,
Josh
-----Original Message-----
From: George Pieri [mailto:[EMAIL PROTECTED]
Sent: Monday, January 12, 2004 6:32 AM
To: Josh Canfield
Subject: Foreign Characters
Thanks for your help!!!
I've found if I PHYSICALLY save the file as ISO-8589-1
it does the conversion correct .....ñ ê ó ñ &
changing the encoding in the processing instruction then has an effect.
1) Since our data is actually read from the database and not from a file is
there
anyway to do the conversion on a java "STRING" and make sure it is
ISO-8589-1
format ?
2) When ISO-8589-1 escaping/conversion occurs it it possible that the
entities use the
ACTUAL CHARACTER refence numberS and not the abbrev.....&#C3 instead of
ñ ?
IE cannot can only resolve the character references within xml data
island so
I'm trying to see if thats possible.
Thanks again for your assistance !!
George
Subject: Foreign characters
Date: Thu, 8 Jan 2004 13:05:10 -0800
From: Josh Canfield <[EMAIL PROTECTED]>
The first clue is that there are two characters references being created for
each extended character. That makes me believe that your input document is
being decoded using a single byte character set, when it is in fact encoded
with a character set that includes multiple byte characters.
I reproduced what you are seeing by saving an xml file as utf-8, but putting
IS0-8895-1 in the xml processing instruction.
You haven't shown us the xml that you are using, so this is just an educated
guess, but try either saving the file as ISO-8859-1, or changing the
encoding in the xml processing instruction in your xml file (not the xsl
file) to utf-8.
Josh
-----Original Message-----
From: michael UIN [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:26 PM
To: [EMAIL PROTECTED]
Subject: Foreign characters
When I use stylesheet that I process with XALAN 2.3.1 that has
foreign characters � � � �
they are escaped as ñ ê ó
ñ
which is not correct !?!?!!....
What am I doing wrong ?
I've tried setting the outputProperty in the Java code.....
transformer.setOutputProperty(OutputKeys.ENCODING,"ISO-8589-1");
transformer.transform(xmlSource, new
javax.xml.transform.stream.StreamResult(myFileW));
as well in the XSL code.....
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xsl:output method="html" version="1.0" indent="yes" encoding="ISO-8859-1"
media-type="text/html"/>
I've also tried setting the encoding in java to UTF-16 with no luck.
Why is it escaping them incorrectly ???
Any ideas ?
Thanks in advance!
==============
XSL Snippet
=============
<b>Data is escaped</b>.....<xsl:value-of
select='/DOCUMENT/DATA/INTL/ROW/name' />
<br/><br/><b>Data is NOT escaped</b>.....
<xsl:value-of select='/DOCUMENT/DATA/INTL/ROW/name'
disable-output-escaping="yes"/>
=============
HTML OUTPUT
=============
<b>Data is escaped</b>.....ñ ê ó
ñ &<br>
<b>Data is NOT escaped</b>.....
� � � � &</body>
_________________________________________________________________
Take advantage of our limited-time introductory offer for dial-up Internet
access. http://join.msn.com/?page=dept/dialup