RE: Foreign Characters

Josh Canfield 12 Jan 2004 20:04:58 -0000

Java Strings are always encoded as UTF-16, the encoding (ISO-8859-1, UTF-8, 
etc) is used to decode byte[] data into Java's 2-byte, UTF-16 characters.


1) You are better off changing the xml processing instruction to use the 
correct encoding. You should be able to leave it blank, assuming that you are 
processing a java string. I suppose if for some reason you don't have control 
of the xml you could dump the string into a byte array using 
string.getBytes("ISO-8859-1") 
(http://java.sun.com/j2se/1.3/docs/api/java/lang/String.html#getBytes(java.lang.String)
 and build your StreamSource from a ByteArrayInputStream... there could be a 
better way, but that's what pops off the top of my head...

2) Using output method html I'm not sure how you would disable the generation 
of the character references. Using output method xml you will get the actual 
characters, assuming they exist in the output encoding, otherwise you will get 
the character reference.

Hope this helps,
Josh


-----Original Message-----
From: George Pieri [mailto:[EMAIL PROTECTED]
Sent: Monday, January 12, 2004 6:32 AM
To: Josh Canfield
Subject: Foreign Characters



Thanks for your help!!!

I've found if I PHYSICALLY save the file as ISO-8589-1
it does the conversion correct .....&ntilde; &ecirc; &oacute; &ntilde;  &amp
changing the encoding in the processing instruction then has an effect.


1) Since our data is actually read from the database and not from a file is
there
   anyway to do the conversion on a java  "STRING" and make sure it is
ISO-8589-1
   format ?

2) When ISO-8589-1 escaping/conversion occurs it it possible that the
entities use the
   ACTUAL CHARACTER refence numberS and not the abbrev.....&#C3 instead of
&ntilde ?

   IE cannot can only resolve the character references within xml data
island so
   I'm trying to see if thats possible.


Thanks again for your assistance !!

George


Subject: Foreign characters
Date: Thu, 8 Jan 2004 13:05:10 -0800
From: Josh Canfield <[EMAIL PROTECTED]>


The first clue is that there are two characters references being created for
each extended character. That makes me believe that your input document is
being decoded using a single byte character set, when it is in fact encoded
with a character set that includes multiple byte characters.

I reproduced what you are seeing by saving an xml file as utf-8, but putting
IS0-8895-1 in the xml processing instruction.

You haven't shown us the xml that you are using, so this is just an educated
guess, but try either saving the file as ISO-8859-1, or changing the
encoding in the xml processing instruction in your xml file (not the xsl
file) to utf-8.

Josh

-----Original Message-----
From: michael UIN [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 08, 2004 12:26 PM
To: [EMAIL PROTECTED]
Subject: Foreign characters



When I use stylesheet that I process with XALAN 2.3.1 that has

foreign characters     � � � �

they are escaped as    ñ ê ó
ñ

which is not correct !?!?!!....

What am I doing wrong ?


I've tried setting the outputProperty in the Java code.....

           transformer.setOutputProperty(OutputKeys.ENCODING,"ISO-8589-1");
           transformer.transform(xmlSource, new
javax.xml.transform.stream.StreamResult(myFileW));

as well in the XSL code.....

<?xml version="1.0" encoding="ISO-8859-1" ?>
<xsl:output method="html" version="1.0" indent="yes"  encoding="ISO-8859-1"
media-type="text/html"/>


I've also tried setting the encoding in java to UTF-16 with no luck.

Why is it escaping them incorrectly ???
Any ideas ?

Thanks in advance!


==============
XSL Snippet
=============

<b>Data is escaped</b>.....<xsl:value-of
select='/DOCUMENT/DATA/INTL/ROW/name' />

<br/><br/><b>Data is NOT escaped</b>.....
       <xsl:value-of select='/DOCUMENT/DATA/INTL/ROW/name'
disable-output-escaping="yes"/>


=============
HTML OUTPUT
=============

<b>Data is escaped</b>.....ñ ê ó
ñ  &<br>

<b>Data is NOT escaped</b>.....
              � � � �  &</body>

_________________________________________________________________
Take advantage of our limited-time introductory offer for dial-up Internet
access. http://join.msn.com/?page=dept/dialup

RE: Foreign Characters

Reply via email to