SV: Numeric entity problem

Erik Ytterman 24 Sep 2003 14:14:17 -0000

Hello Simon!

Thank you for your quick response.


The reason for sending this message to two groups is the following:

I still think this is a bug/feature in the xalan package, and if so it
should be patched. Of course i could have written two mails with exactly
the same content.

Look at this example:

I have this element:

<element>I am tired</element>

I extract the text part

"I am tired"

I translate the text part into something like this:

"Ich bin m�de"

I exchange the german y, which is an ISO-8859-1 charachter with a
character code larger than 127, into a proper XML numeric character
entity (&#253;) giving us the following string

"Ich bin m&#253;de"

I exchange the text part of the element, giving (If the API works
properly)

<element>Ich bin m&#253;de</element>

This is then serialized into

<element>Ich bin m&amp;#253;de</element>

Which from my point of view is incorrect behaviour, since the text
content of the previous element was totally correct from an XML point of
view?!

Any godd ideas?!

BR 
/Erik

-----Ursprungligt meddelande-----
Fr�n: Simon Kitching [mailto:[EMAIL PROTECTED] 
Skickat: den 24 september 2003 11:38
Till: Erik Ytterman
Kopia: [EMAIL PROTECTED]; 'Beatrice Nilsson'
�mne: Re: Numeric entity problem

Hi Eric,

First of all, a minor note on etiquette: it is generally frowned upon to
post to both user and dev email lists. The user list is certainly the
best place for this sort of question. 

I believe that Xerces is behaving exactly as expected; you told it that
the contents of a text node is a string containing the characters:
  '&', '2', '3', etc

This is *text* to xerces, and because text cannot contain an ampersand,
it is escaped when writing the data out.

I suggest you try this:
  char[] c = {253}; // array of 1 char which is unicode char #253
  String str = new String(c);

Now put this string (containing the unicode character #253) into the
node.

I suspect there is actually a way to specify unicode chars directly in
string literals, maybe something like:
  String s = "\xFD";
I'm not sure about that, though.

Regards,

Simon

On Wed, 2003-09-24 at 21:11, Erik Ytterman wrote:
> Dear All!
>  
> I'm struggling with a problem that needs to be solve as soon as 
> possible. Hope that you will be able to help me. I will attach parts 
> of the code.
>  
> I'm doing the following:
>  
> 1. Recive a callback with a proper XML document.
> (DocumentHandler.handleDocument())
>  
> 2. Use XPath to find the element to process
> (DocumentHandler.translateDocument())
>  
> 3. Find the text content of this element.
> (DocumentHandler.translateDocument())
>  
> 4. Translate the textual content of the element.
> (OpenB2BUtil.translateString())
>  
> 5. An ugly hack to transform any characters except ASCII into numeric 
> entities. (OpenB2BUtil.etitifyIsoString())
>  
> 6. Replace the textual content of the element, including numeric 
> entities (DocumentHandler.translateDocument())
>  
> 7. Serialize the resulting DOM tree using transformers
> (OpenB2BUtil.documentToStream())
>  
> Problem:
>  
> As can be seen from the code, I replace the textual content of an 
> element, with a string that contains numeric entities (&#253;). My 
> problem is that the serialization seem to translate this into 
> (&amp;#253;).
>  
> Questions:
>  
> 1. Is this a bug in xalan, from my point of view, it should leave the 
> numeric entity in the text payload untouched, since it is proper XML.
>  
> 2. If not, is there a way to disable this "feature" in Xalan, so that 
> these, perfectly legal numeric entities are let through in the 
> serialization
>  
> 3. If not, any sugestions on how to solve the problem?
>  
> /Erik
>  
>  
>  
>  
>

SV: Numeric entity problem

Reply via email to