DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20456>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20456

output-method html;href-attribute: url encoding after utf8-encoding

[EMAIL PROTECTED] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID



------- Additional Comments From [EMAIL PROTECTED]  2003-06-13 21:16 -------
Thomas,
The fact that your characters are ISO Latin-1 does not matter here. The capital 
A with an umlaut has a numeric value of 196 or in hex 0xC4. But this does not 
mean that the character should be encoded as %C4.

The XSLT recommendation (http://www.w3.org/TR/xslt) Says this in section 16.2:
---------------------------------
The html output method should escape non-ASCII characters in URI attribute 
values using the method recommended in Section B.2.1 of the HTML 4.0 
Recommendation.
---------------------------------

In the HTML 4.01 recommendation (http://www.w3.org/TR/html40) says this in 
section B.2.1
-----------------------------------
B.2.1 Non-ASCII characters in URI attribute values
Although URIs do not contain non-ASCII values (see [URI], section 2.1) authors 
sometimes specify them in attribute values expecting URIs (i.e., defined with %
URI; in the DTD). For instance, the following href value is illegal: 

<A href="http://foo.org/H�kon";>...</A>

We recommend that user agents adopt the following convention for handling non-
ASCII characters in such cases: 

Represent each character in UTF-8 (see [RFC2044]) as one or more bytes. 
Escape these bytes with the URI escaping mechanism (i.e., by converting each 
byte to %HH, where HH is the hexadecimal notation of the byte value). 
--------------------------------------------

So there you have it.  We are UTF-8 encoding first and %HH escaping after that, 
as the recommendation says.  This UTF-8 encoding is not related to the any 
encoding attribute.

Regards,
Brian Minchau

Reply via email to