DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20456>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20456 output-method html;href-attribute: url encoding after utf8-encoding [EMAIL PROTECTED] changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Additional Comments From [EMAIL PROTECTED] 2003-06-13 21:16 ------- Thomas, The fact that your characters are ISO Latin-1 does not matter here. The capital A with an umlaut has a numeric value of 196 or in hex 0xC4. But this does not mean that the character should be encoded as %C4. The XSLT recommendation (http://www.w3.org/TR/xslt) Says this in section 16.2: --------------------------------- The html output method should escape non-ASCII characters in URI attribute values using the method recommended in Section B.2.1 of the HTML 4.0 Recommendation. --------------------------------- In the HTML 4.01 recommendation (http://www.w3.org/TR/html40) says this in section B.2.1 ----------------------------------- B.2.1 Non-ASCII characters in URI attribute values Although URIs do not contain non-ASCII values (see [URI], section 2.1) authors sometimes specify them in attribute values expecting URIs (i.e., defined with % URI; in the DTD). For instance, the following href value is illegal: <A href="http://foo.org/H�kon">...</A> We recommend that user agents adopt the following convention for handling non- ASCII characters in such cases: Represent each character in UTF-8 (see [RFC2044]) as one or more bytes. Escape these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value). -------------------------------------------- So there you have it. We are UTF-8 encoding first and %HH escaping after that, as the recommendation says. This UTF-8 encoding is not related to the any encoding attribute. Regards, Brian Minchau
