DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26403>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26403 double UTF-8 encoding of HTTP request parameters Summary: double UTF-8 encoding of HTTP request parameters Product: Struts Version: Nightly Build Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: Other Component: Digester AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] I'm having a problem with properly processing UTF-8 encoded request parameters through struts. The effect is, that international characters (that are not ASCII, thus are multi-byte UTF-8 characters) are encoded twice into UTF-8. As an example, let's see the examples webapp included in the jakarta-struts source tree. It has the registration sample, reachable through http://localhost:8080/struts-examples/validator/registration.do if installed on localhost:8080. let's suppose I which to type: small letter a with acute: รก unicode value hex: 00e1 unicode value binary: 11100001 UTF-8 binary: 11000011 10100001 UTF-8 in hex: c3a1 into the firstName field into the form. this can be simulated by: http://localhost:8080/struts-examples/validator/registration-submit.do?firstName=%C3%A1 (if typed manually and submitted via POST, has the same effect) the resuling page shows a lot of form problems, as I didn't fill out most of the fields, which is OK. but more importantly, it also shows the entered letter in the firstName input field. what is vierd, is that a different letter is shown (actually two letters). running xxd on the received page, here's the relevant part: 00003a0: 6e67 7468 3d22 3330 2220 7369 7a65 3d22 ngth="30" size=" 00003b0: 3330 2220 7661 6c75 653d 22c3 83c2 a122 30" value="...." 00003c0: 3e0a 2020 2020 3c2f 7464 3e0a 2020 3c2f >. </td>. </ with the important part at value="....", which is: 00003b0: 3330 2220 7661 6c75 653d 22c3 83c2 a122 30" value="...." ^^^^^^^^^^ the letters presented are: UTF-8 hex sequence: c383c2a1 UTF-8 binary: 11000011 10000011 11000010 10100001 which is actually two UTF-8 letters by now. what is funny, that if I 'decode' them from UTF-8, I get the original UTF-8 sequence: first part, as received: 11000011 10000011 de-coded: 11000011 second part, as received: 11000010 10100001 de-coded: 10100001 and voila, the the parts make up the original UTF-8 sequence: 11000011 10100001 which actually is the UTF-8 sequence for the letter sent. if I resend this page (the by now to UTF-8 letters), I get four letters, then 8, etc. it seems, that the engine doesn't recognize, that there are UTF-8 sequences to begin with, and encodes them 'again'. I'm using mozilla as a browser, Tomcat 5.0.16. the encoding of the pages is UTF-8. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]