[ http://issues.apache.org/struts/browse/STR-1941?page=all ] David Evans reopened STR-1941: ------------------------------
Assign To: David Evans (was: Struts Developer Mailing List) > double UTF-8 encoding of HTTP request parameters > ------------------------------------------------ > > Key: STR-1941 > URL: http://issues.apache.org/struts/browse/STR-1941 > Project: Struts Action 1 > Type: Bug > Components: Action > Versions: Nightly Build > Environment: Operating System: other > Platform: Other > Reporter: Akos Maroy > Assignee: David Evans > > I'm having a problem with properly processing UTF-8 encoded request parameters > through struts. The effect is, that international characters (that are not > ASCII, thus are multi-byte UTF-8 characters) are encoded twice into UTF-8. > As an example, let's see the examples webapp included in the jakarta-struts > source tree. It has the registration sample, reachable through > http://localhost:8080/struts-examples/validator/registration.do > if installed on localhost:8080. let's suppose I which to type: > small letter a with acute: á > unicode value hex: 00e1 > unicode value binary: 11100001 > UTF-8 binary: 11000011 10100001 > UTF-8 in hex: c3a1 > into the firstName field into the form. this can be simulated by: > http://localhost:8080/struts-examples/validator/registration-submit.do?firstName=%C3%A1 > (if typed manually and submitted via POST, has the same effect) > the resuling page shows a lot of form problems, as I didn't fill out most of > the > fields, which is OK. but more importantly, it also shows the entered letter in > the firstName input field. what is vierd, is that a different letter is shown > (actually two letters). running xxd on the received page, here's the relevant > part: > 00003a0: 6e67 7468 3d22 3330 2220 7369 7a65 3d22 ngth="30" size=" > 00003b0: 3330 2220 7661 6c75 653d 22c3 83c2 a122 30" value="...." > 00003c0: 3e0a 2020 2020 3c2f 7464 3e0a 2020 3c2f >. </td>. </ > with the important part at value="....", which is: > 00003b0: 3330 2220 7661 6c75 653d 22c3 83c2 a122 30" value="...." > ^^^^^^^^^^ > the letters presented are: > UTF-8 hex sequence: c383c2a1 > UTF-8 binary: 11000011 10000011 11000010 10100001 > which is actually two UTF-8 letters by now. what is funny, that if I 'decode' > them from UTF-8, I get the original UTF-8 sequence: > first part, as received: 11000011 10000011 > de-coded: 11000011 > second part, as received: 11000010 10100001 > de-coded: 10100001 > and voila, the the parts make up the original UTF-8 sequence: > 11000011 10100001 > which actually is the UTF-8 sequence for the letter sent. > if I resend this page (the by now to UTF-8 letters), I get four letters, then > 8, > etc. it seems, that the engine doesn't recognize, that there are UTF-8 > sequences > to begin with, and encodes them 'again'. > I'm using mozilla as a browser, Tomcat 5.0.16. the encoding of the pages is > UTF-8. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/struts/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]