-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 André,
FYI After logging, this seems to be one of the most-discussed topics on the list. On 3/16/2009 9:54 AM, André Warnier wrote: > I am about 99% sure of the following, but I would like to be 100% sure. To sum up: 1. Using <meta> to set the Content-Type of the page to charset ISO-8859-2 2. Submitting a POST form with higher ASCII characters (those that will only work properly when respecting ISO-8859-2) and enctype="multipart/form-data" 3. Trying to use HttpServletRequest.getParameter() > then, if this form is submitted, within my servlet the line > > String p1 = request.getParameter("param1"); > > would always return into p1, the proper internal Java Unicode string > value of the input element "param1" of the form, properly decoded from > it's original iso-8859-2 encoding. > Yes ? No. The servlet spec (SRV 3.1.1) states that POST data will only be read from the request when the following conditions are true (note #3): " 1. The request is an HTTP or HTTPS request. 2. The HTTP method is POST. 3. The content type is application/x-www-form-urlencoded. 4. The servlet has made an initial call of any of the getParameter family of methods on the request object. " Since you are using multipart/form-data, Tomcat isn't supposed to read the POST parameters. You will have to do this yourself. If your client is not sending a Content-Type including a character encoding, then you have a client who isn't playing nicely. :( Most people give up and just set everything to UTF-8 and be done with it. Mikolaj's experience suggests that his client doesn't send the right Content-Type (charset, really) and so Tomcat defaults to ISO-8859-1. Most people use a filter that checks to see what the character encoding is and, if there is none, sets the default to whatever pages advertise themselves as (often UTF-8, in your case ISO-8859-2). This fixes 90% of the POST encoding problems. GET is another issue. :( You asked how the server asks the client to encode a request. There's really no provision for that in the HTTP spec. Anecdotal evidence suggests that request (N + 1) is sent using the encoding of response N, meaning that the client tends to use the encoding of the server's last response. Your statement about GET requests being (not) covered under a shortcoming of the HTTP and URL specs is spot on: you basically can't count on correct non-ISO-8859-1 characters in a URL. The solution? Use POST. Quick question: multipart/form-data is typically used for file upload... why not use application/x-www-form-urlencoded instead? I realize the problem is that certain browsers do not send the proper charset in the Content-Type, but I'd like to understand your affinity for multipart/form-data. - -chris -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkm+w9kACgkQ9CaO5/Lv0PCXDgCdHi/cBwJgafNE5yR636FaXyHi w24An0AMx7XXG8PRpjszGFmWM6KNWlnc =Mtww -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org