-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

André,

FYI After logging, this seems to be one of the most-discussed topics on
the list.

On 3/16/2009 9:54 AM, André Warnier wrote:
> I am about 99% sure of the following, but I would like to be 100% sure.

To sum up:

1. Using <meta> to set the Content-Type of the page to
   charset ISO-8859-2
2. Submitting a POST form with higher ASCII characters (those that
   will only work properly when respecting ISO-8859-2)
   and enctype="multipart/form-data"
3. Trying to use HttpServletRequest.getParameter()

> then, if this form is submitted, within my servlet the line
> 
> String p1 = request.getParameter("param1");
> 
> would always return into p1, the proper internal Java Unicode string
> value of the input element "param1" of the form, properly decoded from
> it's original iso-8859-2 encoding.
> Yes ?

No. The servlet spec (SRV 3.1.1) states that POST data will only be read
from the request when the following conditions are true (note #3):

"
1. The request is an HTTP or HTTPS request.
2. The HTTP method is POST.
3. The content type is application/x-www-form-urlencoded.
4. The servlet has made an initial call of any of the getParameter
   family of methods on the request object.
"

Since you are using multipart/form-data, Tomcat isn't supposed to read
the POST parameters. You will have to do this yourself. If your client
is not sending a Content-Type including a character encoding, then you
have a client who isn't playing nicely. :( Most people give up and just
set everything to UTF-8 and be done with it.

Mikolaj's experience suggests that his client doesn't send the right
Content-Type (charset, really) and so Tomcat defaults to ISO-8859-1.
Most people use a filter that checks to see what the character encoding
is and, if there is none, sets the default to whatever pages advertise
themselves as (often UTF-8, in your case ISO-8859-2). This fixes 90% of
the POST encoding problems.

GET is another issue. :(

You asked how the server asks the client to encode a request. There's
really no provision for that in the HTTP spec. Anecdotal evidence
suggests that request (N + 1) is sent using the encoding of response N,
meaning that the client tends to use the encoding of the server's last
response.

Your statement about GET requests being (not) covered under a
shortcoming of the HTTP and URL specs is spot on: you basically can't
count on correct non-ISO-8859-1 characters in a URL. The solution? Use POST.

Quick question: multipart/form-data is typically used for file upload...
why not use application/x-www-form-urlencoded instead? I realize the
problem is that certain browsers do not send the proper charset in the
Content-Type, but I'd like to understand your affinity for
multipart/form-data.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkm+w9kACgkQ9CaO5/Lv0PCXDgCdHi/cBwJgafNE5yR636FaXyHi
w24An0AMx7XXG8PRpjszGFmWM6KNWlnc
=Mtww
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to