-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
André,
FYI After logging, this seems to be one of the most-discussed topics on
the list.
On 3/16/2009 9:54 AM, André Warnier wrote:
> I am about 99% sure of the following, but I would like to be 100% sure.
To sum up:
1. Using <meta> to set the Content-Type of the page to
charset ISO-8859-2
2. Submitting a POST form with higher ASCII characters (those that
will only work properly when respecting ISO-8859-2)
and enctype="multipart/form-data"
3. Trying to use HttpServletRequest.getParameter()
> then, if this form is submitted, within my servlet the line
>
> String p1 = request.getParameter("param1");
>
> would always return into p1, the proper internal Java Unicode string
> value of the input element "param1" of the form, properly decoded from
> it's original iso-8859-2 encoding.
> Yes ?
No. The servlet spec (SRV 3.1.1) states that POST data will only be read
from the request when the following conditions are true (note #3):
"
1. The request is an HTTP or HTTPS request.
2. The HTTP method is POST.
3. The content type is application/x-www-form-urlencoded.
4. The servlet has made an initial call of any of the getParameter
family of methods on the request object.
"
Since you are using multipart/form-data, Tomcat isn't supposed to read
the POST parameters. You will have to do this yourself. If your client
is not sending a Content-Type including a character encoding, then you
have a client who isn't playing nicely. :( Most people give up and just
set everything to UTF-8 and be done with it.
Mikolaj's experience suggests that his client doesn't send the right
Content-Type (charset, really) and so Tomcat defaults to ISO-8859-1.
Most people use a filter that checks to see what the character encoding
is and, if there is none, sets the default to whatever pages advertise
themselves as (often UTF-8, in your case ISO-8859-2). This fixes 90% of
the POST encoding problems.
GET is another issue. :(
You asked how the server asks the client to encode a request. There's
really no provision for that in the HTTP spec. Anecdotal evidence
suggests that request (N + 1) is sent using the encoding of response N,
meaning that the client tends to use the encoding of the server's last
response.
Your statement about GET requests being (not) covered under a
shortcoming of the HTTP and URL specs is spot on: you basically can't
count on correct non-ISO-8859-1 characters in a URL. The solution? Use POST.
Quick question: multipart/form-data is typically used for file upload...
why not use application/x-www-form-urlencoded instead? I realize the
problem is that certain browsers do not send the proper charset in the
Content-Type, but I'd like to understand your affinity for
multipart/form-data.
- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkm+w9kACgkQ9CaO5/Lv0PCXDgCdHi/cBwJgafNE5yR636FaXyHi
w24An0AMx7XXG8PRpjszGFmWM6KNWlnc
=Mtww
-----END PGP SIGNATURE-----
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]