Hi,
I am in the process of converting my web site to be UTF-8 compliant,
so that it can better handle scripts used by various languages. To be
clear I am using a Java application server and by default it only
accepts ISO-8859-1 for POST requests. In order to be able to have it
treat the request parameters as UTF-8 I have to call the
request.setCharacterEncoding("UTF-8")
each time I receive the parameters. I would have preferred an option
to tell my application server to default to treating all post content
as UTF-8, but this was rejected based on RFC 2616, section 3.7.1 and
3.4.1.
I decided to try to specify the charset as part of the form's enctype
attribute:
<form action="" method="post" enctype="application/x-www-form-
urlencoded; charset=utf-8">
though having tested with Safari, Firefox and Opera, I found that only
Opera included the "charset=utf-8" component in the content-type of
the request. Additionally if I specify
<form action="" method="post" enctype="application/x-www-form-
urlencoded; charset=utf-8" accept-charset="utf-8">
I get other strange results with Firefox and Safari. With Opera I just
see questions marks when I pass my Japanese character test case.
Now to the questions:
- should web browsers be acknowledging the charset attribute
specified in the form, and sending them to the HTTP server?
- is considered wrong to force my application to treat all requests
as UTF-8?
André