Hi, Mark, and thanks for some quick response. You provided some info I
wasn't aware of. Some responses below:
On 1/8/2019 9:57 PM, Mark Thomas wrote:
On 08/01/2019 21:31, Garret Wilson wrote:
<snip/>
But as discussed above, this is completely wrong: the resource
character encoding of a request sent in
`application/x-www-form-urlencoded` should have absolutely no bearing
on how the encoded octets within that resource are decoded.
That is not the correct interpretation of section 3.12 of the Servlet
4.0 specification (note the section numbers do vary between spec
versions). Tomcat implements the correct interpretation - i.e. the
charset from the request content-type defines how encoded octets are
decoded and, if none is specified, ISO-8859-1 is used as the default.
Ah, I hadn't seen that in the servlet spec. Yes, it seems as if Tomcat
is correctly following the spec, but I would still say the servlet spec
is wrong to make any linkage at all between resource encoding and %nn
interpretation. In fact reading the prose it's not clear to me that the
servlet spec is even strongly tying the %nn interpretation to the
encoding. It just sees to say that, unless otherwise specified, the %nn
interpretation should be ISO-8859-1. And actually that's a step up from
the HTML 4.0.1 spec, which in
https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1 indicates
that they should be interpreted as US-ASCII codes. :(
You indicate that this is all out of date, and I think we're in
agreement there. We really, really need to get the next servlet
specification to remove this part. In fact the servlet specification
should defer to the official `application/x-www-form-urlencoded`
specification, which at this point I think is the W3C HTML5 spec, which
in turn defers to the WHATWG spec (which clearly says that UTF-8) should
be used. What makes all of this more of a mess is that there seems to be
no way to work around this from the client side, e.g. by putting
something in the HTML to indicate UTF-8, as
`application/x-www-form-urlencoded` doesn't support a `charset` parameter.
Anyway if there are any openings on the committee to update the servlet
spec, let me know.
...
As of Servlet 4.0 there is a specification compliant configuration
option to change this default to any encoding of your choice.
Obviously, UTF-8 is one of the options. You can do this by adding the
following to your web.xml:
<request-character-encoding>UTF-8</request-character-encoding>
Oh, that is really good to know, thanks!! Still I say that the request
character encoding is orthogonal to the %nn encoding, but, still, it's
good to have an implementation-agnostic way to do it.
Whether Tomcat should ship with this setting present in conf/web.xml
by default is something that should probably be discussed for Tomcat
10. Given the current state of the web, there is a reasonable case for
doing so. I'll add that to the TOMCAT-NEXT discussion list.
Yes please! If I can help in any way, let me know.
The Tomcat Wiki also needs to be updated to take account of this new
configuration option (and the related <response-character-encoding>).
Since it is a wiki and this is clearly an issue you care about would
you like to tackle that?
Yes, I'd love to. Let me know what permissions I need, etc.
I have an international flight boarding right now so I have to go, and I
may not reply for the next few hours, but definitely sign me up.
Thanks,
Garret
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org