On 2/6/2020 12:43 PM, Christopher Schultz wrote:
…
* Therefore `web.xml` settings, HTTP headers, etc. are all
irrelevant, as this is an issue dealing with the file format
itself, and the latest spec for the file format says to use UTF-8,
so everyone should use UTF-8 already.
Except for everyone who already uses something else and expects
everything to be backward-compatible.
I think there comes a time where we have to more forward after some
critical level of usage is reached. I think we've passed that point.
Modern browsers in the sense that you mention are not
backwards-compatible for `application/x-www-form-urlencoded`. So what
are we being compatible with by not using UTF-8 decoding? Do we have
anything besides browsers consuming output from legacy JSP apps? As
noted the browsers break when we try to be "backwards-compatible" in the
sense you mention.
The problem is that you don't get to declare what's "best" for
everyone and then the whole world does what you want.
But here I would imagine that already agrees what's best; the debate is
whether we should do different than what we know is best because of some
outdated specs. (And I say that as a huge proponent of following standards.)
I'll give you an example that is directly relevant. Over 10 years ago I
strongly advocated to the RDF group that the Internet should abandon the
outdated practice of requiring that `text/*` media types default to
US-ASCII; otherwise there would be no point in using `text/*` for
anything going forward! (That's why we went through a sad phase where
everyone was using `application/*` for text formats because they wanted
to default to something other than US-ASCII.)
* https://www.w3.org/2008/01/rdf-media-types
* https://lists.w3.org/Archives/Public/www-archive/2007Dec/0059.html
Sure enough, eventually someone saw the light (I won't claim I had
anything to do with it, but it is exactly what I was arguing for) and
created https://tools.ietf.org/html/rfc6657, which says that individual
`text/*` types can choose a default other than ASCII. Finally we're not
stuck in the past anymore!
I would say that someone needs to create an updated
`application/x-www-form-urlencoded` specification prescribing UTF-8
decoding of encoded octets, except that the WhatWG has already done
that! So I'm not declaring that everyone should do it "my" way. I'm
saying everyone should follow the latest spec which already exists.
Anyway, thanks for listening. I think it's a fun discussion, and I
wasn't being combative---I just wanted to tell a bit of the story. I
need to get back to work now. :)
Thanks again for the change in Tomcat 10!
Garret