At 02:28 PM 8/4/2009 +1000, Graham Dumpleton wrote:
2009/8/4 P.J. Eby <>:
> I'm not clear on your logic here.  If I request foo/bar/baz (where baz
> actually has an accent over the 'a') in latin-1 encoding, and foo/bar is the
> script, then the (accented) baz is legitimate for pass-through to the
> application, no?

Technically, but what I am pointing out is that Apache pretty well
says that foo/bar needs to be UTF-8.

Which doesn't change the fact that you haven't yet proposed what a WSGI server should *do* with such non-UTF8 bytes in PATH_INFO and QUERY_STRING. Apache can and does pass through such bytes, so the spec needs to say what we do with them.

 If you are going to have
different parts of the one URL needing a different encoding to be
understood, personally I would say you asking for trouble. So, am
saying that UTF-8 needs to really apply more for sake of sanity and

So what, precisely, are you proposing should happen when such bytes are present?

So I guess the problem is more where URLs are already % encoded when
coming back as href or form action because they may be in an encoding
incompatible with UTF-8 if it were to be clicked on.

Yep, that's the case with "standard" browsers and servers; less-standard situations such as spiders and scripts generating or following URLs are also relevant, as are deliberate hack attempts. So having the result of this behavior be undefined is a bad thing.

The Apache server at least will decode those % escape sequence and I
believe it is the result of that which is used in stuff like rewrite
rule matches, not the raw URL. The only exception would be if rewrite
rule explicit matched against REQUEST_URI variable which still
contains % escape sequences. So if not in UTF-8, means effectively
that you can't then match them with Apache rewrite rules then.

That's got nothing to do with what you propose for WSGI to do with the rest of it, though.

(However, your belief may be incorrect in any event, as this page:

claims that mod_rewrite can RewriteCond on THE_REQUEST in order to match still-encoded paths.)

Web-SIG mailing list
Web SIG:

Reply via email to