On 09/17/2010 04:21 AM, Ian Bicking wrote:
Yes, if we get rid of SCRIPT_NAME/PATH_INFO then the problem goes away. For
servers without access to the unencoded value, reencoding those values
doesn't actually lose any information over what we have now, and avoids any
encoding issues.
It doesn't lose any information, but it also makes script_name/path_info
inherently unreliable. My fear is that if gateways are allowed to create
a reconstructed script_name/path_info without clearly signalling they
have done so, those values will continue to be unreliable at all times
and server authors won't feel the need to get it right since it's broken
everywhere anyway: the unhappy status quo.
This is why I am continuing to plead for a 'script_name/path_info are
authoritative' flag in environ that applications can use to detect
situations where it is safe to go ahead and rely on them. I want to say
"Unicode paths are supported if your server/gateway does", not "Unicode
paths might sometimes work, depending on how you configure your server
and application".
It is not just CGI that is affected here! IIS does not provide the
original undecoded path at all, even through ISAPI.
At the moment I am using a 'fixPathInfo' method in my form-reading layer
to try to compensate as much as possible for the problems of CGI:
- on Python 2 on Windows, re-read the environment variables using
ctypes if available, to avoid the mangling caused by reading
os.environ using mbcs. (This didn't used to work, as old versions
of IIS deliberately mbcs-filtered values before putting them in the
environment, but it does now.)
- on Python 3 on POSIX, re-read the environment variables using
environb if available. Otherwise try to reverse the faulty decoding
of environ using surrogateescapes, where available.
- on Windows, encode the Unicode environment to bytes using
ISO-8859-1 if the server is Apache, or UTF-8 is the server is
IIS. (IIS tries to decode path bytes using UTF-8, falling back
to mbcs where the input is not valid UTF-8. Unfortunately there
is no way to tell this has happened.)
- when server is Microsoft-IIS, remove the erroneously repeated
SCRIPT_NAME components from the front of PATH_INFO. (This is a
long-standing bug that can be configured away using the
allowPathInfo/AllowPathInfoForScriptMappings configs, but no-
one does as it breaks ASP.)
However, the form layer is not really the right place to be doing these
hacks. It would be better done in the stdlib CGI handler.
Servers with REQUEST_URI can at least attempt to
reconstruct the encoded values.
This is slightly unsafe. It's something an application might want to do
(or at least provide as an option), but a gateway probably couldn't get
away with it for the general case because REQUEST_URI doesn't reflect
the redirections done by a RewriteRule or an ErrorDocument.
Cookie is also the one header that can't be safely folded.
There are others, eg. Authorization. Anyway: folding doesn't happen in
the HTTP world. It can be forgotten about.
--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/
_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe:
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com