On 09/17/2010 04:21 AM, Ian Bicking wrote:

Yes, if we get rid of SCRIPT_NAME/PATH_INFO then the problem goes away.  For
servers without access to the unencoded value, reencoding those values
doesn't actually lose any information over what we have now, and avoids any
encoding issues.

It doesn't lose any information, but it also makes script_name/path_info inherently unreliable. My fear is that if gateways are allowed to create a reconstructed script_name/path_info without clearly signalling they have done so, those values will continue to be unreliable at all times and server authors won't feel the need to get it right since it's broken everywhere anyway: the unhappy status quo.

This is why I am continuing to plead for a 'script_name/path_info are authoritative' flag in environ that applications can use to detect situations where it is safe to go ahead and rely on them. I want to say "Unicode paths are supported if your server/gateway does", not "Unicode paths might sometimes work, depending on how you configure your server and application".

It is not just CGI that is affected here! IIS does not provide the original undecoded path at all, even through ISAPI.

At the moment I am using a 'fixPathInfo' method in my form-reading layer to try to compensate as much as possible for the problems of CGI:

  - on Python 2 on Windows, re-read the environment variables using
    ctypes if available, to avoid the mangling caused by reading
    os.environ using mbcs. (This didn't used to work, as old versions
    of IIS deliberately mbcs-filtered values before putting them in the
    environment, but it does now.)

  - on Python 3 on POSIX, re-read the environment variables using
    environb if available. Otherwise try to reverse the faulty decoding
    of environ using surrogateescapes, where available.

  - on Windows, encode the Unicode environment to bytes using
    ISO-8859-1 if the server is Apache, or UTF-8 is the server is
    IIS. (IIS tries to decode path bytes using UTF-8, falling back
    to mbcs where the input is not valid UTF-8. Unfortunately there
    is no way to tell this has happened.)

  - when server is Microsoft-IIS, remove the erroneously repeated
    SCRIPT_NAME components from the front of PATH_INFO. (This is a
    long-standing bug that can be configured away using the
    allowPathInfo/AllowPathInfoForScriptMappings configs, but no-
    one does as it breaks ASP.)

However, the form layer is not really the right place to be doing these hacks. It would be better done in the stdlib CGI handler.

Servers with REQUEST_URI can at least attempt to
reconstruct the encoded values.

This is slightly unsafe. It's something an application might want to do (or at least provide as an option), but a gateway probably couldn't get away with it for the general case because REQUEST_URI doesn't reflect the redirections done by a RewriteRule or an ErrorDocument.

Cookie is also the one header that can't be safely folded.

There are others, eg. Authorization. Anyway: folding doesn't happen in the HTTP world. It can be forgotten about.

--
And Clover
mailto:a...@doxdesk.com
http://www.doxdesk.com/
_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to