On Saturday, July 17, 2010, Ian Bicking <i...@colorstudy.com> wrote: > On Fri, Jul 16, 2010 at 4:33 AM, And Clover <and...@doxdesk.com> wrote: > > > On 07/14/2010 06:43 AM, Ian Bicking wrote: > > > There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO, > and HTTP_COOKIE. > > > > (And of those, PATH_INFO is the only one that really matters, in that no-one > really uses non-ASCII script filenames, and non-ASCII characters in > Cookie/Set-Cookie are still handled so differently/brokenly across browsers > that you can't rely on them at all.) > > > > > * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them > exclusively with encoded versions > > > > For compatibility with existing apps, how about keeping the existing > SCRIPT_NAME and PATH_INFO as-is (with all their problems), and specifying > that the new 'raw' versions (whatever they are called) are added only if they > really are raw, not reconstructed. > > Having two ways of expressing the same information will lead to bugs related > to which data is canonical. If an application is using SCRIPT_NAME/PATH_INFO > and then updates those values in any way, and > wsgi.raw_script_name/wsgi.raw_path_info are present, then there will be weird > bugs and code will disagree about which one is correct. Since %2f can exist > in the raw versions, there isn't even a way to chunk the two variables in the > same way. > > > Then existing scripts that don't care about non-ASCII and slashes can carry > on as before, and for apps that do care about them, they'll be able to be > *sure* the input is correct. Or they can fall back to PATH_INFO when not > present, and avoid producing these kind of URLs in response. > > I don't think it works to imagine you can just not care about non-ASCII. > Requests come in. WSGI should represent those requests. If a request comes > in with non-ASCII bytes then WSGI needs to do *something* with it. I don't > want to have to configure servers with application policy; servers should > just work. > > And this doesn't help with Python 3: either we have byte values of > SCRIPT_NAME and PATH_INFO in Python 3, or we have text values. I think bytes > will be more awkward to port to than text, and inconsistent with other WSGI > values. If we have text then we have to choose an encoding. Latin1 will > work, but it will be the exact wrong encoding most of the time as UTF-8 is > the typical (unlike other headers, where Latin1 will mostly be an okay > encoding, or as good a guess as we have). If we firmly remove these keys > then we can avoid this choice entirely... and we conveniently also get a > better representation of the request.
One reason I don't want to see the existing keys removed is for debugging purposes. In Apache, various Apache modules such as mod_rewrite will operate on that translated path. I am concerned that if only the raw one is available in the WSGI application then confusion may arise where something doesn't go right with rewrites because the only information that may be able to be dumped in the way of debug by an application will be different to what other Apache modules may operate on. If you aren't going to make use of CGI versions, then would still like to see them present but perhaps renamed. That way you don't have a loss of information when it comes to trying to debug stuff. I could perhaps just put this in a Apache/mod_wsgi specific key as well given that the issue is particular to it. Thus might have apache.path_info or cgi.path_info. Graham > Note that libraries can smooth over this change; WebOb for instance will > certainly still support req.script_name/req.path_info by decoding the raw > values. Admittedly lots of code use these values directly... but at least if > they get a KeyError the port/fix will be obvious (as opposed to out of sync > values, which will only emerge as a problem occasionally -- I'd rather not > invite more occasional bugs). > > -- > Ian Bicking | http://blog.ianbicking.org > _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com