Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[Ian] >> OK, another proposal entirely: we kill SCRIPT_NAME and PATH_INFO >> introduce two equivalent variables that hold the NOT url-decoded values. [Graham] > That may be fine for pure Python web servers where you control the > split of REQUEST_URI into SCRIPT_NAME and PATH_INFO in the first pla

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[Ian] > When things get messed up I recommend people use a middleware > (paste.deploy.config.PrefixMiddleware, though I don't really care what they > use) to fix up the request to be correct.  Pulling it from REQUEST_URI would > be fine. That would be unworkable under java servlet containers, sinc

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Graham Dumpleton
2009/9/22 Mark Nottingham : > You're twisting my words; nowhere did I say i wasn't willing to read the > PEP. What I did say was that a proposal can and should be made in less than > eleven pages; I'd like to give my feedback, both because I use Python and > because I have some interest in HTTP. Ho

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Mark Nottingham
No worries, and apologies for the manner of asking; I just wanted to provide feedback from an HTTP perspective before it got too far down the road. I'm happy to wait a bit longer for it to bake if that's more helpful. Cheers, On 22/09/2009, at 5:10 PM, Graham Dumpleton wrote: 2009/9/22

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Armin Ronacher
Hi, P.J. Eby schrieb: > Actually, latin-1 bytes encoding is the *simplest* thing that could > possibly work, since it works already in e.g. Jython, and is actually > in the spec already... and any framework that wants unicode URIs > already has to decode them, so the code is already written.

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Armin Ronacher
Hi, Mark Nottingham schrieb: > HTTP headers *are* ASCII; RFC2616 defined them to be ISO-8859-1, but > HTTPbis currently takes the stance that they're ASCII, as in practice > Latin-1 isn't used and may introduce interop problems. In practise non-ascii data ends up in headers. > What does it me

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Mark Nottingham
On 22/09/2009, at 6:11 PM, Armin Ronacher wrote: Hi, Mark Nottingham schrieb: HTTP headers *are* ASCII; RFC2616 defined them to be ISO-8859-1, but HTTPbis currently takes the stance that they're ASCII, as in practice Latin-1 isn't used and may introduce interop problems. In practise non-asci

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Armin Ronacher
Hi, Ian Bicking schrieb: > Request headers, which you didn't split out... those I'm not sure. I'd > *like* them to be native. But damn, I'm just not sure quite how. > surrogateescape? Latin1? Latin1 as a kind of poor man's surrogateescape > isn't so bad. And the headers *should* be ASCII for

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[P.J. Eby] >> Actually, latin-1 bytes encoding is the *simplest* thing that could >> possibly work, since it works already in e.g. Jython, and is actually >> in the spec already...  and any framework that wants unicode URIs >> already has to decode them, so the code is already written. [Armin] > E

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Ian Bicking
On Tue, Sep 22, 2009 at 3:16 AM, Armin Ronacher wrote: > Hi, > > Ian Bicking schrieb: > > Request headers, which you didn't split out... those I'm not sure. I'd > > *like* them to be native. But damn, I'm just not sure quite how. > > surrogateescape? Latin1? Latin1 as a kind of poor man's sur

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Armin Ronacher
Hi, Alan Kennedy schrieb: > So, if nobody implements that, then why are we trying to standardise it? I think that was just one of the ideas that were discussed. Just to sum it up a bit where we already went: - my initial plan was going bytes everywhere. Turns out, on Python 3 this is nearly i

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Armin Ronacher
Hi, Ian Bicking schrieb: > Tell doesn't have particular overhead except to keep track of how many bytes > have been read. That would allow libraries to at least detect contention > for wsgi.input. I wish seek were detectable, though I agree it shouldn't be > required at all. Ah right. Thought t

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[Alan] >> Is there a real need out there? [Armin] > In python 3, yes. Because the stdlib no longer works with bytes and the > bytes object has few string semantics left. Why can't we just do the same as the java servlet spec? I.E. 1. Ignore the encoding issues being discussed 2. Give the progra

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[Armin] > Because that problem was solved a long ago in applications themselves. > Webob, Werkzeug, Paste, Pylons, Django, you name it, all are operating > on unicode. And the way they do that is straightforward. So what are we all discussing? Those frameworks obviously have solved all of the pr

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Armin Ronacher
Hi, Alan Kennedy schrieb: > 2. Give the programmer (possibly mojibake) unicode strings in the WSGI > environ anyway > 3. And let them solve their problems themselves, using server > configuration or bespoke middleware Because that problem was solved a long ago in applications themselves. Webob, We

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Armin Ronacher
Hi, Alan Kennedy schrieb: > from miscellaneous unknown character sets into unicode, with out any > mistakes, under all possible WSGI environments, e.g. No, they know the character sets. You tell them what character set you want to use. For example you can specify "utf-8", and they will decode/en

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[Armin] > No, they know the character sets. Hmmm, define "know" ;-) [Armin] > You tell them what character set you > want to use. For example you can specify "utf-8", and they will > decode/encode from/to utf-8. But there is no way for the application to > send information to the server before

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread René Dudfield
On Tue, Sep 22, 2009 at 10:06 AM, Alan Kennedy wrote: > [Alan] >>> Is there a real need out there? > > [Armin] >> In python 3, yes.  Because the stdlib no longer works with bytes and the >> bytes object has few string semantics left. > > Why can't we just do the same as the java servlet spec? I.E.

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Armin Ronacher
Hi, Alan Kennedy schrieb: > Hmmm, define "know" ;-) The charset of incoming data, the charset of URLs, the charset of outgoing data, the charset of whatever the application uses, is what the application decides it to be. Most new applications go with utf-8 for everything these days. > I see this

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[Armin] > Of course a server configuration variable would be a solution for many > of these problems, but I don't like the idea of changing application > behavior based on server configuration. So you don't like the way that Django, Werkzeug, WebOb, etc, do it now, even though they appear to be mo

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Armin Ronacher
Hi, Alan Kennedy schrieb: > So you don't like the way that Django, Werkzeug, WebOb, etc, do it > now, even though they appear to be mostly successful, and you're happy > to cite them as such? Server != Application. > From the applications point of view, a framework-level configuration > variable

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread René Dudfield
On Tue, Sep 22, 2009 at 12:12 PM, Armin Ronacher wrote: > Hi, > > Alan Kennedy schrieb: >> So you don't like the way that Django, Werkzeug, WebOb, etc, do it >> now, even though they appear to be mostly successful, and you're happy >> to cite them as such? > Server != Application. > >> From the ap

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Massimo Di Pierro
Thank you Armin this makes things clear to me ( a newbie hre). On Sep 22, 2009, at 3:29 AM, Armin Ronacher wrote: - my initial plan was going bytes everywhere. Turns out, on Python 3 this is nearly impossible to do because the majority of the standard library went an unicode path, even where

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread P.J. Eby
At 04:44 PM 9/22/2009 +1000, Graham Dumpleton wrote: 2009/9/22 Mark Nottingham : > That blog entry is eleven printed pages. Given that PEP 333 also prints as > eleven pages from my browser, I suspect there's some extraneous information > in there. > > Could you please summarise? Requiring all com

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread P.J. Eby
At 09:23 AM 9/22/2009 +0100, Alan Kennedy wrote: [P.J. Eby] >> Actually, latin-1 bytes encoding is the *simplest* thing that could >> possibly work, since it works already in e.g. Jython, and is actually >> in the spec already... and any framework that wants unicode URIs >> already has to decode

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread P.J. Eby
At 11:28 AM 9/22/2009 +0200, Armin Ronacher wrote: Hi, Alan Kennedy schrieb: > 2. Give the programmer (possibly mojibake) unicode strings in the WSGI > environ anyway > 3. And let them solve their problems themselves, using server > configuration or bespoke middleware Because that problem was so

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread P.J. Eby
At 11:30 AM 9/22/2009 +0100, Alan Kennedy wrote: I see this as being the same as Graham's suggested approach of a per-server configurable charset, which is then stored in the WSGI dictionary, so that applications that have problems, i.e. that detect mojibake in the unicode SCRIPT_NAME or PATH_INF

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread P.J. Eby
At 03:22 PM 9/22/2009 +0100, René Dudfield wrote: On Tue, Sep 22, 2009 at 3:07 PM, P.J. Eby wrote: > At 11:30 AM 9/22/2009 +0100, Alan Kennedy wrote: >> >> I see this as being the same as Graham's suggested approach of a >> per-server configurable charset, which is then stored in the WSGI >> dic

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Armin Ronacher
Hi, P.J. Eby schrieb: > If they are putting objects of type 'unicode' under WSGI-defined > environ keys on Python 2.x, they are *not WSGI compliant*. Who is doing that? They are using bytestrings when talking to WSGI and only expose unicode to the user of the framework. That happens on top of WS

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Armin Ronacher
Hi, P.J. Eby schrieb: > What roundtrips? If they're operating on unicode, either they're in > violation of the spec (in which case, f*** them), or they're already > running a decode every time they pull something out of the > environ... and using latin-1 or surrogates is only one encoding cal

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Robert Brewer
P.J. Eby [mailto:p...@telecommunity.com] > At 07:40 PM 9/21/2009 -0700, Robert Brewer wrote: > > Yes; you have to transcode to the "correct" encoding. Once. > > Then every other WSGI application interface "below" that one > > doesn't have to care. > > You can only do that if you *break encapsulati

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread And Clover
Ian Bicking wrote: OK, another proposal entirely: we kill SCRIPT_NAME and PATH_INFO, and introduce two equivalent variables that hold the NOT url-decoded values. Yes, that was my preferred option, it makes all the worries about encodings quite moot: everything is effectively ASCII; they'll wo

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread And Clover
Alan Kennedy wrote: Why can't we just do the same as the java servlet spec? Because Servlet is a walking, stinking demonstration of how *not* to handle encodings. Every servlet container has its own different method of selecting input character sets, and the default encoding is almost neve

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Armin Ronacher
Hi, And Clover schrieb: > This is absolutely the opposite of what I want as an application author. > I want to hand out my WSGI application that uses UTF-8 and know that > wherever it is deployed the non-ASCII characters will go through without > getting mangled. I could not agree more. Probab

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Massimo Di Pierro
+1 On Sep 22, 2009, at 10:45 AM, Armin Ronacher wrote: Hi, And Clover schrieb: This is absolutely the opposite of what I want as an application author. I want to hand out my WSGI application that uses UTF-8 and know that wherever it is deployed the non-ASCII characters will go through wit

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread And Clover
Graham wrote: > Armin has fast asleep now, so my shift. Heh. It's a multiple-man job keeping up with this monster thread! The URLs don't break. Not in themselves. Just the language of the PEP implies that to fix them up would contravene the spec: >> The application MUST use [the encoding

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread René Dudfield
On Tue, Sep 22, 2009 at 3:07 PM, P.J. Eby wrote: > At 11:30 AM 9/22/2009 +0100, Alan Kennedy wrote: >> >> I see this as being the same as Graham's suggested approach of a >> per-server configurable charset, which is then stored in the WSGI >> dictionary, so that applications that have problems, i.

[Web-SIG] Just to cheer you up

2009-09-22 Thread Armin Ronacher
Hey, After all that discussions about unicode and path info and all related problems I would love to remind everybody how well we are doing. I just had a brief discussion with Christian Neukirchen (The Rack developer) about the state of URL quoting and unicode and this is how it looks in Ruby lan

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Philip Jenvey
On Sep 22, 2009, at 2:28 AM, Armin Ronacher wrote: Hi, Alan Kennedy schrieb: 2. Give the programmer (possibly mojibake) unicode strings in the WSGI environ anyway 3. And let them solve their problems themselves, using server configuration or bespoke middleware Because that problem was solv

[Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO

2009-09-22 Thread Ian Bicking
OK, I mentioned this in the last thread, but... I can't keep up with all this discussion, and I bet you can't either. So, here's a rough proposal for WSGI and unicode: I propose we switch primarily to "native" strings: str on both Python 2 and 3. Specifically: environ keys: native environ CGI v

Re: [Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO

2009-09-22 Thread Massimo Di Pierro
Hello Ian, I really like your proposal. Massimo On Sep 22, 2009, at 9:22 PM, Ian Bicking wrote: OK, I mentioned this in the last thread, but... I can't keep up with all this discussion, and I bet you can't either. So, here's a rough proposal for WSGI and unicode: I propose we switch prim

Re: [Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO

2009-09-22 Thread P.J. Eby
At 09:22 PM 9/22/2009 -0500, Ian Bicking wrote: OK, I mentioned this in the last thread, but... I can't keep up with all this discussion, and I bet you can't either. So, here's a rough proposal for WSGI and unicode: I propose we switch primarily to "native" strings: str on both Python 2 and 3.

Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread P.J. Eby
At 05:12 PM 9/22/2009 -0700, Philip Jenvey wrote: Because our request container is a plain, pre-fabricated dict that doesn't permit the lazy behavior. Not quite true; you can always write a library function, get_foo(environ) that does the lazy caching in a private environ key, at the cost of

Re: [Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO

2009-09-22 Thread James Y Knight
On Sep 22, 2009, at 10:22 PM, Ian Bicking wrote: I propose we switch primarily to "native" strings: str on both Python 2 and 3. [...] All [...] headers will be treated as Latin1. I like this. I think it would be "cleaner" to use bytes for all these things, but it's not really important. Giv

[Web-SIG] Getting back to WSGI grass roots.

2009-09-22 Thread Graham Dumpleton
Sorry, after having had a bit of think while eating lunch, I am going to throw up another point of view on this whole issue. So, sit back and be just a little bit concerned. WSGI stands for Web Server GATEWAY Interface. My understanding is that right back at the beginning WSGI was purely intended

Re: [Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO

2009-09-22 Thread Henry Precheur
On Tue, Sep 22, 2009 at 11:26:15PM -0400, P.J. Eby wrote: > +1, if you mean the strings have the same content, > character-for-character on Python 2.3. That is, a \x80 byte in a > Python 2 'str' is matched by an \x80 character in the Python 3 > 'str'. (I presume that's what we mean by "native"

Re: [Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO

2009-09-22 Thread Henry Precheur
On Tue, Sep 22, 2009 at 09:22:48PM -0500, Ian Bicking wrote: > Well, the biggie: is it right to use native strings for the environ values, > and response status/headers? Specifically, tricks like the latin1 > transcoding won't work in Python 2, but will in Python 3. Is this weird? > Or just somet

Re: [Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO

2009-09-22 Thread Armin Ronacher
Hi, Ian Bicking schrieb: > I propose we switch primarily to "native" strings: str on both Python 2 and > 3. I'm starting to think that this is the best idea. > I then propose that we eliminate SCRIPT_NAME and PATH_INFO. Instead we > have: IMO they should stick around for compatibility with older