Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest
Hi, I just want to reply to this because I think many people seem to be missing why things are done in a certain way. Especially if the appear to be odd. On 05/01/2016 12:26, Cory Benfield wrote: 1. WSGI is prone to header injection vulnerabilities issues by designdue to the conversion of HTTP headers to CGI-style environment variables: if the server doesn’t specifically prevent it, X-Foo and X_Foo both become HTTP_X_Foo. I don’t believe it’s a good choice to destructively encode headers, expect applications to undo the damage somehow, and introduce security vulnerabilities in the process. If mimicking CGI is still considered a must-have — 1% of current Python web programmers may have heard about it, most of them from PEP — then that burden should be pushed onto the server, not the application. Headers always will have to be encoded destructively if you want any form of generic processing. We need header joining, we need to normalize the keys already at least to the extend of the HTTP specification. I'm happy to not perform the conversion of dashes to underscores but you will work in environments where this conversion was already done so the spec will need to deal with that case anyways. The WSGI spec currently also does not sufficiently explain how to join headers. In particular the cookie header was written without header joining in mind which is why it needs to be joined differently than all other headers. Header joining also comes up as a big topic in HTTP 2 so the spec will need to cover this. 2. More generally, I fail to see how mixing HTTP headers, server-related inputs, and environment variables in a dict adds values. It prevents iterating on each collection separately. It only makes sense if not offering more features than CGI is a design goal; in that case, this discussion doesn’t serve a purpose anyway. It would be nicer and possibly more secure if the application received separately: I think this is largely a nice to have, not something that has any overall benefits. I rather just clean up the actual stupid things such as CONTENT_TYPE and CONTENT_LENGTH which cause a lot more real world friction than just the names of keys in general. This really should not turn into meaningless bikeshedding about what information should be called. Also consider how much code out there already assumes CGI/WSGI variables so any move off that really should have good reasons or we all will just waste enormous amounts just to transpose between the two representations. a. Configuration information, which servers could read from environment variables by default for backwards compatibility, but could also get through more secure channels and restrict to what the application needs in order to better isolate it from the entire OS. What WSGI traditionally lacked was a setup phase where data could be passed to the application that was server specific but not request bound. For instance there is no reason an application cannot get hold of wsgi.errors before a request comes in. I would like to see this fixed in a new specification. 3. Stop pretending that HTTP is a unicode protocol, or at least stop ignoring reality when doing so. WSGI enforces ISO-8859-1-decoded str objects in the environ, which is just wrong. It’s all the more a surprising choice since this change was driven by Python 3, that UTF-8 is the correct choice, and that Python 3 defaults to UTF-8. Django has to re-encode and re-decode before doing anything with HTTP headers: I agree with this but you will have to have that fight with others. I said many times before that values should never have been unicode values in the first place but certain decisions in the Python 3 standard library at the time prevented that. In particular until 3.2 or so it was impossible to parse byte URLs. 5. Improve request / response length handling and connection closure. Armin and Graham have talked about in the past and know the topic better than I do. There’s also a rejected PEP by Armin which made sense to me. I think last time I discussed that with Graham it was not clear what the solution is in the context of WSGI. The idea that there is a content-length is laughable in the context of a real application where the server is performing conversions on the input and output stream. We would need many more than just one content length and an automatically terminated input stream. However at that point you will quickly realize that you can't have it both ways and you either have a WSGI like protocol, or raw access to sockets but certainly not both. This topic has caused a lot of bikeshedding in the past and I fail to see how it will be differently this time. My current thinking is that the most realistic approach to most of those problems will be the concept of framing on both the input and output side. That's somewhat compatible with both chunked transports well as websockets. But if we do go down
Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest
Hi, On 05/01/2016 13:09, Luke Plant wrote: Just to add my 2c - as another Django developer, I agree completely with Aymeric here. My own experience was that the HTTP handling done by WSGI (especially URL handing, HTTP header mangling, os.environ as a destination - all due to CGI compatibility - and semi-broken unicode handling) only made things harder for us. We would much rather have dealt with raw streams of bytes and done all HTTP parsing ourselves. Like Graham said, for HTTP/2 let's ignore the history of WSGI and start from scratch with a API that actually serves us well. Alright. I bite: if it would not be done that way you had different problems. In particular a problem that comes up very often is that people want the PATH_INFO and SCRIPT_NAME to not be encoded. That however completely breaks any form of routing you would want to do the moment they contain unicode characters. I keep having the argument about PATH_INFO and the header semantics constantly with people and i'm absolutely convinced (from the theory behind it as well as playing around with ideas for PEP 444 a few years ago) that it only gets worse the moment you leave the WSGI territory too far. Likewise I wonder how many people that ask for more low level access concerned themselves with chunked requests/responses, transport encodings and all the complexity that servers do for you. Yes, quite a bit of this is broken in WSGI but would have been trivial to fix without throwing the whole specification into the toilette :) Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest
Hi, On 04/01/2016 16:15, Cory Benfield wrote: I don’t believe that will work. Correct. This cannot be done except for very simplistic servers. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest
Hi, I personally probably do not want to participate in this discussion much but I want to leave some thoughts in case someone finds them useful. I personally think that fundamentally "concurrent programming" and just getting access to a socket is not something that fits into a generically deployable container specification which is what WSGI largely is. WSGI was quite trivially specified as what happens from request to response and even in that area it already suffered from significant limitations in regards to where the specification did not consider what servers would do with it. I do not want to go into detail too much but WSGI the spec never really concerned itself with the vast complexity that is HTTP in practice (chunked requests, transfer encodings, stream termination signalling etc.) I heavily doubt that dragging concurrency into the spec will make it any less problematic for real world situations. Why do we need concurrency on the spec level? I honestly do not see the point because in practical terms we might just make a spec that then cannot really be deployed in practice just because nobody would want to. Making a server that gracefully shuts down when things are purely request/response is already tricky enough, but finding a method to shut down a server with active stream connections is something that does not even have enough agreement between implementations yet (which also needs a lot of client support) that I don't think will fit into a specification. I honestly do not think that you can have it both ways: a WSGI specification and a raw socket. Maybe we reached the point where WSGI should just be deprecated and frameworks themselves will fill the gap. We would only specify a data exchange layer so that frameworks can interoperate in some way or another. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest
Hi, On 04/01/2016 16:30, Cory Benfield wrote: Your core question seems to be: “why do we need a spec that specifies concurrency?” I think this is reasonable. One way out might be to take the route of ASGI[0], which essentially uses a message broker to act as the interface between server and application. This lets applications handle their own concurrency without needing to co-ordinate with the server. From there the spec doesn’t need to handle concurrency, as the application and server are in separate processes. I think the *only* way to scale websockets is to use an RPC system to dispatch commands and to handle fan out somewhere centralized. This for instance works really well with redis as a broker. All larger deployments of websockets I have worked with so far involved a simple redis to websocket server that barely restarts and dispatches commands (and receives messages) via redis. That's a simple an straightforward way that still keeps deployments work well because you never restart the actual connections unless you need to pull a cable or you have a bug in the websocket server. That's why I'm personally also not really interested in this topic as for large scale deployments this is not really an issue and for toy applications I do not use websockets ;) Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Is PEP 3333 the final solution for WSGI on Python 3?
Hi, On 10/23/10 7:43 PM, P.J. Eby wrote: I don't think it's an either-or case. PEP just means that there's a clear path to port WSGI 1 apps. If somebody wants to champion a WSGI 1.1, a 2.0, and whatever else, that's great! Oh, I was not denying that. The original post on reddit to which I commented was called Is PEP the final solution for WSGI on Python 3 :) I'm really trying to step *down* from involvement in this; the only reason I stepped up to do this now is because of the pending 3.2 release and the open question(s) over stdlib APIs that have to stabilize in this release. I think the main problem is that we are all incredible happy with Python 2 currently and Python 3 is not very convincing at the moment. Unleaden Swallow also did not exactly deliver to it's promises lately, but PyPy seems to be doing quite well lately, and due to it's nature it would be unrealistic to assume it switches to Python 3 anytime soon. (It's one of the largest Python 2 codebases) I have to admit that my interest in Python 3 is not very high and I am most likely not the most reliable person when it comes to driving PEP 444 :) Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Is PEP 3333 the final solution for WSGI on Python 3?
Hi, On 10/22/10 2:35 AM, Graham Dumpleton wrote: has said: Hopefully not. WSGI could do better and there is a proposal for that (444). Just to give this some more context: I think WSGI (even in Python 2) is faulty and we have the possibility now to fix it. Nobody in the web community is really eager to use Python 3 currently as far as I can see, so we have some extra time where we can actually introduce some value in to web development on Python 3. An improved WSGI specification could be a key to that. If PEP is what we end up with, that is fine with me as well. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
Hi, On 9/20/10 6:31 PM, Matt Goodall wrote: Servers should definitely not transform a HEAD to a GET. There are some good reasons why it currently has to. I haven't read the link in question but I had a discussion with Graham a few days ago on Skype and he outlined the issue in detail. I will write a summary to the list in a few days, just too busy to do that right now :( Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
Hi, On 9/17/10 11:40 AM, And Clover wrote: This is why I am continuing to plead for a 'script_name/path_info are authoritative' flag in environ that applications can use to detect situations where it is safe to go ahead and rely on them. I want to say Unicode paths are supported if your server/gateway does, not Unicode paths might sometimes work, depending on how you configure your server and application. In case there is no raw value with the current spec, you can see SCRIPT_NAME and PATH_INFO as unreliable. In case we change the spec as Ian mentioned above, I am all for a wsgi.guessed_encoding = True flag or something like that. It is not just CGI that is affected here! IIS does not provide the original undecoded path at all, even through ISAPI. Unless I am mistaken, the same is true for CGI scripts running on Apache2 on Windows. - on Python 2 on Windows, re-read the environment variables using ctypes if available, to avoid the mangling caused by reading os.environ using mbcs. (This didn't used to work, as old versions of IIS deliberately mbcs-filtered values before putting them in the environment, but it does now.) I did some tests a while ago and was pretty sure that Apache2 on Windows did the same. Might be wrong though. However, the form layer is not really the right place to be doing these hacks. It would be better done in the stdlib CGI handler. The correct place for these hacks would be the appropriate WSGI/Web3 handler of the webserver. Certainly not a particular WSGI/Web3 implementation or even the CGI module of the standard library. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
Hi, On 9/17/10 5:42 PM, P.J. Eby wrote: So, do you have an example of what some real-world code is going to *do* with this information? i.e., what's the use case for knowing the precise degree of messed-uppedness of the path? ;-) Actually, I can see a couple of use cases. I have a blog that by default only produces ASCII-safe slugs for the URLs which means that if you are a chinese person you will only get the ID based fallback there. If I could safely detect if the setup supports unicode identifiers in URLs in a way that works, I could give a good default and warn the user if they change the setting. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
Hi, On 9/17/10 7:43 PM, Ian Bicking wrote: I'm also not sure what motivated this particular change, but I don't have any opinion one way or the other. Motivation is that WSGI wants servers to do something like this: if len(iterable) == 1 and content_length_header_missing: headers.append(('Content-Length', str(len(iterable[0]))) However not everybody was doing that and some applications were setting a content length header or not. If a content length header was not set some middlewares that changed content worked properly even though they did not check the header. The idea is that with web3 every tool in the chain is supposed to look for that header and update it appropriately. Even the piglatin middleware from the PEP 333 did not check the content length if I remember correctly. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
Hi, On 9/16/10 1:44 PM, Tarek Ziadé wrote: I propose to write in the PEP that a middleware should provide an app attribute to get the wrapped application or middleware. It seems to be the most common name used out there. What about middlewares that encapsulate more than one application? Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
Hi, On 9/16/10 6:19 PM, Robert Brewer wrote: 1. Hooray for all-byte output. Hooray for agreeing :) 3. +1 on (status, headers, body) in that order. Your own example code composed them in that order, and then re-arranged them for output! One of the benefits of a new spec is the opportunity to coerce rewrites in existing codebases that undo their poor design choices and make them more readable. By the way, the Specification Details and Values Returned sections have this in the (s, h, b) order in your draft. I suppose it makes sense to word the spec in that order then, seems like the majority wants it that way round. 4. The web3 spec says, In case a content length header is absent the stream must not return anything on read. It must never request more data than specified from the client. but later it says, Web3 servers must handle any supported inbound hop-by-hop headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable.. I would be sad if web3 did not support streaming uploads via Transfer-Encoding. One way to implement that would be to make the origin server handle read() transparently by returning '' on EOF, regardless of whether a Content-Length or a Transfer-Encoding header was provided. I was toying with the idea to have a websocket extension for web3 which would have solved my usecase for requests without a content-length header. The problem with the content length of incoming data is quite complex and that seemed to be the solution that was easiest for everybody involved. 5. Conversely, streaming output is nice to have and should be explicitly supported in the web3 spec. One way would be to require servers to respect a 'Transfer-Encoding: chunked' header emitted by the application. However, the WSGI and web3 specs specifically deny this approach by saying, Applications and middleware are forbidden from using HTTP/1.1 hop-by-hop features or headers. A workaround would be for the application to signal Transfer-Encoding by omitting any Content-Length header in its response headers (this is what CherryPy currently does). I am fine improving that, but it would require a very good reference implementation with enough comments so that people have an idea of how it's supposed to behave. wsgiref is nice in WSGI already, but it has its faults to which we should try to keep in mind for web3. (Like that it sets multithreaded flag despite being single threaded or that it always appends a Date header breaking some applications). 6. I'd personally like to see it be OK for apps and middleware to emit Connection: close too, or have some other way of communicating that desire to the server. I would like to see this feature as well, but you will have to fight for this feature with Phillip and Graham I suppose. 7. it is presumed that Web3 middleware will be created which can be used in front of existing WSGI 1.0 applications, allowing those existing WSGI 1.0 applications to run under a Web3 stack. This middleware will require, when under Python 3, an equivalence to be drawn between Python 3 str types and the bytes values represented by the HTTP request and all the attendant encoding- guessing (or configuration) it implies. Just some field experience: that's not hard. CherryPy 3.2 does this now between various WSGI proposals. I suppose we will see some adapters that have some configuration parameters to adapt to different usage patterns. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
Hi, On 9/16/10 7:56 PM, Ty Sarna wrote: Agreed. Among many other reasons, it seems poor from a Python 3 marketing perspective to introduce a name change that implies something totally different from WSGI that will require major rewrites to port to. It's also a poor choice as a rebranding even if one were desirable, I think. It's terribly generic, and suggests it's somehow a successor to Web 2.0. Nor is it very search engine friendly, and there may be trademark issues (http://www.networkedplanet.com/Products/Web3/) The name is not set in stone. I am very happy to accept WSGI 2 as a name for that, but we did not want to totally bypass the discussions on web-sig here and announce something that clearly says it will be WSGI 2 when only a small set of the people here participated directly in the writing of that PEP. * It makes sense to me that the error stream should accept both bytes and unicode, and should do a best effort to handle either. Getting encoding errors or type errors when logging an error is very distracting. I think I agree with this too. There are no such stream objects on Python 3 unless I am missing something. Furthermore there are no libraries on Python 3 that would emit string information as text, so I don't see the reason for considering bytes and unicode for that stream. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
Hi, Here some things comments summarized and how things will change: - The order of the response tuple. The majority of this list wants it to be changed to the standard (status, headers, body) format, and we agree. The original motivation was passing it to the constructor of a common response object, but there is no reason this shouldn't be changed. Will update the PEP and implementation appropriately. - The async part. It was added in the hope that someone would step up and come up with something better as replacement. I asked in the #twisted IRC channel but they did not see any value in supporting a common specification that was shared with the synchronous world and it looks like it will be harder to find someone that does care about this particular issue. The motivation was that facebook's tornado framework is currently attracting a lot of users and creating an environment besides the WSGI one which means that it might be quite hard to share some code between those two worlds. I also remember hearing a lot of backlash when start_response was considered for deleting last time from the nginx mod_wsgi maintainer. If I can't find someone that is willing to provide some input on that I will remove that section. - Bytes values in the environment: HTTP transmits bytes, that's a fact we can't change. When we go with native strings we will go with unicode on 3.x This has the following implications: - getting the right path info requires a decode + an encode unless you are assuming latin1. - same as above for the script name and cookie header When going with unicode strings on 3.x for environ values, we would have to do the same for outgoing values which makes middlewares a lot harder to write: - header keys and values might then be bytes and unicode strings. Because of this all middlewares would have to convert to either str objects or bytes which might mean a lot of extra encoding and decoding depending on how the middleware is implemented. - We can't change the fact that a large percentage of Python developers is living in an ASCII-only world which would never have to deal with encodings that way and might be encouraged to just assume ASCII as encoding. For implementations not based on the standard library the bytes-only approach seems to be easier in any way as far as I can see. The only real issue appears to be urllib for the moment, and until that is resolved one could easily do an encode/decode around the calls to that particular library. - web3.errors I think Ian raised concern that it's specified to support unicode only. I don't think we should change that to accepting either bytes or unicode is a good idea on Python 3 where there is no stream in the language or standard library that accepts both at the same time. An implementation for 2.x could support both, but I don't know if there is a usecase for that. In general though I have to say that very few people use wsgi.errors currently, so I don't think this is a real issue anyways. - the web3 name If there is any value in this PEP and we find something to decide on, there is no reason this couldn't be WSGI 2. But until it's just something a small part of the web-sig community worked on directly a separate name is a good thing I think, because it does not reserve the name WSGI 2 for something that might actually become WSGI 2 in case this PEP gets rejected. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
Hi, On 9/17/10 3:43 AM, Ian Bicking wrote: Not if you are working with the URL-encoded paths. SCRIPT_NAME / PATH_INFO will always stay unencoded and the current spec requires the web3.script_name thing to only be provided if the server can safely provide that. So at least for the fallback, we are dealing with (properly latin1 decoded) non-URL encoded things. Can be changed of course. Cookie is weird. If that one header could be bytes, that'd be great... but special-casing Cookie/Set-Cookie is too hard/weird. Special casing one header is indeed weird. I don't know of any other header (or the status) that would reasonably cause a problem. And I'm not glossing over corner cases -- I'm generally very aware and concerned with legacy issues, and interacting with legacy systems. There just aren't any here except for the resolvable issues I've listed. Technically speaking it would affect etags too, but I doubt anyone is using non-ASCII quoted strings there. A very funny header is btw the Warning header which actually can have any encoding: The warn-text SHOULD be in a natural language and character set that is most likely to be intelligible to the human user receiving the response. This decision MAY be based on any available knowledge, such as the location of the cache or user, the Accept-Language field in a request, the Content-Language field in a response, etc. The default language is English and the default character set is ISO-8859-1. If a character set other than ISO-8859-1 is used, it MUST be encoded in the warn-text using the method described in RFC 2047 [14]. Doubt anyone is using that header though. It's more of an issue under Python 2, it could probably be ignored with Python 3. Under Python 2 when you have some error condition it's really frustrating to encounter some unicode error with the logging of that error (often covering up the original error). I guess there it would be fine to have stderr like stream that accepts unicode and bytes. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 (aka Web3)
Hi, On 9/17/10 4:21 AM, Ian Bicking wrote: The Title header (in Atompub) also suggests 2047, but that's essentially an ASCII conversion like URL quoting. It looks something like =?iso-8859-1?q?p=F6stal?= Yep. That was mere a fun fact I wanted to share. Was not aware of HTTP specifying a non latin1 header anywhere. I suppose the authors of the HTTP specification were aware of encoding issues, just that the people that made the Cookie specification didn't have non-ASCII payloads in mind. Not too surprising, after all it's called Cookie and not arbitrary data-store :) Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
Hi, On 2010-08-27 6:05 PM, Christoph Zwerschke wrote: Btw, another problem with this is that the lower() method does not know that it has to use latin1 when lowercasing. That is not a problem insofar that case insensitive HTTP tokens are limited to ASCII only. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
Hi, On 7/17/10 9:15 AM, Ian Bicking wrote: This is an Apache-specific issue. It definitely doesn't apply to paste.httpserver, I doubt CherryPy or wsgiref. I don't really know how Nginx or other servers work. This will be an issue for every server that... * supports unicode filesystems * decides to do internal mapping based on URIs and not IRIs In fact, this will be an issue for things like middlewares that want to map applications to paths. In fact, this already is an issue on Python 2 already, just that nobody cares. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
Hi, On 7/17/10 12:57 PM, Armin Ronacher wrote: In fact, this will be an issue for things like middlewares that want to map applications to paths. In fact, this already is an issue on Python 2 already, just that nobody cares. s/applications/serving static files from folders/ Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
Hi, On 7/17/10 1:20 AM, Chris McDonough wrote: Let me know if I'm missing something. The only thing you miss is that the bytes type of Python 3 is badly supported in the stdlib (not an issue if we reimplement everything in our libraries, not an issue for me) and that the bytes type has no string formattings which makes us do the encode/decode dance in our own implementation so of the missing stdlib functions. So I am pretty sure we can't totally bypass the encoding/decoding. We might however require less encodes/decodes if we leave bytes on the WSGI layer. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO
Hi, Ian Bicking schrieb: I propose we switch primarily to native strings: str on both Python 2 and 3. I'm starting to think that this is the best idea. I then propose that we eliminate SCRIPT_NAME and PATH_INFO. Instead we have: IMO they should stick around for compatibility with older applications and be latin1 encoded on Python 3. But the use is discouraged. Again, it would be better to do; parse_cookie(urllib.unquote(environ['HTTP_COOKIE']).decode('utf8')) That will only work in Python 2, in Python 2 urllib.unquote already yields unicode strings and assumes an utf-8 quoted string. Other variables like environ['wsgi.url_scheme'], environ['CONTENT_TYPE'], etc, will be native strings. A Python 3 hello work app will then look like: def hello_world(environ): return ('200 OK', [('Content-type', 'text/html; charset=utf8')], ['Hello World!'.encode('utf8')]) start_response and changes to wsgi.input are incidental to what I'm proposing here (except that wsgi.input will be bytes); we can decide about themseparately. If we go about dropping start_response, can we move the app iter to the beginning? That would be consistent with the signature of common response objects, making it possible to do this: response = Response(*hello_world(environ)) In general I think doing too many changes at once is harmful so I'm happy to stick with start_response for another iteration of WSGI. Well, the biggie: is it right to use native strings for the environ values, and response status/headers? Specifically, tricks like the latin1 transcoding won't work in Python 2, but will in Python 3. Is this weird? Or just something you have to think about when using the two Python versions? The WSGI PEP should standardize a way for the application to figure out the environment it runs in. And that I think that should *not* be checking sys.version_info but rather comparing string features. What happens if you give unicode text in the response headers that cannot be encoded as Latin1? Undefined behavior, the example server should raise an assertion error. Should some things specifically be ASCII? E.g., status. No, HTTP specifies the status as TEXT and TEXT is specified as (any 8-bit sequence of data except any US-ASCII control character but including CR, LR, space and tabs). Should some things be unicode on Python 2? I don't think so. Is there a common case here that would be inefficient? Don't think so. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Getting back to WSGI grass roots.
Hi, Graham Dumpleton schrieb: So, rather than throw away completely the idea of bytes everywhere, and rewrite the WSGI specification, we could instead say that the existing conceptual idea of WSGI 1.0 is still valid, and just build on top of it a translation interface to present that as unicode. I could live with that as well. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
Hi, P.J. Eby schrieb: Actually, latin-1 bytes encoding is the *simplest* thing that could possibly work, since it works already in e.g. Jython, and is actually in the spec already... and any framework that wants unicode URIs already has to decode them, so the code is already written. Except that nobody implements that and that Jython has a standard Python 2.x byte string. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
Hi, Ian Bicking schrieb: Request headers, which you didn't split out... those I'm not sure. I'd *like* them to be native. But damn, I'm just not sure quite how. surrogateescape? Latin1? Latin1 as a kind of poor man's surrogateescape isn't so bad. And the headers *should* be ASCII for sane requests, so it's not a horrible compromise. Except for cookie headers. Thanks to advertising and all the other system putting headers on your page you can't even properly control that one. Another thing to consider: in Python 3.1, the HTTP server internally decodes to latin1 and there is no simple way to change that, unless you replace the implementation. Ugh. wsgi.input could remain. I think at least it should become a file-like interface (i.e., giving an empty string when the content is exausted) and I might even ask that it implement .tell() (.seek() would be nice of course, but optional). If there was some other idea, I think there's room for improvement on wsgi.input and the file interface. -1 on seek and tell. This could be impossible to implement and what we really want to do is to not have the data in memory but on disk or whereever you put big-ass uploads. Also it will be hard to test for an avaiable seek or not, because even if it's a noop, the method could be there. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
Hi, Alan Kennedy schrieb: So, if nobody implements that, then why are we trying to standardise it? I think that was just one of the ideas that were discussed. Just to sum it up a bit where we already went: - my initial plan was going bytes everywhere. Turns out, on Python 3 this is nearly impossible to do because the majority of the standard library went an unicode path, even where bytes would be more appropriate (like cgi.FieldStorage, urllib.parse etc.) - Graham, Robert (and now me as well) try to get charset guessing for URLs going, decide on latin1 for the HTTP headers. latin1 could be re-decoded by the application if it really thinks it wanted utf-8 for instance. (Like cookie headers, only I guess only there) - One idea is enforcing unicode for all Python versions - One idea is going unicode for Python 3 and bytestrings for Python 2 - New (and old) discussions bring up the surrogate escapes. So it's quite hard to follow because different people talk about different ideas at the same time. And so far none of them looks really compelling. Is there a real need out there? In python 3, yes. Because the stdlib no longer works with bytes and the bytes object has few string semantics left. Which is a worthy goal, IMHO. Java has been there since the very start, since java strings have always been unicode. Take a look at the java docs for HttpServlet: no methods return bytes/bytearrays. And people appear to have problems with that, because what they are doing is using a specified charset that is by default iso-8859-1: http://wiki.apache.org/tomcat/FAQ/CharacterEncoding Java programmers just tolerate this, although they may curse the developers of the servlet spec for not having solved their specific problem for them. Many Java apps are also still using latin1 only or have all kinds of problems with charsets. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
Hi, Alan Kennedy schrieb: Hmmm, define know ;-) The charset of incoming data, the charset of URLs, the charset of outgoing data, the charset of whatever the application uses, is what the application decides it to be. Most new applications go with utf-8 for everything these days. I see this as being the same as Graham's suggested approach of a per-server configurable charset, which is then stored in the WSGI dictionary. SCRIPT_NAME and PATH_INFO are different because URLs as entered by the user will always be utf-8 in modern browsers. Even if the application decides to have latin1 URLs. Of course a server configuration variable would be a solution for many of these problems, but I don't like the idea of changing application behavior based on server configuration. At that point we will finally have successfully killed the idea of nested WSGI applications, because those could depend on different charsets. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
Hi, And Clover schrieb: This is absolutely the opposite of what I want as an application author. I want to hand out my WSGI application that uses UTF-8 and know that wherever it is deployed the non-ASCII characters will go through without getting mangled. I could not agree more. Probably the best way is indeed using native strings for each Python version, where native strings are unicode the server should latin1 decode it and SCRIPT_NAME / PATH_INFO will be called wsgi.raw_script_name and wsgi.raw_path_info and be properly quoted. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] Just to cheer you up
Hey, After all that discussions about unicode and path info and all related problems I would love to remind everybody how well we are doing. I just had a brief discussion with Christian Neukirchen (The Rack developer) about the state of URL quoting and unicode and this is how it looks in Ruby land: - if PATH_INFO or SCRIPT_NAME is quoted, is not known. It may be, or it may not. The specification does not say and in practice both is in use. - Unicode is not specified at all, it's an unwritten law that strings in rack are specified encodingless but that is not written down either. Hope that makes you feel better :) Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
Hi, Robert Brewer schrieb: urllib.unquote, for one. We had to make a version which accepts bytes (and outputs bytes). But it's only 8 lines of code. Here a patch for urllib.parse that restores Python 2.x behavior. Because it also changes behavior for Python 3.x I have not yet submitted it for discussions: http://paste.pocoo.org/show/140739/ This adds byte support for all unquoting functions and URL parsing and joining. It also changes the quoting functions to return bytes when passed bytes. The latter is something that most likely does not survive a review on python-dev. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
Hi, James Bennett schrieb: Well, ordinarily I'd be inclined to agree: HTTP deals in bytes, so an interface to HTTP should deal in bytes as well. If it was just that I would be happy to stay with bytes. But unless the standard library changes in the way it works on Python 3 there is not much but unicode we can use. bytes no longer behave like strings, it's not very comfortable to work with them. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 0333 and PEP XXXX Updated
Hi, Graham Dumpleton schrieb: Regardless of the details of changes being made to the PEP and the creation of any new ones, do we need to first agree on the overall direction we are going to take. Ie., the grand plan at a high level. Indeed. The 0333 changes are mostly uncontroversial and can be discovered separately. So far the discussions on this mailinglist in the last days only covered what would be a new WSGI version which is in the file. What I am getting at here is that the likes of PJE has indicated a preference for skipping any WSGI 1.1 altogether and going straight to WSGI 2.0. If there isn't going to be support all round for even coming out with WSGI 1.1, then don't want to see time wasted trying to come up with a new PEP only for what is needed to change. The time wasted on is not that much, it's just your #3 written down to text with the unicode return values. So, I am starting to get nervous that we could go to a great deal of work to try and resolve the various issues for a specific definition, only to find that people don't even agree that such a version is warranted and we get a deadlock. WSGI 1.1 as currently specified in would be pretty uncontroversial on Python 2.x because of the str/unicode coercion that Python implicitly applies and that this is basically the only change. 1. Clarifications and corrections to existing WSGI for Python 2.X Is already in 0333 in the repository. 2. Come up with a version of WSGI for Python 3.X. The whole bytes versus unicode discussion. That is in , just that this new version of WSGI also works in Python 2.x and is unicode based. 3. Drop the start_response() function and ability to use its write() function returned as result. What people have been calling WSGI 2.0. That would be too many changes at the same time. We can specify WSGI 2.0 at the same time based on and just change the return value to ``(app_iter, status, headers)`` and drop the `start_response`. But that really breaks applications and workflows and I don't think everybody would swtich over to that right away. The first question is, should Python 2.X forever be bytes everywhere, or if we start introducing unicode [...] Latest version of specifies ist as unicode for 2.x and 3.x except where native strings still make sense. In my definitions I introduced 'native' string along with 'bytes' and 'unicode' string in an attempt to try and be able to use one set of language which would describe WSGI and be interpretable in the context of both Python 2.X and Python 3.X. is basically that. The second question is, do we want to try and come up with something for Python 3.X, ie., (2) above, while still preserving the current start_response() callback, or do we instead want to jump direct to WSGI (Python 3.X) 2.0, ie., combine (2) and (3) above, and say that there is no WSGI 1.X for Python 3.X at all? does not drop start_response. That would break too much code (all middlewares and it's not straightforward to write middlewares for both start_response and without then). For example, one option for a roadmap would keep bytes everywhere in Python 2.X and jump direct to WSGI 2.0 in Python 3.X. IMO WSGI 1.0 should just fix the small problem it has, and WSGI 1.1 goes to unicode in both versions. WSGI (Python 2.X) 1.1 - Clarify existing WSGI by adding (1) above. WSGI (Python 2.X) 2.0 - Drop start_response() from WSGI (Python 2.X) 1.1. Keep bytes everywhere. WSGI (Python 3.X) 2.0 - Adapt WSGI (Python 2.X) 2.0 to Python 3.X. Use definition #4 (or more likely a variation on it). For that I would rather go like this: WSGI 1.0 stays the same as PEP 0333 currently is WSGI 1.1 becomes what Ian and I added to PEP 0333 WSGI 2.0 becomes a modified version of PEP WSGI 3.0 like XXX but drops start_response One reason for still keeping bytes everywhere in Python 2.X is that is because how it is and if unicode introduced then possibly would just be ignored by people anyway. If WSGI 2.0 based on the list above introduces unicode to both Python 2.x and Python 3.x not much would change for the user. Frameworks are already using unicode everywhere already, if the decoding step happens in the webserver they just would have to make their own decoding a NOOP if they detect version (2, 0). Second reason is whereby Ian is promoting PEP 0383 as way of resolving transcoding issues If we want to be WSGI still Python 2.x compliant for all version (which I hope we do), that is out of the question. Also latin1 is fine because it's actually what HTTP speaks and does not drop any information. For URIs we do what browsers do already which also does not lose any information. Don't see what 0383 gives us we can't have with what you and Robert are already doing. So, perhaps we can step back for a minute and ask those couple of major questions. To state them again, they were: 1. Do we keep bytes everywhere forever
Re: [Web-SIG] PEP 0333 and PEP XXXX Updated
Hi, Armin Ronacher schrieb: WSGI 1.1 as currently specified in would be pretty uncontroversial on Python 2.x because of the str/unicode coercion that Python implicitly applies and that this is basically the only change. Based on the table, is 2.0 now. That would be too many changes at the same time. We can specify WSGI 2.0 at the same time based on Would be 3.0 then. IMO WSGI 1.0 should just fix the small problem it has, and WSGI 1.1 goes to unicode in both versions. Based on the table. 1.0 is 1.1 and 1.1 is 2.0. I hope that unconfuses my mail, but I'm pretty sure it did not :) Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] Request for Comments on upcoming WSGI Changes
Hello everybody, Thanks to Graham Dumpleton and Robert Brewer there is some serious progress on WSGI currently. I proposed a roadmap with some PEP changes now that need some input. Summary: WSGI 1.0 stays the same as PEP 0333 currently is WSGI 1.1 becomes what Ian and I added to PEP 0333 WSGI 2.0 becomes a unicode powered version of WSGI 1.1 WSGI 3.0 becomes WSGI 2.0 just without start_response WSGI 1.0 and 1.1 are byte based and nearly impossible to use on Python 3 because of changes in the standard library that no longer work with a byte-only approach. The PEPs themselves are here: http://bitbucket.org/ianb/wsgi-peps/ Neither the wording not the changes in there are anywhere near final. Graham wrote down two questions he wants every major framework developer to be answered. These should guide the way to new WSGI standards: 1. Do we keep bytes everywhere forever in Python 2.X, or try to introduce unicode there at all to at least mirror what changes might be made to make WSGI workable in Python 3.X? 2. Do we skip WSGI 1.X completely for Python 3.X and go straight to WSGI 2.0 for Python 3.X? I added a new question I think should be asked too: 3. Do we skip WSGI 2.0 as specified in the PEP and go straight to WSGI 3.0 and drop start_response? The following things became pretty clear when playing around with various specifications on Python 3: - Python 3 no longer implicitly converts between unicode and byte strings. This covers comparisons, the regular expression engine, all string functions and many modules in the stdlib. - The Python 3 stdlib radically moved to unicode for non unicode things as well (the http servers, http clients, url handling etc.) - A byte only version of WSGI appears unrealistic on Python 3 because it would require server and middleware implementors to reimplement parts of the standard library to work on bytes again. - unicode support can be added for WSGI on both Python 2.x and Python 3.x without removing functionality. Browsers are already doing a similar encoding trick as proposed by Graham Dumpleton to handle URLs. - Python 2.x already accepts unicode strings for many things such as URL handling thanks to the fact that unicode and byte strings are surprisingly interchangeable. - cgi.FieldStorage and some other parts is now totally broken on Python 3 and should no longer be used in 3.0 and 3.1 because it reads the response body into memory. This currently affects WebOb, Pylons and TurboGears. I sent this mail to every major framework / WSGI implementor so that we get input even if you're missing the discussion on web-sig. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
Hi, P.J. Eby schrieb: This discussion has been going on for so long that I've already forgotten what the problem was with just using the original 1.0 spec for 3.X, i.e., using native strings for everything, using latin-1 encoding. The only things I can recall off the top of my head are that the input stream would still be bytes, and that the environment might've used a different encoding. Django, Pylons, SQLAlchemy, Mako, Jinja2, Genshi, Werkzeug, WebOb and many more technologies are based on unicode, even in Python 2.x. They are currently doing decoding of byte data internally. In Python 2.x if we stick to native strings for WSGI 2.0 / 1.5 whatever we suddenly have different code paths for Python 3 and Python 2. Because in Python 3 we suddendly already have unicode data. You're assuming a situation where the applicaiton in Python 2.x was byte based, but in the majority of cases this is never the situation. IMO, this strongly suggests that it's the stdlib or Python 3 that's broken here. How much of the stdlib are we talking about needing to reimplement, aside from cgi.FieldStorage? I'm already creating a patch for urllib which currently requires unicode. I'm not sure about what to do with cgi.FieldStorage, in general I would not recommend using the cgi module for WSGI applications at all! If we would go with bytes for the WSGI 1.0 spec on Python 3 a WSGI server also has to decode that data from the Server again. Also (something I haven't yet filed as a bug because I guess there will be more changes involved) the HTTP server in Python 3.1 does not support non-ASCII headers. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
Hi, Chris McDonough schrieb: Personally, I find it a bit hard to get excited about Python 3 as a web application deployment platform. Everybody feels that way currently. But if we don't fix WSGI that will never change. Given this point of view, it would be extremely helpful if someone could explain to people with the same outlook why we should want to deal with Unicode strings in any WSGI specification. I summarized the reasons in my mail. Also have a look at the discussions in this mailinglist that lead to that. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] python3 wsgi. Re: WSGI 1 Changes [ianb's and my changes]
Hi, Massimo Di Pierro schrieb: I liked your idea very much Rene' , so I made this Can you please stop that before you do any more damage? Your code is not even anywhere close to what was discussed and has tons of errors and ugly bits and pieces in there. Again. An example does not bring us anything because we already know the implications of each proposal. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Unicode in Python 3
Hi, Armin Ronacher schrieb: urllib.parse appears to be buggy with bytestrings: I'm pretty sure the latter is a bug and I will file one, however if there is broken behavior with bytestrings in Python 3.1 that's another thing we have to keep in mind. I have to correct myself, there are separate functions for byte quoting. (parse.unquote_to_bytes, parse.quote_from_bytes). Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Unicode in Python 3
Hi, René Dudfield schrieb: I think that shows that they are being handled differently depending on type. Which is against polymorphism... but some people prefer to have separate functions for different types(in and out). I don't think other python functions do this though. So maybe this is a one off, and could be considered a bug... I'm not sure why they did it this way. The fact that urldecode and urlparse does not provide a byte-only implementation is something I would consider a bug. After all that module is called urlparse and not iriparse. Here is a snippet from the compat.py we used to port pygame to support python2.3 through 3.1 How is that related? Arguments against using bytes (and using unicode instead). So I'm -1 on using b'' all over the place since it's not in both versions of python, and makes it impossible for code bases to share the same code for multiple versions of python. That would not matter much because the high-level applications never see what's under the hood. Besides web2py all frameworks and libraries I know about are using unicode internally anyways. Argument for using bytes: There are many more. It's suppose to be byte based everywhere because that's how these protocols work. There is no magic unicode layer in HTTP that solves all of our problems. - URLs are byte based, URLs are untrusted - WSGI 1.0 was byte based, API wise that means the smallest change - Frameworks don't have to be totally rewritten because they already have their own unicode conversion functions. - Except the application, nothing knows about the real encoding information. Graham's suggestion for URL encodings means that the URL encoding would ahve to be passed to the WSGI server from outside (he proposed the apache config as an example). This means that the application behavior will change based on the server configuration, causing even more confusion. Let us ignore 2to3 and syntax problem for a minute. These are a lot less complex than the actual encoding problems. Also it is very, very unlikely that applications will be able to go through 2to3 and continue to work because there is just too much stuff that changes. b'' vs '' is really the smallest issue we have with WSGI currently. Change behavior of the bytes object and a semi-unicode aware standard library are the biggest problems in my opinion. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Unicode in Python 3
Hi, Graham Dumpleton schrieb: So, no strict need to make the WSGI adapter do it differently. You may want to only do that if concerned about overhead of transcoding. Transcoding just these is most probably going to be less overhead than the WSGI adapter having to set up both unicode and raw values in a dictionary for everything. So if I understand you correctly the wsgi.uri_encoding would be used *only* as a information what the URI encoding was, the application however should use the internal encoding it wants? That sounds right, but then let's make that should a MUST. Your query_string example is flawed as the query string is always quoted and encoding/decoding an ASCII only string will not change much if the encoding is a superset of ASCII which is required anyways for various reasons. I would go with this wording for the spec then: wsgi.uri_encoding holds the encoding of the URI that was used to decode the SCRIPT_NAME and PATH_INFO. If the application decodes the query string it MUST obey the encoding here. If REQUEST_URI is available, the server will use the URI encoding to decode this value as well. However for encoding of URIs it MUST not use the wsgi.uri_encoding information but MUST use UTF-8 to encode the URI. Backwards compatibility for URIs: If the application depends on non UTF-8 URIs and the fallback encoding is NOT latin1 the application will have to check the wsgi.uri_encoding for latin1 and if it detects it, it has to encode back to latin1 and decode from the fallback encoding (eg: iso-8859-7). WSGI 2.0 however requires the application to use UTF-8 for generated URIs. I checked the browser implementations now and for arbitrary URIs (not generated URIs in a page) the browser will always try UTF-8. RFC 3987 also recommends UTF-8 for URIs. Even with your iso-8859-4 example, can't see how you can without knowing loose what original characters are, as wsgi.uri_encoding being provided always allows you to transcode to what you needed it to be when what was supplied didn't match. Assuming the only possible values for wsgi.uri_encoding are latin1/iso-8859-1 and utf-8 when the application is invoked, I'm totally fine with that. Because if the application's fallback URI encoding is something like iso-8859-4, the application can itself check for latin1 and reencode the data. I could live with that. What I don't want to see in WSGI is that the fallback encoding (latin1) could be changed in the server configuration. Now you can go back to monologue, as definitely sleeping now. ;-) \o/ Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Unicode in Python 3
Hi, René Dudfield schrieb: Rather than using a 2to3 tool - which then makes you have two versions of your code, making the code work in python 2.x and 3.x. 2to3 outputs python2.x incompatible code - when it doesn't have to. 2to3 is intended to be run automatically for each release. You would not maintain two versions. It would mean code bases need to support b'' - which is not compatible with python2. This makes it harder to port, as it restricts people to having separate code bases for each language. This is not possible for some code bases since it double the maintenance burden. Convincing people to port to python3 is already hard enough. Byte literals are available in Python 2.6. As far as I'm concerend I don't see a real reason to port to Python 3 at the moment. We should rather get our stuff ready that once Python 2.6 is the standard the porting becomes as simple as possible. Supporting Python 2.4, 2.5, 2.6 and 3.x is a very complex task that does not work for every library (due to changed APIs for example). Well, this thread is about python3 issues. I think there's enough people who want to consider the python3 issues to not ignore it. We cannot fight on too many fronts at the same time. This thread is about unicode and encodings, not about Python 3 syntax. 2to3 tackles the latter, if it does not work for you, consider writing that to the porting mailinglist. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Unicode in Python 3
Hi, René Dudfield schrieb: What is proposed: Where was that proposed? 1. Default utf-8 to be used. That's a possibility yes, but it has to be carefully be considered. 2. A buffer to be used for raw data. What is raw data? If you mean we keep the unencoded data around, I would strongly argue against that. Otherwise it makes middlewares even harder to write. 3. New keys which are callables to request the encoding you want. Did I miss something? Why are we requesting encodings now? 4. Encoding keys are specified. 4.a URI encoding key 'wsgi.uri_encoding' 4.b Form data encoding key 'wsgi.form_encoding' 4.c Page encoding key 'wsgi.page_encoding' 4.d Header encoding key 'wsgi.header_encoding' I don't know where you are getting that from. The only WSGI key would be `wsgi.uri_encoding` and that is only set by the server and only used for legacy non UTF-8 URLs. 5. For next version of wsgi (1.1 or 2.0), using an adapter for backwards compat for wsgi 1.0 apps on wsgi2 server. No decision about WSGI versioning was made so far. If WSGI in Python 3 is based on unicode, then the version is raised to 1.1, 2.0 is not yet discussed as far as I'm concerned. 2.c Avoiding bytes type and syntax for compatibility with = python 2.5.4 (buffer, and unicode) If WSGI for Python 3 is based on Unicode it will use '' for textual context and b'' for bytes. If it's based on bytes it will obviously use the byte literals. 3. Transcoding to only happen if needed. I can't see how that would work if it's based on unicode, if it's based on bytes that's already what happens in WSGI 1. 4. URI encoding can be explicitly stated in a URI key This value is only *set* by the server on decode, the value is to be ignored by the actual application or middleware except for QUERY_STRING and REQUEST_URI decoding. Everything else makes things a lot more complicated without improving anything. 5. Backwards compat for wsgi 1.0 apps on wsgi 2 server. Also wsgi 2.0 apps on wsgi 1.0 server with an adapter. Again, WSGI 2.0 is something that has to be discussed separately, otherwise we totally lose track. Issues with proposal? Things this proposal did not consider? Yes you did: - it has no real world advantage over either WSGI based on unicode that is utf-8 with latin1 fallback or a WSGI based on bytes. - it's backwards incompatible in every way, even to CGI. - it is slow because every dict access would also cause a function call. Furthermore middlewares would most likely start causing circular dependencies when they replace the callable with a new callable and they do not alias the value as a local in the frame that created it. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] PEP 0333 and PEP XXXX Updated
Hi, I know I pretty much SPAM the list here now which is why I added all the changes of WSGI 1.0 and what could become WSGI 1.1 into a repo on bitbucket as two PEPS: http://bitbucket.org/ianb/wsgi-peps/src/ pep-0333.txt This is basically just a new revision for PEP 333 changing the following things: - removing Jython and Python 2.2 compatibility. Jython is close enough to modern Python versions now that this does not make any difference. - fixing wsgi.input by adding a proper readline(). The current version still requires the user to care about not reading past the content length, but if all server implementors agree that could be changed so that the stream provides an end of line marker. - mentioning that WSGI 1.0 is not supported by Python 3. - made WSGI 1.0 depend on bytes. - fixed example code - servers may no longer add a date or server header if that header is already present. (This MUST may become a SHOULD for the server header as it's probably hard to control for things like mod_wsgi) - weakened the rules for buffering and streaming. Everybody does it, so it should be allowed. - added middleware warning for `wsgi.file_wrapper` pep-.txt This specifies WSGI 1.1 based on #3/#4 in Graham Dumpletons Blog post. The differences to his proposal: - the application iterator must by byte based. I would really require that, so that people explicitly encode their stuff as utf-8 instead of yielding latin1. If we want to allow unicode return values I strongly encourage using utf-8 for the return value because we already require UTF-8 URLs. - clarified wsgi.uri_encoding, that algorithm should not be the default but the only one to make it easier for applications to reencode URIs. - Stick to `start_response` and `exc_info` but add deprecation warnings for `exc_info` and `write()`. This should make it easier to port applications over. Breaking too many APIs at the same time is probably not the best idea. If we really want to get rid of `start_response` at the same time, I would suggest using ``(appiter, status, headers)`` instead of ``(status, headers, appiter)``. The former is the current common signature of response objects which would make it possible to convert from a WSGI application response to a response object by doing something like this: response = Response(*wsgi_app(request.environ)) The PEP is currently missing any copyright information and headers and should only be considered as a draft. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI 1 Changes [ianb's and my changes]
Hi, René Dudfield schrieb: It says Because of this future revisions of WSGI will most likely switch away from a raw CGI environment to require the server to provide these values to be quoted and available on a different key. This information would be additional information of course! Also on that link, why not explicitly state that python 2.x should use str or StringType there? (line 977). Probably a good idea. Once I'm sure that this i no longer an issue, I will add that. It was definitely 2.2. So I think that needs to be changed in your changes - and related changes double checked. See http://docs.python.org/whatsnew/2.2.html Will do. It looks like python3 issues are being addressed in your changes anyway. But it should be discussed separately and then be integrated. The changes in the PEP currently reflect #1 of Graham's proposal. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] String Types in WSGI [Graham's WSGI for py3]
Hi, Graham Dumpleton schrieb: On top of the issues above, Armin believes 2to3 gives better results where bytes everywhere interpretation is used. Has anyone else actually tried 2to3 and have the experience with it? You slightly misquoted me. I said that 2to3 gives good results on high level transformations (eg, a django app between 2 and 3) because both foo and ufoo becomes foo. Werkzeug, WebOb, Django all use unicode by default, so the application will not notice any changes. That would not change if we would have unicode in the WSGI dict and the framework would be changed to treat it properly and do a encode/decode dance if necessary. The reason I brought it up is that 2to3 does not work at all on the raw WSGI layer currently because it converts bytes to unicode which in my opinion is just wrong. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] String Types in WSGI [Graham's WSGI for py3]
Hi, Let me backup a bit here. We have to focus on two difference use cases for WSGI on Python 3. The one is the application that should continue to work on Python 3, the other one is the application that was designed for Python 3. In both cases let's just assume that this application is using WebOb/Werkzeug/Django or whatever library is in use. 2to3 converts foo and ufoo to foo. However in Python 3 foo is unicode, so that's fine if the library exposes unicode data only. This is the case for all the frameworks and libraries. Template engines, database adapters, frameworks, they all use unicode internally which is great. If the WSGI server figures out charsets or the library, the data forwarded to the application is always unicode. So what would we gain from doing the decoding in the server? On the bright side, 2to3 would probably start working for some raw WSGI applications but would still break many. On the other hand, the frameworks would still have to perform encoding detection for stuff like multipart or form encoded form data. Even worse: they would have to apply different decode rules for form data and stuff like path info. It already caused confusion that path info was unquoted in the past with many people quoting that value, it would be even worse in the future if path info was proper unicode, query string looked like unicode but is actually url encoded data with a different encoding etc. I can see some major confusion coming up there, and it would not remove any complexity for real-world implementations of WSGI. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] python3 wsgi. Re: WSGI 1 Changes [ianb's and my changes]
Hi, René Dudfield schrieb: Perhaps a good way to test that, is to make a smallish example wsgi program to port to python3, using the various proposals... or the proposal most liked. Not a good idea. Because a small WSGI application directly on top of WSGI behaves completely different than a big WSGI application on top of an existing system. The interfaces the implementations (WebOb, Werkzeug, Django) expose would not change either way because they are already unicode aware. 2to3 would go the unicode way because that's what it was written for. But that is also the one that causes the most problems. Then we could see how easy it would be to port to a given implementation that supports that proposal. I'm not sure which of the proposals Grahams mod_wsgi branch is for... or for the cherrypy branch... but those ones would be easier to test since they're already done. A WSGI Server that is byte only based on a simple one like wsgiref can be written in a couple of minutes. You just have to take the existing sources and make sure a b is in front of all strings that should be byte strings. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Sketching a WSGI 2-to-1 adapter with greenlets
Hi, P.J. Eby schrieb: newer spec. On CPython at least, this can be implemented using greenlets, and on other Python implementations it could be done with threads. Here's a quick and dirty, untested sketch (no error checking, no version handling) of how it could be done with greenlets: greenlets are one solution, but I don't think there are any applications out there using write() that are worth supporting in WSGI 2.0. Such applications should rather use an internal buffer and write to that. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Sketching a WSGI 2-to-1 adapter with greenlets
Hi, Ian Bicking schrieb: What's wrong with this simpler approach to the conversion? It buffers, you can no longer do this: request.write('processing data') request.flush() ... request.write('data processed') request.flush() But that's not too common and people should rather rewrite their applications to use generators for these cases. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] Strings in Jython [Graham's WSGI for py3]
Hi, This is my first reply in a list of replies for Grahams lengthy blog post about WSGI 3 [1]. I break it up into multiple separate threads so that this can be discussed easier. What should be highlighted is that for Jython, as I understand it at least, when reading from a socket connection it returns a unicode string. That unicode string will only have characters in the range \u through \u00FF, inclusive. Further, it is possible to transcode that unicode string without needing to go through a separate byte string type. On Jython 2.5 (the only one I tested) there is a 'str' and 'unicode' type and sockets return strings. I can't see much difference to cpython here. Is the Jython unicode issue really (still) relevant? I can see that IronPython has only one string type, but they are doing fine handling binary data in their unicode? ones. Regards, Armin [1]: http://blog.dscpl.com.au/2009/09/roadmap-for-python-wsgi-specification.html ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] WSGI 1 Changes [ianb's and my changes]
Hi, Graham mentioned that the WSGI development might further drift apart based on the changes Ian Bicking and I did on DjangoCon in a separate hg repository [1] for the WSGI PEP. I just want to point out that these are in no way final and are further intended to only clarify some of the wrong wordings for Python 2, give us a real readline() function on the input stream and get rid of useless old cruft such as Python 2.2 support and Jython compatibility which no longer appears to be a problem. My personal Idea would be making that PEP WSGI 1.1 and having a separate one for Python 3. The reason for pushing up the number would be that frameworks then can figure out if they have to safely process the input stream because there is no useful readline function or not. Regards, Armin [1]: http://bitbucket.org/ianb/wsgi-peps/ ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI 1 Changes [ianb's and my changes]
Hi, René Dudfield schrieb: I don't like yours and Ians changes with regard to cgi. cgi exists. Breaking wsgi apps on cgi is silly. Can you give an example on where we break CGI compatibility? I think you mean pre-2.2 support, not python 2.2? iterators came about in python 2.2. That might be. That was before my time. I'm pretty sure the first Python version I used was 2.3, but don't quote me on that. Fixing the python3 wsgi situation needs to happen very soon(it's been a year already!). I don't think delaying it any longer is a good idea for python 3 and for python as a whole. So making a separate wsgi version will not be good if a new wsgi comes out for python3. I agree that WSGI for Python 3 has to be fixed, I'm just not yet convinced that Python 3 is what will be relevant anytime soon. From my current perspective there is still too much left unanswered in Python 3. Regards, Armin ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com