Re: [Web-SIG] PEP 333 and gzipping of responses
2009/8/11 James Y Knight : > On Aug 10, 2009, at 10:11 PM, James Bennett wrote: >> >> Earlier today I posted an article on my blog following up on some >> discussions of WSGI > > I find it a bit odd that you again claim WSGI doesn't support chunked > transfers after that was thoroughly explained here, already. WSGI applications themselves shouldn't deal with chunked transfer encoding. In other words, for a response, a WSGI application should not format a response in chunked form as per HTTP specification. This doesn't though stop the underlying web server from doing that where no content length is supplied, but that is nothing to do with WSGI and a completely separate concern only relevant to the web server layer. In other words, out of scope of the WSGI specification. Robert has already indicated that web server underlying CherryPy WSGI server does this and I can say that Apache also does that, so mod_wsgi also by virtue of that can generate chunked response content, albeit that it isn't actually a feature of mod_wsgi. As for request content, it is also the concern of the underlying web server and not the WSGI application. That said, the way the WSGI specification is drafted makes it impossible for a WSGI application to handle a request which uses chunked content directly. This is because wsgi.input isn't required to use an empty string as end of input sentinel. This means one cannot just read until all request content is exhausted. Instead, it is required to rely on CONTENT_LENGTH to determine how much an application can actually read. With chunked request content though, there is no CONTENT_LENGTH. The WSGI specification follows CGI though and so if CONTENT_LENGTH is not supplied you are supposed to assume that CONTENT_LENGTH is 0. As such, there is no way to indicate that input can be present but is of unknown length and so chunked request content cannot be handled directly by a WSGI compliant application. In the web server that underlies CherryPy WSGI server, Robert tries to address this by reading in all input for chunked request up front and determining CONTENT_LENGTH before passing it to the WSGI application. This prohibits WSGI application from directly streaming request content and leads into issues about what to do if request content is large. If WSGI application is streaming it itself, it could determine that it should halt if finding more than it wants to deal with. By doing that in web server though, WSGI application doesn't have that level of control. In Apache/mod_wsgi, for <3.0 it will reject chunked requests outright. In 3.0+ you will be able to optionally specify a directive which will allow chunked request content, but you have to consciously step outside of bounds of WSGI and ignore CONTENT_LENGTH and instead read to end of input if you want to handle chunked request content. Thus, your application wouldn't be WSGI compliant. Some number of users accept this though, as it is the only way to handle uploads from some mobile phones, which use chunked request content for large uploads. This issue of there being no way to handle content of unknown length also means you cannot have mutating input filters. This means you cannot use compression on request content and use mod_deflate in Apache to uncompress it as the resulting content will normally be of different length to that specified by CONTENT_LENGTH, which will be the compressed length. Now, I have described CherryPy WSGI server as being layered, ie., web server and then WSGI adapter. I know that it may not be that clear cut and they are one in the same, but logically, there is a split, even if the code is much intertwined. I am sure Robert will correct me if my understanding is wrong. :-) Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 333 and gzipping of responses
On Aug 10, 2009, at 10:11 PM, James Bennett wrote: Earlier today I posted an article on my blog following up on some discussions of WSGI I find it a bit odd that you again claim WSGI doesn't support chunked transfers after that was thoroughly explained here, already. And add to that false claims about forbidding Content-Encoding, strange claims about its character support being insufficientI'm getting the feeling that you don't actually understand HTTP. HTTP really *is* hard, but WSGI didn't screw it up, you just seem to misunderstand either what WSGI allows or else what is correct with regards to HTTP. I'd tend to agree with much of what you wrote in the last two sections of your post, but the first section is just completely confused and wrong. James ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 333 and gzipping of responses
On Mon, Aug 10, 2009 at 9:11 PM, James Bennett wrote: > Earlier today I posted an article on my blog following up on some > discussions of WSGI; one criticism presented was of language in PEP > 333 regarding gzipping of responses by WSGI applications. Ian posted a > comment which stated that the criticism was not correct, but I'm at a > loss to figure out what *is* correct, so I'll bring up the question > here. > > In a parenthetical at the end of the section entitled "Handling the > Content-Length Header", PEP 333 states: > > > Note: applications and middleware must not apply any kind of > > Transfer-Encoding to their output, such as chunking or gzipping; as > > "hop-by-hop" operations, these encodings are the province of the > > actual web server/gateway. See Other HTTP Features below, for more > > details. > > In the section "Other HTTP Features", PEP 333 states, in part: > > > However, because WSGI servers and applications do not communicate > > via HTTP, what RFC 2616 calls "hop-by-hop" headers do not apply to > > WSGI internal communications. WSGI applications must not generate > > any "hop-by-hop" headers [4], attempt to use HTTP features that > > would require them to generate such headers, or rely on the content > > of any incoming "hop-by-hop" headers in the environ dictionary. > > My criticism of this is that this is at best ambiguous, and quite > possibly openly misleading to readers of the PEP. > > The ambiguity here is that "gzip" is a valid value for the > Transfer-Encoding header in HTTP (RFC 2616, Sections 3.6 and 14.41), > but is also a valid value for the Content-Encoding header (RFC 2616, > Sections 3.5 and 14.11). > I just don't get the confusion. Transfer-Encoding is not allowed in WSGI (a hop-by-hop header, like several other Transfer-* headers). Content-Encoding is allowed, because everything not specifically mentioned is allowed. Clearly "Content-Encoding" and "Transfer-Encoding" are different strings. And, as you mention, the normal thing that people currently do is use Content-Encoding anyway, so since people aren't using Transfer-Encoding, why is this controversial? There are some weird implications to using Content-Encoding, specifically ETags and range requests, but eh... those exist in mod_deflate and just about everywhere, and are mostly outside the scope of WSGI. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] PEP 333 and gzipping of responses
Earlier today I posted an article on my blog following up on some discussions of WSGI; one criticism presented was of language in PEP 333 regarding gzipping of responses by WSGI applications. Ian posted a comment which stated that the criticism was not correct, but I'm at a loss to figure out what *is* correct, so I'll bring up the question here. In a parenthetical at the end of the section entitled "Handling the Content-Length Header", PEP 333 states: > Note: applications and middleware must not apply any kind of > Transfer-Encoding to their output, such as chunking or gzipping; as > "hop-by-hop" operations, these encodings are the province of the > actual web server/gateway. See Other HTTP Features below, for more > details. In the section "Other HTTP Features", PEP 333 states, in part: > However, because WSGI servers and applications do not communicate > via HTTP, what RFC 2616 calls "hop-by-hop" headers do not apply to > WSGI internal communications. WSGI applications must not generate > any "hop-by-hop" headers [4], attempt to use HTTP features that > would require them to generate such headers, or rely on the content > of any incoming "hop-by-hop" headers in the environ dictionary. My criticism of this is that this is at best ambiguous, and quite possibly openly misleading to readers of the PEP. The ambiguity here is that "gzip" is a valid value for the Transfer-Encoding header in HTTP (RFC 2616, Sections 3.6 and 14.41), but is also a valid value for the Content-Encoding header (RFC 2616, Sections 3.5 and 14.11). Web frameworks and libraries (in many languages, not just Python) which support gzipping of responses all seem to opt for the latter method. Additionally, Apache's mod_deflate -- which so far as I know is overwhelmingly the most common mechanism for enabling gzipping at the server level -- also opts for this method, and uses the Content-Encoding header. Given this, gzipping of responses seems to be rather universally associated, in the minds of web developers, with the Content-Encoding header, which is not a "hop-by-hop" header (RFC 2616, Section 13.5.1). As such, the immediate (and misleading) impression given to readers of PEP 333 will likely be one of: 1. PEP 333 forbids applications using Content-Encoding to signal gzipped response bodies (since it mentions gzipping as something applications specifically must not do), or 2. PEP 333 is ambiguous or contradictory on account of mentioning Transfer-Encoding and "hop-by-hop" headers in a context in which no-one uses Transfer-Encoding or a "hop-by-hop" header, or 3. This text in PEP 333 is based upon a misunderstanding of this feature of HTTP or of its use in the real world. None of these seem particularly good, and this is why I took that section of the spec to task (albeit in a much briefer and more cursory fashion, since this message is already starting to run a bit long). If I'm misreading or misunderstanding either PEP 333 or RFC 2616, I'd appreciate it if someone would explain where I've gone astray. But as it stands, I believe the text of PEP 333 quoted above is problematic and likely to lead to confusion, and (if I'm not misreading or misunderstanding it) should probably be revised to address these concerns. -- "Bureaucrat Conrad, you are technically correct -- the best kind of correct." ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com