Re: [Web-SIG] PEP 333 and gzipping of responses

2009-08-10 Thread Graham Dumpleton
2009/8/11 James Y Knight :
> On Aug 10, 2009, at 10:11 PM, James Bennett wrote:
>>
>> Earlier today I posted an article on my blog following up on some
>> discussions of WSGI
>
> I find it a bit odd that you again claim WSGI doesn't support chunked
> transfers after that was thoroughly explained here, already.

WSGI applications themselves shouldn't deal with chunked transfer
encoding. In other words, for a response, a WSGI application should
not format a response in chunked form as per HTTP specification. This
doesn't though stop the underlying web server from doing that where no
content length is supplied, but that is nothing to do with WSGI and a
completely separate concern only relevant to the web server layer. In
other words, out of scope of the WSGI specification. Robert has
already indicated that web server underlying CherryPy WSGI server does
this and I can say that Apache also does that, so mod_wsgi also by
virtue of that can generate chunked response content, albeit that it
isn't actually a feature of mod_wsgi.

As for request content, it is also the concern of the underlying web
server and not the WSGI application. That said, the way the WSGI
specification is drafted makes it impossible for a WSGI application to
handle a request which uses chunked content directly. This is because
wsgi.input isn't required to use an empty string as end of input
sentinel. This means one cannot just read until all request content is
exhausted. Instead, it is required to rely on CONTENT_LENGTH to
determine how much an application can actually read. With chunked
request content though, there is no CONTENT_LENGTH. The WSGI
specification follows CGI though and so if CONTENT_LENGTH is not
supplied you are supposed to assume that CONTENT_LENGTH is 0. As such,
there is no way to indicate that input can be present but is of
unknown length and so chunked request content cannot be handled
directly by a WSGI compliant application.

In the web server that underlies CherryPy WSGI server, Robert tries to
address this by reading in all input for chunked request up front and
determining CONTENT_LENGTH before passing it to the WSGI application.
This prohibits WSGI application from directly streaming request
content and leads into issues about what to do if request content is
large. If WSGI application is streaming it itself, it could determine
that it should halt if finding more than it wants to deal with. By
doing that in web server though, WSGI application doesn't have that
level of control.

In Apache/mod_wsgi, for <3.0 it will reject chunked requests outright.
In 3.0+ you will be able to optionally specify a directive which will
allow chunked request content, but you have to consciously step
outside of bounds of WSGI and ignore CONTENT_LENGTH and instead read
to end of input if you want to handle chunked request content. Thus,
your application wouldn't be WSGI compliant. Some number of users
accept this though, as it is the only way to handle uploads from some
mobile phones, which use chunked request content for large uploads.

This issue of there being no way to handle content of unknown length
also means you cannot have mutating input filters. This means you
cannot use compression on request content and use mod_deflate in
Apache to uncompress it as the resulting content will normally be of
different length to that specified by CONTENT_LENGTH, which will be
the compressed length.

Now, I have described CherryPy WSGI server as being layered, ie., web
server and then WSGI adapter. I know that it may not be that clear cut
and they are one in the same, but logically, there is a split, even if
the code is much intertwined. I am sure Robert will correct me if my
understanding is wrong. :-)

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 333 and gzipping of responses

2009-08-10 Thread James Y Knight

On Aug 10, 2009, at 10:11 PM, James Bennett wrote:

Earlier today I posted an article on my blog following up on some
discussions of WSGI


I find it a bit odd that you again claim WSGI doesn't support chunked  
transfers after that was thoroughly explained here, already. And add  
to that false claims about forbidding Content-Encoding, strange claims  
about its character support being insufficientI'm getting the  
feeling that you don't actually understand HTTP.


HTTP really *is* hard, but WSGI didn't screw it up, you just seem to  
misunderstand either what WSGI allows or else what is correct with  
regards to HTTP.


I'd tend to agree with much of what you wrote in the last two sections  
of your post, but the first section is just completely confused and  
wrong.


James
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 333 and gzipping of responses

2009-08-10 Thread Ian Bicking
On Mon, Aug 10, 2009 at 9:11 PM, James Bennett wrote:

> Earlier today I posted an article on my blog following up on some
> discussions of WSGI; one criticism presented was of language in PEP
> 333 regarding gzipping of responses by WSGI applications. Ian posted a
> comment which stated that the criticism was not correct, but I'm at a
> loss to figure out what *is* correct, so I'll bring up the question
> here.
>
> In a parenthetical at the end of the section entitled "Handling the
> Content-Length Header", PEP 333 states:
>
> > Note: applications and middleware must not apply any kind of
> > Transfer-Encoding to their output, such as chunking or gzipping; as
> > "hop-by-hop" operations, these encodings are the province of the
> > actual web server/gateway. See Other HTTP Features below, for more
> > details.
>
> In the section "Other HTTP Features", PEP 333 states, in part:
>
> > However, because WSGI servers and applications do not communicate
> > via HTTP, what RFC 2616 calls "hop-by-hop" headers do not apply to
> > WSGI internal communications. WSGI applications must not generate
> > any "hop-by-hop" headers [4], attempt to use HTTP features that
> > would require them to generate such headers, or rely on the content
> > of any incoming "hop-by-hop" headers in the environ dictionary.
>
> My criticism of this is that this is at best ambiguous, and quite
> possibly openly misleading to readers of the PEP.
>
> The ambiguity here is that "gzip" is a valid value for the
> Transfer-Encoding header in HTTP (RFC 2616, Sections 3.6 and 14.41),
> but is also a valid value for the Content-Encoding header (RFC 2616,
> Sections 3.5 and 14.11).
>

I just don't get the confusion.  Transfer-Encoding is not allowed in WSGI (a
hop-by-hop header, like several other Transfer-* headers).  Content-Encoding
is allowed, because everything not specifically mentioned is allowed.
 Clearly "Content-Encoding" and "Transfer-Encoding" are different strings.
 And, as you mention, the normal thing that people currently do is use
Content-Encoding anyway, so since people aren't using Transfer-Encoding, why
is this controversial?

There are some weird implications to using Content-Encoding, specifically
ETags and range requests, but eh... those exist in mod_deflate and just
about everywhere, and are mostly outside the scope of WSGI.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |
http://topplabs.org/civichacker
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] PEP 333 and gzipping of responses

2009-08-10 Thread James Bennett
Earlier today I posted an article on my blog following up on some
discussions of WSGI; one criticism presented was of language in PEP
333 regarding gzipping of responses by WSGI applications. Ian posted a
comment which stated that the criticism was not correct, but I'm at a
loss to figure out what *is* correct, so I'll bring up the question
here.

In a parenthetical at the end of the section entitled "Handling the
Content-Length Header", PEP 333 states:

> Note: applications and middleware must not apply any kind of
> Transfer-Encoding to their output, such as chunking or gzipping; as
> "hop-by-hop" operations, these encodings are the province of the
> actual web server/gateway. See Other HTTP Features below, for more
> details.

In the section "Other HTTP Features", PEP 333 states, in part:

> However, because WSGI servers and applications do not communicate
> via HTTP, what RFC 2616 calls "hop-by-hop" headers do not apply to
> WSGI internal communications. WSGI applications must not generate
> any "hop-by-hop" headers [4], attempt to use HTTP features that
> would require them to generate such headers, or rely on the content
> of any incoming "hop-by-hop" headers in the environ dictionary.

My criticism of this is that this is at best ambiguous, and quite
possibly openly misleading to readers of the PEP.

The ambiguity here is that "gzip" is a valid value for the
Transfer-Encoding header in HTTP (RFC 2616, Sections 3.6 and 14.41),
but is also a valid value for the Content-Encoding header (RFC 2616,
Sections 3.5 and 14.11).

Web frameworks and libraries (in many languages, not just Python)
which support gzipping of responses all seem to opt for the latter
method. Additionally, Apache's mod_deflate -- which so far as I know
is overwhelmingly the most common mechanism for enabling gzipping at
the server level -- also opts for this method, and uses the
Content-Encoding header.

Given this, gzipping of responses seems to be rather universally
associated, in the minds of web developers, with the Content-Encoding
header, which is not a "hop-by-hop" header (RFC 2616, Section
13.5.1). As such, the immediate (and misleading) impression given to
readers of PEP 333 will likely be one of:

1. PEP 333 forbids applications using Content-Encoding to signal
   gzipped response bodies (since it mentions gzipping as something
   applications specifically must not do), or

2. PEP 333 is ambiguous or contradictory on account of mentioning
   Transfer-Encoding and "hop-by-hop" headers in a context in which
   no-one uses Transfer-Encoding or a "hop-by-hop" header, or

3. This text in PEP 333 is based upon a misunderstanding of this
   feature of HTTP or of its use in the real world.

None of these seem particularly good, and this is why I took that
section of the spec to task (albeit in a much briefer and more cursory
fashion, since this message is already starting to run a bit long).

If I'm misreading or misunderstanding either PEP 333 or RFC 2616, I'd
appreciate it if someone would explain where I've gone astray. But as
it stands, I believe the text of PEP 333 quoted above is problematic
and likely to lead to confusion, and (if I'm not misreading or
misunderstanding it) should probably be revised to address these
concerns.


-- 
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com