Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-27 Thread Graham Dumpleton
On 25/01/2008, Brian Smith <[EMAIL PROTECTED]> wrote:
> My application correctly responds to HEAD requests as-is. However, it doesn't 
> work with middleware that sets headers based on the content of the response 
> body.
>
> For example, a gateway or middleware that sets ETag based on an checksum, 
> Content-Encoding, Content-Length and/or Content-MD5 will all result in wrong 
> results by default. Right now, my applications assume that any such gateway 
> or the first such middleware will change environ["REQUEST_METHOD"] from 
> "HEAD" to "GET" before the application is invoked, and discard the response 
> body that the application generates.
>
> However, many gateways and middleware do not do this, and PEP 333 doesn't 
> have anything to say about it. As a result, a 100% WSGI 1.0-compliant 
> application is not portable between gateways.
>
> I suggest that a revision of PEP 333 should require the following behavior:
>
> 1. WSGI gateways must always set environ["REQUEST_METHOD"] to "GET" for HEAD 
> requests. Middleware and applications will not be able to detect the 
> difference between GET and HEAD requests.
>
> 2. For a HEAD request, A WSGI gateway must not iterate through the response 
> iterable, but it must call the response iterable's close() method, if any. It 
> must not send any output that was written via start_response(...).write() 
> either. Consequently, WSGI applications must work correctly, and must not 
> leak resources, when their output is not iterated; an application should not 
> signal or log an error if the iterable's close() method is invoked without 
> any iteration taking place.

For this discussion, which I see that there was no further followups,
I see no choice but in Apache mod_wsgi to do number 1 above. It is the
only way that one can guarantee that things will work properly due to
the fact that Apache has its own output filtering system whereby
output headers can be set based on the actual request content. If not
done then the result of GET and HEAD may not be the same.

As to number 2 (with later clarification), I will defer trying to do
any optimisation by virtue of skipping processing of the iterable.
This is in part because of the issue of whether a WSGI adapter is
allowed to skip processing the iterable, but also because it gets a
bit tricky in Apache mod_wsgi daemon mode as you need to pass across
information from Apache child process to daemon process indicating
whether there are any output filters registered in the Apache child
process. Only knowing that could you skip processing the iterable in
the daemon process and not generate any content.

Overall I think the basic problem here is that in WSGI it likes to
think it is the sole arbiter on what the response headers will be. In
practice this may not be the case where one is bridging from a true
web server which is capable of doing a lot of other stuff. For a WSGI
adapter where this can occur, seems there isn't a choice for it to
change all HEAD requests to GET requests.

So, although I can fix Apache mod_wsgi so that HEAD works, this will
not help with other Apache solutions such as CGI, SCGI, FASTCGI, AJP
etc. For those the WSGI adapters used will have to be separately fixed
to do a similar thing.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Graham Dumpleton
On 25/01/2008, Brian Smith <[EMAIL PROTECTED]> wrote:
> Graham Dumpleton wrote:
> > If Apache mod_wsgi (the WSGI gateway) does then do this, ie.,
> > didn't iterate through the iterable and therefore didn't
> > return the content through to Apache, it would as explained
> > cause traditional Apache output filters to potentially yield
> > incorrect results. This is what I am highlighting.
> >
> > So Apache mod_wsgi couldn't avoid processing the iterable,
> > unless as you allude to with how internals of how Apache is
> > used to implement wsgi.file_wrapper support, that mod_wsgi
> > similarly detected when no Apache output filters are
> > registered that could add additional headers and skip the processing.
>
> Right, my idea was that mod_wsgi could implement a new bucket type,
> where the iteration is done if and only if some output filter reads from
> the bucket. But, if no output filters read from the bucket, then the
> iteration would never happen.

Unfortunately as I think I mentioned on mod_wsgi list previously, that
may not be trivial. :-)

> > Pardon me if I am not understanding very well, I did not get
> > much sleep last night because of baby and my head hurts. :-(
>
> Not your (or your daughter's) fault; I wrote something different from
> what I meant.

Okay, clearer now.

> I hope tonight is easier on you. Good luck!

I hope so too. Am going home early now, but the boss will probably not
allow me to read email for a couple of days until I am fully
recovered, so you'll probably not hear from me more on this issue. I
certainly understand what you are saying and the potential need for
it, so will be interesting to see what final consensus is.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Brian Smith
Graham Dumpleton wrote:
> To quote, in 2 you said:
> 
> """For a HEAD request, A WSGI gateway must not iterate 
> through the response iterable"""
> 
> I was presuming that this was saying that the WSGI gateway 
> should do this as well as changing the REQUEST_METHOD 
> actually sent to the WSGI application to GET.

I misstated it. It should be "For a HEAD request, A WSGI gateway *may*
skip iterating through the response iterable". That is, if the gateway
can detect that the request entity isn't going to change the final set
of headers in any way, it can skip the iteration.

> If Apache mod_wsgi (the WSGI gateway) does then do this, ie., 
> didn't iterate through the iterable and therefore didn't 
> return the content through to Apache, it would as explained 
> cause traditional Apache output filters to potentially yield 
> incorrect results. This is what I am highlighting.
> 
> So Apache mod_wsgi couldn't avoid processing the iterable, 
> unless as you allude to with how internals of how Apache is 
> used to implement wsgi.file_wrapper support, that mod_wsgi 
> similarly detected when no Apache output filters are 
> registered that could add additional headers and skip the processing.

Right, my idea was that mod_wsgi could implement a new bucket type,
where the iteration is done if and only if some output filter reads from
the bucket. But, if no output filters read from the bucket, then the
iteration would never happen.

> > def application(env, start_response):
> > start_response("200 OK",
> > [("Content-Length", "1")])
> > if env["REQUEST_METHOD"] == "HEAD":
> > return []
> > else:
> > return ["a"*1]
> >
> > I tested this in mod_wsgi and mod_wsgi gets it wrong. mod_wsgi sets 
> > env["REQUEST_METHOD"] to "HEAD" for HEAD requests.
> 
> It just passes whatever Apache sets up as the CGI environment.
> 
> > When mod_deflate is
> > enabled, a HEAD request returns "Content-Length: 20", and a GET 
> > request returns "Content-Length: 46". However, it is supposed to be
> > "Content-Length: 46" in both cases.
> 
> Is this with your sample application which detects HEAD and 
> doesn't return anything if it is found. In other words, it is 
> driven by what your application is actually returning?

Yes, these results are from the program above. Those 10,000 A's compress
down to 26 bytes, plus the 20 byte header. For the HEAD case,
mod_deflate compresses 0 bytes to 0 bytes and adds a 20 byte header.

> > Note also that in mod_wsgi, use of wsgi.file_wrapper is a huge 
> > optimization for this: if no Apache output filters need the 
> > response entity, and wsgi.file_wrapper is used, then the file
> > will never be read off the disk.
> 
> Hmmm, I didn't actually look under the covers of what Apache 
> did when I used its file bucket for that. Worked out better 
> than I expected then. :-)

I will double-check, but I believe that in the embedded mode, the file
never gets read at all, when there are no output filters processing the
output. I will bring it up on the mod_wsgi list. 

> Except as pointed out that 2 suggests I should never pass on 
> content from iterable for HEAD, where in practice I still 
> have to if there are output filters.
>
> Pardon me if I am not understanding very well, I did not get 
> much sleep last night because of baby and my head hurts. :-(

Not your (or your daughter's) fault; I wrote something different from
what I meant. I hope tonight is easier on you. Good luck!

Regards,
Brian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Graham Dumpleton
On 25/01/2008, Ian Bicking <[EMAIL PROTECTED]> wrote:
> Brian Smith wrote:
> > Graham Dumpleton wrote:
> >> The issue here is that Apache has its own output filtering
> >> system where filters can set headers based on the actual
> >> content. Because of this, any output filter must always
> >> receive the response content regardless of whether the
> >> request is a GET or HEAD. If an application handler tries to
> >> optimise things and not return the content, then these output
> >> filters may generate different headers for a HEAD request
> >> than a GET request, thereby violating the requirement that
> >> they should actually be the same.
> >>
> >> Note that response content is still thrown away for a HEAD
> >> request, it is just done at the very last moment after all
> >> Apache output filters have processed the data.
> >
> > Right, that is exactly what I am saying. In Apache's documentation, it
> > says that every handler should include the response entity for HEAD
> > requests, so that output filters can process the output. However, there
> > is nothing in PEP 333 that talks about this behavior.
>
> Unlike Apache there are no output filters in WSGI;

Well, the concept of output filters does exist in WSGI, they are just
called something different. ;-)

Anyway, the end result is the same, it is just that how they are
modeled in the worlds of Apache and WSGI at the interface level is
different.

Graham

> all middleware gets
> to adjust the request as well as the response.  So middleware that can't
> handle a real HEAD request has an opportunity to turn it into a GET
> request.  I don't see why PEP 333 needs to talk about this, to me it
> seems straight forward enough in a WSGI context, and PEP 333 can't cover
> every possible bug someone might introduce into their middleware.
>
>Ian
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: 
> http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Graham Dumpleton
On 25/01/2008, Brian Smith <[EMAIL PROTECTED]> wrote:
> Graham Dumpleton wrote:
> > The issue here is that Apache has its own output filtering
> > system where filters can set headers based on the actual
> > content. Because of this, any output filter must always
> > receive the response content regardless of whether the
> > request is a GET or HEAD. If an application handler tries to
> > optimise things and not return the content, then these output
> > filters may generate different headers for a HEAD request
> > than a GET request, thereby violating the requirement that
> > they should actually be the same.
> >
> > Note that response content is still thrown away for a HEAD
> > request, it is just done at the very last moment after all
> > Apache output filters have processed the data.
>
> Right, that is exactly what I am saying.

To quote, in 2 you said:

"""For a HEAD request, A WSGI gateway must not iterate through the
response iterable"""

I was presuming that this was saying that the WSGI gateway should do
this as well as changing the REQUEST_METHOD actually sent to the WSGI
application to GET.

If Apache mod_wsgi (the WSGI gateway) does then do this, ie., didn't
iterate through the iterable and therefore didn't return the content
through to Apache, it would as explained cause traditional Apache
output filters to potentially yield incorrect results. This is what I
am highlighting.

So Apache mod_wsgi couldn't avoid processing the iterable, unless as
you allude to with how internals of how Apache is used to implement
wsgi.file_wrapper support, that mod_wsgi similarly detected when no
Apache output filters are registered that could add additional headers
and skip the processing.

Some clarification in 2 is perhaps required.

> In Apache's documentation, it
> says that every handler should include the response entity for HEAD
> requests, so that output filters can process the output. However, there
> is nothing in PEP 333 that talks about this behavior. So, the only
> reasonable thing to do is to assume that, when environ["REQUEST_METHOD"]
> == "HEAD", no response entity should be generated. Do we all agree that
> the following application is correct?:
>
> def application(env, start_response):
> start_response("200 OK",
> [("Content-Length", "1")])
> if env["REQUEST_METHOD"] == "HEAD":
> return []
> else:
> return ["a"*1]
>
> Because of web servers' output filters, if the WSGI gateway is an web
> server module or a [Fast]CGI script, then it needs to lie and tell the
> application that the request is a "GET", not a "HEAD." Otherwise, the
> application will see that the request method is "HEAD" and suppress its
> own response entity, as the HTTP specification requires, and the output
> filters will fail. The only time it is reasonable for the gateway to
> pass "HEAD" as the request method is when it knows that there are not
> any output filters/middleware that depend on the response entity.
> Usually that is only possible in standalone web servers like CherryPy's
> or Paste's.
>
> I tested this in mod_wsgi and mod_wsgi gets it wrong. mod_wsgi sets
> env["REQUEST_METHOD"] to "HEAD" for HEAD requests.

It just passes whatever Apache sets up as the CGI environment.

> When mod_deflate is
> enabled, a HEAD request returns "Content-Length: 20", and a GET request
> returns "Content-Length: 46". However, it is supposed to be
> "Content-Length: 46" in both cases.

Is this with your sample application which detects HEAD and doesn't
return anything if it is found. In other words, it is driven by what
your application is actually returning?

Am not saying your application is wrong or right, am just trying to
determine if you are saying that there is a problem in Apache mod_wsgi
separate to the what it is passing as REQUEST_METHOD to cause that.

> The CGI WSGI gateway in PEP 333 gets
> it wrong too when mod_deflate is used.
>
> Note also that in mod_wsgi, use of wsgi.file_wrapper is a huge
> optimization for this: if no Apache output filters need the response
> entity, and wsgi.file_wrapper is used, then the file will never be read
> off the disk.

Hmmm, I didn't actually look under the covers of what Apache did when
I used its file bucket for that. Worked out better than I expected
then. :-)

> But, if wsgi.file_wrapper is not used, then the entire
> file has to be read off the disk through the application's output
> iterable for no reason. It would be nice if the non-file_wrapper case
> worked as well as the file_wrapper case.
>
> If you put all this together, you end up with the rules that I outlined
> in my previous message:

Except as pointed out that 2 suggests I should never pass on content
from iterable for HEAD, where in practice I still have to if there are
output filters.

Pardon me if I am not understanding very well, I did not get much
sleep last night because of baby a

Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Ian Bicking
Brian Smith wrote:
> Graham Dumpleton wrote:
>> The issue here is that Apache has its own output filtering 
>> system where filters can set headers based on the actual 
>> content. Because of this, any output filter must always 
>> receive the response content regardless of whether the 
>> request is a GET or HEAD. If an application handler tries to 
>> optimise things and not return the content, then these output 
>> filters may generate different headers for a HEAD request 
>> than a GET request, thereby violating the requirement that 
>> they should actually be the same.
>>
>> Note that response content is still thrown away for a HEAD 
>> request, it is just done at the very last moment after all 
>> Apache output filters have processed the data.
> 
> Right, that is exactly what I am saying. In Apache's documentation, it
> says that every handler should include the response entity for HEAD
> requests, so that output filters can process the output. However, there
> is nothing in PEP 333 that talks about this behavior. 

Unlike Apache there are no output filters in WSGI; all middleware gets 
to adjust the request as well as the response.  So middleware that can't 
handle a real HEAD request has an opportunity to turn it into a GET 
request.  I don't see why PEP 333 needs to talk about this, to me it 
seems straight forward enough in a WSGI context, and PEP 333 can't cover 
every possible bug someone might introduce into their middleware.

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Brian Smith
Chris McDonough wrote:
> I have applications that do detect the difference between a 
> GET and a HEAD (they do slightly less work if the request is 
> a HEAD request), so I suspect this is not a totally 
> reasonable thing to add to the spec.

Yes, of course. In order to avoid doing unnecessary work for a HEAD
request, the extra work needs to be transferred to the response
iterable; for a HEAD request, the gateway would skip the iterable except
for its close() method, and so all the extra work is skipped as well.

> Maybe instead the middleware that does what you're describing should
be changed 
> instead to deal with HEAD requests.

I agree. But, this problem is often overlooked by middleware, which
indicates that we at least need an explanation of the problem in the
specification. But, when the middleware are corrected, then applications
like yours will only work efficiently if they transfer the extra work
they do for GET (vs. HEAD) requests to the response iterable.

> In general, I don't think is (or should be) any guarantee 
> that an arbitrary middleware stack will work with an 
> arbitrary application.  Although that would be nice in 
> theory, I suspect it would require a very complex protocol 
> (more complex than what WSGI requires now).

That is exactly what WSGI is designed for. There is no point to having a
standard if there is no interoperability amongst compliant components.

- Brian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Brian Smith
Graham Dumpleton wrote:
> The issue here is that Apache has its own output filtering 
> system where filters can set headers based on the actual 
> content. Because of this, any output filter must always 
> receive the response content regardless of whether the 
> request is a GET or HEAD. If an application handler tries to 
> optimise things and not return the content, then these output 
> filters may generate different headers for a HEAD request 
> than a GET request, thereby violating the requirement that 
> they should actually be the same.
> 
> Note that response content is still thrown away for a HEAD 
> request, it is just done at the very last moment after all 
> Apache output filters have processed the data.

Right, that is exactly what I am saying. In Apache's documentation, it
says that every handler should include the response entity for HEAD
requests, so that output filters can process the output. However, there
is nothing in PEP 333 that talks about this behavior. So, the only
reasonable thing to do is to assume that, when environ["REQUEST_METHOD"]
== "HEAD", no response entity should be generated. Do we all agree that
the following application is correct?:

def application(env, start_response):
start_response("200 OK",
[("Content-Length", "1")])
if env["REQUEST_METHOD"] == "HEAD":
return []
else:
return ["a"*1]

Because of web servers' output filters, if the WSGI gateway is an web
server module or a [Fast]CGI script, then it needs to lie and tell the
application that the request is a "GET", not a "HEAD." Otherwise, the
application will see that the request method is "HEAD" and suppress its
own response entity, as the HTTP specification requires, and the output
filters will fail. The only time it is reasonable for the gateway to
pass "HEAD" as the request method is when it knows that there are not
any output filters/middleware that depend on the response entity.
Usually that is only possible in standalone web servers like CherryPy's
or Paste's.

I tested this in mod_wsgi and mod_wsgi gets it wrong. mod_wsgi sets
env["REQUEST_METHOD"] to "HEAD" for HEAD requests. When mod_deflate is
enabled, a HEAD request returns "Content-Length: 20", and a GET request
returns "Content-Length: 46". However, it is supposed to be
"Content-Length: 46" in both cases. The CGI WSGI gateway in PEP 333 gets
it wrong too when mod_deflate is used.

Note also that in mod_wsgi, use of wsgi.file_wrapper is a huge
optimization for this: if no Apache output filters need the response
entity, and wsgi.file_wrapper is used, then the file will never be read
off the disk. But, if wsgi.file_wrapper is not used, then the entire
file has to be read off the disk through the application's output
iterable for no reason. It would be nice if the non-file_wrapper case
worked as well as the file_wrapper case.

If you put all this together, you end up with the rules that I outlined
in my previous message:

> 1. WSGI gateways must always set environ["REQUEST_METHOD"] to
>"GET" for HEAD requests. Middleware and applications will
>not be able to detect the difference between GET and HEAD 
>requests.
>
> 2. For a HEAD request, A WSGI gateway must not iterate
>through the response iterable, but it must call the
>response iterable's close() method, if any. It must not
>send any output that was written via
>start_response(...).write() either. Consequently,
>WSGI applications must work correctly, and must not
>leak resources, when their output is not iterated;
>an application should not signal or log an error if
>the iterable's close() method is invoked without any
>iteration taking place.

- Brian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Graham Dumpleton
On 25/01/2008, Brian Smith <[EMAIL PROTECTED]> wrote:
> 1. WSGI gateways must always set environ["REQUEST_METHOD"] to "GET" for HEAD 
> requests. Middleware and applications will not be able to detect the 
> difference between GET and HEAD requests.
>
> 2. For a HEAD request, A WSGI gateway must not iterate through the response 
> iterable, but it must call the response iterable's close() method, if any. It 
> must not send any output that was written via start_response(...).write() 
> either. Consequently, WSGI applications must work correctly, and must not 
> leak resources, when their output is not iterated; an application should not 
> signal or log an error if the iterable's close() method is invoked without 
> any iteration taking place.
>
> Please add this issue to http://wsgi.org/wsgi/WSGI_2.0.

This would go against how things are done with Apache and could cause
Apache to generate incorrect response headers for a HEAD request.

The issue here is that Apache has its own output filtering system
where filters can set headers based on the actual content. Because of
this, any output filter must always receive the response content
regardless of whether the request is a GET or HEAD. If an application
handler tries to optimise things and not return the content, then
these output filters may generate different headers for a HEAD request
than a GET request, thereby violating the requirement that they should
actually be the same.

Note that response content is still thrown away for a HEAD request, it
is just done at the very last moment after all Apache output filters
have processed the data.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Ian Bicking
Brian Smith wrote:
> My application correctly responds to HEAD requests as-is. However, it
> doesn't work with middleware that sets headers based on the content
> of the response body.
> 
> For example, a gateway or middleware that sets ETag based on an
> checksum, Content-Encoding, Content-Length and/or Content-MD5 will
> all result in wrong results by default. Right now, my applications
> assume that any such gateway or the first such middleware will change
> environ["REQUEST_METHOD"] from "HEAD" to "GET" before the application
> is invoked, and discard the response body that the application
> generates.

Then the middleware is just wrong.  It shouldn't overwrite ETag values 
generated by the application, and if it is set to generate ETags from 
hashes of the content then it should change HEAD to GET.

> However, many gateways and middleware do not do this, and PEP 333
> doesn't have anything to say about it. As a result, a 100% WSGI
> 1.0-compliant application is not portable between gateways.

Nothing in WSGI says that all middleware is sensible or correct.  In 
this case it just seems like there's a bad middleware involved that 
isn't respecting basic HTTP semantics.  WSGI doesn't specify HTTP 
semantics but of course they are a basic foundation for any kind of 
interaction, and it's assumed they'll be respected.

   Ian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Manlio Perillo
Brian Smith ha scritto:
> My application correctly responds to HEAD requests as-is. However, it doesn't 
> work with middleware that sets headers based on the content of the response 
> body.
> 
> For example, a gateway or middleware that sets ETag based on an checksum, 
> Content-Encoding, Content-Length and/or Content-MD5 will all result in wrong 
> results by default. Right now, my applications assume that any such gateway 
> or the first such middleware will change environ["REQUEST_METHOD"] from 
> "HEAD" to "GET" before the application is invoked, and discard the response 
> body that the application generates. 
> 
> However, many gateways and middleware do not do this, and PEP 333 doesn't 
> have anything to say about it. As a result, a 100% WSGI 1.0-compliant 
> application is not portable between gateways.
> 
> I suggest that a revision of PEP 333 should require the following behavior:
> 
> 1. WSGI gateways must always set environ["REQUEST_METHOD"] to "GET" for HEAD 
> requests. Middleware and applications will not be able to detect the 
> difference between GET and HEAD requests.
> 

-1.

> 2. For a HEAD request, A WSGI gateway must not iterate through the response 
> iterable, but it must call the response iterable's close() method, if any. It 
> must not send any output that was written via start_response(...).write() 
> either. Consequently, WSGI applications must work correctly, and must not 
> leak resources, when their output is not iterated; an application should not 
> signal or log an error if the iterable's close() method is invoked without 
> any iteration taking place.
> 

This is done in the WSGI implementation for Nginx, as an example; and 
some time ago there was a discussion about this.

Moreover, if the response iterable is a generator, no iteration (and 
content generation) is done.

> Please add this issue to http://wsgi.org/wsgi/WSGI_2.0.
> 
> Regards,
> Brian
> 



Regards  Manlio Perillo
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Chris McDonough
I have applications that do detect the difference between a GET and a HEAD 
(they 
do slightly less work if the request is a HEAD request), so I suspect this is 
not a totally reasonable thing to add to the spec.  Maybe instead the 
middleware 
that does what you're describing should be changed instead to deal with HEAD 
requests.

In general, I don't think is (or should be) any guarantee that an arbitrary 
middleware stack will work with an arbitrary application.  Although that would 
be nice in theory, I suspect it would require a very complex protocol (more 
complex than what WSGI requires now).

- C

Brian Smith wrote:
> My application correctly responds to HEAD requests as-is. However, it doesn't 
> work with middleware that sets headers based on the content of the response 
> body.
> 
> For example, a gateway or middleware that sets ETag based on an checksum, 
> Content-Encoding, Content-Length and/or Content-MD5 will all result in wrong 
> results by default. Right now, my applications assume that any such gateway 
> or the first such middleware will change environ["REQUEST_METHOD"] from 
> "HEAD" to "GET" before the application is invoked, and discard the response 
> body that the application generates. 
> 
> However, many gateways and middleware do not do this, and PEP 333 doesn't 
> have anything to say about it. As a result, a 100% WSGI 1.0-compliant 
> application is not portable between gateways.
> 
> I suggest that a revision of PEP 333 should require the following behavior:
> 
> 1. WSGI gateways must always set environ["REQUEST_METHOD"] to "GET" for HEAD 
> requests. Middleware and applications will not be able to detect the 
> difference between GET and HEAD requests.
> 
> 2. For a HEAD request, A WSGI gateway must not iterate through the response 
> iterable, but it must call the response iterable's close() method, if any. It 
> must not send any output that was written via start_response(...).write() 
> either. Consequently, WSGI applications must work correctly, and must not 
> leak resources, when their output is not iterated; an application should not 
> signal or log an error if the iterable's close() method is invoked without 
> any iteration taking place.
> 
> Please add this issue to http://wsgi.org/wsgi/WSGI_2.0.
> 
> Regards,
> Brian
> 
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/chrism%40plope.com
> 

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com