Graham Dumpleton wrote: > The issue here is that Apache has its own output filtering > system where filters can set headers based on the actual > content. Because of this, any output filter must always > receive the response content regardless of whether the > request is a GET or HEAD. If an application handler tries to > optimise things and not return the content, then these output > filters may generate different headers for a HEAD request > than a GET request, thereby violating the requirement that > they should actually be the same. > > Note that response content is still thrown away for a HEAD > request, it is just done at the very last moment after all > Apache output filters have processed the data.
Right, that is exactly what I am saying. In Apache's documentation, it says that every handler should include the response entity for HEAD requests, so that output filters can process the output. However, there is nothing in PEP 333 that talks about this behavior. So, the only reasonable thing to do is to assume that, when environ["REQUEST_METHOD"] == "HEAD", no response entity should be generated. Do we all agree that the following application is correct?: def application(env, start_response): start_response("200 OK", [("Content-Length", "10000")]) if env["REQUEST_METHOD"] == "HEAD": return [] else: return ["a"*10000] Because of web servers' output filters, if the WSGI gateway is an web server module or a [Fast]CGI script, then it needs to lie and tell the application that the request is a "GET", not a "HEAD." Otherwise, the application will see that the request method is "HEAD" and suppress its own response entity, as the HTTP specification requires, and the output filters will fail. The only time it is reasonable for the gateway to pass "HEAD" as the request method is when it knows that there are not any output filters/middleware that depend on the response entity. Usually that is only possible in standalone web servers like CherryPy's or Paste's. I tested this in mod_wsgi and mod_wsgi gets it wrong. mod_wsgi sets env["REQUEST_METHOD"] to "HEAD" for HEAD requests. When mod_deflate is enabled, a HEAD request returns "Content-Length: 20", and a GET request returns "Content-Length: 46". However, it is supposed to be "Content-Length: 46" in both cases. The CGI WSGI gateway in PEP 333 gets it wrong too when mod_deflate is used. Note also that in mod_wsgi, use of wsgi.file_wrapper is a huge optimization for this: if no Apache output filters need the response entity, and wsgi.file_wrapper is used, then the file will never be read off the disk. But, if wsgi.file_wrapper is not used, then the entire file has to be read off the disk through the application's output iterable for no reason. It would be nice if the non-file_wrapper case worked as well as the file_wrapper case. If you put all this together, you end up with the rules that I outlined in my previous message: > 1. WSGI gateways must always set environ["REQUEST_METHOD"] to > "GET" for HEAD requests. Middleware and applications will > not be able to detect the difference between GET and HEAD > requests. > > 2. For a HEAD request, A WSGI gateway must not iterate > through the response iterable, but it must call the > response iterable's close() method, if any. It must not > send any output that was written via > start_response(...).write() either. Consequently, > WSGI applications must work correctly, and must not > leak resources, when their output is not iterated; > an application should not signal or log an error if > the iterable's close() method is invoked without any > iteration taking place. - Brian _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com