[Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME
Hi. I have implemented the wsgiorg.routing_args specification, using the code in the example. However I have a problem, and I can't see a good solution. Suppose that an application is mounted (embedded in a web server) at location /example. The application script executed by the server simply setups the routings, database connections and so. Let's suppose that the request uri is /example/login/. For the main application, SCRIPT_NAME is /example. For the application at /login, SCRIPT_NAME is /example/login. My problem is that I want, in the page generated by login application, return an anchor in the form /example/logout/. The usual solution is to do environ['SCRIPT_NAME'] + '/logout', but this will return /example/login/logout/, and not /example/logout/. This seems to be not possible with the current specifications, since the original SCRIPT_NAME is lost. What is the best solution? 1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a list. This means that the request uri recostruction must be changed: SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path) 2) Store a wsgiorg.original_script_name, with the value seen by the routing application. 3) Simply don't change SCRIPT_NAME and PATH_INFO. However I usually need the updated PATH_INFO. Thanks Manlio Perillo ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME
At 03:22 PM 1/24/2008 +0100, Manlio Perillo wrote: Let's suppose that the request uri is /example/login/. For the main application, SCRIPT_NAME is /example. For the application at /login, SCRIPT_NAME is /example/login. My problem is that I want, in the page generated by login application, return an anchor in the form /example/logout/. The usual solution is to do environ['SCRIPT_NAME'] + '/logout', but this will return /example/login/logout/, and not /example/logout/. This seems to be not possible with the current specifications, since the original SCRIPT_NAME is lost. What is the best solution? 1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a list. This means that the request uri recostruction must be changed: SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path) 2) Store a wsgiorg.original_script_name, with the value seen by the routing application. 3) Simply don't change SCRIPT_NAME and PATH_INFO. However I usually need the updated PATH_INFO. 4) Use a relative link, with href=logout. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME
Phillip J. Eby ha scritto: At 03:22 PM 1/24/2008 +0100, Manlio Perillo wrote: Let's suppose that the request uri is /example/login/. For the main application, SCRIPT_NAME is /example. For the application at /login, SCRIPT_NAME is /example/login. My problem is that I want, in the page generated by login application, return an anchor in the form /example/logout/. The usual solution is to do environ['SCRIPT_NAME'] + '/logout', but this will return /example/login/logout/, and not /example/logout/. This seems to be not possible with the current specifications, since the original SCRIPT_NAME is lost. What is the best solution? 1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a list. This means that the request uri recostruction must be changed: SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path) 2) Store a wsgiorg.original_script_name, with the value seen by the routing application. 3) Simply don't change SCRIPT_NAME and PATH_INFO. However I usually need the updated PATH_INFO. 4) Use a relative link, with href=logout. But since the base url is /example/login/, this relative link is resolved to /example/login/logout/. Manlio Perillo ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME
Phillip J. Eby ha scritto: At 03:22 PM 1/24/2008 +0100, Manlio Perillo wrote: Let's suppose that the request uri is /example/login/. For the main application, SCRIPT_NAME is /example. For the application at /login, SCRIPT_NAME is /example/login. My problem is that I want, in the page generated by login application, return an anchor in the form /example/logout/. The usual solution is to do environ['SCRIPT_NAME'] + '/logout', but this will return /example/login/logout/, and not /example/logout/. This seems to be not possible with the current specifications, since the original SCRIPT_NAME is lost. What is the best solution? 1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a list. This means that the request uri recostruction must be changed: SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path) 2) Store a wsgiorg.original_script_name, with the value seen by the routing application. 3) Simply don't change SCRIPT_NAME and PATH_INFO. However I usually need the updated PATH_INFO. 4) Use a relative link, with href=logout. But since the base url is /example/login/, this relative link is resolved to /example/login/logout/. In that case, use href=../logout/. Manlio Perillo -- Sven ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME
Sven Berkvens-Matthijsse ha scritto: Phillip J. Eby ha scritto: At 03:22 PM 1/24/2008 +0100, Manlio Perillo wrote: Let's suppose that the request uri is /example/login/. For the main application, SCRIPT_NAME is /example. For the application at /login, SCRIPT_NAME is /example/login. My problem is that I want, in the page generated by login application, return an anchor in the form /example/logout/. The usual solution is to do environ['SCRIPT_NAME'] + '/logout', but this will return /example/login/logout/, and not /example/logout/. This seems to be not possible with the current specifications, since the original SCRIPT_NAME is lost. What is the best solution? 1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a list. This means that the request uri recostruction must be changed: SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path) 2) Store a wsgiorg.original_script_name, with the value seen by the routing application. 3) Simply don't change SCRIPT_NAME and PATH_INFO. However I usually need the updated PATH_INFO. 4) Use a relative link, with href=logout. But since the base url is /example/login/, this relative link is resolved to /example/login/logout/. In that case, use href=../logout/. I would not call this a solution! It's only a workaround. Manlio Perillo ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] URL quoting in WSGI (or the lack therof)
Ian Bicking wrote: We encountered it with GData too, as it uses URLs like /{http:%2f%2fexample.com}term/. But if you balance the {}'s you can parse it out. Unquoted curly braces are illegal in any kind of URI or IRI. Does GData really require them to be unquoted? - Brian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] HEAD requests, WSGI gateways, and middleware
My application correctly responds to HEAD requests as-is. However, it doesn't work with middleware that sets headers based on the content of the response body. For example, a gateway or middleware that sets ETag based on an checksum, Content-Encoding, Content-Length and/or Content-MD5 will all result in wrong results by default. Right now, my applications assume that any such gateway or the first such middleware will change environ[REQUEST_METHOD] from HEAD to GET before the application is invoked, and discard the response body that the application generates. However, many gateways and middleware do not do this, and PEP 333 doesn't have anything to say about it. As a result, a 100% WSGI 1.0-compliant application is not portable between gateways. I suggest that a revision of PEP 333 should require the following behavior: 1. WSGI gateways must always set environ[REQUEST_METHOD] to GET for HEAD requests. Middleware and applications will not be able to detect the difference between GET and HEAD requests. 2. For a HEAD request, A WSGI gateway must not iterate through the response iterable, but it must call the response iterable's close() method, if any. It must not send any output that was written via start_response(...).write() either. Consequently, WSGI applications must work correctly, and must not leak resources, when their output is not iterated; an application should not signal or log an error if the iterable's close() method is invoked without any iteration taking place. Please add this issue to http://wsgi.org/wsgi/WSGI_2.0. Regards, Brian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware
I have applications that do detect the difference between a GET and a HEAD (they do slightly less work if the request is a HEAD request), so I suspect this is not a totally reasonable thing to add to the spec. Maybe instead the middleware that does what you're describing should be changed instead to deal with HEAD requests. In general, I don't think is (or should be) any guarantee that an arbitrary middleware stack will work with an arbitrary application. Although that would be nice in theory, I suspect it would require a very complex protocol (more complex than what WSGI requires now). - C Brian Smith wrote: My application correctly responds to HEAD requests as-is. However, it doesn't work with middleware that sets headers based on the content of the response body. For example, a gateway or middleware that sets ETag based on an checksum, Content-Encoding, Content-Length and/or Content-MD5 will all result in wrong results by default. Right now, my applications assume that any such gateway or the first such middleware will change environ[REQUEST_METHOD] from HEAD to GET before the application is invoked, and discard the response body that the application generates. However, many gateways and middleware do not do this, and PEP 333 doesn't have anything to say about it. As a result, a 100% WSGI 1.0-compliant application is not portable between gateways. I suggest that a revision of PEP 333 should require the following behavior: 1. WSGI gateways must always set environ[REQUEST_METHOD] to GET for HEAD requests. Middleware and applications will not be able to detect the difference between GET and HEAD requests. 2. For a HEAD request, A WSGI gateway must not iterate through the response iterable, but it must call the response iterable's close() method, if any. It must not send any output that was written via start_response(...).write() either. Consequently, WSGI applications must work correctly, and must not leak resources, when their output is not iterated; an application should not signal or log an error if the iterable's close() method is invoked without any iteration taking place. Please add this issue to http://wsgi.org/wsgi/WSGI_2.0. Regards, Brian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/chrism%40plope.com ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware
Brian Smith ha scritto: My application correctly responds to HEAD requests as-is. However, it doesn't work with middleware that sets headers based on the content of the response body. For example, a gateway or middleware that sets ETag based on an checksum, Content-Encoding, Content-Length and/or Content-MD5 will all result in wrong results by default. Right now, my applications assume that any such gateway or the first such middleware will change environ[REQUEST_METHOD] from HEAD to GET before the application is invoked, and discard the response body that the application generates. However, many gateways and middleware do not do this, and PEP 333 doesn't have anything to say about it. As a result, a 100% WSGI 1.0-compliant application is not portable between gateways. I suggest that a revision of PEP 333 should require the following behavior: 1. WSGI gateways must always set environ[REQUEST_METHOD] to GET for HEAD requests. Middleware and applications will not be able to detect the difference between GET and HEAD requests. -1. 2. For a HEAD request, A WSGI gateway must not iterate through the response iterable, but it must call the response iterable's close() method, if any. It must not send any output that was written via start_response(...).write() either. Consequently, WSGI applications must work correctly, and must not leak resources, when their output is not iterated; an application should not signal or log an error if the iterable's close() method is invoked without any iteration taking place. This is done in the WSGI implementation for Nginx, as an example; and some time ago there was a discussion about this. Moreover, if the response iterable is a generator, no iteration (and content generation) is done. Please add this issue to http://wsgi.org/wsgi/WSGI_2.0. Regards, Brian Regards Manlio Perillo ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] URL quoting in WSGI (or the lack therof)
Brian Smith wrote: Ian Bicking wrote: We encountered it with GData too, as it uses URLs like /{http:%2f%2fexample.com}term/. But if you balance the {}'s you can parse it out. Unquoted curly braces are illegal in any kind of URI or IRI. Does GData really require them to be unquoted? No, quoted is fine. Of course parsing PATH_INFO I couldn't tell anyway ;) Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware
Brian Smith wrote: My application correctly responds to HEAD requests as-is. However, it doesn't work with middleware that sets headers based on the content of the response body. For example, a gateway or middleware that sets ETag based on an checksum, Content-Encoding, Content-Length and/or Content-MD5 will all result in wrong results by default. Right now, my applications assume that any such gateway or the first such middleware will change environ[REQUEST_METHOD] from HEAD to GET before the application is invoked, and discard the response body that the application generates. Then the middleware is just wrong. It shouldn't overwrite ETag values generated by the application, and if it is set to generate ETags from hashes of the content then it should change HEAD to GET. However, many gateways and middleware do not do this, and PEP 333 doesn't have anything to say about it. As a result, a 100% WSGI 1.0-compliant application is not portable between gateways. Nothing in WSGI says that all middleware is sensible or correct. In this case it just seems like there's a bad middleware involved that isn't respecting basic HTTP semantics. WSGI doesn't specify HTTP semantics but of course they are a basic foundation for any kind of interaction, and it's assumed they'll be respected. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME
Manlio Perillo wrote: I have implemented the wsgiorg.routing_args specification, using the code in the example. However I have a problem, and I can't see a good solution. Suppose that an application is mounted (embedded in a web server) at location /example. The application script executed by the server simply setups the routings, database connections and so. Let's suppose that the request uri is /example/login/. For the main application, SCRIPT_NAME is /example. For the application at /login, SCRIPT_NAME is /example/login. My problem is that I want, in the page generated by login application, return an anchor in the form /example/logout/. The usual solution is to do environ['SCRIPT_NAME'] + '/logout', but this will return /example/login/logout/, and not /example/logout/. This seems to be not possible with the current specifications, since the original SCRIPT_NAME is lost. What is the best solution? 1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a list. This means that the request uri recostruction must be changed: SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path) I suppose you could leave stuff on PATH_INFO. But that doesn't seem to fit with the idea of PATH_INFO. Also, will it be strictly SCRIPT_NAME/consumed_path/PATH_INFO, or could it be SCRIPT_NAME/consumed_path/some_other_parsing/consumed_path/PATH_INFO -- after all, there's cases where stuff gets pushed from PATH_INFO to SCRIPT_NAME, and if consumed_path is in between, which one do you push stuff to? 2) Store a wsgiorg.original_script_name, with the value seen by the routing application. I guess I usually do something like this, typically storing myapp.base_path for use when I am generation application-absolute URLs (like /logout). Then at the first chance (before running any kind of routing) I do environ['myapp.base_path'] = environ['SCRIPT_NAME']. This ad hoc technique works fine, but is very ad hoc. I'm not sure what the best way to handle this is, really. I'm not sure there's a singular root for an entire request, if you are nesting applications, so a single key (wsgiorg.original_script_name) doesn't seem quite right. I can't remember what Routes does for URL generation. Maybe it leaves SCRIPT_NAME alone? I think so. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware
On 25/01/2008, Brian Smith [EMAIL PROTECTED] wrote: 1. WSGI gateways must always set environ[REQUEST_METHOD] to GET for HEAD requests. Middleware and applications will not be able to detect the difference between GET and HEAD requests. 2. For a HEAD request, A WSGI gateway must not iterate through the response iterable, but it must call the response iterable's close() method, if any. It must not send any output that was written via start_response(...).write() either. Consequently, WSGI applications must work correctly, and must not leak resources, when their output is not iterated; an application should not signal or log an error if the iterable's close() method is invoked without any iteration taking place. Please add this issue to http://wsgi.org/wsgi/WSGI_2.0. This would go against how things are done with Apache and could cause Apache to generate incorrect response headers for a HEAD request. The issue here is that Apache has its own output filtering system where filters can set headers based on the actual content. Because of this, any output filter must always receive the response content regardless of whether the request is a GET or HEAD. If an application handler tries to optimise things and not return the content, then these output filters may generate different headers for a HEAD request than a GET request, thereby violating the requirement that they should actually be the same. Note that response content is still thrown away for a HEAD request, it is just done at the very last moment after all Apache output filters have processed the data. Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware
Chris McDonough wrote: I have applications that do detect the difference between a GET and a HEAD (they do slightly less work if the request is a HEAD request), so I suspect this is not a totally reasonable thing to add to the spec. Yes, of course. In order to avoid doing unnecessary work for a HEAD request, the extra work needs to be transferred to the response iterable; for a HEAD request, the gateway would skip the iterable except for its close() method, and so all the extra work is skipped as well. Maybe instead the middleware that does what you're describing should be changed instead to deal with HEAD requests. I agree. But, this problem is often overlooked by middleware, which indicates that we at least need an explanation of the problem in the specification. But, when the middleware are corrected, then applications like yours will only work efficiently if they transfer the extra work they do for GET (vs. HEAD) requests to the response iterable. In general, I don't think is (or should be) any guarantee that an arbitrary middleware stack will work with an arbitrary application. Although that would be nice in theory, I suspect it would require a very complex protocol (more complex than what WSGI requires now). That is exactly what WSGI is designed for. There is no point to having a standard if there is no interoperability amongst compliant components. - Brian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware
Graham Dumpleton wrote: The issue here is that Apache has its own output filtering system where filters can set headers based on the actual content. Because of this, any output filter must always receive the response content regardless of whether the request is a GET or HEAD. If an application handler tries to optimise things and not return the content, then these output filters may generate different headers for a HEAD request than a GET request, thereby violating the requirement that they should actually be the same. Note that response content is still thrown away for a HEAD request, it is just done at the very last moment after all Apache output filters have processed the data. Right, that is exactly what I am saying. In Apache's documentation, it says that every handler should include the response entity for HEAD requests, so that output filters can process the output. However, there is nothing in PEP 333 that talks about this behavior. So, the only reasonable thing to do is to assume that, when environ[REQUEST_METHOD] == HEAD, no response entity should be generated. Do we all agree that the following application is correct?: def application(env, start_response): start_response(200 OK, [(Content-Length, 1)]) if env[REQUEST_METHOD] == HEAD: return [] else: return [a*1] Because of web servers' output filters, if the WSGI gateway is an web server module or a [Fast]CGI script, then it needs to lie and tell the application that the request is a GET, not a HEAD. Otherwise, the application will see that the request method is HEAD and suppress its own response entity, as the HTTP specification requires, and the output filters will fail. The only time it is reasonable for the gateway to pass HEAD as the request method is when it knows that there are not any output filters/middleware that depend on the response entity. Usually that is only possible in standalone web servers like CherryPy's or Paste's. I tested this in mod_wsgi and mod_wsgi gets it wrong. mod_wsgi sets env[REQUEST_METHOD] to HEAD for HEAD requests. When mod_deflate is enabled, a HEAD request returns Content-Length: 20, and a GET request returns Content-Length: 46. However, it is supposed to be Content-Length: 46 in both cases. The CGI WSGI gateway in PEP 333 gets it wrong too when mod_deflate is used. Note also that in mod_wsgi, use of wsgi.file_wrapper is a huge optimization for this: if no Apache output filters need the response entity, and wsgi.file_wrapper is used, then the file will never be read off the disk. But, if wsgi.file_wrapper is not used, then the entire file has to be read off the disk through the application's output iterable for no reason. It would be nice if the non-file_wrapper case worked as well as the file_wrapper case. If you put all this together, you end up with the rules that I outlined in my previous message: 1. WSGI gateways must always set environ[REQUEST_METHOD] to GET for HEAD requests. Middleware and applications will not be able to detect the difference between GET and HEAD requests. 2. For a HEAD request, A WSGI gateway must not iterate through the response iterable, but it must call the response iterable's close() method, if any. It must not send any output that was written via start_response(...).write() either. Consequently, WSGI applications must work correctly, and must not leak resources, when their output is not iterated; an application should not signal or log an error if the iterable's close() method is invoked without any iteration taking place. - Brian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware
Brian Smith wrote: Graham Dumpleton wrote: The issue here is that Apache has its own output filtering system where filters can set headers based on the actual content. Because of this, any output filter must always receive the response content regardless of whether the request is a GET or HEAD. If an application handler tries to optimise things and not return the content, then these output filters may generate different headers for a HEAD request than a GET request, thereby violating the requirement that they should actually be the same. Note that response content is still thrown away for a HEAD request, it is just done at the very last moment after all Apache output filters have processed the data. Right, that is exactly what I am saying. In Apache's documentation, it says that every handler should include the response entity for HEAD requests, so that output filters can process the output. However, there is nothing in PEP 333 that talks about this behavior. Unlike Apache there are no output filters in WSGI; all middleware gets to adjust the request as well as the response. So middleware that can't handle a real HEAD request has an opportunity to turn it into a GET request. I don't see why PEP 333 needs to talk about this, to me it seems straight forward enough in a WSGI context, and PEP 333 can't cover every possible bug someone might introduce into their middleware. Ian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware
Graham Dumpleton wrote: To quote, in 2 you said: For a HEAD request, A WSGI gateway must not iterate through the response iterable I was presuming that this was saying that the WSGI gateway should do this as well as changing the REQUEST_METHOD actually sent to the WSGI application to GET. I misstated it. It should be For a HEAD request, A WSGI gateway *may* skip iterating through the response iterable. That is, if the gateway can detect that the request entity isn't going to change the final set of headers in any way, it can skip the iteration. If Apache mod_wsgi (the WSGI gateway) does then do this, ie., didn't iterate through the iterable and therefore didn't return the content through to Apache, it would as explained cause traditional Apache output filters to potentially yield incorrect results. This is what I am highlighting. So Apache mod_wsgi couldn't avoid processing the iterable, unless as you allude to with how internals of how Apache is used to implement wsgi.file_wrapper support, that mod_wsgi similarly detected when no Apache output filters are registered that could add additional headers and skip the processing. Right, my idea was that mod_wsgi could implement a new bucket type, where the iteration is done if and only if some output filter reads from the bucket. But, if no output filters read from the bucket, then the iteration would never happen. def application(env, start_response): start_response(200 OK, [(Content-Length, 1)]) if env[REQUEST_METHOD] == HEAD: return [] else: return [a*1] I tested this in mod_wsgi and mod_wsgi gets it wrong. mod_wsgi sets env[REQUEST_METHOD] to HEAD for HEAD requests. It just passes whatever Apache sets up as the CGI environment. When mod_deflate is enabled, a HEAD request returns Content-Length: 20, and a GET request returns Content-Length: 46. However, it is supposed to be Content-Length: 46 in both cases. Is this with your sample application which detects HEAD and doesn't return anything if it is found. In other words, it is driven by what your application is actually returning? Yes, these results are from the program above. Those 10,000 A's compress down to 26 bytes, plus the 20 byte header. For the HEAD case, mod_deflate compresses 0 bytes to 0 bytes and adds a 20 byte header. Note also that in mod_wsgi, use of wsgi.file_wrapper is a huge optimization for this: if no Apache output filters need the response entity, and wsgi.file_wrapper is used, then the file will never be read off the disk. Hmmm, I didn't actually look under the covers of what Apache did when I used its file bucket for that. Worked out better than I expected then. :-) I will double-check, but I believe that in the embedded mode, the file never gets read at all, when there are no output filters processing the output. I will bring it up on the mod_wsgi list. Except as pointed out that 2 suggests I should never pass on content from iterable for HEAD, where in practice I still have to if there are output filters. Pardon me if I am not understanding very well, I did not get much sleep last night because of baby and my head hurts. :-( Not your (or your daughter's) fault; I wrote something different from what I meant. I hope tonight is easier on you. Good luck! Regards, Brian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware
On 25/01/2008, Brian Smith [EMAIL PROTECTED] wrote: Graham Dumpleton wrote: If Apache mod_wsgi (the WSGI gateway) does then do this, ie., didn't iterate through the iterable and therefore didn't return the content through to Apache, it would as explained cause traditional Apache output filters to potentially yield incorrect results. This is what I am highlighting. So Apache mod_wsgi couldn't avoid processing the iterable, unless as you allude to with how internals of how Apache is used to implement wsgi.file_wrapper support, that mod_wsgi similarly detected when no Apache output filters are registered that could add additional headers and skip the processing. Right, my idea was that mod_wsgi could implement a new bucket type, where the iteration is done if and only if some output filter reads from the bucket. But, if no output filters read from the bucket, then the iteration would never happen. Unfortunately as I think I mentioned on mod_wsgi list previously, that may not be trivial. :-) Pardon me if I am not understanding very well, I did not get much sleep last night because of baby and my head hurts. :-( Not your (or your daughter's) fault; I wrote something different from what I meant. Okay, clearer now. I hope tonight is easier on you. Good luck! I hope so too. Am going home early now, but the boss will probably not allow me to read email for a couple of days until I am fully recovered, so you'll probably not hear from me more on this issue. I certainly understand what you are saying and the potential need for it, so will be interesting to see what final consensus is. Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com