[Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME

2008-01-24 Thread Manlio Perillo
Hi.

I have implemented the wsgiorg.routing_args specification, using the 
code in the example.

However I have a problem, and I can't see a good solution.

Suppose that an application is mounted (embedded in a web server) at 
location /example.

The application script executed by the server simply setups the 
routings, database connections and so.

Let's suppose that the request uri is /example/login/.

For the main application, SCRIPT_NAME is /example.
For the application at /login, SCRIPT_NAME is /example/login.


My problem is that I want, in the page generated by login application, 
return an anchor in the form /example/logout/.

The usual solution is to do environ['SCRIPT_NAME'] + '/logout', but this 
will return /example/login/logout/, and not /example/logout/.

This seems to be not possible with the current specifications, since the 
original SCRIPT_NAME is lost.

What is the best solution?

1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a
list.

This means that the request uri recostruction must be changed:
SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path)

2) Store a wsgiorg.original_script_name, with the value seen by the
routing application.

3) Simply don't change SCRIPT_NAME and PATH_INFO.
However I usually need the updated PATH_INFO.



Thanks  Manlio Perillo
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME

2008-01-24 Thread Phillip J. Eby
At 03:22 PM 1/24/2008 +0100, Manlio Perillo wrote:
Let's suppose that the request uri is /example/login/.

For the main application, SCRIPT_NAME is /example.
For the application at /login, SCRIPT_NAME is /example/login.

My problem is that I want, in the page generated by login application,
return an anchor in the form /example/logout/.

The usual solution is to do environ['SCRIPT_NAME'] + '/logout', but this
will return /example/login/logout/, and not /example/logout/.

This seems to be not possible with the current specifications, since the
original SCRIPT_NAME is lost.

What is the best solution?

1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a
 list.

 This means that the request uri recostruction must be changed:
 SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path)

2) Store a wsgiorg.original_script_name, with the value seen by the
 routing application.

3) Simply don't change SCRIPT_NAME and PATH_INFO.
 However I usually need the updated PATH_INFO.

4) Use a relative link, with href=logout.

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME

2008-01-24 Thread Manlio Perillo
Phillip J. Eby ha scritto:
 At 03:22 PM 1/24/2008 +0100, Manlio Perillo wrote:
 Let's suppose that the request uri is /example/login/.

 For the main application, SCRIPT_NAME is /example.
 For the application at /login, SCRIPT_NAME is /example/login.

 My problem is that I want, in the page generated by login application,
 return an anchor in the form /example/logout/.

 The usual solution is to do environ['SCRIPT_NAME'] + '/logout', but this
 will return /example/login/logout/, and not /example/logout/.

 This seems to be not possible with the current specifications, since the
 original SCRIPT_NAME is lost.

 What is the best solution?

 1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a
 list.

 This means that the request uri recostruction must be changed:
 SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path)

 2) Store a wsgiorg.original_script_name, with the value seen by the
 routing application.

 3) Simply don't change SCRIPT_NAME and PATH_INFO.
 However I usually need the updated PATH_INFO.
 
 4) Use a relative link, with href=logout.
 

But since the base url is /example/login/, this relative link is 
resolved to /example/login/logout/.




Manlio Perillo

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME

2008-01-24 Thread Sven Berkvens-Matthijsse
 Phillip J. Eby ha scritto:
  At 03:22 PM 1/24/2008 +0100, Manlio Perillo wrote:
  Let's suppose that the request uri is /example/login/.
 
  For the main application, SCRIPT_NAME is /example.
  For the application at /login, SCRIPT_NAME is /example/login.
 
  My problem is that I want, in the page generated by login application,
  return an anchor in the form /example/logout/.
 
  The usual solution is to do environ['SCRIPT_NAME'] + '/logout', but this
  will return /example/login/logout/, and not /example/logout/.
 
  This seems to be not possible with the current specifications, since the
  original SCRIPT_NAME is lost.
 
  What is the best solution?
 
  1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a
  list.
 
  This means that the request uri recostruction must be changed:
  SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path)
 
  2) Store a wsgiorg.original_script_name, with the value seen by the
  routing application.
 
  3) Simply don't change SCRIPT_NAME and PATH_INFO.
  However I usually need the updated PATH_INFO.
  
  4) Use a relative link, with href=logout.
  
 
 But since the base url is /example/login/, this relative link is 
 resolved to /example/login/logout/.

In that case, use href=../logout/.

 Manlio Perillo

-- 
Sven
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME

2008-01-24 Thread Manlio Perillo
Sven Berkvens-Matthijsse ha scritto:
 Phillip J. Eby ha scritto:
 At 03:22 PM 1/24/2008 +0100, Manlio Perillo wrote:
 Let's suppose that the request uri is /example/login/.

 For the main application, SCRIPT_NAME is /example.
 For the application at /login, SCRIPT_NAME is /example/login.

 My problem is that I want, in the page generated by login application,
 return an anchor in the form /example/logout/.

 The usual solution is to do environ['SCRIPT_NAME'] + '/logout', but this
 will return /example/login/logout/, and not /example/logout/.

 This seems to be not possible with the current specifications, since the
 original SCRIPT_NAME is lost.

 What is the best solution?

 1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a
 list.

 This means that the request uri recostruction must be changed:
 SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path)

 2) Store a wsgiorg.original_script_name, with the value seen by the
 routing application.

 3) Simply don't change SCRIPT_NAME and PATH_INFO.
 However I usually need the updated PATH_INFO.
 4) Use a relative link, with href=logout.

 But since the base url is /example/login/, this relative link is 
 resolved to /example/login/logout/.
 
 In that case, use href=../logout/.
 


I would not call this a solution!
It's only a workaround.



Manlio Perillo


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] URL quoting in WSGI (or the lack therof)

2008-01-24 Thread Brian Smith
Ian Bicking wrote:

 We encountered it with GData too, as it uses URLs like 
 /{http:%2f%2fexample.com}term/.  But if you balance the {}'s 
 you can parse it out.

Unquoted curly braces are illegal in any kind of URI or IRI. Does GData
really require them to be unquoted?

- Brian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Brian Smith
My application correctly responds to HEAD requests as-is. However, it doesn't 
work with middleware that sets headers based on the content of the response 
body.

For example, a gateway or middleware that sets ETag based on an checksum, 
Content-Encoding, Content-Length and/or Content-MD5 will all result in wrong 
results by default. Right now, my applications assume that any such gateway or 
the first such middleware will change environ[REQUEST_METHOD] from HEAD to 
GET before the application is invoked, and discard the response body that the 
application generates. 

However, many gateways and middleware do not do this, and PEP 333 doesn't have 
anything to say about it. As a result, a 100% WSGI 1.0-compliant application is 
not portable between gateways.

I suggest that a revision of PEP 333 should require the following behavior:

1. WSGI gateways must always set environ[REQUEST_METHOD] to GET for HEAD 
requests. Middleware and applications will not be able to detect the difference 
between GET and HEAD requests.

2. For a HEAD request, A WSGI gateway must not iterate through the response 
iterable, but it must call the response iterable's close() method, if any. It 
must not send any output that was written via start_response(...).write() 
either. Consequently, WSGI applications must work correctly, and must not leak 
resources, when their output is not iterated; an application should not signal 
or log an error if the iterable's close() method is invoked without any 
iteration taking place.

Please add this issue to http://wsgi.org/wsgi/WSGI_2.0.

Regards,
Brian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Chris McDonough
I have applications that do detect the difference between a GET and a HEAD 
(they 
do slightly less work if the request is a HEAD request), so I suspect this is 
not a totally reasonable thing to add to the spec.  Maybe instead the 
middleware 
that does what you're describing should be changed instead to deal with HEAD 
requests.

In general, I don't think is (or should be) any guarantee that an arbitrary 
middleware stack will work with an arbitrary application.  Although that would 
be nice in theory, I suspect it would require a very complex protocol (more 
complex than what WSGI requires now).

- C

Brian Smith wrote:
 My application correctly responds to HEAD requests as-is. However, it doesn't 
 work with middleware that sets headers based on the content of the response 
 body.
 
 For example, a gateway or middleware that sets ETag based on an checksum, 
 Content-Encoding, Content-Length and/or Content-MD5 will all result in wrong 
 results by default. Right now, my applications assume that any such gateway 
 or the first such middleware will change environ[REQUEST_METHOD] from 
 HEAD to GET before the application is invoked, and discard the response 
 body that the application generates. 
 
 However, many gateways and middleware do not do this, and PEP 333 doesn't 
 have anything to say about it. As a result, a 100% WSGI 1.0-compliant 
 application is not portable between gateways.
 
 I suggest that a revision of PEP 333 should require the following behavior:
 
 1. WSGI gateways must always set environ[REQUEST_METHOD] to GET for HEAD 
 requests. Middleware and applications will not be able to detect the 
 difference between GET and HEAD requests.
 
 2. For a HEAD request, A WSGI gateway must not iterate through the response 
 iterable, but it must call the response iterable's close() method, if any. It 
 must not send any output that was written via start_response(...).write() 
 either. Consequently, WSGI applications must work correctly, and must not 
 leak resources, when their output is not iterated; an application should not 
 signal or log an error if the iterable's close() method is invoked without 
 any iteration taking place.
 
 Please add this issue to http://wsgi.org/wsgi/WSGI_2.0.
 
 Regards,
 Brian
 
 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe: http://mail.python.org/mailman/options/web-sig/chrism%40plope.com
 

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Manlio Perillo
Brian Smith ha scritto:
 My application correctly responds to HEAD requests as-is. However, it doesn't 
 work with middleware that sets headers based on the content of the response 
 body.
 
 For example, a gateway or middleware that sets ETag based on an checksum, 
 Content-Encoding, Content-Length and/or Content-MD5 will all result in wrong 
 results by default. Right now, my applications assume that any such gateway 
 or the first such middleware will change environ[REQUEST_METHOD] from 
 HEAD to GET before the application is invoked, and discard the response 
 body that the application generates. 
 
 However, many gateways and middleware do not do this, and PEP 333 doesn't 
 have anything to say about it. As a result, a 100% WSGI 1.0-compliant 
 application is not portable between gateways.
 
 I suggest that a revision of PEP 333 should require the following behavior:
 
 1. WSGI gateways must always set environ[REQUEST_METHOD] to GET for HEAD 
 requests. Middleware and applications will not be able to detect the 
 difference between GET and HEAD requests.
 

-1.

 2. For a HEAD request, A WSGI gateway must not iterate through the response 
 iterable, but it must call the response iterable's close() method, if any. It 
 must not send any output that was written via start_response(...).write() 
 either. Consequently, WSGI applications must work correctly, and must not 
 leak resources, when their output is not iterated; an application should not 
 signal or log an error if the iterable's close() method is invoked without 
 any iteration taking place.
 

This is done in the WSGI implementation for Nginx, as an example; and 
some time ago there was a discussion about this.

Moreover, if the response iterable is a generator, no iteration (and 
content generation) is done.

 Please add this issue to http://wsgi.org/wsgi/WSGI_2.0.
 
 Regards,
 Brian
 



Regards  Manlio Perillo
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] URL quoting in WSGI (or the lack therof)

2008-01-24 Thread Ian Bicking
Brian Smith wrote:
 Ian Bicking wrote:
 
 We encountered it with GData too, as it uses URLs like 
 /{http:%2f%2fexample.com}term/.  But if you balance the {}'s 
 you can parse it out.
 
 Unquoted curly braces are illegal in any kind of URI or IRI. Does GData
 really require them to be unquoted?

No, quoted is fine.  Of course parsing PATH_INFO I couldn't tell anyway ;)

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Ian Bicking
Brian Smith wrote:
 My application correctly responds to HEAD requests as-is. However, it
 doesn't work with middleware that sets headers based on the content
 of the response body.
 
 For example, a gateway or middleware that sets ETag based on an
 checksum, Content-Encoding, Content-Length and/or Content-MD5 will
 all result in wrong results by default. Right now, my applications
 assume that any such gateway or the first such middleware will change
 environ[REQUEST_METHOD] from HEAD to GET before the application
 is invoked, and discard the response body that the application
 generates.

Then the middleware is just wrong.  It shouldn't overwrite ETag values 
generated by the application, and if it is set to generate ETags from 
hashes of the content then it should change HEAD to GET.

 However, many gateways and middleware do not do this, and PEP 333
 doesn't have anything to say about it. As a result, a 100% WSGI
 1.0-compliant application is not portable between gateways.

Nothing in WSGI says that all middleware is sensible or correct.  In 
this case it just seems like there's a bad middleware involved that 
isn't respecting basic HTTP semantics.  WSGI doesn't specify HTTP 
semantics but of course they are a basic foundation for any kind of 
interaction, and it's assumed they'll be respected.

   Ian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] wsgiorg.routing_args and original SCRIPT_NAME

2008-01-24 Thread Ian Bicking
Manlio Perillo wrote:
 I have implemented the wsgiorg.routing_args specification, using the 
 code in the example.
 
 However I have a problem, and I can't see a good solution.
 
 Suppose that an application is mounted (embedded in a web server) at 
 location /example.
 
 The application script executed by the server simply setups the 
 routings, database connections and so.
 
 Let's suppose that the request uri is /example/login/.
 
 For the main application, SCRIPT_NAME is /example.
 For the application at /login, SCRIPT_NAME is /example/login.
 
 
 My problem is that I want, in the page generated by login application, 
 return an anchor in the form /example/logout/.
 
 The usual solution is to do environ['SCRIPT_NAME'] + '/logout', but this 
 will return /example/login/logout/, and not /example/logout/.
 
 This seems to be not possible with the current specifications, since the 
 original SCRIPT_NAME is lost.
 
 What is the best solution?
 
 1) Do not change SCRIPT_NAME, and instead add a wsgiorg.consumed_path, a
 list.
 
 This means that the request uri recostruction must be changed:
 SCRIPT_NAME = SCRIPT_NAME + '/'.join(wsgiorg.consumed_path)

I suppose you could leave stuff on PATH_INFO.  But that doesn't seem to 
fit with the idea of PATH_INFO.  Also, will it be strictly 
SCRIPT_NAME/consumed_path/PATH_INFO, or could it be 
SCRIPT_NAME/consumed_path/some_other_parsing/consumed_path/PATH_INFO -- 
after all, there's cases where stuff gets pushed from PATH_INFO to 
SCRIPT_NAME, and if consumed_path is in between, which one do you push 
stuff to?

 2) Store a wsgiorg.original_script_name, with the value seen by the
 routing application.

I guess I usually do something like this, typically storing 
myapp.base_path for use when I am generation application-absolute URLs 
(like /logout).  Then at the first chance (before running any kind of 
routing) I do environ['myapp.base_path'] = environ['SCRIPT_NAME'].

This ad hoc technique works fine, but is very ad hoc.  I'm not sure what 
the best way to handle this is, really.  I'm not sure there's a singular 
root for an entire request, if you are nesting applications, so a single 
key (wsgiorg.original_script_name) doesn't seem quite right.

I can't remember what Routes does for URL generation.  Maybe it leaves 
SCRIPT_NAME alone?  I think so.

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Graham Dumpleton
On 25/01/2008, Brian Smith [EMAIL PROTECTED] wrote:
 1. WSGI gateways must always set environ[REQUEST_METHOD] to GET for HEAD 
 requests. Middleware and applications will not be able to detect the 
 difference between GET and HEAD requests.

 2. For a HEAD request, A WSGI gateway must not iterate through the response 
 iterable, but it must call the response iterable's close() method, if any. It 
 must not send any output that was written via start_response(...).write() 
 either. Consequently, WSGI applications must work correctly, and must not 
 leak resources, when their output is not iterated; an application should not 
 signal or log an error if the iterable's close() method is invoked without 
 any iteration taking place.

 Please add this issue to http://wsgi.org/wsgi/WSGI_2.0.

This would go against how things are done with Apache and could cause
Apache to generate incorrect response headers for a HEAD request.

The issue here is that Apache has its own output filtering system
where filters can set headers based on the actual content. Because of
this, any output filter must always receive the response content
regardless of whether the request is a GET or HEAD. If an application
handler tries to optimise things and not return the content, then
these output filters may generate different headers for a HEAD request
than a GET request, thereby violating the requirement that they should
actually be the same.

Note that response content is still thrown away for a HEAD request, it
is just done at the very last moment after all Apache output filters
have processed the data.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Brian Smith
Chris McDonough wrote:
 I have applications that do detect the difference between a 
 GET and a HEAD (they do slightly less work if the request is 
 a HEAD request), so I suspect this is not a totally 
 reasonable thing to add to the spec.

Yes, of course. In order to avoid doing unnecessary work for a HEAD
request, the extra work needs to be transferred to the response
iterable; for a HEAD request, the gateway would skip the iterable except
for its close() method, and so all the extra work is skipped as well.

 Maybe instead the middleware that does what you're describing should
be changed 
 instead to deal with HEAD requests.

I agree. But, this problem is often overlooked by middleware, which
indicates that we at least need an explanation of the problem in the
specification. But, when the middleware are corrected, then applications
like yours will only work efficiently if they transfer the extra work
they do for GET (vs. HEAD) requests to the response iterable.

 In general, I don't think is (or should be) any guarantee 
 that an arbitrary middleware stack will work with an 
 arbitrary application.  Although that would be nice in 
 theory, I suspect it would require a very complex protocol 
 (more complex than what WSGI requires now).

That is exactly what WSGI is designed for. There is no point to having a
standard if there is no interoperability amongst compliant components.

- Brian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Brian Smith
Graham Dumpleton wrote:
 The issue here is that Apache has its own output filtering 
 system where filters can set headers based on the actual 
 content. Because of this, any output filter must always 
 receive the response content regardless of whether the 
 request is a GET or HEAD. If an application handler tries to 
 optimise things and not return the content, then these output 
 filters may generate different headers for a HEAD request 
 than a GET request, thereby violating the requirement that 
 they should actually be the same.
 
 Note that response content is still thrown away for a HEAD 
 request, it is just done at the very last moment after all 
 Apache output filters have processed the data.

Right, that is exactly what I am saying. In Apache's documentation, it
says that every handler should include the response entity for HEAD
requests, so that output filters can process the output. However, there
is nothing in PEP 333 that talks about this behavior. So, the only
reasonable thing to do is to assume that, when environ[REQUEST_METHOD]
== HEAD, no response entity should be generated. Do we all agree that
the following application is correct?:

def application(env, start_response):
start_response(200 OK,
[(Content-Length, 1)])
if env[REQUEST_METHOD] == HEAD:
return []
else:
return [a*1]

Because of web servers' output filters, if the WSGI gateway is an web
server module or a [Fast]CGI script, then it needs to lie and tell the
application that the request is a GET, not a HEAD. Otherwise, the
application will see that the request method is HEAD and suppress its
own response entity, as the HTTP specification requires, and the output
filters will fail. The only time it is reasonable for the gateway to
pass HEAD as the request method is when it knows that there are not
any output filters/middleware that depend on the response entity.
Usually that is only possible in standalone web servers like CherryPy's
or Paste's.

I tested this in mod_wsgi and mod_wsgi gets it wrong. mod_wsgi sets
env[REQUEST_METHOD] to HEAD for HEAD requests. When mod_deflate is
enabled, a HEAD request returns Content-Length: 20, and a GET request
returns Content-Length: 46. However, it is supposed to be
Content-Length: 46 in both cases. The CGI WSGI gateway in PEP 333 gets
it wrong too when mod_deflate is used.

Note also that in mod_wsgi, use of wsgi.file_wrapper is a huge
optimization for this: if no Apache output filters need the response
entity, and wsgi.file_wrapper is used, then the file will never be read
off the disk. But, if wsgi.file_wrapper is not used, then the entire
file has to be read off the disk through the application's output
iterable for no reason. It would be nice if the non-file_wrapper case
worked as well as the file_wrapper case.

If you put all this together, you end up with the rules that I outlined
in my previous message:

 1. WSGI gateways must always set environ[REQUEST_METHOD] to
GET for HEAD requests. Middleware and applications will
not be able to detect the difference between GET and HEAD 
requests.

 2. For a HEAD request, A WSGI gateway must not iterate
through the response iterable, but it must call the
response iterable's close() method, if any. It must not
send any output that was written via
start_response(...).write() either. Consequently,
WSGI applications must work correctly, and must not
leak resources, when their output is not iterated;
an application should not signal or log an error if
the iterable's close() method is invoked without any
iteration taking place.

- Brian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Ian Bicking
Brian Smith wrote:
 Graham Dumpleton wrote:
 The issue here is that Apache has its own output filtering 
 system where filters can set headers based on the actual 
 content. Because of this, any output filter must always 
 receive the response content regardless of whether the 
 request is a GET or HEAD. If an application handler tries to 
 optimise things and not return the content, then these output 
 filters may generate different headers for a HEAD request 
 than a GET request, thereby violating the requirement that 
 they should actually be the same.

 Note that response content is still thrown away for a HEAD 
 request, it is just done at the very last moment after all 
 Apache output filters have processed the data.
 
 Right, that is exactly what I am saying. In Apache's documentation, it
 says that every handler should include the response entity for HEAD
 requests, so that output filters can process the output. However, there
 is nothing in PEP 333 that talks about this behavior. 

Unlike Apache there are no output filters in WSGI; all middleware gets 
to adjust the request as well as the response.  So middleware that can't 
handle a real HEAD request has an opportunity to turn it into a GET 
request.  I don't see why PEP 333 needs to talk about this, to me it 
seems straight forward enough in a WSGI context, and PEP 333 can't cover 
every possible bug someone might introduce into their middleware.

   Ian
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Brian Smith
Graham Dumpleton wrote:
 To quote, in 2 you said:
 
 For a HEAD request, A WSGI gateway must not iterate 
 through the response iterable
 
 I was presuming that this was saying that the WSGI gateway 
 should do this as well as changing the REQUEST_METHOD 
 actually sent to the WSGI application to GET.

I misstated it. It should be For a HEAD request, A WSGI gateway *may*
skip iterating through the response iterable. That is, if the gateway
can detect that the request entity isn't going to change the final set
of headers in any way, it can skip the iteration.

 If Apache mod_wsgi (the WSGI gateway) does then do this, ie., 
 didn't iterate through the iterable and therefore didn't 
 return the content through to Apache, it would as explained 
 cause traditional Apache output filters to potentially yield 
 incorrect results. This is what I am highlighting.
 
 So Apache mod_wsgi couldn't avoid processing the iterable, 
 unless as you allude to with how internals of how Apache is 
 used to implement wsgi.file_wrapper support, that mod_wsgi 
 similarly detected when no Apache output filters are 
 registered that could add additional headers and skip the processing.

Right, my idea was that mod_wsgi could implement a new bucket type,
where the iteration is done if and only if some output filter reads from
the bucket. But, if no output filters read from the bucket, then the
iteration would never happen.

  def application(env, start_response):
  start_response(200 OK,
  [(Content-Length, 1)])
  if env[REQUEST_METHOD] == HEAD:
  return []
  else:
  return [a*1]
 
  I tested this in mod_wsgi and mod_wsgi gets it wrong. mod_wsgi sets 
  env[REQUEST_METHOD] to HEAD for HEAD requests.
 
 It just passes whatever Apache sets up as the CGI environment.
 
  When mod_deflate is
  enabled, a HEAD request returns Content-Length: 20, and a GET 
  request returns Content-Length: 46. However, it is supposed to be
  Content-Length: 46 in both cases.
 
 Is this with your sample application which detects HEAD and 
 doesn't return anything if it is found. In other words, it is 
 driven by what your application is actually returning?

Yes, these results are from the program above. Those 10,000 A's compress
down to 26 bytes, plus the 20 byte header. For the HEAD case,
mod_deflate compresses 0 bytes to 0 bytes and adds a 20 byte header.

  Note also that in mod_wsgi, use of wsgi.file_wrapper is a huge 
  optimization for this: if no Apache output filters need the 
  response entity, and wsgi.file_wrapper is used, then the file
  will never be read off the disk.
 
 Hmmm, I didn't actually look under the covers of what Apache 
 did when I used its file bucket for that. Worked out better 
 than I expected then. :-)

I will double-check, but I believe that in the embedded mode, the file
never gets read at all, when there are no output filters processing the
output. I will bring it up on the mod_wsgi list. 

 Except as pointed out that 2 suggests I should never pass on 
 content from iterable for HEAD, where in practice I still 
 have to if there are output filters.

 Pardon me if I am not understanding very well, I did not get 
 much sleep last night because of baby and my head hurts. :-(

Not your (or your daughter's) fault; I wrote something different from
what I meant. I hope tonight is easier on you. Good luck!

Regards,
Brian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HEAD requests, WSGI gateways, and middleware

2008-01-24 Thread Graham Dumpleton
On 25/01/2008, Brian Smith [EMAIL PROTECTED] wrote:
 Graham Dumpleton wrote:
  If Apache mod_wsgi (the WSGI gateway) does then do this, ie.,
  didn't iterate through the iterable and therefore didn't
  return the content through to Apache, it would as explained
  cause traditional Apache output filters to potentially yield
  incorrect results. This is what I am highlighting.
 
  So Apache mod_wsgi couldn't avoid processing the iterable,
  unless as you allude to with how internals of how Apache is
  used to implement wsgi.file_wrapper support, that mod_wsgi
  similarly detected when no Apache output filters are
  registered that could add additional headers and skip the processing.

 Right, my idea was that mod_wsgi could implement a new bucket type,
 where the iteration is done if and only if some output filter reads from
 the bucket. But, if no output filters read from the bucket, then the
 iteration would never happen.

Unfortunately as I think I mentioned on mod_wsgi list previously, that
may not be trivial. :-)

  Pardon me if I am not understanding very well, I did not get
  much sleep last night because of baby and my head hurts. :-(

 Not your (or your daughter's) fault; I wrote something different from
 what I meant.

Okay, clearer now.

 I hope tonight is easier on you. Good luck!

I hope so too. Am going home early now, but the boss will probably not
allow me to read email for a couple of days until I am fully
recovered, so you'll probably not hear from me more on this issue. I
certainly understand what you are saying and the potential need for
it, so will be interesting to see what final consensus is.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com