[Web-SIG] Prototype of wsgi.input.readline().

2008-01-30 Thread Graham Dumpleton
As I think we all know, no one implements readline() for wsgi.input as
defined in the WSGI specification. The reason for this is that stuff
like cgi.FieldStorage would refuse to work and would just generate an
exception. This is because cgi.FieldStorage expects to pass an
argument to readline().

So, although this is linked in the issues list for possible amendments
to WSGI specification, there hasn't that I recall been a discussion on
how readline() would be defined in any amendment or future version.

In particular, would the specification be changed to either:

1. readline(size) where size argument is mandatory, or:

2. readline(size=-1) where size argument is optional.

If the size argument is made mandatory, then it would parallel how
read() function is defined, but this in itself would mean
cgi.FieldStorage would break.

This is because cgi.FieldStorage actually calls readline() with no
argument as well as an argument in different places in the code.

If we allow the argument to be optional however, we run into the same
portability problems that would exist with some WSGI adapters which do
not simulate EOF on input when all request content is read.

Specifically, if user code calls readline() with no argument but the
last line of the file wasn't terminated with a EOL, then it would
hang.

As it is, cgi.FieldStorage only works on systems which do not simulate
EOF because the content format it is decoding has its own concept of
end of stream marker and cgi.FieldStorage implementation specifically
looks for that. The cgi.FieldStorage implementation certainly doesn't
track how much input it has read in and progressively change the size
argument to readline() on that basis.

Any other code which uses readline() with no argument would similarly
have to depend on some concept of an end of stream marker in the
content, because one can't rely on getting an empty string when input
is exhausted,

In some respects this highlights the inconsistency of the read()
argument not being optional. This is because one of the reasons for
not allowing read() argument to be optional is that it would be
problematical for implementations that do not simulate EOF, yet the
same issue exists with readline() and an optional argument has to be
allowed for that because of how cgi.FieldStorage is implemented.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Prototype of wsgi.input.readline().

2008-01-30 Thread Chris McDonough
Graham Dumpleton wrote:
 As I think we all know, no one implements readline() for wsgi.input as
 defined in the WSGI specification. The reason for this is that stuff
 like cgi.FieldStorage would refuse to work and would just generate an
 exception. This is because cgi.FieldStorage expects to pass an
 argument to readline().

I haven't been keeping up on the issues this has caused wrt WSGI, but note that 
the reason that cgi.FieldStorage passes a size argument to readline is in order 
to prevent memory exhaustion when reading files that don't have any linebreaks 
(denial of service).  See http://bugs.python.org/issue1112549 .

 
 So, although this is linked in the issues list for possible amendments
 to WSGI specification, there hasn't that I recall been a discussion on
 how readline() would be defined in any amendment or future version.
 
 In particular, would the specification be changed to either:
 
 1. readline(size) where size argument is mandatory, or:
 
 2. readline(size=-1) where size argument is optional.
 
 If the size argument is made mandatory, then it would parallel how
 read() function is defined, but this in itself would mean
 cgi.FieldStorage would break.
 
 This is because cgi.FieldStorage actually calls readline() with no
 argument as well as an argument in different places in the code.

cgi.FieldStorage doesn't call readline() without an argument. 
cgi.parse_multipart does, but this function is not used by cgi.FieldStorage.  I 
don't know if this changes anything.

- C

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HTTP 1.1 Expect/Continue handling

2008-01-30 Thread Brian Smith
Graham Dumpleton wrote:
 On 29/01/2008, James Y Knight [EMAIL PROTECTED] wrote:
 a) One is to clarify this as a requirement upon the WSGI gateway.
  Something like the following:
  If the client requests Expect: 100-continue, and the application 
  yields data before reading from the input, and the response 
  code is a success (2xx) code, then the gateway MUST send a
  100 continue response, before writing any other response headers
  in order to comply with RFC 2616 ยง8.2.3 and to allow the WSGI
  application to read from the input stream later on in request
  processing.

This requirement is goes too far. I think the part of the specification
that says the server most not perform the requested operation is
over-reaching. It fails to consider the case where the server can
successfully perform the operation without reading the request body. For
example, consider a TOUCH method that updates the ETag and Last-Modified
date of a resource. Or, a DELETE (a DELETE request shouldn't have a
request body, but should the server really be required to check for one
and refuse to delete the resource if it finds one?).

The WSGI gateway MAY send a 100 continue response in this situation, but
it shouldn't be required to. If the application wants the stricter
semantics then it should be coded to handle it.

  This should handle most real-world cases. Now, only sending 
  100 when the response code is 2xx may be potentially a bit
  fragile, and won't help e.g. your dummy app above.
  (maybe some real app really did want the input data even
  for an error response too?). But, on the other hand, you
  really *don't* want to force the transmission of a 100 
  continue when the server is sending e.g. a 400 Bad 
  Request response and likely won't ever read input data.

Exactly, if you always send 100 continue then you defeat the purpose of
it entirely. I would like to see the specification revised so that it is
obvious that my example program is invalid when a Expect: 100 continue
response header is present.

  b) Alternatively, the WSGI gateway could raise an exception 
  when you attempt to respond with a success code without having
  read the input.

For the same reasons I mentioned above, this is too strict. 

  c) Another option is to clarify this as a requirement for a WSGI
  application: An application must not read from wsgi.input after 
  yielding its first non-empty string unless it has already read from 
  wsgi.input before having yielded its first non-empty string.

This is the requirement that I want to see. But, I prefer to have it
qualified with when environ['HTTP_EXPECT'] contains the '100-continue'
token.

  (environ[wsgi.input].read(0) may be used to indicate the 
  desire to read the input in the future and satisfy this
  requirement, without actually reading any data.)

Nice in theory, but if the specification is going to change to support
this, I would rather see the specification change to allow the
application to generate its own 100 continue response.

 A clarification in the specification may be required to the 
 extent of saying that where a zero length read is done, that 
 no WSGI middleware which wraps wsgi.input, nor even the WSGI 
 adapter itself may optimise it away. In other words a zero 
 length read must always be passed through unless specifically 
 not appropriate for what the WSGI middleware is doing.

 This would be required to ensure that zero length read always 
 propagates down to the web server layer itself such that it 
 may trigger the 100-continue.

The statement An application must not read from wsgi.input after...
would already apply to middleware, because middleware are applications.
If the middleware causes no response data to be read, it should not be
required to cause a 100 continue to be sent.

- Brian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] Prohibiting reading from wsgi.input in an application iterable's close method

2008-01-30 Thread Brian Smith
I would like to see the following requirement added to the WSGI specification:

An application may only methods on environ[wsgi.input] before it returns its 
response iterable, or from within an execution of its iterable's next() method. 
In particular, the application iterable's close() method, MUST NOT read from 
wsgi.input.

Thoughts?

- Brian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Prototype of wsgi.input.readline().

2008-01-30 Thread Graham Dumpleton
On 31/01/2008, Chris McDonough [EMAIL PROTECTED] wrote:
 Graham Dumpleton wrote:
  As I think we all know, no one implements readline() for wsgi.input as
  defined in the WSGI specification. The reason for this is that stuff
  like cgi.FieldStorage would refuse to work and would just generate an
  exception. This is because cgi.FieldStorage expects to pass an
  argument to readline().

 I haven't been keeping up on the issues this has caused wrt WSGI, but note 
 that
 the reason that cgi.FieldStorage passes a size argument to readline is in 
 order
 to prevent memory exhaustion when reading files that don't have any linebreaks
 (denial of service).  See http://bugs.python.org/issue1112549 .

The interesting comment in that bug is:

The input data
is not required by the RFC 822/1521/1522/1867
specifications to contain any newline characters.

If that can occur, then a WSGI adapter which didn't simulate EOF would
fail in that the read would block and never return. All the more
reason that simulating EOF needs to be mandatory.

  So, although this is linked in the issues list for possible amendments
  to WSGI specification, there hasn't that I recall been a discussion on
  how readline() would be defined in any amendment or future version.
 
  In particular, would the specification be changed to either:
 
  1. readline(size) where size argument is mandatory, or:
 
  2. readline(size=-1) where size argument is optional.
 
  If the size argument is made mandatory, then it would parallel how
  read() function is defined, but this in itself would mean
  cgi.FieldStorage would break.
 
  This is because cgi.FieldStorage actually calls readline() with no
  argument as well as an argument in different places in the code.

 cgi.FieldStorage doesn't call readline() without an argument.
 cgi.parse_multipart does, but this function is not used by cgi.FieldStorage.  
 I
 don't know if this changes anything.

Not really, I should have said 'cgi' module as a whole rather than
specifically cgi.FieldStorage. Given that people might be using
cgi.parse_multipart in standard CGI, there would probably still be an
expectation that it worked for WSGI. We can't really say that you can
use cgi.FieldStorage but not cgi.parse_multipart. People will just
expect all the normal tools people would use for this to work.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Prototype of wsgi.input.readline().

2008-01-30 Thread Chris McDonough
Graham Dumpleton wrote:
 

 If the size argument is made mandatory, then it would parallel how
 read() function is defined, but this in itself would mean
 cgi.FieldStorage would break.

 This is because cgi.FieldStorage actually calls readline() with no
 argument as well as an argument in different places in the code.
 cgi.FieldStorage doesn't call readline() without an argument.
 cgi.parse_multipart does, but this function is not used by cgi.FieldStorage. 
  I
 don't know if this changes anything.
 
 Not really, I should have said 'cgi' module as a whole rather than
 specifically cgi.FieldStorage. Given that people might be using
 cgi.parse_multipart in standard CGI, there would probably still be an
 expectation that it worked for WSGI. We can't really say that you can
 use cgi.FieldStorage but not cgi.parse_multipart. People will just
 expect all the normal tools people would use for this to work.

Personally, I think parse_multipart should go away.  It's not suitable for 
anything but toy usage.

If people use it, and they expose their site to the world, arbitrary anonymous 
visitors can cause their Python's process size to grow to arbitrarily.  I don't 
think any existing well-known framework uses it, for this very reason.

If it can't go away, and there's a problem due to the non-parity between 
parse_multipart's use and FieldStorage's use, I suspect the right answer is to 
change cgi.parse_multipart to pass in a size value for readline too.  I 
probably 
should have done that when I made the patch. :-(

- C
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Prototype of wsgi.input.readline().

2008-01-30 Thread Graham Dumpleton
On 31/01/2008, Chris McDonough [EMAIL PROTECTED] wrote:
 Graham Dumpleton wrote:
 
 
  If the size argument is made mandatory, then it would parallel how
  read() function is defined, but this in itself would mean
  cgi.FieldStorage would break.
 
  This is because cgi.FieldStorage actually calls readline() with no
  argument as well as an argument in different places in the code.
  cgi.FieldStorage doesn't call readline() without an argument.
  cgi.parse_multipart does, but this function is not used by 
  cgi.FieldStorage.  I
  don't know if this changes anything.
 
  Not really, I should have said 'cgi' module as a whole rather than
  specifically cgi.FieldStorage. Given that people might be using
  cgi.parse_multipart in standard CGI, there would probably still be an
  expectation that it worked for WSGI. We can't really say that you can
  use cgi.FieldStorage but not cgi.parse_multipart. People will just
  expect all the normal tools people would use for this to work.

 Personally, I think parse_multipart should go away.  It's not suitable for
 anything but toy usage.

Not necessarily. Someone may see it as a trade off. The code itself says:

This is easy to use but not
much good if you are expecting megabytes to be uploaded -- in that case,
use the FieldStorage class instead which is much more flexible.

So comment implies it is easier to use and so some may think it is
simpler for what they are doing if they are only dealing with small
requests.

Of course, it would probably be prudent if you know your requests are
always going to be small to use LimitRequestBody in Apache, or a
specific check on content length if handled in Python code, to block
someone sending over sized requests intentionally to try and break
things. Provided you did this, may be quite reasonable to use it in
specific circumstances.

 If people use it, and they expose their site to the world, arbitrary anonymous
 visitors can cause their Python's process size to grow to arbitrarily.  I 
 don't
 think any existing well-known framework uses it, for this very reason.

 If it can't go away, and there's a problem due to the non-parity between
 parse_multipart's use and FieldStorage's use, I suspect the right answer is to
 change cgi.parse_multipart to pass in a size value for readline too.  I 
 probably
 should have done that when I made the patch. :-(

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Prototype of wsgi.input.readline().

2008-01-30 Thread Chris McDonough
Graham Dumpleton wrote:
 On 31/01/2008, Chris McDonough [EMAIL PROTECTED] wrote:
 Graham Dumpleton wrote:
 If the size argument is made mandatory, then it would parallel how
 read() function is defined, but this in itself would mean
 cgi.FieldStorage would break.

 This is because cgi.FieldStorage actually calls readline() with no
 argument as well as an argument in different places in the code.
 cgi.FieldStorage doesn't call readline() without an argument.
 cgi.parse_multipart does, but this function is not used by 
 cgi.FieldStorage.  I
 don't know if this changes anything.
 Not really, I should have said 'cgi' module as a whole rather than
 specifically cgi.FieldStorage. Given that people might be using
 cgi.parse_multipart in standard CGI, there would probably still be an
 expectation that it worked for WSGI. We can't really say that you can
 use cgi.FieldStorage but not cgi.parse_multipart. People will just
 expect all the normal tools people would use for this to work.
 Personally, I think parse_multipart should go away.  It's not suitable for
 anything but toy usage.
 
 Not necessarily. Someone may see it as a trade off. The code itself says:
 
 This is easy to use but not
 much good if you are expecting megabytes to be uploaded -- in that case,
 use the FieldStorage class instead which is much more flexible.
 
 So comment implies it is easier to use and so some may think it is
 simpler for what they are doing if they are only dealing with small
 requests.
 
 Of course, it would probably be prudent if you know your requests are
 always going to be small to use LimitRequestBody in Apache, or a
 specific check on content length if handled in Python code, to block
 someone sending over sized requests intentionally to try and break
 things. Provided you did this, may be quite reasonable to use it in
 specific circumstances.

Indeed.  But then again, I doubt the casual user would be able to make this 
judgment and take the necessary precautions.  This kind of user is likely the 
same class of user for whom CGI.FieldStorage is too hard (which it really 
isn't).

- C

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Reading of input after headers sent and 100-continue.

2008-01-30 Thread Brian Smith
Graham Dumpleton wrote:
 Effectively, if a 200 response came back, it seems to suggest 
 that the client still should send the request body, just that 
 it 'SHOULD NOT wait for an indefinite period'. It doesn't say 
 explicitly for the client that it shouldn't still send the 
 request body if another response code comes back.

This behavior is to support servers that don't understand the Expect:
header. 

Basically, if the server responds with a 100, the client must send the
request body. If the server responds with a 4xx or 5xx, the client must
not send the request body. If the server responds with a 2xx or a 3xx,
then the client should must send (the rest of) the request body, on the
assumption that the server doesn't understand Expect:. To be
completely compliant, a server should always respond with a 100 in front
of a 2xx or 3xx, I guess. Thanks for clarifying that for me. I guess the
rules make sense after all.

 So technically, if the client has to still send the request 
 content, something could still read it. It would not be ideal 
 that there is a delay depending on what the client does, but 
 would still be possible from what I read of this section.

You are right. To avoid confusion, you should probably force mod_wsgi to
send a 100-continue in front of any 2xx or 3xx response.

 It MUST NOT perform the requested method if it returns a final status
code.

The implication is that the only time it will avoid sending a 100 is
when it is sending a 4xx, and it should never perform the requested
method if it already said the method failed. The only excuse for not
sending a 100 is that you don't know about Expect: 100-continue. But,
that can't be true if you are reading this part of the spec!

If it responds with a final status
 code, it MAY close the transport connection or it MAY continue
 to read and discard the rest of the request.

If the client receives a 2xx or 3xx without a 100 first, it has to send
the request body (well, depending on which 3xx it is, that is not true).
But, the server doesn't have to read it! But, again, the assumption is
that the server will only send a response without a 100 if it is a 4xx
or 5xx.

 It seems by what you are saying that if 100-continue is 
 present this wouldn't be allowed, and that to ensure correct 
 behaviour the handler would have to read at least some of the 
 request body before sending back the response headers.

You are right, I was wrong. 

  Since ap_http_filter is an input filter only, it should be 
 enough to 
  just avoid reading from the input brigade. (AFAICT, anyway.)
 
 In other words block the handler from reading, potentially 
 raise an error in the process. Except to be fair and 
 consistent, you would have to apply the same rule even if 
 100-continue isn't present. Whether that would break some 
 existing code in doing that is the concern I have, even if it 
 is some simple test program that just echos back the request 
 body as the response body.

Technically, even if the server returns a 4xx, it can still read the
request body, but it might not get anything or it might only get part of
it. I guess, the change to the WSGI spec that is needed is to say that
the gateway must not send the 100 continue if it has already sent some
headers, and that it should send a 100 continue before any 2xx or 3xx
code, which is basically what James Knight suggested (sorry James). The
gateway must indicate EOF if only a partial request body was received. I
don't think the gateway should be required to provide any of the partial
request content on a 4xx, though.

- Brian

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Reading of input after headers sent and 100-continue.

2008-01-30 Thread Graham Dumpleton
For those on the Python web sig who might be thinking they missed part
of the conversation, you have. This is the second half of a
conversation started on Apache modules-dev list about Apache
100-continue processing. If interested, you can see the first half of
the conversation at:

  http://mail-archives.apache.org/mod_mbox/httpd-modules-dev/200801.mbox/browser

Graham

On 31/01/2008, Brian Smith [EMAIL PROTECTED] wrote:
 Graham Dumpleton wrote:
  Effectively, if a 200 response came back, it seems to suggest
  that the client still should send the request body, just that
  it 'SHOULD NOT wait for an indefinite period'. It doesn't say
  explicitly for the client that it shouldn't still send the
  request body if another response code comes back.

 This behavior is to support servers that don't understand the Expect:
 header.

 Basically, if the server responds with a 100, the client must send the
 request body. If the server responds with a 4xx or 5xx, the client must
 not send the request body. If the server responds with a 2xx or a 3xx,
 then the client should must send (the rest of) the request body, on the
 assumption that the server doesn't understand Expect:. To be
 completely compliant, a server should always respond with a 100 in front
 of a 2xx or 3xx, I guess. Thanks for clarifying that for me. I guess the
 rules make sense after all.

  So technically, if the client has to still send the request
  content, something could still read it. It would not be ideal
  that there is a delay depending on what the client does, but
  would still be possible from what I read of this section.

 You are right. To avoid confusion, you should probably force mod_wsgi to
 send a 100-continue in front of any 2xx or 3xx response.

  It MUST NOT perform the requested method if it returns a final status
 code.

 The implication is that the only time it will avoid sending a 100 is
 when it is sending a 4xx, and it should never perform the requested
 method if it already said the method failed. The only excuse for not
 sending a 100 is that you don't know about Expect: 100-continue. But,
 that can't be true if you are reading this part of the spec!

 If it responds with a final status
  code, it MAY close the transport connection or it MAY continue
  to read and discard the rest of the request.

 If the client receives a 2xx or 3xx without a 100 first, it has to send
 the request body (well, depending on which 3xx it is, that is not true).
 But, the server doesn't have to read it! But, again, the assumption is
 that the server will only send a response without a 100 if it is a 4xx
 or 5xx.

  It seems by what you are saying that if 100-continue is
  present this wouldn't be allowed, and that to ensure correct
  behaviour the handler would have to read at least some of the
  request body before sending back the response headers.

 You are right, I was wrong.

   Since ap_http_filter is an input filter only, it should be
  enough to
   just avoid reading from the input brigade. (AFAICT, anyway.)
 
  In other words block the handler from reading, potentially
  raise an error in the process. Except to be fair and
  consistent, you would have to apply the same rule even if
  100-continue isn't present. Whether that would break some
  existing code in doing that is the concern I have, even if it
  is some simple test program that just echos back the request
  body as the response body.

 Technically, even if the server returns a 4xx, it can still read the
 request body, but it might not get anything or it might only get part of
 it. I guess, the change to the WSGI spec that is needed is to say that
 the gateway must not send the 100 continue if it has already sent some
 headers, and that it should send a 100 continue before any 2xx or 3xx
 code, which is basically what James Knight suggested (sorry James). The
 gateway must indicate EOF if only a partial request body was received. I
 don't think the gateway should be required to provide any of the partial
 request content on a 4xx, though.

 - Brian


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com