[Web-SIG] Prototype of wsgi.input.readline().
As I think we all know, no one implements readline() for wsgi.input as defined in the WSGI specification. The reason for this is that stuff like cgi.FieldStorage would refuse to work and would just generate an exception. This is because cgi.FieldStorage expects to pass an argument to readline(). So, although this is linked in the issues list for possible amendments to WSGI specification, there hasn't that I recall been a discussion on how readline() would be defined in any amendment or future version. In particular, would the specification be changed to either: 1. readline(size) where size argument is mandatory, or: 2. readline(size=-1) where size argument is optional. If the size argument is made mandatory, then it would parallel how read() function is defined, but this in itself would mean cgi.FieldStorage would break. This is because cgi.FieldStorage actually calls readline() with no argument as well as an argument in different places in the code. If we allow the argument to be optional however, we run into the same portability problems that would exist with some WSGI adapters which do not simulate EOF on input when all request content is read. Specifically, if user code calls readline() with no argument but the last line of the file wasn't terminated with a EOL, then it would hang. As it is, cgi.FieldStorage only works on systems which do not simulate EOF because the content format it is decoding has its own concept of end of stream marker and cgi.FieldStorage implementation specifically looks for that. The cgi.FieldStorage implementation certainly doesn't track how much input it has read in and progressively change the size argument to readline() on that basis. Any other code which uses readline() with no argument would similarly have to depend on some concept of an end of stream marker in the content, because one can't rely on getting an empty string when input is exhausted, In some respects this highlights the inconsistency of the read() argument not being optional. This is because one of the reasons for not allowing read() argument to be optional is that it would be problematical for implementations that do not simulate EOF, yet the same issue exists with readline() and an optional argument has to be allowed for that because of how cgi.FieldStorage is implemented. Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Prototype of wsgi.input.readline().
Graham Dumpleton wrote: As I think we all know, no one implements readline() for wsgi.input as defined in the WSGI specification. The reason for this is that stuff like cgi.FieldStorage would refuse to work and would just generate an exception. This is because cgi.FieldStorage expects to pass an argument to readline(). I haven't been keeping up on the issues this has caused wrt WSGI, but note that the reason that cgi.FieldStorage passes a size argument to readline is in order to prevent memory exhaustion when reading files that don't have any linebreaks (denial of service). See http://bugs.python.org/issue1112549 . So, although this is linked in the issues list for possible amendments to WSGI specification, there hasn't that I recall been a discussion on how readline() would be defined in any amendment or future version. In particular, would the specification be changed to either: 1. readline(size) where size argument is mandatory, or: 2. readline(size=-1) where size argument is optional. If the size argument is made mandatory, then it would parallel how read() function is defined, but this in itself would mean cgi.FieldStorage would break. This is because cgi.FieldStorage actually calls readline() with no argument as well as an argument in different places in the code. cgi.FieldStorage doesn't call readline() without an argument. cgi.parse_multipart does, but this function is not used by cgi.FieldStorage. I don't know if this changes anything. - C ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] HTTP 1.1 Expect/Continue handling
Graham Dumpleton wrote: On 29/01/2008, James Y Knight [EMAIL PROTECTED] wrote: a) One is to clarify this as a requirement upon the WSGI gateway. Something like the following: If the client requests Expect: 100-continue, and the application yields data before reading from the input, and the response code is a success (2xx) code, then the gateway MUST send a 100 continue response, before writing any other response headers in order to comply with RFC 2616 ยง8.2.3 and to allow the WSGI application to read from the input stream later on in request processing. This requirement is goes too far. I think the part of the specification that says the server most not perform the requested operation is over-reaching. It fails to consider the case where the server can successfully perform the operation without reading the request body. For example, consider a TOUCH method that updates the ETag and Last-Modified date of a resource. Or, a DELETE (a DELETE request shouldn't have a request body, but should the server really be required to check for one and refuse to delete the resource if it finds one?). The WSGI gateway MAY send a 100 continue response in this situation, but it shouldn't be required to. If the application wants the stricter semantics then it should be coded to handle it. This should handle most real-world cases. Now, only sending 100 when the response code is 2xx may be potentially a bit fragile, and won't help e.g. your dummy app above. (maybe some real app really did want the input data even for an error response too?). But, on the other hand, you really *don't* want to force the transmission of a 100 continue when the server is sending e.g. a 400 Bad Request response and likely won't ever read input data. Exactly, if you always send 100 continue then you defeat the purpose of it entirely. I would like to see the specification revised so that it is obvious that my example program is invalid when a Expect: 100 continue response header is present. b) Alternatively, the WSGI gateway could raise an exception when you attempt to respond with a success code without having read the input. For the same reasons I mentioned above, this is too strict. c) Another option is to clarify this as a requirement for a WSGI application: An application must not read from wsgi.input after yielding its first non-empty string unless it has already read from wsgi.input before having yielded its first non-empty string. This is the requirement that I want to see. But, I prefer to have it qualified with when environ['HTTP_EXPECT'] contains the '100-continue' token. (environ[wsgi.input].read(0) may be used to indicate the desire to read the input in the future and satisfy this requirement, without actually reading any data.) Nice in theory, but if the specification is going to change to support this, I would rather see the specification change to allow the application to generate its own 100 continue response. A clarification in the specification may be required to the extent of saying that where a zero length read is done, that no WSGI middleware which wraps wsgi.input, nor even the WSGI adapter itself may optimise it away. In other words a zero length read must always be passed through unless specifically not appropriate for what the WSGI middleware is doing. This would be required to ensure that zero length read always propagates down to the web server layer itself such that it may trigger the 100-continue. The statement An application must not read from wsgi.input after... would already apply to middleware, because middleware are applications. If the middleware causes no response data to be read, it should not be required to cause a 100 continue to be sent. - Brian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] Prohibiting reading from wsgi.input in an application iterable's close method
I would like to see the following requirement added to the WSGI specification: An application may only methods on environ[wsgi.input] before it returns its response iterable, or from within an execution of its iterable's next() method. In particular, the application iterable's close() method, MUST NOT read from wsgi.input. Thoughts? - Brian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Prototype of wsgi.input.readline().
On 31/01/2008, Chris McDonough [EMAIL PROTECTED] wrote: Graham Dumpleton wrote: As I think we all know, no one implements readline() for wsgi.input as defined in the WSGI specification. The reason for this is that stuff like cgi.FieldStorage would refuse to work and would just generate an exception. This is because cgi.FieldStorage expects to pass an argument to readline(). I haven't been keeping up on the issues this has caused wrt WSGI, but note that the reason that cgi.FieldStorage passes a size argument to readline is in order to prevent memory exhaustion when reading files that don't have any linebreaks (denial of service). See http://bugs.python.org/issue1112549 . The interesting comment in that bug is: The input data is not required by the RFC 822/1521/1522/1867 specifications to contain any newline characters. If that can occur, then a WSGI adapter which didn't simulate EOF would fail in that the read would block and never return. All the more reason that simulating EOF needs to be mandatory. So, although this is linked in the issues list for possible amendments to WSGI specification, there hasn't that I recall been a discussion on how readline() would be defined in any amendment or future version. In particular, would the specification be changed to either: 1. readline(size) where size argument is mandatory, or: 2. readline(size=-1) where size argument is optional. If the size argument is made mandatory, then it would parallel how read() function is defined, but this in itself would mean cgi.FieldStorage would break. This is because cgi.FieldStorage actually calls readline() with no argument as well as an argument in different places in the code. cgi.FieldStorage doesn't call readline() without an argument. cgi.parse_multipart does, but this function is not used by cgi.FieldStorage. I don't know if this changes anything. Not really, I should have said 'cgi' module as a whole rather than specifically cgi.FieldStorage. Given that people might be using cgi.parse_multipart in standard CGI, there would probably still be an expectation that it worked for WSGI. We can't really say that you can use cgi.FieldStorage but not cgi.parse_multipart. People will just expect all the normal tools people would use for this to work. Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Prototype of wsgi.input.readline().
Graham Dumpleton wrote: If the size argument is made mandatory, then it would parallel how read() function is defined, but this in itself would mean cgi.FieldStorage would break. This is because cgi.FieldStorage actually calls readline() with no argument as well as an argument in different places in the code. cgi.FieldStorage doesn't call readline() without an argument. cgi.parse_multipart does, but this function is not used by cgi.FieldStorage. I don't know if this changes anything. Not really, I should have said 'cgi' module as a whole rather than specifically cgi.FieldStorage. Given that people might be using cgi.parse_multipart in standard CGI, there would probably still be an expectation that it worked for WSGI. We can't really say that you can use cgi.FieldStorage but not cgi.parse_multipart. People will just expect all the normal tools people would use for this to work. Personally, I think parse_multipart should go away. It's not suitable for anything but toy usage. If people use it, and they expose their site to the world, arbitrary anonymous visitors can cause their Python's process size to grow to arbitrarily. I don't think any existing well-known framework uses it, for this very reason. If it can't go away, and there's a problem due to the non-parity between parse_multipart's use and FieldStorage's use, I suspect the right answer is to change cgi.parse_multipart to pass in a size value for readline too. I probably should have done that when I made the patch. :-( - C ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Prototype of wsgi.input.readline().
On 31/01/2008, Chris McDonough [EMAIL PROTECTED] wrote: Graham Dumpleton wrote: If the size argument is made mandatory, then it would parallel how read() function is defined, but this in itself would mean cgi.FieldStorage would break. This is because cgi.FieldStorage actually calls readline() with no argument as well as an argument in different places in the code. cgi.FieldStorage doesn't call readline() without an argument. cgi.parse_multipart does, but this function is not used by cgi.FieldStorage. I don't know if this changes anything. Not really, I should have said 'cgi' module as a whole rather than specifically cgi.FieldStorage. Given that people might be using cgi.parse_multipart in standard CGI, there would probably still be an expectation that it worked for WSGI. We can't really say that you can use cgi.FieldStorage but not cgi.parse_multipart. People will just expect all the normal tools people would use for this to work. Personally, I think parse_multipart should go away. It's not suitable for anything but toy usage. Not necessarily. Someone may see it as a trade off. The code itself says: This is easy to use but not much good if you are expecting megabytes to be uploaded -- in that case, use the FieldStorage class instead which is much more flexible. So comment implies it is easier to use and so some may think it is simpler for what they are doing if they are only dealing with small requests. Of course, it would probably be prudent if you know your requests are always going to be small to use LimitRequestBody in Apache, or a specific check on content length if handled in Python code, to block someone sending over sized requests intentionally to try and break things. Provided you did this, may be quite reasonable to use it in specific circumstances. If people use it, and they expose their site to the world, arbitrary anonymous visitors can cause their Python's process size to grow to arbitrarily. I don't think any existing well-known framework uses it, for this very reason. If it can't go away, and there's a problem due to the non-parity between parse_multipart's use and FieldStorage's use, I suspect the right answer is to change cgi.parse_multipart to pass in a size value for readline too. I probably should have done that when I made the patch. :-( Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Prototype of wsgi.input.readline().
Graham Dumpleton wrote: On 31/01/2008, Chris McDonough [EMAIL PROTECTED] wrote: Graham Dumpleton wrote: If the size argument is made mandatory, then it would parallel how read() function is defined, but this in itself would mean cgi.FieldStorage would break. This is because cgi.FieldStorage actually calls readline() with no argument as well as an argument in different places in the code. cgi.FieldStorage doesn't call readline() without an argument. cgi.parse_multipart does, but this function is not used by cgi.FieldStorage. I don't know if this changes anything. Not really, I should have said 'cgi' module as a whole rather than specifically cgi.FieldStorage. Given that people might be using cgi.parse_multipart in standard CGI, there would probably still be an expectation that it worked for WSGI. We can't really say that you can use cgi.FieldStorage but not cgi.parse_multipart. People will just expect all the normal tools people would use for this to work. Personally, I think parse_multipart should go away. It's not suitable for anything but toy usage. Not necessarily. Someone may see it as a trade off. The code itself says: This is easy to use but not much good if you are expecting megabytes to be uploaded -- in that case, use the FieldStorage class instead which is much more flexible. So comment implies it is easier to use and so some may think it is simpler for what they are doing if they are only dealing with small requests. Of course, it would probably be prudent if you know your requests are always going to be small to use LimitRequestBody in Apache, or a specific check on content length if handled in Python code, to block someone sending over sized requests intentionally to try and break things. Provided you did this, may be quite reasonable to use it in specific circumstances. Indeed. But then again, I doubt the casual user would be able to make this judgment and take the necessary precautions. This kind of user is likely the same class of user for whom CGI.FieldStorage is too hard (which it really isn't). - C ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Reading of input after headers sent and 100-continue.
Graham Dumpleton wrote: Effectively, if a 200 response came back, it seems to suggest that the client still should send the request body, just that it 'SHOULD NOT wait for an indefinite period'. It doesn't say explicitly for the client that it shouldn't still send the request body if another response code comes back. This behavior is to support servers that don't understand the Expect: header. Basically, if the server responds with a 100, the client must send the request body. If the server responds with a 4xx or 5xx, the client must not send the request body. If the server responds with a 2xx or a 3xx, then the client should must send (the rest of) the request body, on the assumption that the server doesn't understand Expect:. To be completely compliant, a server should always respond with a 100 in front of a 2xx or 3xx, I guess. Thanks for clarifying that for me. I guess the rules make sense after all. So technically, if the client has to still send the request content, something could still read it. It would not be ideal that there is a delay depending on what the client does, but would still be possible from what I read of this section. You are right. To avoid confusion, you should probably force mod_wsgi to send a 100-continue in front of any 2xx or 3xx response. It MUST NOT perform the requested method if it returns a final status code. The implication is that the only time it will avoid sending a 100 is when it is sending a 4xx, and it should never perform the requested method if it already said the method failed. The only excuse for not sending a 100 is that you don't know about Expect: 100-continue. But, that can't be true if you are reading this part of the spec! If it responds with a final status code, it MAY close the transport connection or it MAY continue to read and discard the rest of the request. If the client receives a 2xx or 3xx without a 100 first, it has to send the request body (well, depending on which 3xx it is, that is not true). But, the server doesn't have to read it! But, again, the assumption is that the server will only send a response without a 100 if it is a 4xx or 5xx. It seems by what you are saying that if 100-continue is present this wouldn't be allowed, and that to ensure correct behaviour the handler would have to read at least some of the request body before sending back the response headers. You are right, I was wrong. Since ap_http_filter is an input filter only, it should be enough to just avoid reading from the input brigade. (AFAICT, anyway.) In other words block the handler from reading, potentially raise an error in the process. Except to be fair and consistent, you would have to apply the same rule even if 100-continue isn't present. Whether that would break some existing code in doing that is the concern I have, even if it is some simple test program that just echos back the request body as the response body. Technically, even if the server returns a 4xx, it can still read the request body, but it might not get anything or it might only get part of it. I guess, the change to the WSGI spec that is needed is to say that the gateway must not send the 100 continue if it has already sent some headers, and that it should send a 100 continue before any 2xx or 3xx code, which is basically what James Knight suggested (sorry James). The gateway must indicate EOF if only a partial request body was received. I don't think the gateway should be required to provide any of the partial request content on a 4xx, though. - Brian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Reading of input after headers sent and 100-continue.
For those on the Python web sig who might be thinking they missed part of the conversation, you have. This is the second half of a conversation started on Apache modules-dev list about Apache 100-continue processing. If interested, you can see the first half of the conversation at: http://mail-archives.apache.org/mod_mbox/httpd-modules-dev/200801.mbox/browser Graham On 31/01/2008, Brian Smith [EMAIL PROTECTED] wrote: Graham Dumpleton wrote: Effectively, if a 200 response came back, it seems to suggest that the client still should send the request body, just that it 'SHOULD NOT wait for an indefinite period'. It doesn't say explicitly for the client that it shouldn't still send the request body if another response code comes back. This behavior is to support servers that don't understand the Expect: header. Basically, if the server responds with a 100, the client must send the request body. If the server responds with a 4xx or 5xx, the client must not send the request body. If the server responds with a 2xx or a 3xx, then the client should must send (the rest of) the request body, on the assumption that the server doesn't understand Expect:. To be completely compliant, a server should always respond with a 100 in front of a 2xx or 3xx, I guess. Thanks for clarifying that for me. I guess the rules make sense after all. So technically, if the client has to still send the request content, something could still read it. It would not be ideal that there is a delay depending on what the client does, but would still be possible from what I read of this section. You are right. To avoid confusion, you should probably force mod_wsgi to send a 100-continue in front of any 2xx or 3xx response. It MUST NOT perform the requested method if it returns a final status code. The implication is that the only time it will avoid sending a 100 is when it is sending a 4xx, and it should never perform the requested method if it already said the method failed. The only excuse for not sending a 100 is that you don't know about Expect: 100-continue. But, that can't be true if you are reading this part of the spec! If it responds with a final status code, it MAY close the transport connection or it MAY continue to read and discard the rest of the request. If the client receives a 2xx or 3xx without a 100 first, it has to send the request body (well, depending on which 3xx it is, that is not true). But, the server doesn't have to read it! But, again, the assumption is that the server will only send a response without a 100 if it is a 4xx or 5xx. It seems by what you are saying that if 100-continue is present this wouldn't be allowed, and that to ensure correct behaviour the handler would have to read at least some of the request body before sending back the response headers. You are right, I was wrong. Since ap_http_filter is an input filter only, it should be enough to just avoid reading from the input brigade. (AFAICT, anyway.) In other words block the handler from reading, potentially raise an error in the process. Except to be fair and consistent, you would have to apply the same rule even if 100-continue isn't present. Whether that would break some existing code in doing that is the concern I have, even if it is some simple test program that just echos back the request body as the response body. Technically, even if the server returns a 4xx, it can still read the request body, but it might not get anything or it might only get part of it. I guess, the change to the WSGI spec that is needed is to say that the gateway must not send the 100 continue if it has already sent some headers, and that it should send a 100 continue before any 2xx or 3xx code, which is basically what James Knight suggested (sorry James). The gateway must indicate EOF if only a partial request body was received. I don't think the gateway should be required to provide any of the partial request content on a 4xx, though. - Brian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com