[Graham Dumpleton] > How does one implement in WSGI an input filter that manipulates the request > body in such a way that the effective content length would be changed?
> The problem I am trying to address here is how one might implement using WSGI > a > decompression filter for the body of a request. Ie., where "Content-Encoding: > gzip" has been specified. > So, how is one meant to deal with this in WSGI? The usual approach to modifying something something in the WSGI environment, in this case the wsgi.input file-like object, is to wrap it or replace it with an object that behaves as desired. In this case, the approach I would take would be to wrap the wsgi.input object with a gzip.GzipFile object, which should only read the input stream data on demand. The code would look like this import gzip wsgi_env['wsgi.input'] = gzip.GzipFile(wsgi_env['wsgi.input']) Notes. 1. The application should be completely unaware that it is dealing with a compressed stream: it simply reads from wsgi.input, unaware that reading from what it thinks the input stream is actually causing cascading reads down a series of file-like objects. 2. The GzipFile object will decompress on the fly, meaning that it will only read from the wrapped input stream when it needs input. Which means that if the application does not read data from wsgi.input, then no data will be read from the client connection. 3. The GzipFile should not be responsible for enforcement of the incoming Content-Length boundary. Instead, this should be enforced by the original server-provided file-like input stream that it wraps. So if the application attempts to read past Content-Length bytes, the server-provided input stream "is allowed to simulate an end-of-file condition". Which would cause the GzipFile to return an EOF to the application, or possibly an exception. 4. Because of the on-the-fly nature of the GzipFile decompression, it would not be possible to provide a meaningful Content-Length value to the application. To do so would require buffering and decompressing the entire input data stream. But the application should still be able to operate without knowing Content-Length. 5. The wrapping can NOT be done in middleware. PEP 333, Section "Other HTTP Features" has this to say: "WSGI applications must not generate any "hop-by-hop" headers [4], attempt to use HTTP features that would require them to generate such headers, or rely on the content of any incoming "hop-by-hop" headers in the environ dictionary. WSGI servers must handle any supported inbound "hop-by-hop" headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable." So the wrapping and replacement of wsgi.input should happen in the server or gateway, NOT in middleware. 6. Exactly the same principles should apply to decoding incoming Transfer-Encoding: chunked. HTH, Alan. P.S. Thanks for all your great work on mod_python Graham! _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com