On Thu, Apr 15, 2010 at 10:08 PM, Graham Dumpleton <graham.dumple...@gmail.com> wrote: > On 16 April 2010 11:41, Graham Dumpleton <graham.dumple...@gmail.com> wrote: >> I haven't read what you have done yet > > And still haven't. Don't know when I will get a chance to do so. > > Two points from a quick scan of emails. > > 1. The following section of PEP needs to be updated: > > """ > 1417 Apart from the handling of ``close()``, the semantics of returning a > 1418 file wrapper from the application should be the same as if the > 1419 application had returned ``iter(filelike.read, '')``. In other words, > 1420 transmission should begin at the current position within the "file" > 1421 at the time that transmission begins, and continue until the end is > 1422 reached. > """ > > It can't say read until 'end is reached' of file as Content-Length > must be honoured and less returned if Content-Length is less than what > is available in the remainder of the file as per descriptive changes > (3) and (4). > > In respect of question about readline() arguments and whether -1 or > None is allowed. I would say no they are not. Must be positive integer > or no argument supplied at all. > > Different implementations use -1 or None as value of a default > argument to know when an argument wasn't supplied. One cant rely > though on one or the other being used and so that supplying those > arguments explicitly means the same thing as no argument supplied. In > other words, supplying anything but positive integer or no argument at > all is undefined. > > Same issue arises with read() except that only positive integer can > technically be supplied and argument is not optional. Although, any > implementation which implements wsgi.input as a proper file like > argument is going to accept no argument to mean read all input, this > is outside of WSGI specification and calling with no argument is > undefined. > > Graham
I happened to have just started hitting the body reading functions on an HTTP parser I've been working on. I'd be interested to hear a response on what happens when the various read functions are called with a size hint of zero. I realize that zero is not a positive integer but I'm not quite sure on what the recommended return value would be. I'm can see None and -1 being obvious flags for "no size hint", but zero is a tad weird. I want to say that it'd either return "" (which could sorta kinda violate #2) or raise an exception. I really haven't got any reason to prefer on over the other though. As an aside, I think that "honoring Content-Length" should probably be rephrased to a "middleware should not break HTTP" coupled with a page that lists common ways that middle ware breaks HTTP. I reckon its the same reasoning for 333's dictation that hop-by-hop headers are server only, though there are plenty of other ways I could violate RFC 2616 as a middleware author without violating WSGI. Pie in the sky, the common ways would be included with wsgiref's validate decorator. Paul >> but if you have done so >> already, ensure you read: >> >> http://bitbucket.org/ianb/wsgi-peps/src/ >> >> This is Ian's and Armin's previous go at new specification. It though >> tried to go further than what you are doing. >> >> Also read: >> >> http://blog.dscpl.com.au/2009/09/roadmap-for-python-wsgi-specification.html >> >> I explain what I mean by native strings in that. >> >> Graham >> >> On 15 April 2010 22:54, Dirkjan Ochtman <dirk...@ochtman.nl> wrote: >>> Mostly taking Graham's list of issues and incorporating it into PEP 333. >>> >>> Latest revision: http://hg.xavamedia.nl/peps/file/tip/wsgi-1.1.txt >>> >>> Let's have comments here (comments in the form of diffs are >>> particularly welcome, of course). Remember, the idea is not to change >>> or improve WSGI right now, but only to improve the spec, improving >>> interoperability and enabling Python 3 support. >>> >>> Graham, I hope I did a good job with your suggestions. (Since so much >>> of this is yours, I've just listed you as the second author.) I tried >>> to clarify exactly what you meant by "native strings", can you check >>> that out? >>> >>> Cheers, >>> >>> Dirkjan >>> >>> --- pep-0333.txt 2010-04-15 14:46:02.000000000 +0200 >>> +++ wsgi-1.1.txt 2010-04-15 14:51:39.000000000 +0200 >>> @@ -1,114 +1,124 @@ >>> -PEP: 333 >>> -Title: Python Web Server Gateway Interface v1.0 >>> +PEP: 0000 >>> +Title: Python Web Server Gateway Interface 1.1 >>> Version: $Revision$ >>> Last-Modified: $Date$ >>> -Author: Phillip J. Eby <p...@telecommunity.com> >>> +Author: Dirkjan Ochtman <dirk...@ochtman.nl>, >>> + Graham Dumpleton <graham.dumple...@gmail.com> >>> Discussions-To: Python Web-SIG <web-sig@python.org> >>> Status: Draft >>> Type: Informational >>> Content-Type: text/x-rst >>> -Created: 07-Dec-2003 >>> -Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004 >>> +Created: 15-04-2010 >>> +Post-History: Not yet >>> >>> >>> Abstract >>> ======== >>> >>> -This document specifies a proposed standard interface between web >>> -servers and Python web applications or frameworks, to promote web >>> -application portability across a variety of web servers. >>> +This document specifies a revision of the proposed standard interface >>> +between web servers and Python web applications or frameworks, to >>> +promote web application portability across a variety of web servers. >>> >>> >>> Rationale and Goals >>> =================== >>> >>> -Python currently boasts a wide variety of web application frameworks, >>> -such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to >>> -name just a few [1]_. This wide variety of choices can be a problem >>> -for new Python users, because generally speaking, their choice of web >>> -framework will limit their choice of usable web servers, and vice >>> -versa. >>> - >>> -By contrast, although Java has just as many web application frameworks >>> -available, Java's "servlet" API makes it possible for applications >>> -written with any Java web application framework to run in any web >>> -server that supports the servlet API. >>> - >>> -The availability and widespread use of such an API in web servers for >>> -Python -- whether those servers are written in Python (e.g. Medusa), >>> -embed Python (e.g. mod_python), or invoke Python via a gateway >>> -protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of >>> -framework from choice of web server, freeing users to choose a pairing >>> -that suits them, while freeing framework and server developers to >>> -focus on their preferred area of specialization. >>> - >>> -This PEP, therefore, proposes a simple and universal interface between >>> -web servers and web applications or frameworks: the Python Web Server >>> -Gateway Interface (WSGI). >>> - >>> -But the mere existence of a WSGI spec does nothing to address the >>> -existing state of servers and frameworks for Python web applications. >>> -Server and framework authors and maintainers must actually implement >>> -WSGI for there to be any effect. >>> - >>> -However, since no existing servers or frameworks support WSGI, there >>> -is little immediate reward for an author who implements WSGI support. >>> -Thus, WSGI **must** be easy to implement, so that an author's initial >>> -investment in the interface can be reasonably low. >>> - >>> -Thus, simplicity of implementation on *both* the server and framework >>> -sides of the interface is absolutely critical to the utility of the >>> -WSGI interface, and is therefore the principal criterion for any >>> -design decisions. >>> - >>> -Note, however, that simplicity of implementation for a framework >>> -author is not the same thing as ease of use for a web application >>> -author. WSGI presents an absolutely "no frills" interface to the >>> -framework author, because bells and whistles like response objects and >>> -cookie handling would just get in the way of existing frameworks' >>> -handling of these issues. Again, the goal of WSGI is to facilitate >>> -easy interconnection of existing servers and applications or >>> -frameworks, not to create a new web framework. >>> - >>> -Note also that this goal precludes WSGI from requiring anything that >>> -is not already available in deployed versions of Python. Therefore, >>> -new standard library modules are not proposed or required by this >>> -specification, and nothing in WSGI requires a Python version greater >>> -than 2.2.2. (It would be a good idea, however, for future versions >>> -of Python to include support for this interface in web servers >>> -provided by the standard library.) >>> - >>> -In addition to ease of implementation for existing and future >>> -frameworks and servers, it should also be easy to create request >>> -preprocessors, response postprocessors, and other WSGI-based >>> -"middleware" components that look like an application to their >>> -containing server, while acting as a server for their contained >>> -applications. >>> - >>> -If middleware can be both simple and robust, and WSGI is widely >>> -available in servers and frameworks, it allows for the possibility >>> -of an entirely new kind of Python web application framework: one >>> -consisting of loosely-coupled WSGI middleware components. Indeed, >>> -existing framework authors may even choose to refactor their >>> -frameworks' existing services to be provided in this way, becoming >>> -more like libraries used with WSGI, and less like monolithic >>> -frameworks. This would then allow application developers to choose >>> -"best-of-breed" components for specific functionality, rather than >>> -having to commit to all the pros and cons of a single framework. >>> - >>> -Of course, as of this writing, that day is doubtless quite far off. >>> -In the meantime, it is a sufficient short-term goal for WSGI to >>> -enable the use of any framework with any server. >>> - >>> -Finally, it should be mentioned that the current version of WSGI >>> -does not prescribe any particular mechanism for "deploying" an >>> -application for use with a web server or server gateway. At the >>> -present time, this is necessarily implementation-defined by the >>> -server or gateway. After a sufficient number of servers and >>> -frameworks have implemented WSGI to provide field experience with >>> -varying deployment requirements, it may make sense to create >>> -another PEP, describing a deployment standard for WSGI servers and >>> -application frameworks. >>> +WSGI 1.0, specified in PEP 333, did a great job in making it easier >>> +for web applications and web servers to interface with each other. >>> +It has become very much the standard it was meant to be and an >>> +important part of the Python web development infrastructure. >>> + >>> +After several implementations were built by different developers, >>> +it inevitably turned out that the specification wasn't perfect. It >>> +left out some details that were implemented by all the web server >>> +interfaces because they were critical for many applications (or >>> +application frameworks). Additionally, the specification was written >>> +before Python 3.x was specified, resulting in a lack of clear >>> +specification on what to do with unicode strings. >>> + >>> +While there are some ideas around to improve WSGI further in less >>> +compatible ways, we feel that there is value to be had in first >>> +specifying a minor revision of the specification, which is largely >>> +compatible with existing implementations. Further simplification >>> +and experimentation are therefore deferred to a 2.0 version. >>> + >>> + >>> +Differences with WSGI 1.0 >>> +========================= >>> + >>> +Descriptive changes >>> +------------------- >>> + >>> +The following changes were made to realign the spec with >>> +implementations 'in the wild'. >>> + >>> +1. The 'readline()' function of 'wsgi.input' must optionally take >>> + a size hint. This is required because many applications use >>> + cgi.FieldStorage, which uses this functionality. >>> + >>> +2. The 'wsgi.input' functions for reading input must return an empty >>> + string as end of input stream marker. This is required for support >>> + of HTTP 1.1 request pipelining. A correctly implemented WSGI >>> + middleware already has to cope with an empty string as end >>> + sentinel anyway to detect premature end of input. >>> + >>> +3. Any WSGI application or middleware should not itself return, or >>> + consume from a wrapped WSGI component, more data than specified by >>> + the Content-Length response header if defined. Middleware that >>> + does this is arguably broken and can generate incorrect data. >>> + This is just a clarification of obligations. >>> + >>> +4. The WSGI adapter must not pass on to the server any data above >>> + what the Content-Length response header defines, if supplied. >>> + Doing this is technically a violation of HTTP. This is another >>> + clarification of obligations. >>> + >>> + >>> +String handling changes >>> +----------------------- >>> + >>> +The following changes were made to make WSGI work on Python 3.x. >>> + >>> +1. The application is passed an instance of a Python dictionary >>> + containing what is referred to as the WSGI environment. All keys >>> + in this dictionary are native strings. For CGI variables, all names >>> + are going to be ISO-8859-1 and so where native strings are >>> + unicode strings, that encoding is used for the names of CGI >>> + variables. >>> + >>> +2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI >>> + environment, the value of the variable should be a native string. >>> + >>> +3. For the CGI variables contained in the WSGI environment, the values >>> + of the variables are native strings. Where native strings are >>> + unicode strings, ISO-8859-1 encoding would be used such that the >>> + original character data is preserved and as necessary the unicode >>> + string can be converted back to bytes and thence decoded to unicode >>> + again using a different encoding. >>> + >>> +4. The WSGI input stream 'wsgi.input' contained in the WSGI environment >>> + and from which request content is read, should yield byte strings. >>> + >>> +5. The status line specified by the WSGI application should be a byte >>> + string. Where native strings are unicode strings, the native string >>> + type can also be returned in which case it would be encoded as >>> + ISO-8859-1. >>> + >>> +6. The list of response headers specified by the WSGI application should >>> + contain tuples consisting of two values, where each value is a byte >>> + string. Where native strings are unicode strings, the native string >>> + type can also be returned in which case it would be encoded as >>> + ISO-8859-1. >>> + >>> +7. The iterable returned by the application and from which response >>> + content is derived, should yield byte strings. Where native strings >>> + are unicode strings, the native string type can also be returned in >>> + which case it would be encoded as ISO-8859-1. >>> + >>> +8. The value passed to the 'write()' callback returned by >>> + 'start_response()' should be a byte string. Where native strings >>> + are unicode strings, a native string type can also be supplied, in >>> + which case it would be encoded as ISO-8859-1. >>> >>> >>> Specification Overview >>> @@ -447,6 +457,13 @@ >>> Streaming`_ section below for more on how application output must be >>> handled.) >>> >>> +Further on, several places specify constraints upon string types used >>> +in the WSGI API. The term native string is used to mean the 'str' class >>> +in both Python 2.x and 3.x. The spec tries to ensure optimal >>> +compatibility and ease of use by allowing implementations running on >>> +Python 3.x to encode strings (which are Unicode strings with no >>> +specified encoding) as ISO-8859-1 where a 3.x string is passed in. >>> + >>> The server or gateway should treat the yielded strings as binary byte >>> sequences: in particular, it should ensure that line endings are >>> not altered. The application is responsible for ensuring that the >>> @@ -489,12 +506,22 @@ >>> ``environ`` Variables >>> --------------------- >>> >>> +All keys in this dictionary are native strings. For CGI variables, >>> +all names are going to be ISO-8859-1 and so where native strings are >>> +unicode strings, that encoding is used for the names of CGI variables. >>> + >>> The ``environ`` dictionary is required to contain these CGI >>> environment variables, as defined by the Common Gateway Interface >>> specification [2]_. The following variables **must** be present, >>> unless their value would be an empty string, in which case they >>> **may** be omitted, except as otherwise noted below. >>> >>> +The values for CGI variables are native strings. Where native strings >>> +are unicode strings, ISO-8859-1 encoding would be used such that the >>> +original character data is preserved and as necessary the unicode >>> +string can be converted back to bytes and thence decoded to unicode >>> +again using a different encoding. >>> + >>> ``REQUEST_METHOD`` >>> The HTTP request method, such as ``"GET"`` or ``"POST"``. This >>> cannot ever be an empty string, and so is always required. >>> @@ -575,13 +602,14 @@ >>> ===================== =============================================== >>> Variable Value >>> ===================== =============================================== >>> -``wsgi.version`` The tuple ``(1,0)``, representing WSGI >>> +``wsgi.version`` The tuple ``(1, 0)``, representing WSGI >>> version 1.0. >>> >>> ``wsgi.url_scheme`` A string representing the "scheme" portion of >>> the URL at which the application is being >>> invoked. Normally, this will have the value >>> - ``"http"`` or ``"https"``, as appropriate. >>> + ``"http"`` or ``"https"``, as appropriate. The >>> + value is a native string. >>> >>> ``wsgi.input`` An input stream (file-like object) from which >>> the HTTP request body can be read. (The server >>> @@ -646,7 +674,7 @@ >>> Method Stream Notes >>> =================== ========== ======== >>> ``read(size)`` ``input`` 1 >>> -``readline()`` ``input`` 1,2 >>> +``readline(hint)`` ``input`` 1,2 >>> ``readlines(hint)`` ``input`` 1,3 >>> ``__iter__()`` ``input`` >>> ``flush()`` ``errors`` 4 >>> @@ -661,11 +689,12 @@ >>> ``Content-Length``, and is allowed to simulate an end-of-file >>> condition if the application attempts to read past that point. >>> The application **should not** attempt to read more data than is >>> - specified by the ``CONTENT_LENGTH`` variable. >>> + specified by the ``CONTENT_LENGTH`` variable. All read functions >>> + are required to return an empty string as the end of input stream >>> + marker. They must yield byte strings. >>> >>> -2. The optional "size" argument to ``readline()`` is not supported, >>> - as it may be complex for server authors to implement, and is not >>> - often used in practice. >>> +2. The optional "size" argument to ``readline()`` is required for >>> + the implementer, but optional for callers. >>> >>> 3. Note that the ``hint`` argument to ``readlines()`` is optional for >>> both caller and implementer. The application is free not to >>> @@ -692,12 +721,15 @@ >>> --------------------------------- >>> >>> The second parameter passed to the application object is a callable >>> -of the form ``start_response(status,response_headers,exc_info=None)``. >>> +of the form ``start_response(status, response_headers, exc_info=None)``. >>> (As with all WSGI callables, the arguments must be supplied >>> positionally, not by keyword.) The ``start_response`` callable is >>> used to begin the HTTP response, and it must return a >>> ``write(body_data)`` callable (see the `Buffering and Streaming`_ >>> -section, below). >>> +section, below). Values passed to the ``write(body_data)`` callable >>> +should be byte strings. Where native strings are unicode strings, a >>> +native strings type can also be supplied, in which case it would be >>> +encoded as ISO-8859-1. >>> >>> The ``status`` argument is an HTTP "status" string like ``"200 OK"`` >>> or ``"404 Not Found"``. That is, it is a string consisting of a >>> @@ -705,14 +737,20 @@ >>> single space, with no surrounding whitespace or other characters. >>> (See RFC 2616, Section 6.1.1 for more information.) The string >>> **must not** contain control characters, and must not be terminated >>> -with a carriage return, linefeed, or combination thereof. >>> +with a carriage return, linefeed, or combination thereof. This >>> +value should be a byte string. Where native strings are unicode >>> +strings, the native string type can also be returned, in which >>> +case it would be encoded as ISO-8859-1. >>> >>> The ``response_headers`` argument is a list of ``(header_name, >>> header_value)`` tuples. It must be a Python list; i.e. >>> -``type(response_headers) is ListType``, and the server **may** change >>> +``type(response_headers) is list``, and the server **may** change >>> its contents in any way it desires. Each ``header_name`` must be a >>> valid HTTP header field-name (as defined by RFC 2616, Section 4.2), >>> -without a trailing colon or other punctuation. >>> +without a trailing colon or other punctuation. Both the header_name >>> +and the header_value should be byte strings. Where native strings >>> +are unicode strings, the native string type can also be returned, >>> +in which case it would be encoded as ISO-8859-1. >>> >>> Each ``header_value`` **must not** include *any* control characters, >>> including carriage returns or linefeeds, either embedded or at the end. >>> @@ -809,6 +847,14 @@ >>> Handling the ``Content-Length`` Header >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >>> +If an application or middleware layer chooses to return a >>> +Content-Length header, it should not return more data than specified >>> +by the header value. Any wrapping middleware layer should not >>> +consume more data than specified in the header value from the >>> +wrapped component (either middleware or application). Any WSGI >>> +adapter must similarly not pass on data above what the >>> +Content-Length response header value defines. >>> + >>> If the application does not supply a ``Content-Length`` header, a >>> server or gateway may choose one of several approaches to handling >>> it. The simplest of these is to close the client connection when >>> @@ -1569,55 +1615,13 @@ >>> developers. >>> >>> >>> -Proposed/Under Discussion >>> -========================= >>> - >>> -These items are currently being discussed on the Web-SIG and elsewhere, >>> -or are on the PEP author's "to-do" list: >>> - >>> -* Should ``wsgi.input`` be an iterator instead of a file? This would >>> - help for asynchronous applications and chunked-encoding input >>> - streams. >>> - >>> -* Optional extensions are being discussed for pausing iteration of an >>> - application's ouptut until input is available or until a callback >>> - occurs. >>> - >>> -* Add a section about synchronous vs. asynchronous apps and servers, >>> - the relevant threading models, and issues/design goals in these >>> - areas. >>> - >>> - >>> Acknowledgements >>> ================ >>> >>> -Thanks go to the many folks on the Web-SIG mailing list whose >>> -thoughtful feedback made this revised draft possible. Especially: >>> +Thanks go to many folks on the Web-SIG mailing list for helping the work >>> +on clarifying and improving this specification. In particular: >>> >>> -* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up >>> - on the first draft as not offering any advantages over "plain old >>> - CGI", thus encouraging me to look for a better approach. >>> - >>> -* Ian Bicking, who helped nag me into properly specifying the >>> - multithreading and multiprocess options, as well as badgering me to >>> - provide a mechanism for servers to supply custom extension data to >>> - an application. >>> - >>> -* Tony Lownds, who came up with the concept of a ``start_response`` >>> - function that took the status and headers, returning a ``write`` >>> - function. His input also guided the design of the exception handling >>> - facilities, especially in the area of allowing for middleware that >>> - overrides application error messages. >>> - >>> -* Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython >>> - (well before the spec was finalized) helped to shape the "supporting >>> - older versions of Python" section, as well as the optional >>> - ``wsgi.file_wrapper`` facility. >>> - >>> -* Mark Nottingham, who reviewed the spec extensively for issues with >>> - HTTP RFC compliance, especially with regard to HTTP/1.1 features that >>> - I didn't even know existed until he pointed them out. >>> - >>> +* Phillip J. Eby, for writing/editing the 1.0 specification. >>> >>> References >>> ========== >>> @@ -1643,8 +1647,6 @@ >>> >>> This document has been placed in the public domain. >>> >>> - >>> - >>> .. >>> Local Variables: >>> mode: indented-text >>> >> > _______________________________________________ > Web-SIG mailing list > Web-SIG@python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/paul.joseph.davis%40gmail.com > _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com