On 16 April 2010 13:29, Paul Davis <paul.joseph.da...@gmail.com> wrote: > On Thu, Apr 15, 2010 at 10:08 PM, Graham Dumpleton > <graham.dumple...@gmail.com> wrote: >> On 16 April 2010 11:41, Graham Dumpleton <graham.dumple...@gmail.com> wrote: >>> I haven't read what you have done yet >> >> And still haven't. Don't know when I will get a chance to do so. >> >> Two points from a quick scan of emails. >> >> 1. The following section of PEP needs to be updated: >> >> """ >> 1417 Apart from the handling of ``close()``, the semantics of returning a >> 1418 file wrapper from the application should be the same as if the >> 1419 application had returned ``iter(filelike.read, '')``. In other words, >> 1420 transmission should begin at the current position within the "file" >> 1421 at the time that transmission begins, and continue until the end is >> 1422 reached. >> """ >> >> It can't say read until 'end is reached' of file as Content-Length >> must be honoured and less returned if Content-Length is less than what >> is available in the remainder of the file as per descriptive changes >> (3) and (4). >> >> In respect of question about readline() arguments and whether -1 or >> None is allowed. I would say no they are not. Must be positive integer >> or no argument supplied at all. >> >> Different implementations use -1 or None as value of a default >> argument to know when an argument wasn't supplied. One cant rely >> though on one or the other being used and so that supplying those >> arguments explicitly means the same thing as no argument supplied. In >> other words, supplying anything but positive integer or no argument at >> all is undefined. >> >> Same issue arises with read() except that only positive integer can >> technically be supplied and argument is not optional. Although, any >> implementation which implements wsgi.input as a proper file like >> argument is going to accept no argument to mean read all input, this >> is outside of WSGI specification and calling with no argument is >> undefined. >> >> Graham > > I happened to have just started hitting the body reading functions on > an HTTP parser I've been working on. I'd be interested to hear a > response on what happens when the various read functions are called > with a size hint of zero. > > I realize that zero is not a positive integer but I'm not quite sure > on what the recommended return value would be. I'm can see None and -1 > being obvious flags for "no size hint", but zero is a tad weird. I > want to say that it'd either return "" (which could sorta kinda > violate #2) or raise an exception. I really haven't got any reason to > prefer on over the other though.
I almost mentioned 0 as argument in my previous email, but I got a bit scared off by it also. In all these things, one has to be guided by what a standard file like object does in Python. Ie., >>> import sys >>> sys.stdin.read(0) '' So, although an empty string would normally indicate no more content can be read, a argument of 0 has to be seen as a special exception to that rule, with no choice but that empty string is returned. Graham > As an aside, I think that "honoring Content-Length" should probably be > rephrased to a "middleware should not break HTTP" coupled with a page > that lists common ways that middle ware breaks HTTP. I reckon its the > same reasoning for 333's dictation that hop-by-hop headers are server > only, though there are plenty of other ways I could violate RFC 2616 > as a middleware author without violating WSGI. Pie in the sky, the > common ways would be included with wsgiref's validate decorator. > > Paul > >>> but if you have done so >>> already, ensure you read: >>> >>> http://bitbucket.org/ianb/wsgi-peps/src/ >>> >>> This is Ian's and Armin's previous go at new specification. It though >>> tried to go further than what you are doing. >>> >>> Also read: >>> >>> http://blog.dscpl.com.au/2009/09/roadmap-for-python-wsgi-specification.html >>> >>> I explain what I mean by native strings in that. >>> >>> Graham >>> >>> On 15 April 2010 22:54, Dirkjan Ochtman <dirk...@ochtman.nl> wrote: >>>> Mostly taking Graham's list of issues and incorporating it into PEP 333. >>>> >>>> Latest revision: http://hg.xavamedia.nl/peps/file/tip/wsgi-1.1.txt >>>> >>>> Let's have comments here (comments in the form of diffs are >>>> particularly welcome, of course). Remember, the idea is not to change >>>> or improve WSGI right now, but only to improve the spec, improving >>>> interoperability and enabling Python 3 support. >>>> >>>> Graham, I hope I did a good job with your suggestions. (Since so much >>>> of this is yours, I've just listed you as the second author.) I tried >>>> to clarify exactly what you meant by "native strings", can you check >>>> that out? >>>> >>>> Cheers, >>>> >>>> Dirkjan >>>> >>>> --- pep-0333.txt 2010-04-15 14:46:02.000000000 +0200 >>>> +++ wsgi-1.1.txt 2010-04-15 14:51:39.000000000 +0200 >>>> @@ -1,114 +1,124 @@ >>>> -PEP: 333 >>>> -Title: Python Web Server Gateway Interface v1.0 >>>> +PEP: 0000 >>>> +Title: Python Web Server Gateway Interface 1.1 >>>> Version: $Revision$ >>>> Last-Modified: $Date$ >>>> -Author: Phillip J. Eby <p...@telecommunity.com> >>>> +Author: Dirkjan Ochtman <dirk...@ochtman.nl>, >>>> + Graham Dumpleton <graham.dumple...@gmail.com> >>>> Discussions-To: Python Web-SIG <web-sig@python.org> >>>> Status: Draft >>>> Type: Informational >>>> Content-Type: text/x-rst >>>> -Created: 07-Dec-2003 >>>> -Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004 >>>> +Created: 15-04-2010 >>>> +Post-History: Not yet >>>> >>>> >>>> Abstract >>>> ======== >>>> >>>> -This document specifies a proposed standard interface between web >>>> -servers and Python web applications or frameworks, to promote web >>>> -application portability across a variety of web servers. >>>> +This document specifies a revision of the proposed standard interface >>>> +between web servers and Python web applications or frameworks, to >>>> +promote web application portability across a variety of web servers. >>>> >>>> >>>> Rationale and Goals >>>> =================== >>>> >>>> -Python currently boasts a wide variety of web application frameworks, >>>> -such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to >>>> -name just a few [1]_. This wide variety of choices can be a problem >>>> -for new Python users, because generally speaking, their choice of web >>>> -framework will limit their choice of usable web servers, and vice >>>> -versa. >>>> - >>>> -By contrast, although Java has just as many web application frameworks >>>> -available, Java's "servlet" API makes it possible for applications >>>> -written with any Java web application framework to run in any web >>>> -server that supports the servlet API. >>>> - >>>> -The availability and widespread use of such an API in web servers for >>>> -Python -- whether those servers are written in Python (e.g. Medusa), >>>> -embed Python (e.g. mod_python), or invoke Python via a gateway >>>> -protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of >>>> -framework from choice of web server, freeing users to choose a pairing >>>> -that suits them, while freeing framework and server developers to >>>> -focus on their preferred area of specialization. >>>> - >>>> -This PEP, therefore, proposes a simple and universal interface between >>>> -web servers and web applications or frameworks: the Python Web Server >>>> -Gateway Interface (WSGI). >>>> - >>>> -But the mere existence of a WSGI spec does nothing to address the >>>> -existing state of servers and frameworks for Python web applications. >>>> -Server and framework authors and maintainers must actually implement >>>> -WSGI for there to be any effect. >>>> - >>>> -However, since no existing servers or frameworks support WSGI, there >>>> -is little immediate reward for an author who implements WSGI support. >>>> -Thus, WSGI **must** be easy to implement, so that an author's initial >>>> -investment in the interface can be reasonably low. >>>> - >>>> -Thus, simplicity of implementation on *both* the server and framework >>>> -sides of the interface is absolutely critical to the utility of the >>>> -WSGI interface, and is therefore the principal criterion for any >>>> -design decisions. >>>> - >>>> -Note, however, that simplicity of implementation for a framework >>>> -author is not the same thing as ease of use for a web application >>>> -author. WSGI presents an absolutely "no frills" interface to the >>>> -framework author, because bells and whistles like response objects and >>>> -cookie handling would just get in the way of existing frameworks' >>>> -handling of these issues. Again, the goal of WSGI is to facilitate >>>> -easy interconnection of existing servers and applications or >>>> -frameworks, not to create a new web framework. >>>> - >>>> -Note also that this goal precludes WSGI from requiring anything that >>>> -is not already available in deployed versions of Python. Therefore, >>>> -new standard library modules are not proposed or required by this >>>> -specification, and nothing in WSGI requires a Python version greater >>>> -than 2.2.2. (It would be a good idea, however, for future versions >>>> -of Python to include support for this interface in web servers >>>> -provided by the standard library.) >>>> - >>>> -In addition to ease of implementation for existing and future >>>> -frameworks and servers, it should also be easy to create request >>>> -preprocessors, response postprocessors, and other WSGI-based >>>> -"middleware" components that look like an application to their >>>> -containing server, while acting as a server for their contained >>>> -applications. >>>> - >>>> -If middleware can be both simple and robust, and WSGI is widely >>>> -available in servers and frameworks, it allows for the possibility >>>> -of an entirely new kind of Python web application framework: one >>>> -consisting of loosely-coupled WSGI middleware components. Indeed, >>>> -existing framework authors may even choose to refactor their >>>> -frameworks' existing services to be provided in this way, becoming >>>> -more like libraries used with WSGI, and less like monolithic >>>> -frameworks. This would then allow application developers to choose >>>> -"best-of-breed" components for specific functionality, rather than >>>> -having to commit to all the pros and cons of a single framework. >>>> - >>>> -Of course, as of this writing, that day is doubtless quite far off. >>>> -In the meantime, it is a sufficient short-term goal for WSGI to >>>> -enable the use of any framework with any server. >>>> - >>>> -Finally, it should be mentioned that the current version of WSGI >>>> -does not prescribe any particular mechanism for "deploying" an >>>> -application for use with a web server or server gateway. At the >>>> -present time, this is necessarily implementation-defined by the >>>> -server or gateway. After a sufficient number of servers and >>>> -frameworks have implemented WSGI to provide field experience with >>>> -varying deployment requirements, it may make sense to create >>>> -another PEP, describing a deployment standard for WSGI servers and >>>> -application frameworks. >>>> +WSGI 1.0, specified in PEP 333, did a great job in making it easier >>>> +for web applications and web servers to interface with each other. >>>> +It has become very much the standard it was meant to be and an >>>> +important part of the Python web development infrastructure. >>>> + >>>> +After several implementations were built by different developers, >>>> +it inevitably turned out that the specification wasn't perfect. It >>>> +left out some details that were implemented by all the web server >>>> +interfaces because they were critical for many applications (or >>>> +application frameworks). Additionally, the specification was written >>>> +before Python 3.x was specified, resulting in a lack of clear >>>> +specification on what to do with unicode strings. >>>> + >>>> +While there are some ideas around to improve WSGI further in less >>>> +compatible ways, we feel that there is value to be had in first >>>> +specifying a minor revision of the specification, which is largely >>>> +compatible with existing implementations. Further simplification >>>> +and experimentation are therefore deferred to a 2.0 version. >>>> + >>>> + >>>> +Differences with WSGI 1.0 >>>> +========================= >>>> + >>>> +Descriptive changes >>>> +------------------- >>>> + >>>> +The following changes were made to realign the spec with >>>> +implementations 'in the wild'. >>>> + >>>> +1. The 'readline()' function of 'wsgi.input' must optionally take >>>> + a size hint. This is required because many applications use >>>> + cgi.FieldStorage, which uses this functionality. >>>> + >>>> +2. The 'wsgi.input' functions for reading input must return an empty >>>> + string as end of input stream marker. This is required for support >>>> + of HTTP 1.1 request pipelining. A correctly implemented WSGI >>>> + middleware already has to cope with an empty string as end >>>> + sentinel anyway to detect premature end of input. >>>> + >>>> +3. Any WSGI application or middleware should not itself return, or >>>> + consume from a wrapped WSGI component, more data than specified by >>>> + the Content-Length response header if defined. Middleware that >>>> + does this is arguably broken and can generate incorrect data. >>>> + This is just a clarification of obligations. >>>> + >>>> +4. The WSGI adapter must not pass on to the server any data above >>>> + what the Content-Length response header defines, if supplied. >>>> + Doing this is technically a violation of HTTP. This is another >>>> + clarification of obligations. >>>> + >>>> + >>>> +String handling changes >>>> +----------------------- >>>> + >>>> +The following changes were made to make WSGI work on Python 3.x. >>>> + >>>> +1. The application is passed an instance of a Python dictionary >>>> + containing what is referred to as the WSGI environment. All keys >>>> + in this dictionary are native strings. For CGI variables, all names >>>> + are going to be ISO-8859-1 and so where native strings are >>>> + unicode strings, that encoding is used for the names of CGI >>>> + variables. >>>> + >>>> +2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI >>>> + environment, the value of the variable should be a native string. >>>> + >>>> +3. For the CGI variables contained in the WSGI environment, the values >>>> + of the variables are native strings. Where native strings are >>>> + unicode strings, ISO-8859-1 encoding would be used such that the >>>> + original character data is preserved and as necessary the unicode >>>> + string can be converted back to bytes and thence decoded to unicode >>>> + again using a different encoding. >>>> + >>>> +4. The WSGI input stream 'wsgi.input' contained in the WSGI environment >>>> + and from which request content is read, should yield byte strings. >>>> + >>>> +5. The status line specified by the WSGI application should be a byte >>>> + string. Where native strings are unicode strings, the native string >>>> + type can also be returned in which case it would be encoded as >>>> + ISO-8859-1. >>>> + >>>> +6. The list of response headers specified by the WSGI application should >>>> + contain tuples consisting of two values, where each value is a byte >>>> + string. Where native strings are unicode strings, the native string >>>> + type can also be returned in which case it would be encoded as >>>> + ISO-8859-1. >>>> + >>>> +7. The iterable returned by the application and from which response >>>> + content is derived, should yield byte strings. Where native strings >>>> + are unicode strings, the native string type can also be returned in >>>> + which case it would be encoded as ISO-8859-1. >>>> + >>>> +8. The value passed to the 'write()' callback returned by >>>> + 'start_response()' should be a byte string. Where native strings >>>> + are unicode strings, a native string type can also be supplied, in >>>> + which case it would be encoded as ISO-8859-1. >>>> >>>> >>>> Specification Overview >>>> @@ -447,6 +457,13 @@ >>>> Streaming`_ section below for more on how application output must be >>>> handled.) >>>> >>>> +Further on, several places specify constraints upon string types used >>>> +in the WSGI API. The term native string is used to mean the 'str' class >>>> +in both Python 2.x and 3.x. The spec tries to ensure optimal >>>> +compatibility and ease of use by allowing implementations running on >>>> +Python 3.x to encode strings (which are Unicode strings with no >>>> +specified encoding) as ISO-8859-1 where a 3.x string is passed in. >>>> + >>>> The server or gateway should treat the yielded strings as binary byte >>>> sequences: in particular, it should ensure that line endings are >>>> not altered. The application is responsible for ensuring that the >>>> @@ -489,12 +506,22 @@ >>>> ``environ`` Variables >>>> --------------------- >>>> >>>> +All keys in this dictionary are native strings. For CGI variables, >>>> +all names are going to be ISO-8859-1 and so where native strings are >>>> +unicode strings, that encoding is used for the names of CGI variables. >>>> + >>>> The ``environ`` dictionary is required to contain these CGI >>>> environment variables, as defined by the Common Gateway Interface >>>> specification [2]_. The following variables **must** be present, >>>> unless their value would be an empty string, in which case they >>>> **may** be omitted, except as otherwise noted below. >>>> >>>> +The values for CGI variables are native strings. Where native strings >>>> +are unicode strings, ISO-8859-1 encoding would be used such that the >>>> +original character data is preserved and as necessary the unicode >>>> +string can be converted back to bytes and thence decoded to unicode >>>> +again using a different encoding. >>>> + >>>> ``REQUEST_METHOD`` >>>> The HTTP request method, such as ``"GET"`` or ``"POST"``. This >>>> cannot ever be an empty string, and so is always required. >>>> @@ -575,13 +602,14 @@ >>>> ===================== =============================================== >>>> Variable Value >>>> ===================== =============================================== >>>> -``wsgi.version`` The tuple ``(1,0)``, representing WSGI >>>> +``wsgi.version`` The tuple ``(1, 0)``, representing WSGI >>>> version 1.0. >>>> >>>> ``wsgi.url_scheme`` A string representing the "scheme" portion of >>>> the URL at which the application is being >>>> invoked. Normally, this will have the value >>>> - ``"http"`` or ``"https"``, as appropriate. >>>> + ``"http"`` or ``"https"``, as appropriate. The >>>> + value is a native string. >>>> >>>> ``wsgi.input`` An input stream (file-like object) from which >>>> the HTTP request body can be read. (The server >>>> @@ -646,7 +674,7 @@ >>>> Method Stream Notes >>>> =================== ========== ======== >>>> ``read(size)`` ``input`` 1 >>>> -``readline()`` ``input`` 1,2 >>>> +``readline(hint)`` ``input`` 1,2 >>>> ``readlines(hint)`` ``input`` 1,3 >>>> ``__iter__()`` ``input`` >>>> ``flush()`` ``errors`` 4 >>>> @@ -661,11 +689,12 @@ >>>> ``Content-Length``, and is allowed to simulate an end-of-file >>>> condition if the application attempts to read past that point. >>>> The application **should not** attempt to read more data than is >>>> - specified by the ``CONTENT_LENGTH`` variable. >>>> + specified by the ``CONTENT_LENGTH`` variable. All read functions >>>> + are required to return an empty string as the end of input stream >>>> + marker. They must yield byte strings. >>>> >>>> -2. The optional "size" argument to ``readline()`` is not supported, >>>> - as it may be complex for server authors to implement, and is not >>>> - often used in practice. >>>> +2. The optional "size" argument to ``readline()`` is required for >>>> + the implementer, but optional for callers. >>>> >>>> 3. Note that the ``hint`` argument to ``readlines()`` is optional for >>>> both caller and implementer. The application is free not to >>>> @@ -692,12 +721,15 @@ >>>> --------------------------------- >>>> >>>> The second parameter passed to the application object is a callable >>>> -of the form ``start_response(status,response_headers,exc_info=None)``. >>>> +of the form ``start_response(status, response_headers, exc_info=None)``. >>>> (As with all WSGI callables, the arguments must be supplied >>>> positionally, not by keyword.) The ``start_response`` callable is >>>> used to begin the HTTP response, and it must return a >>>> ``write(body_data)`` callable (see the `Buffering and Streaming`_ >>>> -section, below). >>>> +section, below). Values passed to the ``write(body_data)`` callable >>>> +should be byte strings. Where native strings are unicode strings, a >>>> +native strings type can also be supplied, in which case it would be >>>> +encoded as ISO-8859-1. >>>> >>>> The ``status`` argument is an HTTP "status" string like ``"200 OK"`` >>>> or ``"404 Not Found"``. That is, it is a string consisting of a >>>> @@ -705,14 +737,20 @@ >>>> single space, with no surrounding whitespace or other characters. >>>> (See RFC 2616, Section 6.1.1 for more information.) The string >>>> **must not** contain control characters, and must not be terminated >>>> -with a carriage return, linefeed, or combination thereof. >>>> +with a carriage return, linefeed, or combination thereof. This >>>> +value should be a byte string. Where native strings are unicode >>>> +strings, the native string type can also be returned, in which >>>> +case it would be encoded as ISO-8859-1. >>>> >>>> The ``response_headers`` argument is a list of ``(header_name, >>>> header_value)`` tuples. It must be a Python list; i.e. >>>> -``type(response_headers) is ListType``, and the server **may** change >>>> +``type(response_headers) is list``, and the server **may** change >>>> its contents in any way it desires. Each ``header_name`` must be a >>>> valid HTTP header field-name (as defined by RFC 2616, Section 4.2), >>>> -without a trailing colon or other punctuation. >>>> +without a trailing colon or other punctuation. Both the header_name >>>> +and the header_value should be byte strings. Where native strings >>>> +are unicode strings, the native string type can also be returned, >>>> +in which case it would be encoded as ISO-8859-1. >>>> >>>> Each ``header_value`` **must not** include *any* control characters, >>>> including carriage returns or linefeeds, either embedded or at the end. >>>> @@ -809,6 +847,14 @@ >>>> Handling the ``Content-Length`` Header >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> >>>> +If an application or middleware layer chooses to return a >>>> +Content-Length header, it should not return more data than specified >>>> +by the header value. Any wrapping middleware layer should not >>>> +consume more data than specified in the header value from the >>>> +wrapped component (either middleware or application). Any WSGI >>>> +adapter must similarly not pass on data above what the >>>> +Content-Length response header value defines. >>>> + >>>> If the application does not supply a ``Content-Length`` header, a >>>> server or gateway may choose one of several approaches to handling >>>> it. The simplest of these is to close the client connection when >>>> @@ -1569,55 +1615,13 @@ >>>> developers. >>>> >>>> >>>> -Proposed/Under Discussion >>>> -========================= >>>> - >>>> -These items are currently being discussed on the Web-SIG and elsewhere, >>>> -or are on the PEP author's "to-do" list: >>>> - >>>> -* Should ``wsgi.input`` be an iterator instead of a file? This would >>>> - help for asynchronous applications and chunked-encoding input >>>> - streams. >>>> - >>>> -* Optional extensions are being discussed for pausing iteration of an >>>> - application's ouptut until input is available or until a callback >>>> - occurs. >>>> - >>>> -* Add a section about synchronous vs. asynchronous apps and servers, >>>> - the relevant threading models, and issues/design goals in these >>>> - areas. >>>> - >>>> - >>>> Acknowledgements >>>> ================ >>>> >>>> -Thanks go to the many folks on the Web-SIG mailing list whose >>>> -thoughtful feedback made this revised draft possible. Especially: >>>> +Thanks go to many folks on the Web-SIG mailing list for helping the work >>>> +on clarifying and improving this specification. In particular: >>>> >>>> -* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up >>>> - on the first draft as not offering any advantages over "plain old >>>> - CGI", thus encouraging me to look for a better approach. >>>> - >>>> -* Ian Bicking, who helped nag me into properly specifying the >>>> - multithreading and multiprocess options, as well as badgering me to >>>> - provide a mechanism for servers to supply custom extension data to >>>> - an application. >>>> - >>>> -* Tony Lownds, who came up with the concept of a ``start_response`` >>>> - function that took the status and headers, returning a ``write`` >>>> - function. His input also guided the design of the exception handling >>>> - facilities, especially in the area of allowing for middleware that >>>> - overrides application error messages. >>>> - >>>> -* Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython >>>> - (well before the spec was finalized) helped to shape the "supporting >>>> - older versions of Python" section, as well as the optional >>>> - ``wsgi.file_wrapper`` facility. >>>> - >>>> -* Mark Nottingham, who reviewed the spec extensively for issues with >>>> - HTTP RFC compliance, especially with regard to HTTP/1.1 features that >>>> - I didn't even know existed until he pointed them out. >>>> - >>>> +* Phillip J. Eby, for writing/editing the 1.0 specification. >>>> >>>> References >>>> ========== >>>> @@ -1643,8 +1647,6 @@ >>>> >>>> This document has been placed in the public domain. >>>> >>>> - >>>> - >>>> .. >>>> Local Variables: >>>> mode: indented-text >>>> >>> >> _______________________________________________ >> Web-SIG mailing list >> Web-SIG@python.org >> Web SIG: http://www.python.org/sigs/web-sig >> Unsubscribe: >> http://mail.python.org/mailman/options/web-sig/paul.joseph.davis%40gmail.com >> > _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com