(A version of this is is available at http://web-core.org/2.0/pep-0444/ — links
are links, code may be easier to read.)
PEP 444 is quite exciting to me. So much so that I’ve been spending a few days
writing a high-performance (C10K, 10Krsec) Py2.6+/3.1+ HTTP/1.1 server which
implements much of the proposed standard. The server is functional (less
web3.input at the time of this writing), but differs from PEP 444 in several
ways. It also adds several features I feel should be part of the spec.
Source for the server is available on GitHub:
https://github.com/pulp/marrow.server.http
I have made several notes about the PEP 444 specification during implementation
of the above, and concern over some implementation details:
First, async is poorly defined:
> If the origin server advertises that it has the web3.async capability, a Web3
> application callable used by the server is permitted to return a callable
> that accepts no arguments. When it does so, this callable is to be called
> periodically by the origin server until it returns a non-None response, which
> must be a normal Web3 response tuple.
Polling is not true async. I believe that it should be up to the server to
define how async is utilized, and that the specification should be clarified on
this point. (“Called periodically” is too vague.) “Callable” should likely be
redefined as “generator” (a callable that yields) as most applications require
holding on to state and wrapping everything in functools.partial() is somewhat
ugly. Utilizing generators would improve support for existing Python async
frameworks, and allow four modes of operation: yield None (no response, keep
waiting), yield response_tuple (standard response), return / raise
StopIteration (close the async connection) and allow for data to be passed back
to the async callable by the higher-level async framework.
Second, WSGI middleware, while impressive in capability, are somewhat…
heavy-weight. Heavily nesting function calls is wasteful of CPU and RAM,
especially if the middleware decides it can’t operate, for example, GZip
compression disabling itself for non-text/ mimetypes. The majority of WSGI
middleware can, and probably should be, implemented as linear ingress or egress
filters. For example, on-disk static file serving could be an ingress filter,
and GZip compression an egress filter. m.s.http supports this filtering and
demonstrates one API for such. Also, I am in the process of writing an example
egress CompressionFilter.
An example API and filter use implementation: (paraphrased from
marrow.server.http)
> # No filters, near 0 overhead.
> for filter_ in ingress_filters:
> # Can mutate the environment.
> result = filter_(env)
>
> # Allow the filter to return a response rather than continuing.
> if result:
> # result is a status, headers, body_iter tuple
> return result[0], result[1], result[2]
>
> status, headers, body = application(env)
>
> for filter_ in egress_filters:
> # Can mutate the environment, status, headers, body, or
> # return completely new status, headers, and body.
> status, headers, body = filter_(env, status, headers, body)
>
> return status, headers, body
The environment has some minor issues. I’ll write up my changes in RFC-style:
SERVER_NAME is REQUIRED and MUST contain the DNS name of the server OR virtual
server name for the web server if available OR an empty bytestring if DNS
resolution is unavailable. SERVER_ADDR is REQUIRED and MUST contain the web
server’s bound IP address. URL reconstruction SHOULD use HTTP_HOST if
available, SERVER_NAME if there is no HTTP_HOST, and fall back on SERVER_ADDR
if SERVER_NAME is an empty bytestring.
CONTENTL_LENGTH is REQUIRED and MUST be None if not defined by the client.
Testing explicitly for None is more efficient than armoring against missing
values; also, explicit is better than implicit. (Paste’s WSGI1 server defines
CONTENT_LENGTH as 0, but this implies the client explicitly declared it as
zero, which is not the case.)
FRAGMENT and PARAMETERS are REQUIRED and are parsed out of the URL in the same
way as the QUERY_STRING. FRAGMENT is the text after a hash mark (a.k.a.
“anchor” to browsers, e.g. /foo#bar). PARAMETERS come before QUERY_STRING, and
after PATH_INFO separated by a semicolon, e.g. /foo;bar?baz. Both values MUST
be empty bytestrings if not present in the URL. (Rarely used — I’ve only seen
it in Java and ColdFusion applications — but still useful.)
Points of contention:
Changing the namespace seems needless. Using the wsgi.* namespace with a
wsgi.version of (2, 0) will allow applications to easily armor themselves
against incompatible use. That’s what wsgi.version is for! I’d add this as a
strong “point of contention”. m.s.http keeps the wsgi namespace and uses a
version of (2, 0).
That’s it so far. I may occasionally write in with additional ideas as I
continue with my HTTP server implementation.
— Alice.
_______________________________________________
Web-SIG mailing list
[email protected]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe:
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com