Re: [Web-SIG] Collating follow-up on the future of WSGI

2016-01-20 Thread Robert Brewer
CherryPy's wsgiserver chunks the write if the application returns no 
Content-Length header at all (and certain other conditions don't intrude). See 
https://bitbucket.org/cherrypy/cherrypy/src/tip/cherrypy/wsgiserver/wsgiserver2.py?#wsgiserver2.py-928


Robert Brewer
fuman...@aminus.org


From: Web-SIG [web-sig-bounces+fumanchu=aminus@python.org] on behalf of 
André Malo [n...@perlig.de]
Sent: Wednesday, January 20, 2016 3:25 AM
To: web-sig@python.org
Subject: Re: [Web-SIG] Collating follow-up on the future of WSGI

* Cory Benfield wrote:

> > On 20 Jan 2016, at 06:04, Graham Dumpleton 
> > wrote:
> >
> > For response content, if a WSGI application currently doesn’t set a
> > Content-Length on response, A HTTP/1.1 web server is at liberty to chunk
> > the response.
> >
> > So I am not sure what is missing.
>
> My specific concern is the distinction between “at liberty to” and
> “required to”. Certain behaviours that make sense with chunked transfer
> encoding do not make sense without it: for example, streaming API endpoints
> that return events as they arrive. Sending this kind of response with a
> HTTP/1.0-style content-length absent response (framed by connection
> termination) is utterly confusing, especially as some APIs consider the
> chunk framing to be semantic.

Those APIs are just broken then. The HTTP RFCs state very clearly [1], that
any hop may modify the transfer encoding. In other words: the transfer
encoding is transparent to the representation layer.

> This can and does bite people, because while all major production WSGI
> servers use chunked transfer encoding in this situation, not all WSGI
> implementations do: in fact, wsgiref does not. This means that if an
> application has a production design requirement to use chunked transfer
> encoding in its responses it cannot rely on the server actually providing
> it.
>
> I see two solutions to this problem: we could mandate that HTTP/1.1
> responses that have no content length must be chunked, rather than falling
> back to HTTP/1.0 style connection-termination-framed responses, or we could
> have servers stuff something in the environ dictionary that can be checked
> by applications. Or, I suppose, we can conclude that this problem is not
> large enough, and that it’s “caveat developer”.

WSGI is a gateway working with the representation layer. I think, it should
not concern itself with underlying transport issues that much.

Regarding chunked requests - in my own WSGI implementation I went the most
pragmatic way and simply provided a CONTENT_LENGTH of -1 for unknown request
sizes (it maps very well to file.read(size)). Something like this would be my
suggestion for a future WSGI spec.

Cheers,
nd

[1] https://tools.ietf.org/html/rfc7230#section-3.3.1
--
If God intended people to be naked, they would be born that way.
  -- Oscar Wilde
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/fumanchu%40aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] wsgi server...

2011-12-26 Thread Robert Brewer
Chris McDonough wrote:
> Does anyone know of a pure-Python WSGI server that:
> - Is distributed indepdently from a web framework or larger whole.
> - Runs on UNIX and Windows.
> - Runs on both Python 2 and Python 3.
> - Has good test coverage.
> - Is useful in production.

I know you know, Chris, but partially to announce it in general:
"Cheroot" [1] is the HTTP server from CherryPy that is now being
developed and distributed independently. It meets most of your
requirements. I currently give it a "C-" on test coverage, partly
because it's so recently been split off. It might get a "C+" if you
include the rest of the CherryPy test suite against it. We're working on
improving that (and welcome some help in that direction). I can't say
it's been tested as an independent project in production, but it carries
with it a long history of stability from its CherryPy ancestry. It
should run equally well on UNIX and Windows, and Pythons 2.4 to 3.2.


Robert Brewer
fuman...@aminus.org

[1] https://bitbucket.org/cherrypy/cheroot
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-06 Thread Robert Brewer
Alice Bevan–McGregor wrote:
> chris.d...@gmail.com said:
> > I can't get my head around filters yet...
> 
> It isn't necessary; it is, however, an often re-implemented feature of
> a framework on top of WSGI.  CherryPy, Paste, Django, etc. all
> implement some form of non-WSGI (or, hell, Paste uses WSGI middleware)
> thing they call a 'filter'.

Or, if you had actually read what I wrote weeks ago, you'd say "CherryPy used 
to have a thing they call a 'filter', but then replaced it with a much better 
mechanism ("hooks and tools") once the naïve categories of ingress/egress were 
shown in practice to be inadequate." Not to mention that, even when CherryPy 
had something called a 'filter', that it not only predated WSGI but ran at the 
innermost WSGI layer, not the outermost. It's apples and oranges at best, or 
reinventing the square wheel at worst.

We don't need Yet Another Way of hooking in processing components; if anything, 
we need a standard mechanism to compose existing middleware graphs so that 
invariant orderings are explicit and guaranteed. For example, "encode, then 
gzip, then cache". By introducing egress filters as described in PEP 444 (which 
mentions gzip as a candidate for an egress filter), you're then stuck in a 
tug-of-war as to whether to build a new caching component as middleware, as an 
egress filter, or (most likely, in order to compete) both.


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to supplimentmiddleware.

2010-12-13 Thread Robert Brewer
Alice Bevan–McGregor
> There's one issue I've seen repeated a lot in working with WSGI1 and
> that is the use of middleware to process incoming data, but not
> outgoing, and vice-versa; middleware which filters the output in some
> way, but cares not about the input.
> 
> Wrapping middleware around an application is simple and effective, but
> costly in terms of stack allocation overhead; it also makes debugging a
> bit more of a nightmare as the stack trace can be quite deep.
> 
> My updated draft PEP 444[1] includes a section describing Filters, both
> ingress (input filtering) and egress (output filtering).  The API is
> trivially simple, optional (as filters can be easily adapted as
> middleware if the host server doesn't support filters) and easy to
> implement in a server.  (The Marrow HTTP/1.1 server implements them as
> two for loops.)
> 
> Basically an input filter accepts the environment dictionary and can
> mutate it.  Ingress filters take a single positional argument that is
> the environ.  The return value is ignored.  (This is questionable; it
> may sometimes be good to have ingress filters return responses.  Not
> sure about that, though.)
> 
> An egress filter accepts the status, headers, body tuple from the
> applciation and returns a status, headers, and body tuple of its own
> which then replaces the response.  An example implementation is:
> 
>   for filter_ in ingress_filters:
>   filter_(environ)
> 
>   response = application(environ)
> 
>   for filter_ in egress_filters:
>   response = filter_(*response)

That looks amazingly like the code for CherryPy Filters circa 2005. In version 
2 of CherryPy, "Filters" were the canonical extension method (for the 
framework, not WSGI, but the same lessons apply). It was still expensive in 
terms of stack allocation overhead, because you had to call () each filter to 
see if it was "on". It would be much better to find a way to write something 
like:

for f in ingress_filters:
if f.on:
f(environ)

It was also fiendishly difficult to get executed in the right order: if you had 
a filter that was both ingress and egress, the natural tendency for core 
developers and users alike was to append each to each list, but this is almost 
never the correct order. But even if you solve the issue of static composition, 
there's still a demand for programmatic composition ("if X then add Y after 
it"), and even decomposition ("find the caching filter my framework added 
automatically and turn it off"), and list.insert()/remove() isn't stellar at 
that. Calling the filter to ask it whether it is "on" also leads filter 
developers down the wrong path; you really don't want to have Filter A trying 
to figure out if some other, conflicting Filter B has already run (or will run 
soon) that demands Filter A return without executing anything. You really, 
really want the set of filters to be both statically defined and statically 
analyzable.

Finally, you want the execution of filters to be configurable per URI and also 
configurable per controller. So the above should be rewritten again to 
something like:

for f in ingress_filters(controller):
if f.on(environ['path_info']):
f(environ)

It was for these reasons that CherryPy 3 ditched its version 2 "filters" and 
replaced them with "hooks and tools" in version 3. You might find more insight 
by studying the latest cherrypy/_cptools.py


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Most WSGI servers close connections to early.

2010-09-22 Thread Robert Brewer
Benoit Chesneau wrote:
> On Wed, Sep 22, 2010 at 5:34 PM, Robert Brewer 
> wrote:
> > However, the caveat requires a caveat: servers must still be able to
> protect themselves from malicious clients. In practice, that means
> allowing servers to close the connection without reading the entire
> request body if a certain number of bytes is exceeded.
>
> I don't see how it could be the responsability of the server. Can you
> develop a little ? The server shouldn't interfere in the HTTP request
> imo.

Well since the "origin server" is the only component in the architecture
that's *actually* having an HTTP conversation with the client, calling
it "interference" seems a bit skewed. ;) RFC 2616 8.2.3 says:

"If an origin server receives a request that does not include an
Expect request-header field with the "100-continue" expectation,
the request includes a request body, and the server responds
with a final status code before reading the entire request body
from the transport connection, then the server SHOULD NOT close
the transport connection until it has read the entire request,
or until the client closes the connection. Otherwise, the client
might not reliably receive the response message. However, this
requirement is not be construed as preventing a server from
defending itself against denial-of-service attacks, or from
badly broken client implementations."

The way CherryPy implements this is to wrap the socket file before
handing it to wsgi.input. That wrapper understands Content-Length (and
another understands Transfer-Encoding), and won't allow any component
that calls wsgi.input.read(n) to read past the Content-Length limit.
[This also allows components to call read() without a size argument yet
not timeout on the socket, as specified in recent proposals.]

The server can be configured to have a maximum number of bytes it will
allow to be read--if Content-Length exceeds that number, the server
immediately responds with 413 Request Entity Too Large. It doesn't read
the rest of the request entity, because it's too big and could cause a
DoS. If clients can't read the response because they're still blocked
sending a request that's too big, there's not really any way to get
around that if the client didn't send an Expect request header.

If the Content-Length is not too large, and the application returns
(normally or exceptionally), and the wrapper has not recorded that the
bytes read equals the Content-Length, then the server will consume the
remaining bytes and throw them away before sending the response headers.

I just noticed it doesn't do that if it's going to close the conn. Not
sure why. Maybe it should.


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Most WSGI servers close connections to early.

2010-09-22 Thread Robert Brewer
Marcel Hellkamp wrote:
> I just discovered a problem that affects most WSGI server
> implementations and most current web-browsers (tested with wsgiref,
> paste, firefox, chrome, wget and curl):
> 
> If the server closes the connection while the client is still uploading
> data via POST or PUT, the browser displays an error message
> ('Connection
> closed') and does not display the response sent by the server.
> 
> The error occurs if an application chooses to not process a form
> submissions before returning to the WSGI server. This is quite rare in
> real world scenarios, but hard to debug because the server logs the
> request as successfully sent to the client.
> 
> To reproduce the problem, run the following script, visit
> http://localhost:8080/ and upload a big file::
> 
> 
> 
> from wsgiref.simple_server import make_server
> 
> def application(environ, start_response):
> start_response('200 OK', [('Content-Type', 'text/html')])
> return ["""
> 
>   Upload bog file:
>   
>   
> 
> """]
> 
> server = make_server('localhost', 8080, application)
> server.serve_forever()
> 
> 
> 
> 
> I would like to add a warning to the WSGI/web3 specification to address
> this issue:
> 
> "An application should read all available data from
> `environ['wsgi.input']` on POST or PUT requests, even if it does not
> process that data. Otherwise, the client might fail to complete the
> request and not display the response."

Indeed. CherryPy has protected against this for some time. But it shouldn't be 
the burden of *applications* to do this; the WSGI "origin" server can do so 
quite easily.

However, the caveat requires a caveat: servers must still be able to protect 
themselves from malicious clients. In practice, that means allowing servers to 
close the connection without reading the entire request body if a certain 
number of bytes is exceeded.


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 (aka Web3)

2010-09-20 Thread Robert Brewer
> On Sun, 2010-09-19 at 21:52 -0400, Chris McDonough wrote:
> 
> > I'm -0 on the server trying to guess the Content-Length header.  It
> just
> > doesn't seem like much of a burden to place on an application and
> it's
> > easier to specify that an application must do this than it is to
> specify
> > how a server should behave in the face of a missing Content-Length.
> I
> > also believe Graham has argued against making the server guess, I
> > presume this causes him some pain somehow (probably
> underspecification
> > in WSGI).
> 
> Graham's issues with requiring the server to set Content-Length are
> detailed here:
> 
> http://blog.dscpl.com.au/2009/10/wsgi-issues-with-http-head-
> requests.html

Chris,

Thanks for that link. I had completely forgotten about that issue. I'd
really appreciate it if your web3 spec made some definitive decision on
whether applications and middleware are responsible for correctly
differentiating HEAD from GET, or whether servers should transform HEAD
to GET before invoking the first application callable. I'd personally
prefer the former.


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 (aka Web3)

2010-09-18 Thread Robert Brewer
Marcel Hellkamp wrote:
> Am Donnerstag, den 16.09.2010, 22:58 +0200 schrieb Armin Ronacher:
> > - The async part.
> > If I can't find someone that is willing to provide some input on that
> > I will remove that section.
> 
> I see a problem here: The response tuple must be returned synchronously
> according to web3. Once returned, the values are final. If an
> application needs to wait for some background task to finish in order
> to decide about headers or the status code, it is now forced to block
> completely.
> 
> A common use case for this is a web service that itself queries other
> web services (e.g. an ajax proxy to work around "same origin policy").
> 
> With WSGI it was possible to yield empty strings as long as the
> application is waiting for data and call start_response once the
> headers are final. Not perfect, but at least non-blocking. Web3
> removes this possibility. The headers must be returned before the
> body iterable yielded its first element, empty or not.
> 
> Removing any support for this type of asynchronism would render web3
> useless for all but completely synchronous and trivial applications.
> Even frameworks would have no way to work around this anymore.
> 
> I do understand that the start_response callable is inconvenient for
> middleware to implement, but it totally made sense.

I don't follow. What is the benefit of yielding empty strings instead of just 
waiting for the status and headers to be available? Do you then run off and do 
other things with that server thread?

I've run a few businesses now on WSGI without doing what you describe, so I 
don't see why blocking makes an application 'trivial'.


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 (aka Web3)

2010-09-16 Thread Robert Brewer
Chris McDonough wrote:
> A PEP was submitted and accepted today for a WSGI successor protocol
> named Web3:
> 
> http://python.org/dev/peps/pep-0444/
> 
> I'd encourage other folks to suggest improvements to that spec or to
> submit a competing spec, so we can get WSGI-on-Python3 settled soon.

Thanks Chris, a few comments:

 1. Hooray for all-byte output.
 2. Hardly anybody implements RFC 2047, and http-bis is phasing it out.
In addition, since folded and/or 2047-encoded lines are equivalent
to their non-folded-nor-encoded variants, applications have no
business emitting folded or encoded versions of these; that decision
should be left up to the origin server. So keep the text about
control characters, carriage returns and linefeeds, please.
 3. +1 on (status, headers, body) in that order. Your own example code
composed them in that order, and then re-arranged them for output!
One of the benefits of a new spec is the opportunity to coerce
rewrites in existing codebases that undo their poor design choices
and make them more readable. By the way, the "Specification Details"
and "Values Returned" sections have this in the (s, h, b) order in
your draft.
 4. The web3 spec says, "In case a content length header is absent the
stream must not return anything on read. It must never request more
data than specified from the client." but later it says, "Web3
servers must handle any supported inbound "hop-by-hop" headers on
their own, such as by decoding any inbound Transfer-Encoding,
including chunked encoding if applicable.". I would be sad if web3
did not support streaming uploads via Transfer-Encoding. One way to
implement that would be to make the origin server handle read()
transparently by returning '' on EOF, regardless of whether a
Content-Length or a Transfer-Encoding header was provided.
 5. Conversely, streaming output is nice to have and should be
explicitly
supported in the web3 spec. One way would be to require servers
to respect a 'Transfer-Encoding: chunked' header emitted by the
application. However, the WSGI and web3 specs specifically deny
this approach by saying, "Applications and middleware are forbidden
from using HTTP/1.1 "hop-by-hop" features or headers". A workaround
would be for the application to signal Transfer-Encoding by omitting
any Content-Length header in its response headers (this is what
CherryPy currently does).
 6. I'd personally like to see it be OK for apps and middleware to
emit "Connection: close" too, or have some other way of
communicating
that desire to the server.
 7. "it is presumed that Web3 middleware will be created which can
be used "in front" of existing WSGI 1.0 applications, allowing
those existing WSGI 1.0 applications to run under a Web3 stack.
This middleware will require, when under Python 3, an equivalence
to be drawn between Python 3 str types and the bytes values
represented by the HTTP request and all the attendant encoding-
guessing (or configuration) it implies." Just some field experience:
that's not hard. CherryPy 3.2 does this now between various WSGI
proposals.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-08-27 Thread Robert Brewer
Paul Davis wrote:
> > Since the major stumbling block, irrespective of other changes,
> > to any sort of agreement is still bytes vs unicode
>
> I ran into this while I was attempting to put together enough code to
> play with a wsgiref2 that ran on both 2.x and 3.x. As Graham has
> deftly pointed out, its a pretty big pain in the rear.
> 
> Specifically, if we specify that all keys in the environ dictionary
> are byte strings, then there's a noticeable amount of pain in trying
> to write code that runs on both platforms. I object to 2to3.py on
> religious grounds, so when I was implementing this I was doing so with
> code that would run unmodified on both 2 and 3.

Religion is what gets us into this mess. Pragmatism will get us out. We
have two options:

 1. Continue to try to write code that runs unmodified on Python 2 and
3, or that runs when 2to3 is applied. There is a morass of corner cases
and state machines that behave differently depending on when you look at
them lurking here. You can all see where that is getting us: nowhere. By
the time you all discover how to write a spec that deals with all the
pain points which 2to3 introduces, Python 2 will be dead and you will
have wasted your time.
 2. Write a Python 3 version of your code. Yes, it's more drudge work.
Suck it up. To ameliorate that, make the Python 3 version the default as
soon as possible. Deprecate the Python 2 branch. Backport features as
necessary to the Python 2 branch (just as Python itself has been doing,
if you notice). If you do that, we can write a WSGI for Python 3 now
that doesn't suffer from any of the complexities of 2to3.


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Future of WSGI

2009-11-25 Thread Robert Brewer
Sylvain Hellegouarch wrote:
> Personally, I would favor the idea that WSGI2 specifies the way
headers
> should be mapped to object attributes (e.g. Content-Type would become
> content_type) and then let duck typing magic happen rather than
> specifying a class from which to inherit for instance.

How would you handle HTTP extension headers like
X-MyEnterprise-Metadata?

Cook [1] might be appropriate to read here: "...abstract data types
facilitate adding new operations, while [objects] facilitate adding new
representations... Abstract data types define operations that collect
together the behaviors for a given action. Objects organize the matrix
the other way, collecting together all the actions associated with a
given representation. It is easier to add new operations in an ADT, and
new representations using objects."

IMO, it's quite appropriate that we essentially use an ADT (a dict) at
the lowest level, precisely because it constrains the representation.
This is the essence of The Zen of CherryPy #8 "Subclassed builtins are
better than custom types" (really, custom _classes_) and #9 "But builtin
types are even better". People can then objectify those ADTs to their
representational taste.


Robert Brewer
fuman...@aminus.org

[1] http://www.cs.utexas.edu/~wcook/Drafts/2009/essay.pdf
[2]
http://www.cherrypy.org/wiki/ZenOfCherryPy#a8.Subclassedbuiltinsarebette
rthancustomtypes.

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Session events

2009-10-05 Thread Robert Brewer
Alastair "Bell" Turner wrote:
> I've been looking through the range of choices for Python web
> [application] frameworks/libraries (Just to have all the bases
> covered) for a new build project and standardisation of some small
> utilities. There's one feature that I'm not finding and was just
> wanting to check on before considering the joys of rolling my own: I'm
> not finding any support for user session events, I'm particularly
> interested in being able to register a handler on session expiry or
> cleanup. I've mainly been looking at the lighter weight frameworks
> since my requirement for the new build is mainly aggregate and list
> operations, so the least suitable load for ORMs.

I hope, for your own sanity, that by "rolling my own" you mean "my own
session extension", not "my own web framework." ;)

> Have I missed the feature session event somewhere?

You haven't missed it in CherryPy because we actually took it out a few
years ago on purpose--it was a request rare enough to warrant favoring
simplicity of the code base over feature creep. These days, the standard
approach in CP 3.x is to subclass cherrypy.lib.sessions.FileSession (or
one of the others), and add your own calls where you want them, then
just stuff your new class into the sessions module via
"cherrypy.lib.sessions.MyFileSession = MyFileSession" (and the config
system will automatically pick it up).


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Robert Brewer
P.J. Eby [mailto:p...@telecommunity.com]
> At 07:40 PM 9/21/2009 -0700, Robert Brewer wrote:
> > Yes; you have to transcode to the "correct" encoding. Once.
> > Then every other WSGI application interface "below" that one
> > doesn't have to care.
> 
> You can only do that if you *break encapsulation*, which as I said
> earlier is voiding the entire point of having a modular interface.

Requiring one component to run before another to achieve a correct
result does not void modularity. Unix pipes employ a modular interface,
but "cat /etc/fstab | wc | head" produces a very different result than
"cat /etc/fstab | head | wc". In such a system, encapsulation requires
that the components not share state, but rather trust that they are
composed correctly (yes, by some "invisible hand") and that the given
input is the intended one, even if that means a previous component
transformed it.

If, on the other hand, only utf-8-decoded strings can be passed as input
to each WSGI component, then each WSGI component must be prepared to
re-decode its inputs; in that case, each must be configured identically
with the same logic to determine the correct decoding, since the correct
decoding does not differ from one component to the next. That repeated
configuration of the correct decoding is shared state, and breaks
encapsulation; one-time transformation of inputs is not and does not.

> Having a configurable encoding just means that *every* WSGI
> application *must* verify the encoding in order to be safe.

No, each can trust its inputs and do its intended job instead, if your
idempotency requirement is relaxed.

> I'm all
> in favor of making everyone suffer equally, but all else being equal,
> I'd prefer them to suffer idempotently rather than conditionally.  ;-)

I know you do, but I don't see the community following your lead in that
preference. Any middleware that alters the environ breaks idempotency.
Any middleware that alters the output breaks idempotency. Most routing
middleware breaks idempotency. There's a lot of all of those already in
the wild.

CherryPy doesn't care, because we marginalized WSGI middleware into near
obscurity. We did that in large part because of the idempotency
requirements of WSGI 1.0. We may have the only routing middleware that
you could mistakenly put in your stack twice and get the same result! So
I'm not fighting for myself/my framework on this; surrogateescape would
work just fine for us since we ship very little middleware.

But I don't think it would work fine for Paste, Pylons, Turbogears,
Repoze, etcetera etcetera who have lots of WSGI middleware to port and
more they want to build, and have been chafing for years now against
this requirement. I believe they want full unicode SCRIPT_NAME and
PATH_INFO, and would prefer a single, new, modular WSGI component be
inserted in their component graphs than to build that logic into every
WSGI component. They already have to deal with correct ordering in their
WSGI component graphs, because they've already abandoned strict
idempotency. Ben, Ian, Mark, Chris, et al, please confirm or deny that;
I could be way off base.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Robert Brewer
Henry Precheur wrote:
> On Mon, Sep 21, 2009 at 03:26:35PM -0700, Robert Brewer wrote:
> > It looks simpler until you have a site that is not primarily utf-8.
> > In that case, you multiply your (1 line * number of middlewares in
the
> > WSGI
> > stack * each request).
> > With wsgi.uri_encoding you get either (1 line * 1
> > middleware designed to transcode * each request), or even 0 if your
> > whole site uses just one charset.
> 
> I am not sure I understand your point.
> 
> The 0 lines hold true if the whole site is using latin-1 or utf-8 and
> you write your applications/middlewares only for this site. But if
it's
> using any other encoding you still have to transcode.
> 
> def middleware(start_response, environ):
> value = environ['some_key'].\
> encode('utf8', 'surrogateescape').\
> decode(SITE_ENCODING)
> ...

Yes; you have to transcode to the "correct" encoding. Once. Then every
other WSGI application interface "below" that one doesn't have to care.

> With wsgi.uri_encoding you would still have to do the following:
> 
> def middleware(start_response, environ):
> value = environ['some_key'].\
> encode(environ['some_key.encoding']).\
> decode(SITE_ENCODING)
> ...
> 
> Of course you can directly use `environ['some_key']` if you know
you'll
> get the 'right' encoding all the time. But when the encoding changes,
> you'll have to fix all your middlewares.

The decoding doesn't change spontaneously. You either get the correct
one or you get an incorrect one. If it's incorrect, you fix it, one
time, via a WSGI component which you've configured to determine the
"correct" decoding. Then every other WSGI component "below" that one can
go back to trusting the decoding was correct. In fact, if you do that
transcoding right away, no other WSGI components need to be rewritten to
take advantage of unicode. You just have to deploy a single transcoder,
that's 6 lines of code max. I know PJE will chime in here and say you
can't deploy a website that works differently if you happen to forget to
turn on a given piece of middleware, but I also know the rest of you
will drown him out from personal experience because you've *never* done
that. ;)

With utf8+surrogateescape, you don't transcode once, you transcode in
every WSGI component in your stack that needs to "correct" the decoding.
You have to do it more than once because, each time you
encode/re-decode, you use the result and then throw it away. Any
subsequent WSGI components have to encode/re-decode--you cannot store
the redecoded URI in SCRIPT_NAME/PATH_INFO, because the
utf8+surrogateescape scheme says...well, it's always utf8-decoded. In
addition, *every* component that needs to compare URI's then has to be
configured with the same logic, however convoluted, to perform the
"correct" decoding again. It's not just routing middleware: caches need
to reliably compare decoded URI's; so do sessions; so does auth
(especially!); so do static files. And Heaven forfend you actually
decode differently in two different components!


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Robert Brewer
I've never proposed that WSGI make choices for people. I'm simply saying that a 
configurable server, with a sane, perfectly-reversible default, is the simplest 
thing that could possibly work.


Robert Brewer
fuman...@aminus.org

> -Original Message-
> From: Mark Nottingham [mailto:m...@mnot.net]
> Sent: Monday, September 21, 2009 6:28 PM
> To: P.J. Eby
> Cc: Robert Brewer; René Dudfield; Web SIG
> Subject: Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
> 
> +1. There is no one answer for these issues (e.g., URI->IRI conversion
> can lose information), so low-level infrastructure like WSGI shouldn't
> be making choices for people.
> 
> 
> On 22/09/2009, at 5:31 AM, P.J. Eby wrote:
> 
> > At 11:23 AM 9/21/2009 -0700, Robert Brewer wrote:
> >> I still don't see why the environ should have multiple versions of
> >> anything. It's not as if the HTTP request gives us multiple Request-
> >> URI's. There's a single processing step that has to happen
> >> somewhere: decoding the bytes of the Request-URI to unicode. For
> >> the vast majority of apps, it should only happen once. Twice is
> >> acceptable to me for some apps. As I pointed out in the linked
> >> email, doing that as soon as possible (i.e. in the WSGI origin
> >> server) allows URI's to be compared as character strings more
> >> easily. If you deploy a piece of middleware that transcodes (based
> >> on more information than servers want to deal with), it had better
> >> be nearly first in the stack so routing works reliably.
> >
> > The problem with this whole approach is that it's not composable.
> > You can't stick in an application under a router that uses a
> > different method for grokking its subtree of the URI space, unless
> > it knows what's been done to the URI and can un-do it.
> >
> > Maybe I'm missing something here, but the only way I see to preserve
> > composability here is to use latin-1 or bytes.
> >
> > The fundamental problem is that, like it or not, HTTP headers are
> > actually byte strings.  The *only* reason we ever supported unicode
> > in WSGI was to handle platforms where there's no such thing as a non-
> > unicode string, and there we made it explicit that it's just a way
> > of manipulating *bytes*, not unicode.
> >
> > ISTM that very few (if any) of the proposals floating around for
> > modifying WSGI are taking this concept into account.  Most of them
> > sound to me like people saying, "yeah, but this particular hack will
> > work for *my* apps...  so everybody else must be doing something
> > stupid."
> >
> > But WSGI was built on the principle of *equally inconveniencing
> > everyone*, specifically to avoid an impossible attempt at consensus
> > between incompatible ways of doing things.  (E.g., nine million
> > request/response APIs.)
> >
> > So, if the only problem we're going to cause by using bytes
> > everywhere is to make everyone need to change their routing code on
> > Python 3, I vote +1000.  ;-)
> >
> > ___
> > Web-SIG mailing list
> > Web-SIG@python.org
> > Web SIG: http://www.python.org/sigs/web-sig
> > Unsubscribe: http://mail.python.org/mailman/options/web-
> sig/mnot%40mnot.net
> 
> 
> --
> Mark Nottingham http://www.mnot.net/

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Robert Brewer
Henry Precheur wrote:
> On Mon, Sep 21, 2009 at 09:14:13PM +0200, Armin Ronacher wrote:
> > So the same standard should have different behavior on different
> > Python versions?  That would make framework code a lot more
complicated.
> 
> I don't understand why it would be 'a lot more' complicated.
> 
> (The following code snippets is Python 3 only, and assumes we're using
> 'native strings' everywhere)
> 
> In the gateway, environ would be populated this way:
> 
>   environ['some_key'] = some_value.decode('utf8', 'surrogateescape')
> 
> Compare that to the utf-8-then-latin-1 alternative:
> 
>   try:
>   environ['some_key'] = some_value.decode('utf-8')
>   environ['some_key.encoding'] = 'utf-8'
>   except UnicodeError:
>   environ['some_key'] = some_value.decode('latin-1')
>   environ['some_key.encoding'] = 'latin-1'
> 
> 
> What you would have in the application to get the original value:
> 
>   environ['some_key'].encode('utf8', 'surrogateescape')
> 
> With utf8-then-latin1:
> 
>   environ['some_key'].encode(environ['some_key.encoding'])
> 
> 
> The 'surrogateescape' way is clearly simpler.

It looks simpler until you have a site that is not primarily utf-8. In
that case, you multiply your (1 line * number of middlewares in the WSGI
stack * each request). With wsgi.uri_encoding you get either (1 line * 1
middleware designed to transcode * each request), or even 0 if your
whole site uses just one charset.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Robert Brewer
P.J. Eby wrote:
> At 11:23 AM 9/21/2009 -0700, Robert Brewer wrote:
> >I still don't see why the environ should have multiple versions of
> >anything. It's not as if the HTTP request gives us multiple
> >Request-URI's. There's a single processing step that has to happen
> >somewhere: decoding the bytes of the Request-URI to unicode. For the
> >vast majority of apps, it should only happen once. Twice is
> >acceptable to me for some apps. As I pointed out in the linked
> >email, doing that as soon as possible (i.e. in the WSGI origin
> >server) allows URI's to be compared as character strings more
> >easily. If you deploy a piece of middleware that transcodes (based
> >on more information than servers want to deal with), it had better
> >be nearly first in the stack so routing works reliably.
> 
> The problem with this whole approach is that it's not
> composable.  You can't stick in an application under a router that
> uses a different method for grokking its subtree of the URI space,
> unless it knows what's been done to the URI and can un-do it.

I don't understand. If SCRIPT_NAME/PATH_INFO/QUERY_STRING are unicode, the only 
answer to "what's been done to the URI?" can be "wsgi.uri_encoding", which 
allows someone to un-do it. What more do you want?

1. bytes arrive. server decodes with utf8, sets 'wsgi.uri_encoding' to 'utf-8'.
2. middleware says "oops, that's wrong". encodes back to bytes using 'utf-8', 
and re-decodes with koi-8, changing wsgi.uri_encoding to 'koi-8'
3. further middlewares and app use the unicode value, and don't really care 
what encoding was used.

> Maybe I'm missing something here, but the only way I see to preserve
> composability here is to use latin-1 or bytes.
> 
> The fundamental problem is that, like it or not, HTTP headers are
> actually byte strings.  The *only* reason we ever supported unicode
> in WSGI was to handle platforms where there's no such thing as a
> non-unicode string, and there we made it explicit that it's just a
> way of manipulating *bytes*, not unicode.
> 
> ISTM that very few (if any) of the proposals floating around for
> modifying WSGI are taking this concept into account.  Most of them
> sound to me like people saying, "yeah, but this particular hack will
> work for *my* apps...  so everybody else must be doing something
> stupid."
> 
> But WSGI was built on the principle of *equally inconveniencing
> everyone*, specifically to avoid an impossible attempt at consensus
> between incompatible ways of doing things.  (E.g., nine million
> request/response APIs.)
> 
> So, if the only problem we're going to cause by using bytes
> everywhere is to make everyone need to change their routing code on
> Python 3, I vote +1000.  ;-)

That's not the only problem. Using native strings wherever possible makes web 
programing in Python easier, regardless of version. In Python 3, that happens 
to be unicode, for good reasons.

For HTTP, there's a more specific reason: URI's should be compared for 
equivalence character by character, not byte by byte. See 
http://tools.ietf.org/html/rfc3986#section-6.2.1. That includes routing 
middleware.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Robert Brewer
René Dudfield wrote:
> On Mon, Sep 21, 2009 at 6:05 PM, Robert Brewer 
> wrote:
> > Armin Ronacher wrote:
> >> WSGI will demand UTF-8 URLs and only
> >> provide iso-XXX support for backwards compatibility.
> >
> > WSGI cannot demand that; a recommendation for utf-8 in a few draft
> > specifications is at least a decade removed from ubiquitous
> > implementation. We can default to utf-8 at best. I discussed this at
> > length in
> > http://mail.python.org/pipermail/web-sig/2009-August/003948.html
> >
> 
> that post does have good arguments why "a single encoding is not
> acceptable".  utf-8 seems the most common at this point to be the
> default... but we do need a way to specify encoding.
> 
> Is that what you're saying Robert?  Do you have a suggestion for
> specifying encodings?

CherryPy 3.2 does this (pseudocode):

try:
decode_uri(userdefault or 'utf-8')
except UnicodeDecodeError:
decode_uri('iso-8859-1')

> I think surrogateescape will handle the issues with allowing bytes to
> be stored in utf-8.
> http://www.python.org/dev/peps/pep-0383/
> 
> However, I think that is only implemented in python 3.1?... but maybe
> there is someway to have it work on other pythons too?

As Henry Prêcheur says, "that's not an issue if the 'new' WSGI sticks to native 
strings." Which I'd be happy with.

> How about...
> 
> Being able to request which encoding you want has the benefit of only
> having to store one representation before 'baking' the result into the
> environ.  So if someone only ever wants utf-8 they can get it...
> however if they choose to 'bake' the environ then they can request
> something else.  This is similar to a per server setting, but I think
> should work with middleware too?

As noted above, it *is* a per-server setting in CherryPy 3.2. And any 
middleware can certainly be configured as its authors see fit; I don't see a 
need for a generic mechanism to specify what encodings middleware should try. 
However, we still need a generic mechanism declaring which encoding was 
successfully used; this is 'wsgi.uri_encoding'.

> As multiple things should be
> available, and if baked middleware (if it wants to modify things, will
> need to change each version of things).
> 
> These 'baking' methods could live in wsgi to simplify modifying the
> environs multiple versions of things. It would just have some get/set
> functions to put correct handling of encodings in one place.  Of
> course middleware is still free to change things as it wants.

I still don't see why the environ should have multiple versions of anything. 
It's not as if the HTTP request gives us multiple Request-URI's. There's a 
single processing step that has to happen somewhere: decoding the bytes of the 
Request-URI to unicode. For the vast majority of apps, it should only happen 
once. Twice is acceptable to me for some apps. As I pointed out in the linked 
email, doing that as soon as possible (i.e. in the WSGI origin server) allows 
URI's to be compared as character strings more easily. If you deploy a piece of 
middleware that transcodes (based on more information than servers want to deal 
with), it had better be nearly first in the stack so routing works reliably.


Robert Brewer
fuman...@aminus.org


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Robert Brewer
Armin Ronacher wrote:
> WSGI will demand UTF-8 URLs and only
> provide iso-XXX support for backwards compatibility.

WSGI cannot demand that; a recommendation for utf-8 in a few draft
specifications is at least a decade removed from ubiquitous
implementation. We can default to utf-8 at best. I discussed this at
length in
http://mail.python.org/pipermail/web-sig/2009-August/003948.html


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Robert Brewer
P.J. Eby wrote:
> At 07:57 AM 9/21/2009 +0200, Armin Ronacher wrote:
> >Chris McDonough schrieb:
> > > Personally, I find it a bit hard to get excited about Python 3 as
a
> > > web application deployment platform.
> > Everybody feels that way currently.  But if we don't fix WSGI that
> > will never change.
> 
> This is only compounding the errors introduced by the "make the tests
> pass" philosophy of "porting" the stdlib.  We should not make them
> worse.
> 
> At the moment (AFAIK) nobody has gone through the web bits of the
> stdlib and asked, "Should this work on strings, bytes, or both, and
> if both, how should that API be expressed?"

Perhaps not, but I wrote unquote_bytes at PyCon 2009, after discussing
urllib in the python-dev room and being told no bytes-compatible version
was desired in the stdlib. So *some* thought has gone into it.


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Robert Brewer
And Clover wrote:
> > A middleware might re-decode the values if the `wsgi.uri_encoding`
> > is `iso-8859-1` and only then.
> 
> Seems like a mistake. If the middleware knows iso-8859-7 is in use, it
> would need to transcode the charset regardless of whether the
> initially-submitted bytes were a valid UTF-8 sequence or not.
Otherwise
> the application would break when fed with eg. Greek words that
happened
> to encode to valid UTF-8 bytes.

If the entire site expects iso-8859-7 Request-URL's then the deployer
should tell the WSGI server to decode using iso-8859-7 instead of utf-8.

If only part of the site expects iso-8859-7 then...yeah, it needs to
transcode. So what?

> > The application MUST use this value to decode the ``'QUERY_STRING'``
> > as well.
> 
> This will break all use of non-UTF-8 encodings in QUERY_STRING, where
> the path part of the URL does not contain non-UTF-8 sequences. That
> includes the very common case where the path part contains only ASCII.
> 
>  http://greek.example.com/myscript.cgi?x=%C2
> 
> will fail, as the given UTF-8 sniffer only looks at the path part to
> determine what encoding to use for both of the path part and the query
> string.

No, it won't fail. WSGI servers do not perform %-decoding of the
QUERY_STRING. In the example given, a WSGI 1.1 server will set the
Python 3 environ values:

{'SCRIPT_NAME': '',
 'PATH_INFO': 'myscript.cgi',
 'QUERY_STRING': 'x=%C2'}


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-20 Thread Robert Brewer
P.J. Eby wrote:
> At 03:06 PM 9/20/2009 +0200, Armin Ronacher wrote: 
> >The following things became pretty clear when playing around with
> >various specifications on Python 3:
> >
> >-  Python 3 no longer implicitly converts between unicode and byte
> >strings.  This covers comparisons, the regular expression engine,
> >all string functions and many modules in the stdlib.
> >-  The Python 3 stdlib radically moved to unicode for non unicode
> things
> >as well (the http servers, http clients, url handling etc.)
> >
> >-  A byte only version of WSGI appears unrealistic on Python 3
because
> >it would require server and middleware implementors to
reimplement
> >parts of the standard library to work on bytes again.
> 
> IMO, this strongly suggests that it's the stdlib or Python 3 that's
> broken here.  How much of the stdlib are we talking about needing to
> reimplement, aside from cgi.FieldStorage?

urllib.unquote, for one. We had to make a version which accepts bytes
(and outputs bytes). But it's only 8 lines of code.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 0333 and PEP XXXX Updated

2009-09-20 Thread Robert Brewer
Graham Dumpleton wrote:
> Looking at the bigger picture, there are three overall goals that I
> can see that we would want to address.
> 
> 1. Clarifications and corrections to existing WSGI for Python 2.X to
> allow readline() with size hint, mandatory end of stream sentinel for
> wsgi.input, support for chunked request content and rules on amount of
> data that should be returned by WSGI applications and how much data
> wsg.file_wrapper should send back from a file when Content-Length is
> defined. These were the points (11) to (16) that I tacked onto my
> definition #4, in my blog post. They are applicable though to any
> update to WSGI for any version of Python.
> 
> 2. Come up with a version of WSGI for Python 3.X. The whole bytes
> versus unicode discussion.
> 
> 3. Drop the start_response() function and ability to use its write()
> function returned as result. What people have been calling WSGI 2.0.

My goals, in priority order, for the next version(s) of WSGI:

1. Full unicode (not just x00-xFF) in Python 3 for the environ keys and
most values (not wsgi.input, for example).
2. Points 11-16 as you described.
3. The ability to upgrade a WSGI1.0/CPython2.x application to CPython3
using 2to3, minimizing ancillary changes, even if that means requiring
an upgrade to the WSGI version in the process.
4. Minimize the special cases in any new spec. Note this is at the
lowest priority.

> To go along with that, there are a couple major questions I think
> needs to be answered and this will dictate to a degree what any
> roadmap will be.
> 
> The first question is, should Python 2.X forever be bytes everywhere,
> or if we start introducing unicode into parts of the definition for
> Python 3.X, should those versions of the WSGI specification map those
> unicode parts back in to the Python 2.X of an equivalent version of
> the specification?

CherryPy 3.x on Python 2.x will always use bytes everywhere, as we have
always done. So I understand completely if Django, Pylons, etc have
"always used" unicode and want to keep doing that. If y'all decide to
make a version of WSGI which requires unicode because you think it's
easier or more popular, no problem--CherryPy 3.2+ on Python 2 will just
convert back to bytes before handing off that data to CherryPy apps.
This is one reason why a new "wsgi.url_encoding" entry would be required
if SCRIPT_NAME/PATH_INFO/QUERY_STRING become unicode.

> In my definitions I introduced 'native' string along with 'bytes' and
> 'unicode' string in an attempt to try and be able to use one set of
> language which would describe WSGI and be interpretable in the context
> of both Python 2.X and Python 3.X.
> 
> For definition #4, this mean defining SCRIPT_NAME, PATH_INFO and
> QUERY_STRING as 'unicode' string. This meant that for Python 2.X, they
> would as such also be unicode string. The other option was to define
> them as 'native' string, which means the whole 'wsgi.uri_encoding'
> flag was only relevant to Python 3.X, as in Python 2.X the native
> string is 'bytes' and so the whole encoding issue would still be up to
> the WSGI application as it is now for bytes everywhere WSGI in Python
> 2.X. In effect, if they were 'native' strings and 'wsgi.uri_encoding'
> went way, we just have existing WSGI 1.0. The only actual difference
> was that I was adding on top of definition #4 the clarifications as
> per (1) above.

I'd be happy if WSGI 1.1 said "use native" and the "wsgi.uri_encoding"
entry was only required on versions of Python where the native string
type is unicode. That's an extra paragraph in the spec, yes, so violates
my goal 4 a bit, but IMO should not outweigh my goals 1, 2, and 3.

> The second question is, do we want to try and come up with something
> for Python 3.X, ie., (2) above, while still preserving the current
> start_response() callback, or do we instead want to jump direct to
> WSGI (Python 3.X) 2.0, ie., combine (2) and (3) above, and say that
> there is no WSGI 1.X for Python 3.X at all?

I want something in between so I don't have to wait months or years for
WSGI 2. I want to ship a version of CherryPy with Python 3 support last
week.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-20 Thread Robert Brewer
Armin Ronacher wrote:
> Thanks to Graham Dumpleton and Robert Brewer there is some serious
> progress on WSGI currently.  I proposed a roadmap with some PEP
changes
> now that need some input.
> 
> Summary:
> 
>   WSGI 1.0   stays the same as PEP 0333 currently is
>   WSGI 1.1   becomes what Ian and I added to PEP 0333
>   WSGI 2.0   becomes a unicode powered version of WSGI 1.1
>   WSGI 3.0   becomes WSGI 2.0 just without start_response
> 
>   WSGI 1.0 and 1.1 are byte based and nearly impossible to use on
> Python
>   3 because of changes in the standard library that no longer work
with
>   a byte-only approach.
> 
> 
> The PEPs themselves are here: http://bitbucket.org/ianb/wsgi-peps/
> Neither the wording not the changes in there are anywhere near final.
> 
> 
> Graham wrote down two questions he wants every major framework
> developer
> to be answered.  These should guide the way to new WSGI standards:
> 
> 1. Do we keep bytes everywhere forever in Python 2.X, or try to
>introduce unicode there at all to at least mirror what changes
might
>be made to make WSGI workable in Python 3.X?

I'm happy either way, since CherryPy abstracts it all away. Decide
already and I'll implement it.

> 2. Do we skip WSGI 1.X completely for Python 3.X and go straight to
>WSGI 2.0 for Python 3.X?

+1 for skipping straight to unicode in Python 3. But call it "1.1" not
"2.0".

> I added a new question I think should be asked too:
> 
> 3. Do we skip WSGI 2.0 as specified in the PEP and go straight to
>WSGI 3.0 and drop start_response?

No. We need more time to discuss and try to implement the large
architectural changes in that. I need to ship CP 3.2 soon and would like
it to have a better Python 3 story than the "bytes-everywhere" (or
"unicode pretending to be bytes") of WSGI 1.0. We have working code,
which uses unicode in Python 3. Maybe I'll call it "wsgi.version = (1,
'cp32')" and let the spec come later if we can't see the trees for the
forest.


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] String Types in WSGI [Graham's WSGI for py3]

2009-09-19 Thread Robert Brewer
René Dudfield wrote:
> No, slash encoding and normalising are not the only issues.
> As mentioned before sometimes you need the exact bytes.
>
> 1. buggy clients.  If a client sends something that doesn't work
> correctly, you can still sometimes make sense of it in the raw
version
> of the url.
> 2. client APIs that require the server to know the exact url.
> 3. buggy servers that don't do their job properly.
> 4. extensibility.  A url scheme changes a tiny bit, and you want to
> support the change.  Having the raw url allows you do to support it
> on old servers.
>
> In all APIs it's handy to go to lower levels when the higher levels
> don't work right.  Especially when wsgi only handles one side of
> things, and urls are can be generated by anything.

and Graham Dumpleton replied:
> This is where it all comes down to me not have the real world
> experience in writing web applications to know best.
> 
> What I would like to hear is PJE (who tends towards #3) and Robert
> Brewer (who tends towards #4). Can you guys give counter explanations
> as to why there arguments for bytes isn't valid. Ian, I don't think
> you have yet expressed your leaning, but would like to here your point
> as well.

No; in fact, I agree that REQUEST_URI should be mandated as bytes. IIRC, I'm 
the one who proposed it ;)


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 2: Decoding the Request-URI

2009-08-17 Thread Robert Brewer
I wrote:
> Applications do produce URI's (and IRI's, etc. that need to be
> converted into URI's) and do transfer them in media types like
> HTML, which define how to encode a.href's and form.action's
> before %-encoding them [4]. But these are not the only vectors
> by which clients obtain or generate Request-URI's.
> ...
> As someone (Alan Kennedy?) noted at PyCon, static resources may
> depend upon a filename encoding defined by the OS which is
> different than that of the rest of the URI's generated/understood
> by even the most coherent application.
> ...
> "In practical terms, character-by-character comparisons should be
> done codepoint-by-codepoint after conversion to a common character
> encoding." In other words, the URI spec seems to imply that the
> two URI's "/a%c3%bf" and "/a%ff" may be equivalent, if the former
> is u"/a\u00FF" encoded in UTF-8 and the latter is u"/a\u00FF"
> encoded in ISO-8859-1. Note that WSGI 1.0 cannot speak about
> this, since all environ values must be byte strings. IMO WSGI
> 2 should do better in this regard.
> ...
> For the three reasons above, I don't think we can assume that the
> application will always receive equivalent URI's encoded in a
> single, foreseen encoding.

Did I say 3 reasons? I meant 4: Accept-Charset.


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] WSGI 2: Decoding the Request-URI

2009-08-16 Thread Robert Brewer
ed, since the application really does care about 
bytes and not characters. Falling back to ISO-8859-1 (and minting a new WSGI 
environ entry to declare the charset which was used to decode) can handle all 
of these cases. Server configuration options cannot, at least not without their 
specification becoming unwieldy.


Robert Brewer
fuman...@aminus.org

[1] http://markmail.org/message/r6qzszybsk5pwzbt
[2] http://markmail.org/message/47cekkpvdjaectvi
[3] http://markmail.org/message/3bsxo7q6eztcp3yo
[4] http://www.w3.org/TR/html4/interact/forms.html#idx-character_encoding
[5] http://tools.ietf.org/html/rfc3986#section-6
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 2

2009-08-11 Thread Robert Brewer
Graham Dumpleton wrote:
> So, for WSGI 1.0 style of interface and Python 3.0, the following is
> what I was going to implement.

FWIW, I'll answer with what we've implemented for CherryPy 3.2.

> 1. When running under Python 3, applications SHOULD produce bytes
> output, status line and headers.

Yup.

> 2. When running under Python 3, servers and gateways MUST accept
> strings for output, status line and headers. Such strings must be
> converted to bytes output using 'latin-1'. If string cannot be
> converted then is treated as an error.

Yes.

> 3. When running under Python 3, servers MUST provide wsgi.input as a
> binary (byte) input stream.

Boy howdy.

> 4. When running under Python 3, servers MUST provide a text stream for
> wsgi.errors. In converting this to a byte stream for writing to a
> file, the default encoding would be applied.

I'll look into it.

> 5. When running under Python 3, servers MUST provide CGI HTTP and
> server variables as strings. Where such values are sourced from a byte
> string, be that a Python byte string or C string, they should be
> converted as 'UTF-8'. If a specific web server infrastructure is able
> to support different encodings, then the WSGI adapter MAY provide a
> way for a user of the WSGI adapter to customise on a global basis, or
> on a per value basis what encoding is used, but this is entirely
> optional. Note that there is no requirement to deal with RFC 2047.

We're passing unicode for almost everything.

REQUEST_METHOD and wsgi.url_scheme are parsed from the Request-Line, and must 
be ascii-decodable. So are SERVER_PROTOCOL and our custom 
ACTUAL_SERVER_PROTOCOL entries.

The original bytes of the Request-URI are stored in REQUEST_URI. However, 
PATH_INFO and QUERY_STRING are parsed from it, and decoded via a configurable 
charset, defaulting to UTF-8. If the path cannot be decoded with that charset, 
ISO-8859-1 is tried. Whichever is successful is stored at 
environ['REQUEST_URI_ENCODING'] so middleware and apps can transcode if needed. 
Our origin server always sets SCRIPT_NAME to '', but if we populated it, we 
would make it decoded by the same charset.

All request headers are decoded via ISO-8859-1, which can't fail. Applications 
are expected to transcode these values if they believe them to be in another 
encoding.

> This is where I am going to diverge from what has been discussed before.
> 
> The reason I am going to pass as UTF-8 and not latin-1 is that it
> looks like Apache effectively only supports use of UTF-8. Since this
> means that mod_wsgi, and Apache modules for FASTCGI, SCGI, AJP and
> even CGI likely cannot handle anything besides UTF-8 then I really
> can't see the point of trying to cater for a theoretical possibility
> that some HTTP client could use something besides UTF-8. In other
> words, the predominant case will be UTF-8, so let us target that.

That is predominant for the Request-URI, and we are defaulting to utf-8 for 
that as I mentioned above. I believe I demonstrated in 
http://mail.python.org/pipermail/web-sig/2009-April/003755.html that UTF-8 
cannot be the predominant encoding for request headers, which are instead 
mostly ASCII with a few ISO-8859-1's, which is why we are defaulting to 
ISO-8859-1.

> So, rather than burden every WSGI application with the need to convert
> from latin-1 back to bytes and then to UTF-8, let the server deal with
> it, with server using sensible default, and where server
> infrastructure can handle a different encoding, then it can provide
> option to use that encoding and WSGI application doesn't need to
> change.

If there are indeed more headers which are ISO-8859-1, then that same argument 
cuts both ways.

I have no problem doing the same thing here as we do for PATH_INFO: a 
configurable charset, or better yet a list of charsets to try in order, with a 
sensible default, even UTF-8 would be fine. Regardless of the default, if it is 
configurable, then the successful encoding should be put in a canonical environ 
entry so apps can transcode it if the server got it wrong.

Re:bytes. We really do not want the server to set any of the above environ 
entries (except REQUEST_URI) to bytes. I'm surprised those of you who have 
substantial numbers of WSGI middleware aren't fighting this; it would mean 
decoding the same environ entries every time you switched middleware providers. 
Some of you said as much at PyCon: 
http://mail.python.org/pipermail/web-sig/2009-March/003701.html


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 2

2009-08-04 Thread Robert Brewer
James Bennett wrote:
> On Tue, Aug 4, 2009 at 11:54 AM, James Y Knight wrote:
>> But that works just fine today. Your WSGI app sends streaming data back
>> using the iterator functionality, and the server automatically turns it into
>> chunks if it's talking to an HTTP 1.1 client. What's the problem?
> 
> No, it doesn't work just fine today. Either the server has to assume
> that every response from that application should be chunked (which is
> wrong), or the application needs a way to tell the server to chunk.
> Turns out HTTP has a way to indicate that, but WSGI outright forbids
> its use. So instead you have to invent out-of-band mechanisms for the
> application to tell the server what to do, and in the process reinvent
> part of HTTP.

It doesn't have to be out of band; CherryPy's wsgiserver will send a response 
chunked if the application provides no Content-Length response header.

if status == 413:
# Request Entity Too Large. Close conn to avoid garbage.
self.close_connection = True
elif "content-length" not in hkeys:
# "All 1xx (informational), 204 (no content),
# and 304 (not modified) responses MUST NOT
# include a message-body." So no point chunking.
if status < 200 or status in (204, 205, 304):
pass
else:
if self.response_protocol == 'HTTP/1.1':
# Use the chunked transfer-coding
self.chunked_write = True
self.outheaders.append(("Transfer-Encoding", "chunked"))
    else:
# Closing the conn is the only way to determine len.
self.close_connection = True


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Trac-like Query Builder

2009-06-30 Thread Robert Brewer
Gustavo Narea wrote:
> Randy said:
> > Does anyone know of a Python package that would provide query building
> > functionality like Trac has when doing a custom query on the tickets?
> 
> Are you talking about a front-end that allows end-users "assemble" the
> query, or a back-end that turns user-provided data into a result filter
> (e.g., SQL `WHERE` clause)?
> 
> If it's the later, you may want to see this:
> https://launchpad.net/booleano
> Unfortunately it's a rather new piece of software and should get the first
> alpha/usable release this week.

Heck, if all you want is the back end, Geniusql (and therefore Dejavu) allows 
you to build comparison expressions and combine them with "+", "&", and "|":

>>> from geniusql import logic
>>> a = logic.comparison('Name', 6, ['Dave', 'Jerry', 'Sue'])
>>> a
logic.Expression(lambda x: x.Name in ['Dave', 'Jerry', 'Sue'])
>>> b = logic.comparison('Size', 2, 30)
>>> b
logic.Expression(lambda x: x.Size == 30)

>>> a + b
logic.Expression(lambda x: (x.Name in ['Dave', 'Jerry', 'Sue']) and (x.Size == 
30))
>>> a & b
logic.Expression(lambda x: (x.Name in ['Dave', 'Jerry', 'Sue']) and (x.Size == 
30))
>>> a | b
logic.Expression(lambda x: (x.Name in ['Dave', 'Jerry', 'Sue']) or (x.Size == 
30))

>>> c = logic.Expression(lambda g, h: g.Name == h.Name)
>>> c
logic.Expression(lambda g, h: g.Name == h.Name)
>>> a + c
logic.Expression(lambda x, h: (x.Name in ['Dave', 'Jerry', 'Sue']) and (x.Name 
== h.Name))

You then pass those to the storage layer where they are automatically converted 
to backend-specific SQL etc. Been runnin' like a champ for years, not days. [1]


Robert Brewer
fuman...@aminus.org

[1]  OK, I did have to commit a patch just now for comparison() in py2.5+--but 
it's a short one ;)
http://www.aminus.net/geniusql/changeset/280
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] py3k, cgi, email, and form-data

2009-05-12 Thread Robert Brewer
Graham Dumpleton wrote:
> 2009/5/12 Robert Brewer :
> > There's a major change in functionality in the cgi module between
> Python
> > 2 and Python 3 which I've just run across: the behavior of
> > FieldStorage.read_multi, specifically when an HTTP app accepts a file
> > upload within a multipart/form-data payload.
> >
> > In Python 2, each part would be read in sequence within its own
> > FieldStorage instance. This allowed file uploads to be shunted to a
> > TemporaryFile (via make_file) as needed:
> >
> >     klass = self.FieldStorageClass or self.__class__
> >     part = klass(self.fp, {}, ib,
> >  environ, keep_blank_values, strict_parsing)
> >     # Throw first part away
> >     while not part.done:
> >     headers = rfc822.Message(self.fp)
> >     part = klass(self.fp, headers, ib,
> >  environ, keep_blank_values, strict_parsing)
> >     self.list.append(part)
> >
> > In Python 3 (svn revision 72466), the whole request body is read into
> > memory first via fp.read(), and then broken into separate parts in a
> > second step:
> >
> >     klass = self.FieldStorageClass or self.__class__
> >     parser = email.parser.FeedParser()
> >     # Create bogus content-type header for proper multipart parsing
> >     parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type,
> ib))
> >     parser.feed(self.fp.read())
> >     full_msg = parser.close()
> >     # Get subparts
> >     msgs = full_msg.get_payload()
> >     for msg in msgs:
> >     fp = StringIO(msg.get_payload())
> >     part = klass(fp, msg, ib, environ, keep_blank_values,
> >  strict_parsing)
> >     self.list.append(part)
> >
> > This makes the cgi module in Python 3 somewhat crippled for handling
> > multipart/form-data file uploads of any significant size (and since
> > the client is the one determining the size, opens a server up for an
> > unexpected Denial of Service vector).
> >
> > I *think* the FeedParser is designed to accept incremental writes,
> > but I haven't yet found a way to do any kind of incremental reads
> > from it in order to shunt the fp.read out to a tempfile again.
> > I'm secretly hoping Barry has a one-liner fix for this. ;)
> 
> FWIW, Werkzeug gave up on 'cgi' module for form passing and implements
> its own.
> 
> Not sure whether this issue in Python 3.0 was one of the reasons or
> not. I know one of the reasons was because cgi.FieldStorage is not
> WSGI 1.0 compliant. One of the main reasons that no one actually
> adheres to WSGI 1.0 is because of the 'cgi' module. This still hasn't
> been addressed by a proper amendment to WSGI 1.0 specification or a
> new WSGI 1.1 specification to allow a hint to readline().
> 
> The Werkzeug form processing module is properly WSGI 1.0 compliant,
> meaning that Wekzeug is possibly the only major WSGI framework to be
> WSGI compliant.

FWIW, I just added a replacement for the cgi module to CherryPy over the 
weekend for the same reasons. It's in the python3 branch but will get 
backported to CherryPy 3.2 for Python 2.x.


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] py3k, cgi, email, and form-data

2009-05-11 Thread Robert Brewer
There's a major change in functionality in the cgi module between Python
2 and Python 3 which I've just run across: the behavior of
FieldStorage.read_multi, specifically when an HTTP app accepts a file
upload within a multipart/form-data payload.

In Python 2, each part would be read in sequence within its own
FieldStorage instance. This allowed file uploads to be shunted to a
TemporaryFile (via make_file) as needed:

klass = self.FieldStorageClass or self.__class__
part = klass(self.fp, {}, ib,
 environ, keep_blank_values, strict_parsing)
# Throw first part away
while not part.done:
headers = rfc822.Message(self.fp)
part = klass(self.fp, headers, ib,
 environ, keep_blank_values, strict_parsing)
self.list.append(part)

In Python 3 (svn revision 72466), the whole request body is read into
memory first via fp.read(), and then broken into separate parts in a
second step:

klass = self.FieldStorageClass or self.__class__
parser = email.parser.FeedParser()
# Create bogus content-type header for proper multipart parsing
parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib))
parser.feed(self.fp.read())
full_msg = parser.close()
# Get subparts
msgs = full_msg.get_payload()
for msg in msgs:
fp = StringIO(msg.get_payload())
part = klass(fp, msg, ib, environ, keep_blank_values,
 strict_parsing)
self.list.append(part)

This makes the cgi module in Python 3 somewhat crippled for handling
multipart/form-data file uploads of any significant size (and since
the client is the one determining the size, opens a server up for an
unexpected Denial of Service vector).

I *think* the FeedParser is designed to accept incremental writes,
but I haven't yet found a way to do any kind of incremental reads
from it in order to shunt the fp.read out to a tempfile again.
I'm secretly hoping Barry has a one-liner fix for this. ;)


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-05-08 Thread Robert Brewer
P.J. Eby wrote:
> At 08:07 AM 5/8/2009 -0700, Robert Brewer wrote:
>> I decided that that single type should be byte strings because I want
>> WSGI middleware and applications to be able to choose what encoding
>> their output is. Passing unicode to the server would require some
>> out-of-band method of telling the server which encoding to use per
>> response, which seemed unacceptable.
> 
> I find the above baffling, since PEP 333 explicitly states that
> when using unicode types, they're not actually supposed to *be*
> unicode -- they're just bytes decoded with latin-1.

It also explicitly states that "HTTP does not directly support Unicode,
and neither does this interface. All encoding/decoding must be handled
by the application; all strings passed to or from the server must be
standard Python BYTE STRINGS (emphasis mine), not Unicode objects. The
result of using a Unicode object where a string object is required, is
undefined."

PEP 333 is difficult to interpret because it uses the name "str"
synonymously with the concept "byte string", which Python 3000 defies. I
believe the intent was to differentiate unicode from bytes, not elevate
whatever type happens to be called "str" on your Python du jour. It was
and is a mistake to standardize on type names ("str") across platforms
and not on type behavior ("byte string").

If Python3 WSGI apps emit unicode strings (py3k type 'str'), you're
effectively saying the server will always call
"chunk.encode('latin-1')". That negates any benefit of using unicode as
the type for the response. That's not "supporting unicode"; that's using
unicode exactly as if it were an opaque byte string. That's seems silly
to me when there is a perfectly useful byte string type.

> So, the server doesn't need to know "what encoding to use" -- it's
> latin-1, plain and simple.  (And it's an error for an application to
> produce a unicode string that can't be encoded as latin-1.)
>
> To be even more specific: an application that produces strings can
> "choose what encoding to use" by encoding in it, then decoding those
> bytes via latin-1.  (This is more or less what Jython and IronPython
> users are doing already, I believe.)

That may make sense for Jython and IronPython if they truly do not have
a usable byte string type. But it doesn't make as much sense for Python3
which has a usable byte string type. My way:

AppServer
-----
bchunk = uchunk.encode('utf-8')
yield bchunk
   write(bchunk)

Your way:

AppServer
-----
bchunk = uchunk.encode('utf-8')
uchunk = chunk.decode('latin-1')
yield uchunk
   bchunk = uchunk.encode('latin-1')
   write(bchunk)

I don't see any benefit to that.


Robert Brewer
fuman...@aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-05-08 Thread Robert Brewer
Graham Dumpleton wrote:
> Robert, do you have any comments on the restricting of response
> content to bytes and not allow fallback to conversion per latin-1?
> 
> I heard that in CherryPy WSGI server you are only allowing bytes. What
> is your rational for that at the moment?


In Python 2.x, one could easily mix unicode strings and byte strings in
the same interface, because they mostly supported the same operations.
Not so in Python 3.x--byte strings are missing everything from
capitalize() to zfill() [1]. I feel that choosing one type or the other
is required in order to avoid mountains of if-statements in middleware
(and lots of 'pass' statements if bytes are found).

I decided that that single type should be byte strings because I want
WSGI middleware and applications to be able to choose what encoding
their output is. Passing unicode to the server would require some
out-of-band method of telling the server which encoding to use per
response, which seemed unacceptable.

The down side, already alluded to, is that middleware cannot then call
e.g. response.capitalize() or any of a number of other methods without
first decoding the response. And it cannot do that reliably unless
(again) the encoding which was used to produce bytes is communicated
down the stack out of band.

The python3 branch of CherryPy is by no means complete. I'd be happy to
explore emitting unicode if we could decide on a method whereby apps
could inform the server which encoding they want. Middleware which
transcoded the response would need a means of overriding that. But of
course, that opens a whole new can of worms if something goes wrong,
because application authors want control over the error response; if the
server is encoding the response, and an error occurs, there would have
to be a way to pass control back up the stack to...what? whichever
component last set the encoding? That road starts to get complicated
very quickly.

If some middleware needs to treat the response as unicode, I'd rather
emit bytes and somehow return the encoding as part of the response.
Perhaps WSGI 2's mythical "return (status, headers, body-iterable,
encoding)". Middleware could then decode/transcode as desired. I can't
think of a downside to that, other than some lost cycles spent
de/encoding, but perhaps there are some I don't yet foresee.


Robert Brewer
fuman...@aminus.org

[1] See http://docs.python.org/dev/py3k/library/stdtypes.html#string-methods
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-05-05 Thread Robert Brewer
Graham Dumpleton wrote:
> 2009/5/5 Armin Ronacher :
>> Graham Dumpleton wrote:
>>> I can't see but have choice but to pass such settings through as
>>> strings, else more than likely would cause problems for applications.
>>> Problem is it isn't clear what encoding stuff can be in Apache
>>> configuration. At the moment latin-1 is assumed.
>> Because those information does not have a specified encoding I can see
>> nothing wrong with it passing that information as bytestrings.  I would
>> have no problem passing *all* values as bytestrings.
> 
> At what point does that become an inconvenience though? I guess that
> is my concern, because if one has to do too many manual conversions in
> an application, people will start to complain it becomes unwieldy to
> use. In other words, you make it easier or more logical for
> frameworks, but do you end up putting more burden on applications for
> stuff outside those core values.
> 
> So, for those core CGI values which the framework is going to modify
> even before an application sees them, then fine. Is the framework also
> going to set the rules as to what encoding is used for other values in
> the WSGI environment and convert them per that encoding when an
> application requests them, or is the application always going to have
> to deal with them as bytes?
> 
> As I keep saying, you guys who write the frameworks and applications
> are going to know better than I, I am just challenging the notions as
> a way of making people think about it so the end result is what is the
> most logical thing to do. ;-)

In short: it's pretty easy for a framework to default to utf-8 for 
everything, yet give application developers ways to override that. See, 
for example, the cherrypy.tools.encoding Tool in our python3 
branch--it's moved from running "sometime" after the page handler, to 
wrapping the page handler so all page handlers emit bytes. That makes it 
possible for everyone to use unicode strings everywhere, yet still allow 
some to specify exact bytes as necessary. In shorter: don't worry about 
that part, we've got it covered. ;)


Robert Brewer
fuman...@aminus.org


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-16 Thread Robert Brewer
On Fri, 2009-04-17 at 09:37 +1000, Graham Dumpleton wrote:
> >> I am not sure we ended up with a final answer on all of this, but I
> >> don't want to hold up mod_wsgi 3.0, which includes Python 3.0 support,
> >> any longer. As such, am implementing things as per:
> >>
> >>   http://www.wsgi.org/wsgi/Amendments_1.0
> >>
> >> with exception that will not be attempting to do decoding per RFC
> >> 2047. Any CGI variables not related to HTTP headers will also be
> >> handled as latin-1, including SCRIPT_NAME, PATH_INFO and QUERY_STRING.
> >> This should be equivalent with what wsgiref does in Python 3.X and
> >> basically keeps the status quo.
> >>
> > That sounds fine to me, Graham, and is what I'll be implementing in my
> > python3 branch for CherryPy barring any unforeseen impediments.
> 
> Are you moving to use of empty string as end of input sentinel for
> wsgi.input for case where code does actually read more than
> CONTENT_LENGTH?

Sure; I think that's reasonable. It's supposed to be 'file-like'.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-16 Thread Robert Brewer
On Thu, 2009-04-16 at 00:12 -0700, Graham Dumpleton wrote:
> > So, from where I sit, we have:
> >
> >  1. Many header values which are ASCII.
> >  2. A few header values which are ISO-8859-1 plus RFC 2047.
> >  3. A few header values which are URI's (no specified encoding) or
> IRI's
> > (UTF-8).
> >
> > I understand the desire to decode ASAP, and I agree with Guido that
> we
> > should use a default encoding which the app can override. Looking at
> the
> > above, ISO-8859-1 is the best encoding I know of for all three
> header
> > cases. ASCII can be used as a valid subset without transcoding;
> headers
> > which are ISO-8859-1 are decoded perfectly; URI/IRI headers can be
> > transcoded by the app if needed, but mangled opaquely by middleware.
> >
> > If we make *that* call, then IMO there's no reason not to do the
> > same to SCRIPT_NAME, PATH_INFO, and QUERY_STRING.
> 
> I am not sure we ended up with a final answer on all of this, but I
> don't want to hold up mod_wsgi 3.0, which includes Python 3.0 support,
> any longer. As such, am implementing things as per:
> 
>   http://www.wsgi.org/wsgi/Amendments_1.0
> 
> with exception that will not be attempting to do decoding per RFC
> 2047. Any CGI variables not related to HTTP headers will also be
> handled as latin-1, including SCRIPT_NAME, PATH_INFO and QUERY_STRING.
> This should be equivalent with what wsgiref does in Python 3.X and
> basically keeps the status quo.
> 
> If anyone has any last things to say on all of this, please speak up
> now.
> 
That sounds fine to me, Graham, and is what I'll be implementing in my
python3 branch for CherryPy barring any unforeseen impediments.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] FW: Closing #63: RFC2047 encoded words

2009-04-08 Thread Robert Brewer
Brian Smith wrote:
> Here is the change that removes the use of RFC 2047 from HTTP in
> HTTPbis.

Yes, but parsers need to continue decoding them for many years to come.
IMO WSGI origin servers should do this so we can write the decoding
logic once and forget about it (assuming middleware and apps far
outnumber origin servers).


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-03 Thread Robert Brewer
James Y Knight wrote:
> On Apr 2, 2009, at 7:33 AM, Graham Dumpleton wrote:
> 
> > """When running under Python 3, servers MUST provide CGI HTTP
> > variables as strings, decoded from the headers using HTTP standard
> > encodings (i.e. latin-1 + RFC 2047)"""
> >
> > Which is fair enough and basically what the RFCs say. At the moment
I
> > don't apply RFC 2047 rules in Python 3.0 support in mod_wsgi, so
just
> > need to do that.
> 
> I'd really *really* like to recommend that any mention of RFC 2047 is
> stricken from the WSGI server requirements. I cannot imagine that
> decoding actually accomplishing anything other than opening security
> holes (think a filter in an upstream proxy that doesn't know how to do
> 2047-decoding passing something through that you now decode.)
> 
> Also, you have to only do the decoding on TEXT words according to the
> spec, so the WSGI container now needs an HTTP header parser just in
> order to determine where it should decode RFC2047 words and where not
> to? I don't think so...

Something needs to decode RFC2047 words, at least until http-bis is
widespread. I'd be OK with making the app do it as needed (since only it
might know whether extension headers are token/quoted-string/TEXT).


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-03 Thread Robert Brewer
Graham Dumpleton wrote:
> I am slowly working through what I think I at least need to do for
> Apache/mod_wsgi. I'll give a summary of what I have worked out so far
> based on the discussions and my own research.
> ...
> Next HTTP header to worry about is HTTP_REFERRER.
> 
> There would be two parts to this, there would be the host name
> component and then the path component.
> 
> We already know from above that for unicode host name it should be the
> IDNA name.
> 
> For the path component, if the client follows the rules properly, then
> if the path uses a non latin-1 encoding, then it should be using RFC
> 2047 to indicate this so shouldn't have to do anything different and
> use same rule as other HTTP headers. For this header we are actually
> in a better situation that for URL in actual HTTP request line which
> isn't so specific about encodings.

I don't think that's true. Referer must be absoluteURI or relativeURI,
neither of which have defined encodings. RFC 2047 only applies to
headers of type TEXT, of which there are surprisingly few.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-03 Thread Robert Brewer
9-1] character encoding (allowing other character sets
   through use of [RFC2047] encoding).  In practice, most HTTP header
   field-values use only a subset of the US-ASCII charset [USASCII].
   Newly defined header fields SHOULD constrain their field-values to
   US-ASCII characters.  Recipients SHOULD treat other (obs-text) octets
   in field-content as opaque data.

So, from where I sit, we have:

 1. Many header values which are ASCII.
 2. A few header values which are ISO-8859-1 plus RFC 2047.
 3. A few header values which are URI's (no specified encoding) or IRI's
(UTF-8).

I understand the desire to decode ASAP, and I agree with Guido that we
should use a default encoding which the app can override. Looking at the
above, ISO-8859-1 is the best encoding I know of for all three header
cases. ASCII can be used as a valid subset without transcoding; headers
which are ISO-8859-1 are decoded perfectly; URI/IRI headers can be
transcoded by the app if needed, but mangled opaquely by middleware.

If we make *that* call, then IMO there's no reason not to do the same to
SCRIPT_NAME, PATH_INFO, and QUERY_STRING.


Robert Brewer
fuman...@aminus.org

[1]
http://www.ietf.org/internet-drafts/draft-ietf-httpbis-p1-messaging-06.t
xt

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-02 Thread Robert Brewer
Bill Janssen wrote:
> Alan Kennedy  wrote:
> > [Bill]
> > > I think the controlling reference here is RFC 3875.
> >
> > I think the controlling references are RFC 2616, RFC 2396
> > and RFC 3987.
> 
> I see what you're saying, but it's darn near impossible, as a
practical
> matter, to get any guidance on encoding matters from those.
> 
> The question is where those names come from, and they come from CGI,
> and that is (practically speaking) defined these days by RFC 3875,
> as much as anything.

If so, then PEP 333 really should be updated to point at a version of
the CGI "spec" that doesn't reference e.g. RFC 1808 for URI's. As it is,
one could easily come to the conclusion that, for example, path
parameters like /path;a=3 aren't supported (because the CGI draft that
PEP 333 mentions disallows them). I'd be much happier referring to 3875,
and even happier diverging from strict compliance to what was always a
shaky spec.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-01 Thread Robert Brewer
Graham Dumpleton wrote:
> 2009/4/2 Robert Brewer :
> > Alan Kennedy wrote:
> >> Hi Graham,
> >>
> >> I think yours is a good solution to the problem.
> >>
> >> [Graham]
> >> > In other words, leave all the existing CGI variables to come
> through
> >> > as latin-1 decode
> >>
> >> As latin-1 or rfc-2047 decoded, to unicode.
> >>
> >> > and do anything new in 'wsgi' variable namespace,
> >>
> >> So the server provides
> >>
> >> "wsgi.server_decoded_SCRIPT_NAME" == u"whatever"
> >> "wsgi.server_decoded_PATH_INFO" == u"whatever"
> >> "wsgi.server_decode_charset" == u"utf-8"
> >
> > I think everyone at the sprint today acquiesced to having
> > SCRIPT_NAME/PATH_INFO/QUERY_STRING be set in the environ as unicode.
> The
> > server can decide (probably subject to configuration). I've
> implemented
> > this in the python3 branch of CherryPy and it seems to work
> brilliantly.
> > Assuming the server *is* configurable, deployers should be able to
> > choose Latin-1 if they need to recover the original bytes, without
> > having to support a separate set of encoded-byte entries.
> 
> Seems to me that you can't have it be configurable and it must always
> be latin-1 interpretation. The problem is where you are composing
> multiple WSGI applications. If they each have different expectations
> or requirements as to how it is handled, aren't you going to have a
> problem. Or am I missing something in the way you are explaining it?

I would not expect multiple middlewares to want to decode the same URI
differently. But I would assume you'd run into problems when multiple
URI's in the same site had different encodings. Mark Ramm gave the use
case of exposing Unix filenames-as-bytes in URL's--the encoding is
unknown but a human may know better.

Allowing/forcing the human to stick that information in the app or in
the server is the same work, IMO. A server could be configurable to the
point of using different encodings for different URI's via regex
matching or  sections or some other means. I'd be happy with a
spec that said, "servers MUST always decode these 3 entries, but SHOULD
allow the encoding used to be configurable." I'd be equally happy with a
spec that said, "servers MUST always decode these 3 as Latin-1" and
explain why. Both have their manageable pros and cons. But delaying the
decoding to the app by setting those 3 entries as bytes has more cons
than pros.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-01 Thread Robert Brewer
Graham Dumpleton wrote:
> Has anyone actually got an example of code for doing RFC-2047
> decoding. Are there even any systems which make use of that encoding
> for web requests anyway. I still haven't really addressed that
> decoding requirement and I haven't seen any existing Python web stuff
> that tries to.

http://www.cherrypy.org/browser/trunk/cherrypy/lib/http.py#L196

Currently, CP apps call that. We can move that to the server if desired.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-01 Thread Robert Brewer
Alan Kennedy wrote:
> Hi Graham,
> 
> I think yours is a good solution to the problem.
> 
> [Graham]
> > In other words, leave all the existing CGI variables to come through
> > as latin-1 decode
> 
> As latin-1 or rfc-2047 decoded, to unicode.
> 
> > and do anything new in 'wsgi' variable namespace,
> 
> So the server provides
> 
> "wsgi.server_decoded_SCRIPT_NAME" == u"whatever"
> "wsgi.server_decoded_PATH_INFO" == u"whatever"
> "wsgi.server_decode_charset" == u"utf-8"

I think everyone at the sprint today acquiesced to having
SCRIPT_NAME/PATH_INFO/QUERY_STRING be set in the environ as unicode. The
server can decide (probably subject to configuration). I've implemented
this in the python3 branch of CherryPy and it seems to work brilliantly.
Assuming the server *is* configurable, deployers should be able to
choose Latin-1 if they need to recover the original bytes, without
having to support a separate set of encoded-byte entries.

Side note: wrapping the wsgi.input fp in a DecodingWrapper before
handing it to cgi works great, too. No need to rewrite the cgi module to
support bytes as I feared.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-01 Thread Robert Brewer
Graham Dumpleton wrote:
> 2009/4/2 Guido van Rossum :
> > On Wed, Apr 1, 2009 at 12:15 PM, Ian Bicking 
> wrote:
> >> On Wed, Apr 1, 2009 at 11:34 AM, Guido van Rossum 
> wrote:
> >>> On Wed, Apr 1, 2009 at 5:18 AM, Robert Brewer 
> wrote:
> >>>> Good timing. We had been thinking to make everything strings
> except for
> >>>> SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are
> pulled
> >>>> from the Request-URI, which may be in any encoding. It was thought
> that
> >>>> the app would be best-qualified to decode those three.
> >>>
> >>> Argh. The *meaning* of these fields is clearly text. It would be
> most
> >>> unfortunately if all apps were required to deal with decoding bytes
> >>> for these (there is no choice any more, unlike in 2.x). I
> appreciate
> >>> the sentiment that the encoding is unknown, but I would much prefer
> it
> >>> if there was a default encoding that the app could override, or if
> >>> there was some other mechanism whereby the app would not have to be
> >>> bothered with decoding bytes unless it cared.
> >>
> >> This might be fine, except it is hard.  You can't just take
> arbitrary
> >> bytes and do script_name.decode('utf8'), and then when you realize
> you
> >> had it wrong do script_name.encode('utf8').decode('latin1').
> >
> > Well you could make the bytes versions available under different
> keys.
> > I think you do something a bit similar this in webob, e.g. req.params
> > vs. req.str_params. (Perhaps you could have QUERY_STRING and
> > QUERY_BYTES.) The decode() call used to create the text strings could
> > use 'replace' as the error handler and the app could check for the
> > presence of the replacement character ('\ufffd') in the string to see
> > if there was a problem; or it could just work with the string
> > containing that character and report the user some kind of 40x or 50x
> > error. Frameworks (like webob) would of course do the right thing and
> > look for QUERY_BYTES before QUERY_STRING. QUERY_BYTES should probably
> > be optional.
> 
> Can we please not invent new names at global context in WSGI
> environment dictionary, especially ones that mutate existing names
> rather than using a prefix or suffix.
> 
> If we are going to carry values in two different formats, then use the
> 'wsgi' name space. Thus, for byte versions of values perhaps use:
> 
>   wsgi.request_uri
>   wsgi.script_name
>   wsgi.path_info
>   wsgi.query_string
>   etc
> 
> In other words, leave all the existing CGI variables to come through
> as latin-1 decode and do anything new in 'wsgi' variable namespace,
> identifying only the minimal set which needs to be made available as
> bytes.

Some thoughts:

 1. If we always decode as Latin-1 it should be lossless, and consumers could 
retrieve the original bytes with val.decode('Latin-1'), thus removing the need 
for separate entries.

 2. CGI says, "REMOTE_USER = *OCTET" :(

 3. Bikeshed: "wsgi.xyz" is too close to "XYZ" in my opinion.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-01 Thread Robert Brewer
Guido van Rossum wrote:
> Sent: Wednesday, April 01, 2009 9:34 AM
> To: Robert Brewer
> Cc: Web SIG
> Subject: Re: [Web-SIG] Python 3.0 and WSGI 1.0.
> 
> On Wed, Apr 1, 2009 at 5:18 AM, Robert Brewer 
> wrote:
> > Good timing. We had been thinking to make everything strings except
> for
> > SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are pulled
> > from the Request-URI, which may be in any encoding. It was thought
> that
> > the app would be best-qualified to decode those three.
> 
> Argh. The *meaning* of these fields is clearly text. It would be most
> unfortunately if all apps were required to deal with decoding bytes
> for these (there is no choice any more, unlike in 2.x). I appreciate
> the sentiment that the encoding is unknown, but I would much prefer it
> if there was a default encoding that the app could override, or if
> there was some other mechanism whereby the app would not have to be
> bothered with decoding bytes unless it cared.
> 
> Note that Py3k also treats filenames as text, with an optional escape
> hatch for using bytes that only very few apps will need to use.

Understood. I think we have plenty of options here for returning text.
We'll discuss this ASAP in the room.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-01 Thread Robert Brewer
Graham Dumpleton wrote:
> Based on any discussions at PyCon, can someone give a summary of any
> conclusions drawn about how WSGI 1.0 should be implemented in Python
> 3.0.
> 
> The previous analysis of this is at:
> 
>   http://www.wsgi.org/wsgi/Amendments_1.0
> 
> I realise it may be work in progress, but I note that work being done
> on WSGI server associated with CherryPy for Python 3.0 by Robert isn't
> necessarily following that and is perhaps starting to do things in a
> way that I understood were only being speculated upon for WSGI 2.0,
> not for WSGI 1.0. For example:
> 
>   http://www.cherrypy.org/changeset/2199
> 
> In particular, it has:
> 
>   environ["SCRIPT_NAME"] = b""
> 
> The bit from prior analysis which is relevant is:
> 
> """When running under Python 3, servers MUST provide CGI HTTP
> variables as strings, decoded from the headers using HTTP standard
> encodings (i.e. latin-1 + RFC 2047) (Open question: are there any CGI
> or WSGI variables that should NOT be strings?)"""
> 
> Since mod_wsgi has used the prior analysis as basis of Python 3.0
> support, would want to know pretty soon what direction WSGI 1.0 under
> Python 3.0 is going to take, else I am going to have to delay
> releasing mod_wsgi 3.0 or simply yank the support for Python 3.0.
> 
> Robert, yes I know I could have asked you direct, but want a consensus
> from all who were present at PyCon and discussed these things.

Good timing. We had been thinking to make everything strings except for
SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are pulled
from the Request-URI, which may be in any encoding. It was thought that
the app would be best-qualified to decode those three.

I hope to discuss that further this morning at the sprints. Turns out
the cgi module in Python 3 only does text, not bytes. I considered
submitting a patch to make it handle bytes for fp/environ but that
became difficult quickly and may complicate the cgi module needlessly if
we can instead use unicode for those 3 environ entries. I'll report back
here.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI Open Space @ PyCon.

2009-03-29 Thread Robert Brewer
We had a smaller third meeting and answered more issues.

Those present at the third meeting:

 * Mark Ramm (TG)
 * Mike Orr (Pylons)
 * Bob Brewer (CherryPy)
 * Glyph Lefkowitz (Twisted)
 * David Reid (Twisted)
 * Jean-Paul Calderone (Twisted)

Continuing Topic: string type for PATH_INFO and SCRIPT_NAME
---

Much discussion on how to safely decode the Request-URI. Several options
were put forth, including schemes where both unicode and bytes are stuck
in the environ. Final rough consensus was that, even though request
headers MUST be unicode in the environ, SCRIPT_NAME and PATH_INFO
probably MUST be byte strings in order to not "guess wrong" about their
encoding. In addition, a new environ key which indicates whether
%2F-slashes were decoded improperly or not would be beneficial.


Continuing Topic: wsgi.input


Glyph: iterable is good; file-like is also OK.

Big issue: need a way for the app to tell the server that it is waiting
on output from some other source, possibly running in the same event
loop.

  ___ Reactor __
 /  \
++  ++  +--+
|  IMAP  |==|  App   |==|  Server  |
++  ++  +--+

Yielding an empty string (as WSGI 1.0 does) does not provide enough
information; the app needs a way to yield a token which tells the server
"don't call my next() method again until my other source has given me
more input on which to operate."


Asynchronous WSGI support
-

Mostly non-existent. Fix it? Fork it? Drop it? Glyph seemed to think
we're really close if we fix wsgi.input.


Response value type
---

Glyph suggested as the response tuple grows (e.g. by adding a "close"
method), we should more consider returning an object with .status,
.headers, .body, and .close attributes. Packing/unpacking tuples becomes
tedious. Everyone agreed. If changing to an object is not possible, then
a tuple should not have a variable length; that is, no members would be
optional. Returning a dict would be another option (which would allow
optional keys).


Continuing deferred issues
--

 * Lots of little changes: the server's supported HTTP version,
   file_wrapper edge cases, etc.
 * Python 3, and the scheduling of WSGI improvements (version roadmap)
 * Lifecycle methods (start/stop/etc event API driven by the container)


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI Open Space @ PyCon.

2009-03-29 Thread Robert Brewer
I wrote:
> We had a good second meeting and answered more issues. My
understanding
> is that there is another BoF scheduled for tomorrow (Sunday). Check
the
> Open Space board for details.

My mistake. I'll put up an Open Space reservation for 5pm today ASAP.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI Open Space @ PyCon.

2009-03-28 Thread Robert Brewer
Hi all,

We had a good second meeting and answered more issues. My understanding
is that there is another BoF scheduled for tomorrow (Sunday). Check the
Open Space board for details.

Those present at the second meeting:

 * Mark Ramm (TG)
 * Mike Orr (Pylons)
 * Bob Brewer (CherryPy)
 * Ian Bicking (Paste, etc)
 * Alan Kennedy (WSGI gateway servlets/Jython)
 * Rick Copeland (TG)
 * James Bennett (Django)
 * Gary Poster (Launchpad)
 * Chris McDonough (Zope, repoze, etc)
 * Garrett Smith (async WSGI server and middleware)
 * Kumar McMillan (Pylons)
 * Alex Morega (WSGI user)
 * Andrew Sawyer (lurker)
 * Marcus Cavanaugh (Pylons)
 * David Reed (used to be Twisted.web2 maintainer)
 * 8+ others, mostly lurking


Revisited Topic: Unicode values in the WSGI environ
---

Consensus: Response status and headers MUST BE unicode. Doing otherwise
(handling both unicode and byte string) would unnecessarily complicate
the construction of middleware components. Origin HTTP servers MUST
decode these to the appropriate bytestrings (all ISO-8859-1?) before
writing them out to the socket.


Revisited Topic: wsgi.input
---

I raised the issue that, if wsgi.input were an iterable, many apps would
just have to take the extra step of wrapping it in a file-like object
anyway to pass to cgi.Fieldstorage. Others reopened the desire to allow
the app to determine the size of each read().

We didn't reach consensus, IMO. Alan argued for an iterable to more
easily support asynchronous servers. The counter-argument was that
servers could use non-blocking sockets to allow apps which read() to
yield in the case of no immediate data rather than block indefinitely.
If a file-like object were retained, it would help to publish a
chainable file example to help middleware re-stream files they read any
part of.


Response iterable type
--

The current spec says "all strings referred to in this specification
must be of type str or StringType". James asked if this could be
loosened to str-like objects. Perhaps we could replace strict typing
with an ABC requirement? General consensus: -0.


Continuing deferred issues
--

 * Lots of little changes: the server's supported HTTP version,
   file_wrapper edge cases, etc.
 * Python 3, and the scheduling of WSGI improvements
 * Asynchronous WSGI support. Mostly non-existent. Fix it? Fork it?
   Drop it?
 * Lifecycle methods (start/stop/etc event API driven by the container)
 * Remove app_iter.close?


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] thoughts on an iterator

2009-03-28 Thread Robert Brewer
ntage
> that if a middleware piece wants to keep the first 100MB and last
100MB
> from a stream but throw out the middle, it's got no way to do so
> without
> dropping back to its own caching scheme that the framework can't
> coordinate with other schemes; but it seems to cover the majority of
> cases that I can think of.

Those seem like strategies for individual middleware components to
implement, not necessary to burden the general case with it.

> Anyway: no unlimited caching, no unlimited rewind; that's my argument.
> Iterators were just one way of cleaning getting there, but probably,
in
> the light of the next day, not a powerful enough way.

I'd vote to stick with the file-like approach for no other reason than
that FieldStorage expects one.


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI Open Space @ PyCon.

2009-03-28 Thread Robert Brewer
Alan Kennedy wrote:
> For those of you at PyCon, there is a WSGI Open Space @ 5pm today
> (Friday).
> 
> The sub-title of the open space is "Does WSGI need revision"?

Hi all,

We had a good meeting but it was too short. We plan on having another
Open Space meeting at 5pm today (Saturday) to continue the discussion.

Those present at the first meeting:

 * Mark Ramm (TG)
 * Frank Wierzbicki (Jython)
 * Mike Orr (Pylons)
 * Bob Brewer (CherryPy)
 * Ian Bicking (Paste, etc)
 * Eric Larson (CherryPy, WSGI advocate)
 * Alan Kennedy (WSGI gateway servlets/Jython)
 * Michael Twomey (WSGI user + HTTP servers + Twisted)
 * Jorge Vargas (TG)
 * Ian Charnas (TG CMS)
 * Phil Jenvey (Pylons + Jython)
 * 8+ others, mostly lurking

Topic: Unicode values in the WSGI environ
-

Request:

We discussed any blockers to this. Request headers are pretty easy since
the spec requires falling back to ISO-8859-1. HOST might be
IDNA-encoded, but that can be consistent, too. What to do with
SCRIPT_NAME and PATH_INFO is more difficult, since there is no consensus
on encoding--there's percent encoding, of course, but that doesn't cover
multi-byte character encodings. The request body (wsgi.input) would not
be decoded.

The general consensus was that we could specify the decoding of the
request metadata (request line and headers) to be the responsibility of
WSGI servers, and leave it at that--different servers may offer
different means of configuring the decoding. There are pitfalls to this
approach, which the spec should address; in particular, some decoding
strategies may not be reversible. In addition, apps should at least know
which encoding the server chose via a new WSGI environ entry. Name
suggestions welcome.

There was some discussion, but no agreement, on including both unicode
and str (str and bytes in Python 3.x) versions of these values in the
environ.


Response:

I *think* the general consensus was that applications could return
unicode status and headers. But we also noted that servers should be
able to encode any bytes using IDNA/ISO-8859-1/RFC-2047/utf-8 where
appropriate. Not sure where this ended up.


Topic: Return a tuple of (status, headers, body)


That is, get rid of the start_response callable. The general consensus
was that this is a simple, but powerful improvement, which Rack/Jack
have demonstrated already. The "simplest possible application object"
would change from this:

  def simple_app(environ, start_response):
  """Simplest possible application object"""
  status = '200 OK'
  response_headers = [('Content-type','text/plain')]
  start_response(status, response_headers)
  return ['Hello world!\n']

...to this:

  def simple_app(environ):
  """Simplest possible application object"""
  status = '200 OK'
  response_headers = [('Content-type','text/plain')]
  body = ['Hello world!\n']
  return (status, response_headers, body)


Topic: wsgi.input
-

Some authors have found problems with the file-like design of
wsgi.input, specifically, that the rfile is not rewindable/seekable; if
a middleware component reads part of the stream, things could get ugly
if a later app tries to read the full stream.

Discussion centered around replacing the current file-like wsgi.input
with an iterable. Origin servers would be responsible for chunking the
request body, and each consumer would be required to re-stream each
chunk.

This topic didn't seem to have the strong consensus that the previous
ones did. My personal feeling was that we need more time to tease out
the new problems this approach would raise (perhaps with an
implementation or two).

Making this change would, however, solve a related issue: how apps can
safely read the full wsgi.input when the client did not supply a
Content-Length (i.e. the servers would handle all that).


Other topics raised but deferred


 * Standard Request/Response objects. There was one call for, and many
against, this.
 * Lots of little changes: the server's supported HTTP version,
file_wrapper edge cases, etc.
 * Python 3, and the scheduling of WSGI improvements
 * Asynchronous WSGI support. Mostly non-existent. Fix it? Fork it? Drop
it?
 * Lifecycle methods (start/stop/etc event API driven by the container)
 * Remove app_iter.close?


Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] how to test hunging socket ?

2009-01-31 Thread Robert Brewer
William Dodé wrote:
> On 30-01-2009, Ian Bicking wrote:
> > On Fri, Jan 30, 2009 at 3:48 PM, William Dode 
> wrote:
> >
> >> Fine, i should definitely give it a try.
> >>
> >> If my app is not thread safe but respond in a decent time, can i
> benefit
> >> from a multithread server (for a socket problem) if i use a lock for
> >> every page like that :
> >>
> >> I use webob...
> >>
> >
> > If your app isn't threadsafe, you should use a multiprocess server.
> > mod_wsgi has options for this, and flup has forking options (you'd
> use flup
> > behind Apache or another server).
> 
> Yes, i also could use an async server. But i would like to identify
> (and
> reproduce) exactly the problem.
> I also use a lot of cached data in my app. Anyway i have to make it
> thread-safe...



Try http://www.aminus.net/wiki/PyConquer to help identify the problem.




Robert Brewer
fuman...@aminus.org

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Implementing File Upload Size Limits

2008-11-27 Thread Robert Brewer
Graham Dumpleton wrote:
> 2008/11/28 Robert Brewer <[EMAIL PROTECTED]>:
> > CherryPy's wsgiserver will read any remaining request body (which
the
> > application hasn't read) before sending response headers.
> 
> A WSGI application could technically want to send response headers and
> only then read remaining request content. I don't believe there is
> anything in the WSGI specification which prevents that. If you are
> discarding the request content as soon as response headers are
> generated, that could technically be a problem for some use cases,
> even if they may be obscure.

I'll look into that further.

> I cant tell from looking at latest CherryPy WSGI server code as has
> been changed since last I looked at it and haven't yet had time to
> grok it and run some tests, but previously in respect of where WSGI
> specification says:
> 
> """The server is not required to read past the client's specified
> Content-Length, and is allowed to simulate an end-of-file condition if
> the application attempts to read past that point."""
> 
> the CherryPy WSGI server code chose NOT to simulate an end-of-file
> condition. This was the case as the amount of data read from
> wsgi.input was never tracked. This meant that if application did try
> and read more content than available and request pipelining occurring
> then the read would hang as would not get an empty string returned as
> would be normal for end-of-file condition for file like object.
> 
> If the code is still behaving this way, then it wouldn't be possible
> for it to discard remaining input as how much was read wasn't tracked.
> 
> Looking at latest code I do note the presence of a wrapper around
> socket used for wsgi.input, but haven't been able to work out yet
> whether it returns a traditional empty string as end-of-file
> condition, or whether it is going to instead raise your
> MaxSizeExceeded exception and thus not be file like in it behaviour.

It still raises MaxSizeExceeded.

> Can you perhaps explain what is going to happen when an attempt is
> made to read more content than what was available and whether it is
> actually going to raise an exception rather than just return an empty
> string like file like objects would.
> 
> Personally I think that that part of WSGI specification should be
> amended such that it is required that an end-of-file condition MUST be
> indicated using an empty string just like with normal file like
> objects. Just this one change would mean that one could call read()
> with no arguments and have it return all input, whereas at the moment
> WSGI specification does allow argument to read() be optional.
> 
> This would actually negate the whole need for applications to even
> check/use CONTENT_LENGTH except for situations where it mattered such
> as 413 response or where how it decided to process it was dependent on
> size. That is, to get all request content you would just call read()
> with no argument. If you wanted to process it in chunks, then it would
> just loop reading a set chunk size until empty string returned and it
> wouldn't need to track how much it read and short read the last chunk.
> If applications worked this way then one could handle mutating input
> filters that changed amount of request content, ie., decompression of
> data, plus could handle chunked transfer encoding on request content
> in a reasonable way without having to read it all in and buffer it
> just to work out CONTENT_LENGTH.
> 
> Up till now, the only major WGSI server (ignoring wsgiref perhaps) I
> knew of which didn't allow read() with no argument or which didn't
> simulate end-of-file through empty string being returned was CherryPy
> WSGI server. Now its code has been changed, but not sure if it still
> does that or whether it has done something totally different to
> everything else by raising an exception instead.

I'd be open to changing it to EOF instead of error; amending the WSGI
spec would be nice too.


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Implementing File Upload Size Limits

2008-11-27 Thread Robert Brewer
Brian Smith wrote:
> Randy Syring wrote:
> > Hopefully you can clarify something for me.  Lets assume that the
> > client does not use '100 Continue' but sends data immediately, after
> > sending the headers.  If the server never reads the request content,
> > what does that mean exactly?  Does the data get transferred over the
> > wire but then discarded or does the client not get to send the data
> > until the server reads the request body?  I.e. the client tries to
> > "send" it, but the content isn't actually transferred across the
> > wire until the server reads it.  I am just wondering if there
> > is a buffer or queue or something between the server and the client
> > that allows data to be transferred even if the server doesn't
> > "read" the request body.  Or, is it just like a straight pipe
> > where one end (the client) can't push data through until the other
> > end (the server) reads it.
> 
> Under Apache CGI or mod_wsgi, in many situations you will get a
> deadlock in
> this scenario. The input and the output are buffered separately both
of
> those buffers can fill up. Neither mod_wsgi nor mod_cgid implement the
> non-blocking I/O logic needed to prevent deadlocks. I heard (but did
> not
> verify) that mod_fastcgi does not have this deadlocking problem. The
> sizes
> of the buffers determines the size of the inputs and outputs needed to
> cause
> a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by
> default.
> 
> Therefore, for maximum portability, a WSGI application should ALWAYS
> consume
> the *whole* request body if it wants to avoid the deadlock using the
> reference WSGI adapter in PEP 333 or mod_wsgi.

Indeed. This is covered in RFC 2616 Section 8.2.3:

If an origin server receives a request that does not include an
Expect request-header field with the "100-continue" expectation,
the request includes a request body, and the server responds
with a final status code before reading the entire request body
from the transport connection, then the server SHOULD NOT close
the transport connection until it has read the entire request,
or until the client closes the connection. Otherwise, the client
might not reliably receive the response message. However, this
requirement is not be construed as preventing a server from
defending itself against denial-of-service attacks, or from
badly broken client implementations.

CherryPy's wsgiserver will read any remaining request body (which the
application hasn't read) before sending response headers.


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification

2008-11-17 Thread Robert Brewer
Ian Bicking wrote:
> Manlio Perillo wrote:
> > Ian Bicking ha scritto:
> >> [...]
> >> We need to propose a change to the WSGI specification.  I propose,
> in
> >> "Input and Error Streams"
> >> (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams)
> we
> >> change it to have "readline(hint)" and expand Note 3 to include
> >> readline as well as readlines, removing Note 2.  Also I suppose
some
> >> sort of change note in the specification?
> >>
> >> Does this sound like a sufficient change to the spec, and are there
> >> any objections to the change?
> >>
> >
> > Fine for me, but of course we need to do this as:
> > 1) Errata to WSGI 1.0
> > or
> > 2) WSGI 1.1
> > or
> > 3) WSGI 2.0
> >
> > You can't just modify the current WSGI 1.0 spec.
> >
> > I'm for 2), with the other clarifications about WSGI we have
> discussed
> > in the past.
> 
> I'm for 1.  What other clarifications were you thinking of?

PLEASE don't ask, don't tell. Let's not complicate this change by
conflating it with others yet again.


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification

2008-11-16 Thread Robert Brewer
+1

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:web-sig-
> [EMAIL PROTECTED] On Behalf Of Ian Bicking
> Sent: Sunday, November 16, 2008 10:06 AM
> To: Graham Dumpleton
> Cc: Web SIG
> Subject: Re: [Web-SIG] Revising environ['wsgi.input'].readline in the
> WSGI specification
> 
> Graham Dumpleton wrote:
> > 2008/11/16 Ian Bicking <[EMAIL PROTECTED]>:
> >> We need to make a revision to the WSGI spec to say that
> >> environ['wsgi.input'].readline takes an optional size argument.  It
> always
> >> does in practice (except in wsgiref.validate.validator, rendering
> that
> >> validator useless), and is required to in practice, because
everyone
> uses
> >> cgi.FieldStorage, and it passes in that argument.
> >
> > This has been brought up numerous times before. There are other
> things
> > about wsgi.input that really need to be changed as well to make it
> > more useful. When I have pushed for revised specification before I
> > could never get enough interest in it from the people that most
would
> > perceive are the ones who oversee the PEP.
> 
> Yes, this has been passed over before.  To resolve this, let's just
not
> pass it over this time?  This is a relatively small change to the WSGI
> spec, because it represents standard practice -- this change is simply
> getting the spec in line with implementations.
> 
> --
> Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-
> sig/fumanchu%40aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] passing data to python script to generate chart dynamically...

2008-08-10 Thread Robert Brewer
[EMAIL PROTECTED] wrote:
> Is it possible to pass data from an html command to a python script as
> an argument, to create a chart from the provided data?

http://en.wikipedia.org/wiki/Query_string is a good overview of how to do this 
via GET in the URL, with a mention of how to do it in the request body with 
POST. If you have further questions, you should ask at comp.lang.python: 
http://mail.python.org/mailman/listinfo/python-list



Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network

2008-07-22 Thread Robert Brewer
A tcpdump would be more helpful at this point, but I'm not sure the ML
is the right place for that.


Robert Brewer
[EMAIL PROTECTED]


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:web-sig-
> [EMAIL PROTECTED] On Behalf Of Tibor Arpas
> Sent: Tuesday, July 22, 2008 8:51 AM
> To: Jean-Paul Calderone
> Cc: web-sig@python.org
> Subject: Re: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network
> 
> Mhm.. No, That doesn't seem to be THE reason. Paste is HTTP/1.0 too.
> See the detailed server-client communication below.  BTW the VPN is
> not that slow. It's 4Mb/s with pings of 5-7 ms. Thanks guys for the
> suggestions, I appreciate it. If you run out of them, the most
> effective way would probably be to strip down the script even further
> and use the underlying lower level libraries directly. I'll try to get
> back to it later once I have more time...
> 
> Benchmarking 10.0.0.230 (be patient)...INFO: POST header ==
> ---
> GET / HTTP/1.0
> Host: 10.0.0.230:8079
> User-Agent: ApacheBench/2.3
> Accept: */*
> 
> 
> ---
> LOG: header received:
> HTTP/1.0 200 OK
> Server: PasteWSGIServer/0.5 Python/2.5.1
> Date: Tue, 22 Jul 2008 15:36:53 GMT
> content-type: text/html
> Content-Length: 1
> 
> *
> LOG: Response code = 200
> ..done
> 
> ===
> Benchmarking 10.0.0.230 (be patient)...INFO: POST header ==
> ---
> GET / HTTP/1.0
> Host: 10.0.0.230:8078
> User-Agent: ApacheBench/2.3
> Accept: */*
> 
> 
> ---
> LOG: header received:
> HTTP/1.0 200 OK
> 
> LOG: header received:
> HTTP/1.0 200 OK
> Date: Tue, 22 Jul 2008 15:33:57 GMT
> Server: WSGIServer/0.1 Python/2.5.1
> content-type: text/html
> Content-Length: 1
> 
> *
> LOG: Response code = 200
> ..done
> 
> 
> On Tue, Jul 22, 2008 at 3:54 PM, Jean-Paul Calderone
> <[EMAIL PROTECTED]> wrote:
> > On Tue, 22 Jul 2008 09:58:06 +0200, Tibor Arpas <[EMAIL PROTECTED]>
> wrote:
> >>
> >> I added the Content-Length and no difference. Important thing I
> >> noticed is that I get the same request/response rate with only ONE
> >> byte of content. So it looks like a constant delay of 3 seconds per
> >> request..
> >
> > wsgiref seems to run an HTTP 1.0 server without persistent
> connections.
> > Perhaps paste is running an HTTP server with persistent connections.
> > High latency will tank performance of TCP connections.
> >
> > Jean-Paul
> > ___
> > Web-SIG mailing list
> > Web-SIG@python.org
> > Web SIG: http://www.python.org/sigs/web-sig
> > Unsubscribe:
> > http://mail.python.org/mailman/options/web-sig/tibor%40infinit.sk
> >
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-
> sig/fumanchu%40aminus.org
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Fwd: wsgiref.simple_server slow on slow network

2008-07-21 Thread Robert Brewer
Tibor Arpas wrote:
> I'm quite new to python and I ran into a performance problem with
> wsgiref.simple_server. I'm running this little program.
> 
> from wsgiref import simple_server
> 
> def app(environ, start_response):
>start_response('200 OK', [('content-type', 'text/html')])
>return ['*'*5]
> 
> httpd = simple_server.make_server('',8080,app)
> try:
>httpd.serve_forever()
> except KeyboardInterrupt:
>pass
> 
> 
> I get many hundreds of responses/second on my local computer, which is
> fine.
> But when I access this server through our VPN it performs very bad.
> 
> I get 0.33 requests/second as compared to 7 responses/second when
> accessing 50kB static file served by IIS.
> 
> I also tried the same little program using paste.httpserver and that
> version works fast as expected.
> 
> I cannot really understand this behavior. My only thought is that the
> wsgiref version is sending the data in many chunks, and therefore the
> latency of the VPN comes into play. But I don't really know how to
> test this.

One possible answer is that wsgiref doesn't disable the Nagle algorithm
[1].
Try changing WSGIServer.server_bind to read:

def server_bind(self):
"""Override server_bind to store the server name."""
import socket
self.socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY,
1)
HTTPServer.server_bind(self)
self.setup_environ()



Robert Brewer
[EMAIL PROTECTED]

[1] http://en.wikipedia.org/wiki/Nagle's_algorithm

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Alternative to threading.local, based on the stack

2008-07-08 Thread Robert Brewer
Donovan Preston wrote:
> On Jul 8, 2008, at 2:31 PM, Phillip J. Eby wrote:
> > Er, and how do you propose people *access* that interface rather
> > than a specific implementation of it?  Wouldn't we need to pass it
> > in the environ, thereby rendering the whole thing even more
> > obviously moot?  :)
> 
> You're right. A standard specific implementation is what I am
> suggesting. Here, code should help:
> 
> 
> ## requestlocal.py
> 
> ## use thread-local storage as the default
> from threading import local
> 
> def set_local_implementation(imp):
>  global local
>  local = imp
> 
> 
> If a wsgi server wants to implement request-local storage by using the
> environ, it would call set_local_implementation with an imp function
> that closes over the environ for each request.

And what package does requestlocal.py live in?


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Alternative to threading.local, based on the stack

2008-07-07 Thread Robert Brewer
Matt Goodall wrote:
> Yes, it can be tedious but I believe explicit arg passing
> is necessary to make code readable, testable and reusable.
> ...
> I've made the mistake of relying on magic contexts in the
> past. I'm still trying to fix things.

Can you elaborate?


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Alternative to threading.local, based on the stack

2008-07-04 Thread Robert Brewer
Benji York wrote:
> On Fri, Jul 4, 2008 at 9:23 AM, Iwan Vosloo <[EMAIL PROTECTED]> wrote:
> > On Fri, 2008-07-04 at 13:39 +0100, Matt Goodall wrote:
> >> The ideal solution is, of course, to pass everything around to
> whatever
> >> needs it. However, there's really tedious at times.
> >>
> >> Whatever the architecture of the web server there is always a
> request
> >> or, in case of WSGI, an env dict. Therefore, request-scope objects
> >> should be associated with the request.
> >
> > True, but even passing a request or env dict around to everyone gets
> > tedious don't you think?
> 
> It can.  Zope 3 makes a pretty good compromise here.  The "top level"
> object involved in handing the request -- a view -- gets the request
> object explicitly passed as a parameter.  If the view wants to pass
the
> request to function calls or other objects, then it's free to do so.
> 
> But, if at some point you find yourself without a reference to the
> current request and really need it, you can get it "out of thin air"
by
> calling (essentially) get_request().
> 
> The Zope 3 publisher precesses requests using a thread pool, so
> get_request() is implemented by stashing the request object in the
> tread-local storage prior to processing the request and digging it
back
> out if requested.
> 
> Other implementations could store the request somewhere else, but the
> idea is the same.

CherryPy does something similar. The "top level" object involved in
handing the request -- cherrypy.serving -- gets the request and response
objects set as attributes. But instead of calling get_request() as in
Zope 3, there are proxy objects sitting at cherrypy.request and
cherrypy.response which shuttle getattr and setattr to
cherrypy.serving.request/response. That allows app code to just "import
cherrypy" and have access everywhere.

Now, cherrypy.serving _is_ a threadlocal object. But I don't imagine it
would be difficult for a non-threaded HTTP server to replace
cherrypy.serving with some other-context-local if they liked.


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] urlparse method behaviour when handing abs/rel urls

2008-06-27 Thread Robert Brewer
Fred Drake wrote:
> On Fri, Jun 27, 2008 at 3:01 PM, O.R.Senthil Kumaran
> <[EMAIL PROTECTED]> wrote:
> > BTW, commonly when someone writes 'www.python.org', we tend to
> > understand that he is referring to net_loc. Is it not?
> > And also, when we type 'www.python.org' at Address Location in the
> > Browser, it automatically gets translated to http://www.python.org
as
> > the full url and www.python.org becomes net_loc in this case.
> 
> There are two cases here:
> 
> 1. Relative URLs in a context that has a base URL (inside a resource
> loaded from a URL, or in an (X)HTML document that includes a 
> element).
> 
> 2. Abreviated URLs in a user interface that implies no context with a
> base URL (like the browser's address bar).
> 
> I'd suggest that these are completely different.  urlsplit and
> urlparse support 1.  If we want the second, that should be a separate
> function.  It would be reasonable to add that to the urlparse module
> (urllib.parse in Python 3).

There's even a 3rd case: HTTP's Request-URI. For example, '//path' must
be treated as an abs_path consisting of two path_segments ['', 'path'],
not a net_loc, since the Request_URI must be one of ("*" | absoluteURI |
abs_path | authority).


Robert Brewer
[EMAIL PROTECTED]

See
http://www.cherrypy.org/browser/branches/815-urljoin/cherrypy/wsgiserver
/__init__.py#L247 for an implementation.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] [proposal] merging jsonrpc into xmlrpc

2008-04-08 Thread Robert Brewer
Ian Bicking wrote:
> Alan Kennedy wrote:
> > Perhaps some pythonista from Web-SIG is most appropriate to advise
> > how JSON-RPC should move forward? After all, we're more accustomed
to
> > server-side stuff than those javascript folks ;-)
> 
> Let it die?  It is more complicated than necessary, when instead you
> could just make each function a URL of its own, and POST the arguments
> and get back the response, with 500 Server Error for errors.  It's
hard
> to spec that up because it's too simple.

Yup. We just built one of those at work, with the added bonus that a GET
of the same URI returns an HTML form for submitting the right JSON in a
POST. The HTML also shows function metadata (args and return type). A
GET on the parent path returns a bunch of links to GET the children.
Handy and almost RESTful if you squint and call each function a
"resource". I'd still rather have real resources that align with
application state, but it makes for a good transition strategy from
existing RPC mechanisms.


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Time a for JSON parser in the standard library?

2008-04-05 Thread Robert Brewer
John Millikin wrote:
> On Sat, Apr 5, 2008 at 7:01 PM, Robert Brewer <[EMAIL PROTECTED]> wrote:
> > Re: Representation of Fractional Numbers, there are two solutions. If you
> > return decimals, people using JS on the other end are going to call 
> > float(d).
> > If you return floats, people not using JS on the other end are going to go
> > use a different library. I suggest the former is more acceptable than the
> > latter for a stdlib offering. Allowing the caller of parse() to choose
> > would be even better.
> 
> I don't understand what you mean, here. generate ([decimal.Decimal ('1.1')])
> -> '[1.1]', so a JavaScript user calling eval() on it would get a standard
> JavaScript float object without having to call float() explicitly.

Sorry, I wasn't describing what anyone would do in Javascript. Pythonistas 
receiving JSON numbers from a JS *sender*, who want Python floats, can call 
float(d) if they like if you hand them a Decimal object. Annoying but easy. 
People receiving JSON numbers from, say, a Python sender, can't call Decimal(f) 
if you hand them a float instance, at least not reliably. So they'll either go 
use some other jsonlib (bad) or start passing numbers in strings (worse).


Robert Brewr
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Time a for JSON parser in the standard library?

2008-04-05 Thread Robert Brewer
John Millikin wrote:
> I've written a rough draft of a PEP for standard library inclusion,
> attached to this email. Comments/improvements welcome - I tried to
> leave most of the differences between modules in the "Issues" section.

Re: Representation of Fractional Numbers, there are two solutions. If you 
return decimals, people using JS on the other end are going to call float(d). 
If you return floats, people not using JS on the other end are going to go use 
a different library. I suggest the former is more acceptable than the latter 
for a stdlib offering. Allowing the caller of parse() to choose would be even 
better.


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Clarifications on Python 3.0 and WSGI.

2008-03-25 Thread Robert Brewer
Graham Dumpleton wrote:
> 3. When running under Python 3, servers MUST provide CGI HTTP
> variables as strings, decoded from the headers using HTTP standard
> encodings (i.e. latin-1 + RFC 2047)
> 
> Can someone give a practical example of where RFC 2047 fits into this
> and how one is meant to handle it?

Sure. According to RFC 2616 sec 2.2:

   Words of *TEXT MAY contain characters from character sets other than
   ISO-8859-1 only when encoded according to the rules of RFC 2047.

>From CP's test suite [1]:

def ifmatch(self):
val = cherrypy.request.headers['If-Match']
cherrypy.response.headers['ETag'] = val
return repr(val)

...

# Test RFC-2047-encoded request and response header values
c = "=E2=84=ABngstr=C3=B6m"
self.getPage("/headers/ifmatch", [('If-Match', '=?utf-8?q?%s?=' %
c)])
self.assertBody("u'\\u212bngstr\\xf6m'")
self.assertHeader("ETag", '=?utf-8?b?4oSrbmdzdHLDtm0=?=')

That is, CherryPy-the-app-framework decodes the request header
'If-Match' from '=?utf-8?q? =E2=84=ABngstr=C3=B6m?=' to
u'\\u212bngstr\\xf6m'. See [2] for where that happens. PEP 333 only
talks about 2047 encoding, not decoding, and also says "All
encoding/decoding must be handled by the application", so we made the CP
WSGI server pass 2047-encoded request headers through unmodified.

FYI, there's been a lot of talk lately on the http-bis WG about using
some mechanism other than RFC 2047 in the future.


Robert Brewer
[EMAIL PROTECTED]

[1]
http://www.cherrypy.org/browser/trunk/cherrypy/test/test_core.py#L867
[2] http://www.cherrypy.org/browser/trunk/cherrypy/_cprequest.py#L620

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Time a for JSON parser in the standard library?

2008-03-24 Thread Robert Brewer
Bob Ippolito wrote: 
> I chose to only support the basic types out of the box, but you can
> specialize the decoder and encoder any way you want, e.g. to provide a
> JSON serialization scheme where you get a deque out if you put one in.
> I can imagine scenarios where you would want to encode decimal as a
> string for example, because the other end is probably going to parse
> JSON numbers as doubles.

Thanks for allowing that; the last 3 JSON projects I've been involved
with have had Python on both ends, and always passed Decimal, never
floating-point.


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Time a for JSON parser in the standard library?

2008-03-20 Thread Robert Brewer
Bob Ippolito wrote:
> On Thu, Mar 20, 2008 at 5:50 PM, Robert Brewer <[EMAIL PROTECTED]>
> wrote:
> > Deron Meranda wrote:
> > > And even then, we're not just talking about a JSON parser.
> > > We're all doing more than that; we're mapping Python to JSON.
> > > And there is no definitive spec for that.  Just look at my
> > > numbers tests; there are a lot of differences in how numeric
> > > mappings are done, but yet many of them can be arguably
> > > "correct" while still doing things differently.
> >
> >  ...which IMO argues that any json implementation that goes
> > in the stdlib needs to at least allow access to the raw bytes
> > in both directions. For example, if you really want JSON
> > numerals to become Python decimals, you shouldn't be forced
> > to lose information just because the json decoder was only
> > designed to hand you a float. Arbitrary converter plugins would
> > be icing on the cake. A built in decimal converter would be
> > heaven. :)
> 
> That can be easily done, but at the expense of speed or clarity in the
> implementation... I'd be willing to add some hooks to simplejson that
> allow people to pass in their own functions that turn JSON terms (as
> strings) into Python objects.

That'd be great! I expect a speed penalty of course, and IMO most of that 
should be pushed onto anyone passing in functions, rather than making everyone 
pay.


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Time a for JSON parser in the standard library?

2008-03-20 Thread Robert Brewer
Deron Meranda wrote:
> And even then, we're not just talking about a JSON parser.  We're all
> doing more than that; we're mapping Python to JSON.  And there is
> no definitive spec for that.  Just look at my numbers tests; there are
> a lot of differences in how numeric mappings are done, but yet many
> of them can be arguably "correct" while still doing things
differently.

...which IMO argues that any json implementation that goes in the stdlib
needs to at least allow access to the raw bytes in both directions. For
example, if you really want JSON numerals to become Python decimals, you
shouldn't be forced to lose information just because the json decoder
was only designed to hand you a float. Arbitrary converter plugins would
be icing on the cake. A built in decimal converter would be heaven. :)


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Web Dev Pad

2008-03-13 Thread Robert Brewer
Just an update and reminder. We're in room 1021 of the Marriott Renaissance 
through Sunday night. Looks like we have a good-sized crew already and it's 
only Thursday. :) Call anytime!
 
Robert Brewer
[EMAIL PROTECTED]



From: [EMAIL PROTECTED] on behalf of Robert Brewer
Sent: Sat 3/8/2008 12:58 PM
To: web-sig@python.org
Subject: [Web-SIG] Web Dev Pad



Hello, all you Python web tool developers! Like last year, Chad Whitacre
(author of Aspen) and I (lead dev of CherryPy) are going to get a suite
and run the Web Dev Pad again. Come on by any evening from Thursday
night to Sunday night--we'll be up late serving the three M's
(mudslides, margaritas, and martinis) and plenty of camaraderie in our
living room at the Marriott Renaissance [1]. It's a few blocks from the
conference hotel but well worth the trip; try to call if it's before 8pm
to make sure we're not both out to dinner. We're open as late as you
like to anyone who works on web libraries, servers, or frameworks (or
their friends; don't let domain boundaries stop you ;).


Robert Brewer
[EMAIL PROTECTED]
619 374 1117

[1]
http://www.marriott.com/hotels/travel/chibr-renaissance-chicago-ohare-su
ites-hotel/
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/fumanchu%40aminus.org


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] Web Dev Pad

2008-03-08 Thread Robert Brewer
Hello, all you Python web tool developers! Like last year, Chad Whitacre
(author of Aspen) and I (lead dev of CherryPy) are going to get a suite
and run the Web Dev Pad again. Come on by any evening from Thursday
night to Sunday night--we'll be up late serving the three M's
(mudslides, margaritas, and martinis) and plenty of camaraderie in our
living room at the Marriott Renaissance [1]. It's a few blocks from the
conference hotel but well worth the trip; try to call if it's before 8pm
to make sure we're not both out to dinner. We're open as late as you
like to anyone who works on web libraries, servers, or frameworks (or
their friends; don't let domain boundaries stop you ;).


Robert Brewer
[EMAIL PROTECTED]
619 374 1117

[1]
http://www.marriott.com/hotels/travel/chibr-renaissance-chicago-ohare-su
ites-hotel/
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Cookie, cookielib; what to do?

2008-02-28 Thread Robert Brewer
Brett Cannon wrote:
> So my question is what do people see as a possible naming scheme for
> these modules? Cookie has to be renamed because of its PEP 8
> violation.  Here are some ideas::
> 
>  cookielib -> cookielib
>  Cookie -> cookielib2 (with plans to move what needs to go from Cookie
> into cookielib at some point and to deprecate cookielib2 in 3.x).
> 
>   cookielib -> http.cookies
>   Cookie -> http.cookies2 (same thinking as above).
> 
>   cookielib -> cookies.client
>   Cookie -> cookies.parsing

I'd propose:

 Cookie -> http.cookies
 cookielib -> http.cookiejar


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Choosing one of two options for url* in the stdlib reorg

2008-02-28 Thread Robert Brewer
Brett Cannon wrote:
> With PyCon approaching and having other stuff on my plate to deal with
> I want to try to reach a consensus on the whole
> urllib/urllib2/urlparse situation for the stdlib reorg in Python 3.0
> and get it settled.
> 
> So, two options for people to show support for.  One is to keep
> everything and get cute with the naming::
> 
>   urlparse -> url.parse
>   urllib -> url.fetch
>   urllib2 -> url.request
> 
> The second option is to ditch urllib, move the handy quoting tools
> into either their own module or into what is currently urllib2::
> 
>   urlparse -> url.parse
>   urllib -> GONE
>   urllib's utility functions -> url.quote
>   urllib2 -> url.request

+0.5 for this second option. But what will happen to all the other names
urllib2 currently imports from urllib that aren't "quotey"?

from urllib import (unwrap, unquote, splittype, splithost,
 addinfourl, splitport, splitgophertype, splitquery,
 splitattr, ftpwrapper, noheaders, splituser, splitpasswd,
splitvalue)

I wouldn't mind if "urllib's utility functions" just moved into
url.parse. Then we'd have one module for parsing (and unparsing) URL's
and one for actually using them.


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Dealing with urllib, urllib2, and urlparse

2008-02-20 Thread Robert Brewer
Brett Cannon wrote:
> So the conundrum here is that urllib and urllib2 are both used
> extensively and both have their key function named urlopen().  MAL has
> suggested:
> 
> urllib -> url.fetch
> urllib2 -> url.request
> urlparser -> url.parse
> 
> which I am liking. But I figured I would ask if there is any remote
> chance the this SIG has plans to either merge urllib and urllib2 or
> come up with a new module, or something before 3.0 comes out.
> Otherwise MAL's names will probably be the suggested new names and one
> can hope at some point one of the urllib* modules can go away.

I like the above too, and I can't recall anyone talking about merging or
replacing urllib(2) in the multiple years I've been on this list. :) So
+1.


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Merge Cookie and cookielib?

2008-02-05 Thread Robert Brewer
Brett Cannon wrote:
> So my question is whether you all would be up for handling a merging
> of Cookie and cookielib for 2.6?

I appreciate the thought and effort for a smooth transition, but -1 on
this idea if I understand it correctly.

We have no plans to write a version of CherryPy which runs on both 2.x
and 3.x, and even fewer plans to try autogenerating any part of
CherryPy-for-Python-3 from the existing code, despite all the hard work
on the 2to3 project. It's all going to be ported to Python 3 by hand to
ensure our invariants are maintained. Therefore, rearranging any modules
from 2.5 to 2.6 just makes more work for our team in the short term
(lots of sys.version checking since we're still supporting Python 2.3)
with zero gain in the long term. I'd rather just make one static name
change (e.g. from "import Cookie" to "import cookie.server") in the new
cp-python-3 branch and be done.


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] URL quoting in WSGI (or the lack therof)

2008-01-23 Thread Robert Brewer
James Y Knight wrote:
> ...as there is simply no way to represent "some%2Fthing/
> shallow/" with PATH_INFO, as specified in the CGI spec, the only
> alternative is to reject the request. This is what the major servers
> do today.
>
> > Anyone else thinks it's a bug in WSGI too?
> 
> WSGI is based upon CGI and inherits this behavior. I suppose a WSGI-
> specific fix could be done. However, there are good reasons for
> inheriting behavior from CGI, most importantly, ease of integration.
> Servers already implement this behavior for CGI SCGI FastCGI PHP, and
> now, WSGI. None of the previous have seen it as important enough an
> issue to change this behavior, and neither do I think it important
> enough for WSGI.
> 
> So, no, I don't consider it a bug in WSGI. You could call it a bug in
> CGI if you like. Good luck getting it changed.

I consider it a bug in both, and the difficulty level of changing the
CGI behavior really has no bearing on our decision to do better with
WSGI. I think it's important that we allow the full range of URI's to be
accepted. If you go and stick Apache in front of your WSGI app, it will
still 404, sure; but that's your choice to use Apache or not. There's no
sense making WSGI a least common denominator, inheriting all the
limitations of all the existing web servers.


Robert Brewer
[EMAIL PROTECTED]

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] URL quoting in WSGI (or the lack therof)

2008-01-21 Thread Robert Brewer
Luis Bruno wrote:
> Robert Brewer wrote:
> > > IMHO [changing CP's wsgiserver to do decoding] is the wrong answer
> > Why?
> >
> Because then I'm stuck monkey patching every WSGI server (and/or stuck
> using my own URL dispatcher) so that I don't lose the information that
> one of the forward slashes is NOT a path delimiter. You said that
> %-encoding is meant for slashes not participating in hierarchy
> semantics, if I read you correctly; so I think you'll agree with me on
> this.

Ah. Now I see. We've had a test case for this since Nov 2005 [1]. FWIW,
CherryPy took the option of special-casing forward slashes; those are
the only characters which are *not* decoded--they are left as %2F
characters in SCRIPT_NAME and PATH_INFO [2]:

# Unquote the path+params (e.g. "/this%20path" -> "this path").
# http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2
#
# But note that "...a URI must be separated into its components
# before the escaped characters within those components can be
# safely decoded." http://www.ietf.org/rfc/rfc2396.txt, sec 2.4.2
atoms = [unquote(x) for x in quoted_slash.split(path)]
path = "%2F".join(atoms)
environ["PATH_INFO"] = path

...and CherryPy then decodes these on the WSGI-app-side, only after the
dispatching step (to produce "virtual path" atoms) [3]:

if func:
# Decode any leftover %2F in the virtual_path atoms.
vpath = [x.replace("%2F", "/") for x in vpath]
request.handler = LateParamPageHandler(func, *vpath)
else:
request.handler = cherrypy.NotFound()

You're absolutely right; it would be nice to standardize a solution to
this. Of course, I'm going to propose we standardize *our* solution. ;)

> I'll see your CGI draft and raise you the URI spec.

Heh. Quoted in the code comments above.


Robert Brewer
[EMAIL PROTECTED]

[1] cf http://www.cherrypy.org/ticket/393
[2]
http://www.cherrypy.org/browser/trunk/cherrypy/wsgiserver/__init__.py#L3
14
[3] http://www.cherrypy.org/browser/trunk/cherrypy/_cpdispatch.py#L71

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] URL quoting in WSGI (or the lack therof)

2008-01-19 Thread Robert Brewer
Luis Bruno wrote:
> I'm using a /-delimited path, %-encoding each literal '/' appearing in
> the path segments. I was not amused to see egg:Paste#http urldecoding
> the whole PATH_INFO.

All HTTP URI are /-delimited, and any '/' appearing in a single segment
that is not intended to participate in the hierarchy semantics must be
%-encoded before transmitting it over HTTP. I think that's what you're
saying above, but I don't understand why decoding on the server or
gateway is a problem. Perhaps you could expand on that: when you say
"I'm using", where is that? Inside a WSGI application?

> Ben Bangert wrote:
> > This recently became an issue, when a user noticed that the %2B URL
> > encoding for a + sign, had turned into a space when it hit their
app.
> 
> A swift monkey-patch to
paste.httpserver.py:WSGIHandlerMixin.wsgi_setup()
> later, and ORIGINAL_PATH_INFO is part of the WSGI spec in my world.
> The following URL now Does The Right Thing:
> 
> http://127.0.0.1:5000/catalog/NEC/Computers/Laptops/LN500%2F9DW/

Platonic Capital Letters won't get you very far with this crowd. You
have to explain why you think the application should receive %XX encoded
URI's instead of decoded ones. What's the benefit? I only see a con:
every piece of middleware that cares has to repeat the decoding of
PATH_INFO and SCRIPT_NAME, wasting CPU and memory.

> Robert Brewer wrote:
> > I changed CP's wsgiserver to do decoding that very day.
> > So I think the answer is "yes".
> 
> IMHO "yes" is the wrong answer

Why?

> I am also very unsure about what is the right answer.

According to [1], the right answer is "yes":

The PATH_INFO metavariable specifies a path to be interpreted
by the CGI script. It identifies the resource or sub-resource
to be returned by the CGI script, and it is derived from the
portion of the URI path following the script name but preceding
any query data. The syntax and semantics are similar to a
decoded HTTP URL 'path' token (defined in RFC 2396 [4]), with
the exception that a PATH_INFO of "/" represents a single void
path segment.


Robert Brewer
[EMAIL PROTECTED]

[1] http://cgi-spec.golux.com/draft-coar-cgi-v11-03-clean.html#6.1.6

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] URL quoting in WSGI (or the lack therof)

2008-01-18 Thread Robert Brewer
Ben Bangert wrote:
> I unfortunately couldn't find anything in the WSGI spec to indicate
> whether or not I could expect environ variables relating to the URL to
> be URL decoded when I get them or whether they reflect the raw URL
> that was sent to the browser.
> 
> This recently became an issue, when a user noticed that the %2B URL
> encoding for a + sign, had turned into a space when it hit their app.
> Sure enough, Paste was doing URL un-quoting, then Routes, and the
> double URL un-quote resulted in the + being a space.
> 
> Is there some definitive word on whether a WSGI application should
> expect to have it URL un-quoted or not?

The last time I asked that question here [1], Phillip kindly pointed out
to me that that's defined by the CGI spec. I could go through the agony
of distributed English-obfuscated BNF analysis again, but I'll just note
that I changed CP's wsgiserver to do decoding that very day. So I think
the answer is "yes".


Robert Brewer
[EMAIL PROTECTED]

[1] http://mail.python.org/pipermail/web-sig/2006-August/002230.html

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] serving (potentially large) files through wsgi?

2007-12-23 Thread Robert Brewer
Graham Dumpleton wrote:
> On 22/12/2007, Brian Smith <[EMAIL PROTECTED]> wrote:
> > Manlio Perillo wrote:
> > > Instead of using sys.stderr, a better solution is to add a new log
> > > object to the WSGI environment dictionary, so that each
> > > application can have its error log redirected to different files.
> >
> > I agree, but (a) that would have to be standardized somewhere to be
> > useful, and (b) you still have to deal with code that isn't aware of
> > this new functionality--especially libraries that are not WSGI-
> > specific, and existing WSGI 1.0 applications.
> 
> The more and more that this discussion goes on, the conclusion I am
> coming to is that WSGI applications should simply not be using the web
> server log files for application logging at all.

I still say the answer to "should logging be done by the application or
server?" is "neither". We need a component that covers the "everything
else" of WSGI; that is, the environment in which servers and
applications are instantiated, connected, started, stopped, and shut
down. Logging should be offered by that component.

http://www.cherrypy.org/wiki/WSPBSpec


Robert Brewer
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] serving (potentially large) files through wsgi?

2007-12-17 Thread Robert Brewer
Chris Withers wrote:
> Robert Brewer wrote:
> > Apache will interfere, and try to re-apply the range to whatever you
> > emit. The only solution we've found so far is to tell the app to
> ignore
> > any 'Range' request header when running behind Apache, and just let
> > Apache have its way. See http://www.cherrypy.org/changeset/1319
> 
> I've never had any problems with Apache proxying to Zope for this
> stuff...
> 
> I wonder why the proxy setup seems to make it safe?

Because proxying bypasses a lot of the Apache internals. The
re-application of Range headers I described was in a mod_python setup. I
would guess mod_wsgi would exhibit the same problem.


Robert Brewer
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] serving (potentially large) files through wsgi?

2007-12-17 Thread Robert Brewer
Chris Withers wrote:
> Manlio Perillo wrote:
> > 2) handle the range request in the WSGI application.
> >Its not hard as long as you do not implement multiple ranges
> support.
> >
> >If your object database supports seeks, this should be the most
> >efficient solution.
> 
> This is probably what's wanted. So, if a wsgi app does its own range
> handling, the wsgi server won't interfere?

Apache will interfere, and try to re-apply the range to whatever you
emit. The only solution we've found so far is to tell the app to ignore
any 'Range' request header when running behind Apache, and just let
Apache have its way. See http://www.cherrypy.org/changeset/1319


Robert Brewer
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] multi-threaded or multi-process wsgi apps

2007-11-26 Thread Robert Brewer
Graham Dumpleton wrote:
> On 27/11/2007, Robert Brewer <[EMAIL PROTECTED]> wrote:
> > Chris Withers wrote:
> > > Right, I'm curious as to how wsgi applications end up being
> > > multi-threaded or multi-process and if they are, how they share
> > > resources such as databases and configuration.
> > >
> > > There's a couple of reasons I'm asking...
> > >
> > > The first was something Chris McDonough said about one ofthe
issues
> > > they're having with the repoze project: when using something like
> > > mod_wsgi, it's the first person to hit each thread that takes the
> hit
> > > of loading the configuration and opening up the zodb. Opening the
> > ZODB,
> > > in particular, can take a lot of time. How should repoze be
> structured
> > > such that all the threads load their config and open their
> databases
> > > when apache is restarted rather than when each thread is first
hit?
> >
> > If I were coding it, repoze would use a database connection pool
that
> is
> > populated at (sub)process startup.
> 
> The issue with running under Apache, whether it be mod_wsgi or
> mod_python, is that the server itself doesn't necessarily know
> anything about what applications may actually need to be loaded. This
> is because both support the concept of sticking the file representing
> the entry point to the application in some file system directory. The
> first that the server knows about the application is when a URL
> arrives which maps to that application file.
> 
> Thus, in the general case one cant have pre initialisation at
> (sub)process startup. To have pre initialisation means providing an
> explicit means of configuring the server to say that it is possible
> that some application may get invoked through a URL and so just in
> case it should preload the application.
> 
> Because it involves changing main server configuration, obviously can
> only be used as an option where you control the actual web server.
> There would be no way you could use such an option if you were just a
> user in a paid shared web hosting environment. In that case you can't
> avoid doing delayed initialisation at time that first request arrives.
> 
> This is the big difference between Apache and pure Python hosting
> solutions. That is that Apache has to deal with potential shared
> hosting issues. Pure Python hosting solutions would probably always be
> under direct control of the user and be only running their own code.

True, but that doesn't change my recommendation. Even if you're willing
to live with delays on the first request, you still should do as much as
possible as early as possible. Any server, application, or framework
which *requires* me to live with those delays even though I've taken
pains to deploy in a capable, controllable environment would make me
seriously question their utility.


Robert Brewer
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] multi-threaded or multi-process wsgi apps

2007-11-26 Thread Robert Brewer
Chris Withers wrote:
> Right, I'm curious as to how wsgi applications end up being
> multi-threaded or multi-process and if they are, how they share
> resources such as databases and configuration.
> 
> There's a couple of reasons I'm asking...
> 
> The first was something Chris McDonough said about one ofthe issues
> they're having with the repoze project: when using something like
> mod_wsgi, it's the first person to hit each thread that takes the hit
> of loading the configuration and opening up the zodb. Opening the
ZODB,
> in particular, can take a lot of time. How should repoze be structured
> such that all the threads load their config and open their databases
> when apache is restarted rather than when each thread is first hit?

If I were coding it, repoze would use a database connection pool that is
populated at (sub)process startup. The main thread is the only one
"loading config". That avoids any waits during the HTTP request, so your
req/sec rate will go way up. It also allows the process to fail fast in
the event of unreachable databases, so such errors during deployment
will be found sooner and will be easier to debug if they occur outside
of an HTTP request.

It's like a stage production: you don't ask your actors to buy props and
build the set during the show--instead, you buy/build all that and
script/debug/automate the hell out of it before you have an audience.
All long-running servers are a lot like that; do everything you can
before the first request to make absolutely sure nothing slows or stops
you during showtime.

> The second is a problem I see an app I'm working on heading towards.
> The app has web-alterable configuration, so in a multi-threaded and
> particular multi-process environment, I need some way to get the other
> threads or processes to re-read their configuration when it has
> changed.

In a multithreaded environment, I recommend apps read config only at
process startup, parse the entries and use them to modify live objects,
and then throw away the config. Then, if you need to make changes to
settings while live, you just modify the live objects in the same way
the config parsing step did (and then modify the config file only if
desired). That avoids having to re-read the whole config file for each
potential change. In a multiprocess environment, you can notify other
process with any of various forms of IPC or shared state mechanisms.


Robert Brewer
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] about the status line in WSGI

2007-10-19 Thread Robert Brewer
Manlio Perillo wrote:
> Is a WSGI gateway allowed to ignore the Reason-Phrase part of the
> status line returned by the WSGI application, and to use a server
> defined phrase?

I would be sad if a WSGI gateway did that to me. Why deny a web
application developer the right to control that part of the output?


Robert Brewer
[EMAIL PROTECTED]

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.488 / Virus Database: 269.15.0/1077 - Release Date: 10/18/2007 
9:54 AM
 
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Multiple message-header fields handling

2007-10-02 Thread Robert Brewer
Manlio Perillo wrote:
> The HTTP 1.1 protocol (section 4.2) says that:
> """Multiple message-header fields with the same field-name MAY be 
> present in a message if and only if the entire field-value for that 
> header field is defined as a comma-separated list [i.e., #(values)]."""
> 
> This can happen, as an example, with the Cookie header.
> 
> My question is: how should this be handled in WSGI?
> 
> As an example Nginx stores all the headers in a associative array, 
> where, of course, only the "last seen" headers appears.
> 
> However common multiple message-headers are stored in the request struct.
> 
> Since the WSGI environment is a dictionary with keys and values of type 
> str, should an implementation:
> """combine the multiple header fields into one "field-name: field-value" 
> pair, without changing the semantics of the message, by appending each 
> subsequent field-value to the first, each separated by a comma."""
> ?

Yes, it should. As you note, it's part of the HTTP spec that such headers
can be combined without changing the semantics. Here's a list of the
headers that need to be folded:

comma_separated_headers = ['ACCEPT', 'ACCEPT-CHARSET', 'ACCEPT-ENCODING',
'ACCEPT-LANGUAGE', 'ACCEPT-RANGES', 'ALLOW', 'CACHE-CONTROL',
'CONNECTION', 'CONTENT-ENCODING', 'CONTENT-LANGUAGE', 'EXPECT',
'IF-MATCH', 'IF-NONE-MATCH', 'PRAGMA', 'PROXY-AUTHENTICATE', 'TE',
'TRAILER', 'TRANSFER-ENCODING', 'UPGRADE', 'VARY', 'VIA', 'WARNING',
'WWW-AUTHENTICATE']

The only tricky one is Cookie, because e.g. Konqueror sends them on
multiple lines, but they're not foldable.

See http://kristol.org/cookie/errata.html

> Ngins does not do this (and I don't know what Apache does).
> 
> 
> Another question: when an header has an empty field value, what should 
> be set in the environment: an empty string or None?

An empty string, or omit them entirely:

"""The following variables must be present, unless their value would
be an empty string, in which case they may be omitted, except as
otherwise noted below...

HTTP_ Variables
""".


Robert Brewer
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Another WSGI gotcha to be aware of

2007-08-25 Thread Robert Brewer
Phillip J. Eby wrote:
> At 11:45 PM 8/24/2007 -0700, Robert Brewer wrote:
> >However, and here's the rub, if nextapp() raises an
> >exception, **self.response is never bound**, and
> >therefore we have no handle to the object we need
> >to close. Note that this is not a middleware-only
> >problem; servers can run into this too.
> >
> >The spec says, "In general, applications *should* try to
> >trap their own, internal errors"; we might want to make
> >that a MUST in future versions. Alternately, we could
> >require that every application provide its resource-
> >releasing endpoint via some means other than a successful
> >response. I'm sure you all can come up with other solutions.
> 
> I don't see a problem here to solve.  If the application
> didn't return a response, the middleware naturally isn't
> obligated to call close() on it.

Sorry; I didn't mean to imply that WSGI server interfaces
need to be fixed in an way.

As the author of a WSGI application interface, it means
you have to call your own resource cleanup code if an
exception is raised while you're being called (as opposed
to having the response iterated over):

def __call__(self, environ, start_response):
try:
return self.respond(environ, start_response)
except:
self.cleanup()
raise

Some applications don't do this, so my primary message
is to authors of WSGI application interfaces (including
all middleware) to check your code and make sure you do
it, rather than just let errors propagate out.

Some application authors may have chosen to not do this
because the spec says, "should try", not "must". So I'm
thinking of ways to improve that situation, all the way
from "do nothing" to "clarify the language in the spec"
to different conversation models in future specs.


Robert Brewer
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] Another WSGI gotcha to be aware of

2007-08-24 Thread Robert Brewer
Hi all,

I just found another corner case in the WSGI spec
that I thought I'd share so you all can check your
WSGI components for similar problems. Basically,
it comes down to error handling. Here's a simple
example:


class Middleware(object):

def __init__(self, nextapp, environ, start_response):
try:
self.response = nextapp(environ, start_response)
self.iter_response = iter(self.response)
return
except SomeException:
self.close()

def close(self):
if hasattr(self.response, "close"):
self.response.close()

def __iter__(self):
return self

def next(self):
try:
return self.iter_response.next()
except AnotherException:
self.close()
raise StopIteration

As you know, all WSGI middleware (that doesn't just pass
through the response from "nextapp") must itself possess
a "close" method so that the response from "nextapp" can
have *its* close method called. In this example, that
would be "self.response.close".

However, and here's the rub, if nextapp() raises an
exception, **self.response is never bound**, and
therefore we have no handle to the object we need
to close. Note that this is not a middleware-only
problem; servers can run into this too.

The spec says, "In general, applications *should* try to
trap their own, internal errors"; we might want to make
that a MUST in future versions. Alternately, we could
require that every application provide its resource-
releasing endpoint via some means other than a successful
response. I'm sure you all can come up with other solutions.


Robert Brewer
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] Web Bus event graphs

2007-06-28 Thread Robert Brewer
Graham Dumpleton wrote:
> A question about about the idea of bus.start() like event to 
> indicate startup.
> 
> Problem with this is that under mod_wsgi the actual web server child
> process has possibly started long before a request may come in which
> targets a specific WSGI application. This is because loading of a WSGI
> application is effectively done by lazy loading, ie., code file only
> gets loaded when URL for a request maps to it.
> 
> This is different to where a Python based web server is used as
> generally one would in the program script load in the WSGI application
> before you even start the web server, as you would need to get the
> application entry point to be able to construct the fixed URL entry
> point for the root. Pylons and Paste may be an exception to this as
> not sure at what point it actually will load things.
> 
> How do you see being able to handle a startup like event in that case
> for a WSGI application when they aren't effectively being preloaded?
> How would you notify just that one application when it does finally
> get loaded, or do you?

In terms of the "site event bus" model, I would just say that lazy
applications join the start/stop cycle a bit later. They miss the first
"start" notification, so they'd either have to not subscribe to the
'start' channel at all, or would have to call their start listeners
manually on load/first request.

> ...the actual web server child process has possibly started
> long before a request may come in...

That reminds me, I wanted to also discuss another potential channel pair
for managing per-thread resources. CherryPy has an (on_start_thread,
on_stop_thread) pair for registering such callbacks.

Currently, CP invokes *_thread events by checking thread ID's on each
request. If the thread ID has been seen before (there's a set of "seen
thread IDs"), nothing happens; if it hasn't been seen, then
on_start_thread listeners are invoked. Since that chunk of code has to
work with various multithread schemes, the on_stop_thread listeners
aren't called until server shutdown (!).

That's pretty inefficient on its own, but when several WSGI components
in the stack all maintain their own map of seen threads, it becomes
unwieldy pretty quickly. If "the site" could notify such listeners, it
would be more accurate ("thread stop" events would fire when the thread
actually stops) and take less memory, since the site controller would be
the only code with a thread map (and probably already has one anyway).

This isn't limited to threads, by the way. When people talk about
"per-thread" resources, that can usually be safely commuted to
"per-logical-process", where "logical process" encompasses threads,
processes (since they have a main thread, at least), and even tasklets
(Arnar Birgisson is working on a Stackless WSGI server as we speak).


Robert Brewer
System Architect
Amor Ministries
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Web Site Process Bus

2007-06-27 Thread Robert Brewer
Chris McDonough wrote:
> I think I'm mostly confused by the name "process bus" because it  
> seems like the primary use case for something like this is where all  
> of the applications share the same process space and are all written  
> in Python.  Am I right?  If so, maybe a different name is in order?   
> "Application Bus"?  Or even "WSGI Bus", if its presumed that all of  
> the applications will be WSGI applications?

Thinking about this some more, you're right that "process" is not the
appropriate term. This is about coordination between application/server
components and site-wide services, and I was using "process" redundantly
to mean "site". How about Web Site Event Bus instead?


Robert Brewer
System Architect
Amor Ministries
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Web Site Process Bus

2007-06-27 Thread Robert Brewer
Phillip J. Eby wrote:
> Meanwhile, if you get a start call, you must be starting, right?
> So why worry about the state?  It'd be simpler to just use 
> "before/during/after" messages the way Twisted does.  Your "block" 
> example could be replaced by waiting for the "after" message of the 
> desired state, for example.

and I replied:
> That's a possible way to go. My intention was to support both 1)
> examination of the state by external components (for operations other
> than 'block'--progress meters spring to mind) and 2) restrict 
> some state
> transitions if necessary; for example, make bus.start() do nothing (or
> block) unless the state is "STOPPED".

Would it be helpful to just re-use the terms that Twisted does (in
IReactorCore)? The two structures are very similar:

Twisted WSPBus
--- --
core.runningbus.state == states.STARTED
stop()  stop()
'shutdown' events   'stop' channel listeners
'startup' events'start' channel listeners
run()   start()

The only big difference being that IReactorCore.run also starts the main
loop, but that's assumed to be a separate step for WSPBus. Note that
IReactorCore.stop raises an error if not core.running, too.

I'll also note in passing that Twisted "during" and "after" triggers log
on error but don't crash...


Robert Brewer
System Architect
Amor Ministries
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Web Site Process Bus

2007-06-26 Thread Robert Brewer
Phillip J. Eby wrote:
> At 02:17 PM 6/25/2007 -0700, Robert Brewer wrote:
> > Phillip J. Eby wrote:
> > > At 01:51 PM 6/25/2007 -0700, Robert Brewer wrote:
> > > For example, if an error occurs, isn't that an indication that the
> > > component is broken?  Masking the failure just makes it 
> > > less likely the component will get fixed in a timely manner.
> >
> > Yes, the component is broken. However, at runtime, breakage in a
> > CherryPy component shouldn't keep a Quixote component from, say,
> > correctly freeing its DB connections.
> 
> In theory that makes sense, but in practice if you're using 
> priorities because there is a dependency sequence involved, then you 
> now have a new problem, since a component you're relying on having 
> started or stopped first, is now violating its invariants.
> 
> I'm not opposed to logging or catching errors, but I am opposed (in 
> the absence of more specific evidence) to allowing callbacks to 
> propagate unhandled exceptions in the spec, or encouraging event 
> senders to make heroic efforts in the face of unhandled 
> exceptions.  Trying to recover from brokenness is generally not very 
> likely to result in *less* breaking, IMO.

I agree with that in the general case, and specifically for site
startup, which is prone to dependencies and priorities. The specific
case where I feel we need different behavior is when trying to shut down
a site, which rarely involves dependencies and priorities, but can often
lead to increasing damage if an early component errors and remaining
resource cleanup routines are not allowed to run. Maybe we should just
special-case the latter and let the rest fail fast.

> > > Meanwhile, if you get a start call, you must be starting, 
> > > right?  So why worry about the state?  It'd be simpler to
> > > just use "before/during/after" messages the way Twisted does.
> > > Your "block" example could be replaced by waiting for the "after" 
> > > message of the desired state, for example.
> >
> > That's a possible way to go. My intention was to support both
> > 1) examination of the state by external components (for
> > operations other than 'block'--progress meters spring to mind)
> > and 2) restrict some state transitions if necessary; for example,
> > make bus.start() do nothing (or block) unless the state is
"STOPPED".
> 
> Progress meters can be handled by callbacks, too.

With sufficent complexity, yes.

> As for the restrictions, who benefits?  ISTM that components need
> to manage their own lifecycles anyway and should be idempotent
> with respect to repeated transitions.

Good point. Requiring idempotent operations would allow fun sequences
like:

>>> bus.exit()
>>> bus.exit()
>>> help!
SyntaxError: invalid syntax
>>> os.unlink(errant_lock_file)
>>> bus.exit()
...process finally exits...

That would also allow someone to call bus.exit() in the middle of
another thread executing bus.start()...if each component manages its own
state, that would minimize shutdown errors like closing DB connections
that were never opened.

Hmm...


Robert Brewer
System Architect
Amor Ministries
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Web Site Process Bus

2007-06-26 Thread Robert Brewer
Chris McDonough wrote:
> On Jun 26, 2007, at 2:39 PM, Robert Brewer wrote:
> 
> > Chris McDonough wrote:
> >> There are also non-webbish processes like postgres, mysql, 
> >> etc. that
> >> need to be treated as "part of the application".
> >>
> >> I handle this currently by running all of the processes 
> >> related to a
> >> specific project under a process controller (which happens to be
> >> implemented in Python, but that's besides the point, see http://
> >> www.plope.com/software/supervisor2/).  The process controller is
> >> responsible for execing the child processes upon its own startup.
> >> It is also responsible for restarting children if they die,
> >> capturing their output (if any), and allowing sufficiently
> >> privileged users to start and stop each one independently.
> >> The only promise a subprocess must make to be managed is that
> >> it must be possible to start the process "in the foreground"
> >> (not under its down daemon manager).
> >>
> >> If a "process bus" is implemented I suspect it should be 
> >> implemented at this kind of level.
> >
> > Ah, but there's the rub: we all have different ideas about how to
> > *implement* IPC and control.
> 
> I'm confused by this in your earlier message, describing example  
> scenarios:
> 
> """
> If I'm primarily a Zope user instead, I might start my website with
> zdaemon. This would work exactly like the above, but the Bus object
> would be instantiated and started by the zdaemon package. If I'm using
> Graham's new mod_wsgi with Apache, I might expect it to create and
> control the Bus.
> """
> 
> I think I'm mostly confused by the name "process bus" because it  
> seems like the primary use case for something like this is where all  
> of the applications share the same process space

I don't see why it should be limited by that. The primary use case is
anywhere site components and application components are interacting,
that could benefit from a shared understanding (and control) of the
state of the site. To me, that requires a common set of messages, but
the transport mechanism for those messages should be flexible so that
it's useful in both multithread and multiprocess architectures.

> ...and are all written in Python.  Am I right?

That's the initial target market, yes. But I think we can design the
messaging spec to be useful with non-Python application components.

> If so, maybe a different name is in order?   
> "Application Bus"?  Or even "WSGI Bus", if its presumed that all of  
> the applications will be WSGI applications?

Sure, "application bus" is fine, although it's just the other side of
the same coin: "applications" on one side, "site" on the other.

I wouldn't want this to be "WSGI Bus", simply because there's no benefit
for that relationship; the two specs should be useful independently of
each other. In particular, we should be able to design a site-messaging
bus which works for WSGI 1.0, 1.1, 2.0, and whatever might obsolete the
current WSGI in the future.

> I'm confused because zdaemon is a generic process controller, it  
> knows nothing in particular about the application running under it  
> except that it's a UNIX process. It could start postgres instead of  
> Zope if you configured it to.

Sorry, I was really thinking of zopectl when I wrote that. I'm not sure
how zdaemon itself would fit into this whole scheme--it's the process
which zdaemon invokes that should be directly involved. If that process
is a good provider of bus-aware services, you might not need zdaemon
anymore.

> If zdaemon creates a Bus object,
> nothing will be able to send messages to the bus except zdaemon  
> itself, and there can't be any useful listeners because it doesn't  
> share the same process space as its child.

If you use the example Bus implementation I posted, then yes. That's why
I'm pitching WSPBus as a spec, not an implementation. Multiprocess
controllers could implement the bus using any of various forms of IPC;
they just need to arrange for each application component to get a Bus
object that, behind the scenes, is specific to the chosen method of IPC.

So, yes, interprocess communication is more complicated than
intraprocess. But that's true whether you standardize on a bus spec or
not.


Robert Brewer
System Architect
Amor Ministries
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Web Site Process Bus

2007-06-26 Thread Robert Brewer
Chris McDonough wrote:
> There are also non-webbish processes like postgres, mysql, etc. that  
> need to be treated as "part of the application".
> 
> I handle this currently by running all of the processes related to a  
> specific project under a process controller (which happens to be  
> implemented in Python, but that's besides the point, see http:// 
> www.plope.com/software/supervisor2/).  The process controller is  
> responsible for execing the child processes upon its own startup.
> It is also responsible for restarting children if they die,
> capturing their output (if any), and allowing sufficiently
> privileged users to start and stop each one independently.
> The only promise a subprocess must make to be managed is that
> it must be possible to start the process "in the foreground"
> (not under its down daemon manager).
> 
> If a "process bus" is implemented I suspect it should be implemented  
> at this kind of level.

Ah, but there's the rub: we all have different ideas about how to
*implement* IPC and control. Which is why the WSPBus I outlined says
nothing about the mechanisms of message transport, RPC/IPC, or process
or thread boundaries. Instead, it defines the messages themselves, a set
of states for a given site, and a singleton message broker and state
machine. That's it.

If we can get that one piece into place, I think it can be a focal point
for interop between 1) application components, 2) HTTP servers, and 3)
"process controllers" (great term, I think I'll use it from now on). We
can achieve that without specifying how any process controller is
implemented, I think. It's difficult to discuss and reason about,
because CherryPy, Apache, Django, Zope, etcetera etcetera all provide
all three. A common bus should make it easier to decouple, say,
CherryPy's app+server from its process controller, allowing more people
to try out supervisor2 more easily.

> "Actions" could be registered for a specific subprocess types
> to send some input to a pipe file descriptor, send a signal to
> the process, etc.

Yes, and a supervisor2-wspb package could provide a Bus which does that,
then hand a reference to it to each child's components. The "reference"
in this case would be a child-side proxy object, which knows how to send
signals back to the parent process. But the children don't need to know
those transport details--all they have to do is call bus.subscribe,
bus.publish('log'), etc.

> It would also be possible to create some sort of dependency
> map between processes in a configuration, that relate the
> actions of one process to another (restart process A if
> process B is restarted, send a signal S to process C if
> signal T is sent to process D, etc).

Dependencies are a layer that can be built on top of the basic bus.
Since it's the process controller that's calling bus.start, stop and
restart, there's nothing about the WSPBus that stops supervisor2 from
handling dependency graphs on its own. If process A has to *know* that
process B has been restarted, that's a problem (which could be addressed
via custom bus channels), but if only the process controller has to
know, then there's no need to add that to the bus spec, IMO.


Robert Brewer
System Architect
Amor Ministries
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Web Site Process Bus

2007-06-26 Thread Robert Brewer
Graham, thanks for this and the previous post. I've had these issues in
mind while designing the current bus, but you raised them far more
eloquently than I could have.

Graham Dumpleton wrote:
> The only way we may know is to start stepping through specific uses
> cases one at a time, not even worrying about the mechanisms of how the
> bus may actually work, and discuss what would the intention of each
> be.
>
> > My particular use case for keeping SystemExit around is 
> > I have an app that allows the user to upload a new SSL
> > certificate.  Without a restart, or perhaps, given a WSPBus,
> > just a drop to IDLE state, the new SSL certificate would
> > not be applied to new incoming connections.  The thread that
> > puts the new SSL Certificate in place needs to be able to
> > tell the entire server to reload.
> 
> In Apache changing the certificates would need a complete restart of
> everything. Because the child processes aren't privileged they would
> not be able to trigger the main server to do so. This actually gets to
> one of my reservations about some of the stuff being discussed. That
> is, that the WSGI applications should even have any ability to control
> the underlying web server. In a shared web hosting environment using
> Apache, allowing such control is not practical as you don't want
> arbitrary user doing things to the server. If you are running Apache
> as a dedicated server for a single application that is a different
> matter however. Thus some aspects of what can be done by via the bus
> would  have to be controllable dependent on the environment in which
> one is running.

Right. My expectation was that Apache interfaces (like mod_wsgi and
modpython_gateway) would supply custom Bus objects which deny certain
behaviors (like calling bus state-transition methods from a WSGI
application). I think there's room for Apache site admins to choose
whether applications are allowed to do dangerous things, much like how
AllowOverride works for .htaccess.

> At least with Apache, even initiating this sort of stuff from inside
> of a WSGI application may not make a great deal of sense even then. It
> would be far easier and preferable in Apache to use a suexec CGI
> script to accept the upload of the SSL certificate and then trigger a
> restart of Apache.

That's not contrary to the bus concept. If there's a preferred way of
doing things, then a function can be written to do that, supplied by the
site interface, and be subscribed to the appropriate channel.

> So in the end the bus concept may be great for pure
> Python system, but not so sure about a complicated mixed code system
> like Apache, especially where there may be better ways of handling it
> through other features of Apache.

Cannot those "other features" be comoponentized? The only thing the Bus
tries to do is make a common interface for such behaviors--if Apache has
native methods to achieve the desired behavior, then great! Wrapping
them in bus listeners (and subscribing a safe set of them by default)
allows deployers who aren't familiar with Apache to get their site up
and running faster. However, mod_wsgi can still use Apache directives
for attaching/detaching the listeners if it likes, providing a more
Apache-like look-and-feel.


Robert Brewer
System Architect
Amor Ministries
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Web Site Process Bus

2007-06-25 Thread Robert Brewer
Phillip J. Eby wrote:
> At 02:41 PM 6/25/2007 -0700, Robert Brewer wrote:
> >Phillip J. Eby wrote:
> > > Meanwhile, if you get a start call, you must be starting, 
> right?  So
> > > why worry about the state?  It'd be simpler to just use
> > > "before/during/after" messages the way Twisted does.  Your "block"
> > > example could be replaced by waiting for the "after" 
> message of the
> > > desired state, for example.
> >
> >I just realized I haven't really explained what "start" and 
> "stop" mean.
> >I think you might expect it to mean "beginning of process" 
> and "exit the
> >process". But instead, I'm envisioning a FSM that has an "idle" state
> >in-between process init/exit and "server" start/stop, so 
> that, without
> >restarting the process, you can stop and restart (un/bind the socket,
> >etc) the server components. This should also facilitate a 
> daemon parent
> >process having a single site Bus and starting/stopping child 
> processes
> >that contain the WSGI app and server components, if that's 
> the way you
> >want to compose your site.
> 
> Now I'm really confused.  What is the idle state *for*?

One concrete use case could be a test suite that swaps out applications
between tests without exiting the entire process. But there are other
situations where it's useful. The startup script might need a chance to
run mandatory code in-between stop and exit, or a debugger might stop a
live server (stop accepting connections), run a debug session, fix the
problem, then start up again.

> This seems to imply that you need these states to exist for
> distinct components in a single process -- which would be a
> finer-grained sort of "bus" than has been discussed at this
> point, at least as far as I understand it.

That's not what I had in mind, but I wouldn't be opposed to it.


Robert Brewer
System Architect
Amor Ministries
[EMAIL PROTECTED]
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Web Site Process Bus (re-send)

2007-06-25 Thread Robert Brewer
Phillip J. Eby wrote:
> Sent: Monday, June 25, 2007 3:09 PM
> To: Robert Brewer; web-sig@python.org
> Subject: RE: [Web-SIG] Web Site Process Bus (re-send)
> 
> At 02:47 PM 6/25/2007 -0700, Robert Brewer wrote:
> >If I'm primarily a Pylons user, I'm used to starting my websites with
> >"paster serve". In this case, paste.script would create a 
> WSPBus object.
> >[It's up to the Paste developers whether to distribute their own
> >wspbus.py module, or to require one via setuptools.] When 
> Paste parses
> >the paste.deploy file and composes the WSGI stack, it somehow hands a
> >Bus reference to Pylons. Exactly how is up for debate; Pylons might
> >provide a hook for it (say, in app_globals.py), or Pylons components
> >might take the bus as a constructor arg.
> >
> >Paste's ServeCommand object would load the WSGI components and server
> >(as it does now), but then, instead of calling server(app), it would
> >call bus.subscribe('start', server, app). Pylons and Zope would also
> >have the opportunity to subscribe listeners to the 'start' 
> channel (and
> >other channels), such as database connections, access log files, etc.
> >Then, instead of calling server(app) directly, Paste would call
> >bus.start(). It could then call 
> bus.block(states=states.STOPPED) instead
> >of trapping KeyboardInterrupt itself. If block() traps a KBInt or
> >SyExit, it sends the 'stop' message to all subscribers, regardless of
> >whether they were created by Paste, Pylons, or Zope. They 
> all shut down
> >synchronously, and then Paste could feel much safer about calling
> >bus.exit(). If KBInt or SysExit are trapped by a Pylons or Zope
> >component, they would be expected to call bus.stop or bus.exit (which
> >would unblock Paste).
> >
> >If I'm primarily a Zope user instead, I might start my website with
> >zdaemon. This would work exactly like the above, but the Bus object
> >would be instantiated and started by the zdaemon package. If 
> I'm using
> >Graham's new mod_wsgi with Apache, I might expect it to create and
> >control the Bus.
> 
> I sort of understand the above scenarios -- except the part where the 
> bus is actually doing anything useful.  What is it that you get that 
> you can't do almost as easily some other way?
> 
> (The other piece that throws me is the idea of using block to *run* 
> the main process.  Huh?  Where's the event loop then?)

It differs by framework. CP runs the socket-listening event loop in a
separate thread and blocks the main thread. If you were running the HTTP
server in the main thread, you'd have to make it the last subscriber to
'start' and let that block for you (that design isn't in the example
code on my blog, but it could be done fairly easily).

> >If I'm using paste.script but I want autoreload to use execv 
> instead of
> >the default child-process style of autoreload, this is much 
> easier with
> >the WSPBus. Instead of hacking paste.script, I unsubscribe Paste's
> >default autoreloader from the 'restart' channel and subscribe my own.
> 
> This sounds like something useful, or it would if I had any use for 
> code reloaders, which IMO are more appropriate for a development 
> environment than a production one.  Nonetheless, I'm not sure it 
> deserves an entire event bus specification just to be able to have 
> pluggable autoreload.  :)

The point is that *any* site-wide behavior would be pluggable, of
course, from the site log to platform-specific needs (daemonization
etc.), to "what does SIGUSR1 do?" These all seem to me like they'd
benefit from decoupling in order to foster a development market for
them. The CherryPy trunk even swaps out the entire Bus in order to use
win32events on that platform [1].


Robert Brewer
System Architect
Amor Ministries
[EMAIL PROTECTED]

[1] http://www.cherrypy.org/browser/trunk/cherrypy/restsrv/win32.py
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


  1   2   >