Re: [Web-SIG] Server-side async API implementation sketches

2011-01-10 Thread chris . dent

On Sun, 9 Jan 2011, Alice Bevan–McGregor wrote:


On 2011-01-09 09:03:38 -0800, P.J. Eby said:
Hm.  I'm not sure if I like that.  The typical app developer really 
shouldn't be yielding multiple body strings in the first place.


Wait; what?  So you want the app developer to load a 40MB talkcast MP3 into 
memory before sending it?


My reaction too. I've read this elsewhere on this list too, in other
topics. A general statement that the correct way to make an
efficient WSGI (1) app is to return just one body string.

This runs contrary to everything I've ever understood about making
web apps that appear performant to the user: get the first byte out to
the browser as soon as possible.

This came up in discussions of wanting to have a cascading series of
generators (to save memory and improve responsiveness): store
generates data, serializers generates strings, handler generates
(sends out in chunks) the web page from those strings.

So, this is me saying: I'm in favor of a post-wsgi1 world where apps
are encouraged to be generators. To me they are just as useful in
sync and async contexts.

--
Chris Dent   http://burningchrome.com/
[...]___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] Generator-Based Applications: Marrow HTTPd Example

2011-01-10 Thread Alice Bevan–McGregor

Howdy!

Here's a rewritten (and incomplete, but GET and HEAD requests work 
fine) marrow.server.http branch [1] that illustrates a simple 
application [2] and protocol implementation [3].  Most notably, examine 
the 'resume' method [4].


The 'basic' example yields a future instance and uses the data as the 
response body.


Note that this particular rewrite is not complete, nor has it been 
profiled and optimized; initial benchmarks (using the 'benchmark' 
example) show a reduction of ~600 RSecs from the 'draft' branch, which 
is substantial, but hasn't been traced to a particular segment of code 
or design decision yet.


The server is now -extremely- easy to read and follow, with all code 
acting in a linear way.  (Application worker threading has been removed 
from this branch as well; the server is once again purely async.)


- Alice.

[1] https://github.com/pulp/marrow.server.http/tree/generator

[2] https://github.com/pulp/marrow.server.http/blob/generator/examples/basic.py

[3] 
https://github.com/pulp/marrow.server.http/blob/generator/marrow/server/http/protocol.py


[4] 

https://github.com/pulp/marrow.server.http/blob/generator/marrow/server/http/protocol.py#L177-226 




___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Generator-Based Applications: Marrow HTTPd Example

2011-01-10 Thread Massimo Di Pierro
I like this a lot!

On Jan 10, 2011, at 6:25 AM, Alice Bevan–McGregor wrote:

 Howdy!
 
 Here's a rewritten (and incomplete, but GET and HEAD requests work fine) 
 marrow.server.http branch [1] that illustrates a simple application [2] and 
 protocol implementation [3].  Most notably, examine the 'resume' method [4].
 
 The 'basic' example yields a future instance and uses the data as the 
 response body.
 
 Note that this particular rewrite is not complete, nor has it been profiled 
 and optimized; initial benchmarks (using the 'benchmark' example) show a 
 reduction of ~600 RSecs from the 'draft' branch, which is substantial, but 
 hasn't been traced to a particular segment of code or design decision yet.
 
 The server is now -extremely- easy to read and follow, with all code acting 
 in a linear way.  (Application worker threading has been removed from this 
 branch as well; the server is once again purely async.)
 
   - Alice.
 
 [1] https://github.com/pulp/marrow.server.http/tree/generator
 
 [2] 
 https://github.com/pulp/marrow.server.http/blob/generator/examples/basic.py
 
 [3] 
 https://github.com/pulp/marrow.server.http/blob/generator/marrow/server/http/protocol.py
 
 [4] 
 https://github.com/pulp/marrow.server.http/blob/generator/marrow/server/http/protocol.py#L177-226
  
 
 
 ___
 Web-SIG mailing list
 Web-SIG@python.org
 Web SIG: http://www.python.org/sigs/web-sig
 Unsubscribe: 
 http://mail.python.org/mailman/options/web-sig/mdipierro%40cti.depaul.edu

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-10 Thread James Y Knight

On Jan 10, 2011, at 4:48 AM, chris.d...@gmail.com wrote:

 My reaction too. I've read this elsewhere on this list too, in other
 topics. A general statement that the correct way to make an
 efficient WSGI (1) app is to return just one body string.
 
 This runs contrary to everything I've ever understood about making
 web apps that appear performant to the user: get the first byte out to
 the browser as soon as possible.

Wee. You want to get the earliest byte *which is required to display the 
page* out as soon as possible. The browser usually has to parse a whole lot of 
the response before it starts displaying anything useful.

And in order to do that, you really want to minimize the number of 
round-trip-times, which is heavily dependent upon the number of packets sent 
(not the amount of data!), when the data is small. Using a generator in WSGI 
forces the server to push out partial data as soon as possible, so it could end 
up using many more packets than if you buffered everything and sent it at once, 
and thus, will be slower.

As the buffering and streaming section of WSGI1 already says...:
 Generally speaking, applications will achieve the best throughput by 
 buffering their (modestly-sized) output and sending it all at once. This is a 
 common approach in existing frameworks such as Zope: the output is buffered 
 in a StringIO or similar object, then transmitted all at once, along with the 
 response headers.
 
 [...]
 
 For large files, however, or for specialized uses of HTTP streaming (such as 
 multipart server push), an application may need to provide output in 
 smaller blocks (e.g. to avoid loading a large file into memory). It's also 
 sometimes the case that part of a response may be time-consuming to produce, 
 but it would be useful to send ahead the portion of the response that 
 precedes it.

James
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-10 Thread P.J. Eby

At 04:39 PM 1/9/2011 -0800, Alice Bevan­McGregor wrote:

On 2011-01-09 09:26:19 -0800, P.J. Eby said:

If wsgi.input offers any synchronous methods...


Regardless of whether or not wsgi.input is implemented in an async 
way, wrap it in a future and eventually get around to yielding 
it.  Problem /solved/.


Not the API problem.  If I'm accustomed to writing synchronous code, 
the async version looks ridiculous.  Also, an existing WSGI web 
framework isn't going to be able to be ported to this API without 
putting it in a future.


My hope was for an API that would be a simple enough translation that 
*everybody* could be persuaded to use it, but having to use futures 
just to write a normal application simply isn't going to work for 
the core WSGI API.  As a separate WSGI-A profile, sure, it works fine.



If it offers only asynchronous methods, OTOH, then you can't pass 
wsgi.input to any existing libraries (e.g. the cgi module).


Describe to me how a function can be suspended (other than magical 
greenthreads) if it does not yield; if I knew this, maybe I wouldn't 
be so confused.


I'm not sure what you're confused about.  I'm the one who forgot you 
have to read from wsgi.input in a blocking way to write a normal app.  ;-)


(Mainly, because I was so excited about the potential in your 
sketched API, and I got sucked into the process of implementing/improving it.)



I've deviated from your sketch, obviously, and any semblance of 
yielding a 3-tuple.  Stop thinking of my example code as conforming 
to your ideas; it's a new idea, or, worst case, a narrowing of an 
idea into its simplest form.


What I'm trying to point out is that you've missed two important API 
enhancements in my sketch, that make it so that app and middleware 
authors don't have to explicitly manage any generator methods or even 
future methods.



 The mechanics of yielding futures instances allows you to (in your 
server) implement the necessary async code however you wish while 
providing a uniform interface to both sync and async applications 
running on sync and async servers.  In fact, you would be able to 
safely run a sync application on an async server and 
vice-versa.  You can, on an async server:


:: Add a callback to the yielded future to re-schedule the 
application generator.


:: If using greenthreads, just block on future.result() then 
immediately wake up the application generator.


:: Do other things I can't think of because I'm still waking up.


I am not sure why you're reiterating these things.  The sample code I 
posted shows precisely where you'd *do* them in a sync or async 
server.  That's not where the problem lies.



That is not optimum, because now you have an optional API that 
applications who want to be compatible will need to detect and choose between.


It wasn't supposed to be optional, but it's beside the point since 
the presence of a blocking API means the application can block.


The issue might be addressable by having an environment key like, 
'wsgi.canblock' (indicating whether the application is already in a 
separate thread/process), and a piece of middleware that simply 
spawns its child app to a future if wsgi.canblock isn't set.  Then 
people who write blocking applications could use the decorator.




Mostly, though, it seems to me that the need to be able to write 
blocking code does away with most of the benefit of trying to have 
a single API in the first place.


You have artificially created this need, ignoring the semantics of 
using the server-specific executor to detect async-capable requests 
and the yield mechanics I suggested; which happens to be a single, 
coherent API across sync and async servers and applications.


I haven't ignored them.  I'm simply representing the POV of existing 
WSGI apps and frameworks, which currently block, and are unlikely to 
be rewritten so as not to block.  I thought, briefly, that it was 
possible to make an API with a low-enough conceptual overhead to 
allow that porting to occur, and let my enthusiasm carry me away.


I was wrong, though: even the extremely minimalist version isn't 
going to be usable for ported code, which relegates the async version 
to a niche role.


I would note, though, that this is *still* better than my previous 
position, which was that there was no point making an async API *at all*.  ;-)


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-10 Thread P.J. Eby

At 05:06 PM 1/9/2011 -0800, Alice Bevan­McGregor wrote:

On 2011-01-09 09:03:38 -0800, P.J. Eby said:
Hm.  I'm not sure if I like that.  The typical app developer really 
shouldn't be yielding multiple body strings in the first place.


Wait; what?  So you want the app developer to load a 40MB talkcast 
MP3 into memory before sending it?


Statistically speaking, the typical app is producing a web page, 
made of HTML and severely limited in size by the short attention span 
of the human user reading it.  ;-)


Obviously, the spec should allow and support streaming.


  You want to completely eliminate the ability to stream an HTML 
page to the client in chunks (e.g. head block, headers + search 
box, search results, advertisements, footer -- the exact thing 
Google does with every search result)?  That sounds like 
artificially restricting application developers, to me.


First, I don't want to eliminate it.   Second, Google is hardly the 
typical app developer.  If you need the capability, it'll still be there.





In your approach, the above samples have to be rewritten as:
 return app(environ)
[snip]


My code does not use return.  At all.  Only yield.


If you return the calling of a generator, then you pass the original 
generator through to the caller, and it is the equivalent of writing 
a loop in place that iterates over the subgenerator, only without the 
additional complexity of needing to send/throw.



The above middleware pattern works with the sketches I gaveon the 
PEAK wiki, and I've now updated the wiki to include an exampleapp 
and middleware for clarity.


I'll need to re-read the code on your wiki; I find it incredibly 
difficult to grok, however, you can help me out a bit by answering a 
few questions about it: How does middleware trap exceptions raised 
by the application.


With try/except around the yield app(environ) call (main app run), 
or with try/except around the yield body_iter call (body iterator run).



 (Specifically how does the server pass the buck with 
exceptions?  And how does the exception get to the application to 
bubble out towards the server, through middleware, as it does now?)


All that is in the Coroutine class, which is a generator-based green 
thread implementation.


Remember how you were saying that your sketch would benefit from PEP 380?

The Coroutine class is a pure-Python implementation of PEP 380, minus 
the syntactic sugar.  It turns yield into yield from whenever the 
value you yield is itself a geniter.


So, if you pretend that yield app(environ) and yield body_iter 
are actually yield froms instead, then the mechanics should become clearer.


Coroutine runs a generator by sending or throwing into it.  It then 
takes the result (either a value or an exception) and decides where 
to send that.  If it's an object with send/throw methods, it pushes 
it on the stack, and passes None into it to start it running, thereby 
calling the subgenerator.  If it's an exception or a return value 
(e.g. StopIteration(value=None)), it pops the stack and propagates 
the exception or return value to calling generator.


If it's a future or some other object the server cares about, then 
the server can pause the coroutine (by returning 'routine.PAUSE' when 
the coroutine asks it what to do).


Coroutine accepts a trampoline function and a completion callback as 
parameters: the trampoline function inspects a value yielded by a 
generator and then tells the coroutine whether it should PAUSE, CALL, 
RETURN, RESUME, or RAISE in response to that particular 
yield.  RESUME is used for synchronous replies, where the yield 
returns immediately.  RETURN means pop the current generator off the 
stack and return a value to the calling generator.  RAISE raises an 
error immediately in the top-of-stack generator.  CALL pushes a 
geniter on the stack.


IOW, the Coroutine class lets you write servers with just a little 
glue code to tell it how you want the control to flow.  It's actually 
entirely independent of WSGI or any particular WSGI protocol...  I'm 
thinking that I should probably wrap it up into a PyPI package with 
some docs and tests, though I'm not sure when I'd get around to it.


(Heck, it's the sort of thing that probably ought to be in the stdlib 
-- certainly PEP 380 can be implemented in terms of it.)


Anyway, both the sync and async server examples have trampolines that 
detect futures and process them accordingly.  If you yield to a 
future, you get back its result -- either a value or an exception at 
the point where you yielded it.  You don't have to explicitly call 
.result() (in fact, you *can't*); it's already been called before 
control gets back to the place that yielded it.


IOW, in my sketch, yielding to a future looks like this:

data = yield submit(wsgi_input.read, 4096)

without the '.result()' on the end.

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: 

Re: [Web-SIG] Declaring PEP 3333 accepted (was: PEP 444 != WSGI 2.0)

2011-01-10 Thread Guido van Rossum
Ok, now that we've had a week of back and forth about this, let me
repeat my threat. Unless more concerns are brought up in the next 24
hours, can PEP  be accepted? It seems a lot of people are waiting
for a decision that enables implementers to go ahead and claim PEP
333[3] compatibility. PEP 444 can take longer.

--Guido

On Fri, Jan 7, 2011 at 4:56 PM, Graham Dumpleton
graham.dumple...@gmail.com wrote:
 On 8 January 2011 02:55, P.J. Eby p...@telecommunity.com wrote:
 At 05:27 PM 1/7/2011 +1100, Graham Dumpleton wrote:

 Another thing though. For output changed to sys.stdout.buffer. For
 input should we be using sys.stdin.buffer as well if want bytes?

 %$*()%!!!  Sorry, still getting used to this whole Python 3 thing.
  (Honestly, I don't even use Python 2.6 for anything real yet.)


 Good thing I tried running this. Did we all assume that someone else
 was actually running it to check it? :-)

 Well, I only recently started changing the examples to actual Python 3, vs
 being the old Python 2 examples.  Though, I'm not sure anybody ever ran the
 Python 2 ones.  ;-)

 Latest CGI/WSGI bridge example extract from PEP  seems to work
 okay for my simple test.

 So, if no more technical problems (vs cosmetic) that anyone else sees,
 that is probably it and and we can toss this baby out the door.

 Graham




-- 
--Guido van Rossum (python.org/~guido)
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Declaring PEP 3333 accepted (was: PEP 444 != WSGI 2.0)

2011-01-10 Thread Alice Bevan–McGregor

On 2011-01-10 13:12:57 -0800, Guido van Rossum said:

Ok, now that we've had a week of back and forth about this, let me 
repeat my threat. Unless more concerns are brought up in the next 24 
hours, can PEP  be accepted?


+9001 ( 9000)

It seems a lot of people are waiting for a decision that enables 
implementers to go ahead and claim PEP

333[3] compatibility.


Django, mod_wsgi, CherryPy, etc. all have solutions that would need 
AFIK minor tweaking before going live, which would make adoption of 
PEP  the fastest of any PEP I've ever seen. ;)



PEP 444 can take longer.


Indeed it will!  :D

I have the conversion from Textile to ReST about half completed; I'll 
continue to poke it now that mailing list traffic seems to have died 
down and won't be consuming the majority of my Copious Spare Time™.  
ReST just doesn't jive with my neural net.  :/


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Generator-Based Applications: Marrow HTTPd Example

2011-01-10 Thread Alice Bevan–McGregor

On 2011-01-10 04:25:40 -0800, Alice Bevan–McGregor said:

Note that this particular rewrite is not complete, nor has it been 
profiled and optimized; initial benchmarks (using the 'benchmark' 
example) show a reduction of ~600 RSecs from the 'draft' branch, which 
is substantial, but hasn't been traced to a particular segment of code 
or design decision yet.


Ignore that number; I had some runaway processes eating up my CPU.  
That's what I get for going weeks or months between reboots.  ;)


The drop (benchmarking current 'draft' branch and 'generator' branch) 
is now ~200 RSecs (down from ~3.2 KRsecs).  Much more reasonable, and 
subject to enough stddev across runs to make the difference negligible 
at best.


*phew*

- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 feature request - Futures executor

2011-01-10 Thread Timothy Farrell
- Original Message -
From: P.J. Eby p...@telecommunity.com
To: Timothy Farrell tfarr...@owassobible.org, web-sig@python.org
Sent: Friday, January 7, 2011 2:14:20 PM
Subject: Re: [Web-SIG] PEP 444 feature request - Futures executor

 There are some other issues that might need to be addressed, like 
 maybe adding an attribute or two for the level of reliability 
 guaranteed by the executor, or allowing the app to request a given 
 reliability level.  Specifically, it might be important to distinguish 
 between:

 * this will be run exactly once as long as the server doesn't crash
 * this will eventually be run once, even if the server suffers a 
 fatal error between now and then

 IOW, to indicate whether the thing being done is transactional, so to speak.

I understand why this would be good (credit card transactions particularly), 
but how would this play our in the real world?  All servers will do their best 
to run the jobs given them.  

Are you suggesting that there would be a property of the executor that would 
change based on the load of the server or some other metric?  Say the server 
has 100 queued jobs and only 2 worker threads, would it then have a way of 
saying, I'll get to this eventually, but I'm pretty swamped.?

Is that what you're getting at or something more like database 
transactions...I guarantee that I won't stop halfway through this process.

Thanks,
-t
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com