Re: [Web-SIG] Server-side async API implementation sketches
At 05:06 PM 1/9/2011 -0800, Alice BevanMcGregor wrote: On 2011-01-09 09:03:38 -0800, P.J. Eby said: Hm. I'm not sure if I like that. The typical app developer really shouldn't be yielding multiple body strings in the first place. Wait; what? So you want the app developer to load a 40MB talkcast MP3 into memory before sending it? Statistically speaking, the "typical app" is producing a web page, made of HTML and severely limited in size by the short attention span of the human user reading it. ;-) Obviously, the spec should allow and support streaming. You want to completely eliminate the ability to stream an HTML page to the client in chunks (e.g. block, headers + search box, search results, advertisements, footer -- the exact thing Google does with every search result)? That sounds like artificially restricting application developers, to me. First, I don't want to eliminate it. Second, Google is hardly the "typical app developer". If you need the capability, it'll still be there. In your approach, the above samples have to be rewritten as: return app(environ) [snip] My code does not use return. At all. Only yield. If you return the calling of a generator, then you pass the original generator through to the caller, and it is the equivalent of writing a loop in place that iterates over the subgenerator, only without the additional complexity of needing to send/throw. The above middleware pattern works with the sketches I gaveon the PEAK wiki, and I've now updated the wiki to include an exampleapp and middleware for clarity. I'll need to re-read the code on your wiki; I find it incredibly difficult to grok, however, you can help me out a bit by answering a few questions about it: How does middleware trap exceptions raised by the application. With try/except around the "yield app(environ)" call (main app run), or with try/except around the "yield body_iter" call (body iterator run). (Specifically how does the server pass the buck with exceptions? And how does the exception get to the application to bubble out towards the server, through middleware, as it does now?) All that is in the Coroutine class, which is a generator-based "green thread" implementation. Remember how you were saying that your sketch would benefit from PEP 380? The Coroutine class is a pure-Python implementation of PEP 380, minus the syntactic sugar. It turns "yield" into "yield from" whenever the value you yield is itself a geniter. So, if you pretend that "yield app(environ)" and "yield body_iter" are actually "yield from"s instead, then the mechanics should become clearer. Coroutine runs a generator by sending or throwing into it. It then takes the result (either a value or an exception) and decides where to send that. If it's an object with send/throw methods, it pushes it on the stack, and passes None into it to start it running, thereby "calling" the subgenerator. If it's an exception or a return value (e.g. StopIteration(value=None)), it pops the stack and propagates the exception or return value to calling generator. If it's a future or some other object the server cares about, then the server can pause the coroutine (by returning 'routine.PAUSE' when the coroutine asks it what to do). Coroutine accepts a trampoline function and a completion callback as parameters: the trampoline function inspects a value yielded by a generator and then tells the coroutine whether it should PAUSE, CALL, RETURN, RESUME, or RAISE in response to that particular yield. RESUME is used for synchronous replies, where the yield returns immediately. RETURN means pop the current generator off the stack and return a value to the calling generator. RAISE raises an error immediately in the top-of-stack generator. CALL pushes a geniter on the stack. IOW, the Coroutine class lets you write servers with just a little glue code to tell it how you want the control to flow. It's actually entirely independent of WSGI or any particular WSGI protocol... I'm thinking that I should probably wrap it up into a PyPI package with some docs and tests, though I'm not sure when I'd get around to it. (Heck, it's the sort of thing that probably ought to be in the stdlib -- certainly PEP 380 can be implemented in terms of it.) Anyway, both the sync and async server examples have trampolines that detect futures and process them accordingly. If you yield to a future, you get back its result -- either a value or an exception at the point where you yielded it. You don't have to explicitly call .result() (in fact, you *can't*); it's already been called before control gets back to the place that yielded it. IOW, in my sketch, yielding to a future looks like this: data = yield submit(wsgi_input.read, 4096) without the '.result()' on the end. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.py
Re: [Web-SIG] Server-side async API implementation sketches
At 04:39 PM 1/9/2011 -0800, Alice BevanMcGregor wrote: On 2011-01-09 09:26:19 -0800, P.J. Eby said: If wsgi.input offers any synchronous methods... Regardless of whether or not wsgi.input is implemented in an async way, wrap it in a future and eventually get around to yielding it. Problem /solved/. Not the API problem. If I'm accustomed to writing synchronous code, the async version looks ridiculous. Also, an existing WSGI web framework isn't going to be able to be ported to this API without putting it in a future. My hope was for an API that would be a simple enough translation that *everybody* could be persuaded to use it, but having to use futures just to write a "normal" application simply isn't going to work for the core WSGI API. As a separate "WSGI-A" profile, sure, it works fine. If it offers only asynchronous methods, OTOH, then you can't pass wsgi.input to any existing libraries (e.g. the cgi module). Describe to me how a function can be suspended (other than magical greenthreads) if it does not yield; if I knew this, maybe I wouldn't be so confused. I'm not sure what you're confused about. I'm the one who forgot you have to read from wsgi.input in a blocking way to write a normal app. ;-) (Mainly, because I was so excited about the potential in your sketched API, and I got sucked into the process of implementing/improving it.) I've deviated from your sketch, obviously, and any semblance of yielding a 3-tuple. Stop thinking of my example code as conforming to your ideas; it's a new idea, or, worst case, a narrowing of an idea into its simplest form. What I'm trying to point out is that you've missed two important API enhancements in my sketch, that make it so that app and middleware authors don't have to explicitly manage any generator methods or even future methods. The mechanics of yielding futures instances allows you to (in your server) implement the necessary async code however you wish while providing a uniform interface to both sync and async applications running on sync and async servers. In fact, you would be able to safely run a sync application on an async server and vice-versa. You can, on an async server: :: Add a callback to the yielded future to re-schedule the application generator. :: If using greenthreads, just block on future.result() then immediately wake up the application generator. :: Do other things I can't think of because I'm still waking up. I am not sure why you're reiterating these things. The sample code I posted shows precisely where you'd *do* them in a sync or async server. That's not where the problem lies. That is not optimum, because now you have an optional API that applications who want to be compatible will need to detect and choose between. It wasn't supposed to be optional, but it's beside the point since the presence of a blocking API means the application can block. The issue might be addressable by having an environment key like, 'wsgi.canblock' (indicating whether the application is already in a separate thread/process), and a piece of middleware that simply spawns its child app to a future if wsgi.canblock isn't set. Then people who write blocking applications could use the decorator. Mostly, though, it seems to me that the need to be able to write blocking code does away with most of the benefit of trying to have a single API in the first place. You have artificially created this need, ignoring the semantics of using the server-specific executor to detect async-capable requests and the yield mechanics I suggested; which happens to be a single, coherent API across sync and async servers and applications. I haven't ignored them. I'm simply representing the POV of existing WSGI apps and frameworks, which currently block, and are unlikely to be rewritten so as not to block. I thought, briefly, that it was possible to make an API with a low-enough conceptual overhead to allow that porting to occur, and let my enthusiasm carry me away. I was wrong, though: even the extremely minimalist version isn't going to be usable for ported code, which relegates the async version to a niche role. I would note, though, that this is *still* better than my previous position, which was that there was no point making an async API *at all*. ;-) ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On Jan 10, 2011, at 4:48 AM, chris.d...@gmail.com wrote: > My reaction too. I've read this elsewhere on this list too, in other > topics. A general statement that the correct way to make an > efficient WSGI (1) app is to return just one body string. > > This runs contrary to everything I've ever understood about making > web apps that appear performant to the user: get the first byte out to > the browser as soon as possible. Wee. You want to get the earliest byte *which is required to display the page* out as soon as possible. The browser usually has to parse a whole lot of the response before it starts displaying anything useful. And in order to do that, you really want to minimize the number of round-trip-times, which is heavily dependent upon the number of packets sent (not the amount of data!), when the data is small. Using a generator in WSGI forces the server to push out partial data as soon as possible, so it could end up using many more packets than if you buffered everything and sent it at once, and thus, will be slower. As the buffering and streaming section of WSGI1 already says...: > Generally speaking, applications will achieve the best throughput by > buffering their (modestly-sized) output and sending it all at once. This is a > common approach in existing frameworks such as Zope: the output is buffered > in a StringIO or similar object, then transmitted all at once, along with the > response headers. > > [...] > > For large files, however, or for specialized uses of HTTP streaming (such as > multipart "server push"), an application may need to provide output in > smaller blocks (e.g. to avoid loading a large file into memory). It's also > sometimes the case that part of a response may be time-consuming to produce, > but it would be useful to send ahead the portion of the response that > precedes it. James ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On Sun, 9 Jan 2011, Alice Bevan–McGregor wrote: On 2011-01-09 09:03:38 -0800, P.J. Eby said: Hm. I'm not sure if I like that. The typical app developer really shouldn't be yielding multiple body strings in the first place. Wait; what? So you want the app developer to load a 40MB talkcast MP3 into memory before sending it? My reaction too. I've read this elsewhere on this list too, in other topics. A general statement that the correct way to make an efficient WSGI (1) app is to return just one body string. This runs contrary to everything I've ever understood about making web apps that appear performant to the user: get the first byte out to the browser as soon as possible. This came up in discussions of wanting to have a cascading series of generators (to save memory and improve responsiveness): store generates data, serializers generates strings, handler generates (sends out in chunks) the web page from those strings. So, this is me saying: I'm in favor of a post-wsgi1 world where apps are encouraged to be generators. To me they are just as useful in sync and async contexts. -- Chris Dent http://burningchrome.com/ [...]___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-09 17:06:28 -0800, Alice Bevan-McGregor said: On 2011-01-09 09:03:38 -0800, P.J. Eby said: The elephant in the room here is that while it's easy towrite these example applications so they don't block, in practicepeople read files and do database queries and what not in their requests, and those APIs are generally synchronous. So, unless they somehow fold their entire application into a future, it doesn't work. Actually, that's how multithreading support in marrow.server[.http] was implemented. Overhead? 40-60 RSecs. Clarification here, that's less than 2% of total RSecs. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-09 09:03:38 -0800, P.J. Eby said: Hm. I'm not sure if I like that. The typical app developer really shouldn't be yielding multiple body strings in the first place. Wait; what? So you want the app developer to load a 40MB talkcast MP3 into memory before sending it? You want to completely eliminate the ability to stream an HTML page to the client in chunks (e.g. block, headers + search box, search results, advertisements, footer -- the exact thing Google does with every search result)? That sounds like artificially restricting application developers, to me. I much prefer that the canonical example of a WSGI app just return a list with a single bytestring... Why is it wrapped in a list, then? IOW, I want it to look like the normal way to do thing is to just return the whole request at once, and use the additional difficulty of creating a second iterator to discourage people writing iterated bodies when they should just write everything to a BytesIO and be done with it. It sounds to me like your "should" doesn't cover an extremely large range of common use cases. In your approach, the above samples have to be rewritten as: return app(environ) [snip] My code does not use return. At all. Only yield. Try actually making some code that runs on this protocol and yields to futures during the body iteration. Sure. I'll also implement my actual proposal of not having a separate body iterable. The above middleware pattern works with the sketches I gaveon the PEAK wiki, and I've now updated the wiki to include an exampleapp and middleware for clarity. I'll need to re-read the code on your wiki; I find it incredibly difficult to grok, however, you can help me out a bit by answering a few questions about it: How does middleware trap exceptions raised by the application. (Specifically how does the server pass the buck with exceptions? And how does the exception get to the application to bubble out towards the server, through middleware, as it does now?) Really, the only hole in this approach is dealing with applications that block. That's what the executor in the environ is for. If you have image scaling or something else that will block you submit it. All networking calls? You submit them. The elephant in the room here is that while it's easy towrite these example applications so they don't block, in practicepeople read files and do database queries and what not in their requests, and those APIs are generally synchronous. So, unless they somehow fold their entire application into a future, it doesn't work. Actually, that's how multithreading support in marrow.server[.http] was implemented. Overhead? 40-60 RSecs. The option is provided for those who can do nothing about their application blocking, while still maintaining the internally async nature of the server. That you could never *call* the .read() method outside of a future,or else you would block the server, thereby obliterating the point ofhaving the async API in the first place. See above re: your confusion over the calling semantics of wsgi.input in regards to my (and Alex's) proposal. Specifically: data = (yield submit(wsgi_input.read, 4096)).result() This would work on sync and async servers, and with sync and async applications, with no difference in the code. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-09 09:26:19 -0800, P.J. Eby said: By the way, I don't really see the point of the new sketches you're doing... I'm sorry. ...as they aren't nearly as general as the one I've already done, but still have the same fundamental limitation: wsgi.input. You missed the point entirely, then. If wsgi.input offers any synchronous methods... Regardless of whether or not wsgi.input is implemented in an async way, wrap it in a future and eventually get around to yielding it. Problem /solved/. Identical APIs for both sync and async, and if you have an async server but haven't gotten around to implementing your own executor yet, wrapping the blocking read call in a future also solves the problem (albeit not in the most efficient way). I.e. wrap every call to a wsgi.input method by passing it to wsgi.submit. ...then they must be used from a future and must some how raise an error when called from within the application -- otherwise it would block, nullifying the point ofhaving a generator-based API. See above. No extra errors, nothing really that insane. If it offers only asynchronous methods, OTOH, then you can't pass wsgi.input to any existing libraries (e.g. the cgi module). Describe to me how a function can be suspended (other than magical greenthreads) if it does not yield; if I knew this, maybe I wouldn't be so confused. The latter problem is the worse one, because it means that the translation of an app between my original WSGI2 API and the current sketch is no longer just "replace 'return' with 'yield'". I've deviated from your sketch, obviously, and any semblance of yielding a 3-tuple. Stop thinking of my example code as conforming to your ideas; it's a new idea, or, worst case, a narrowing of an idea into its simplest form. The only way this would work is if WSGI applications are still allowed to be written in a blocking style. Greenlet-based frameworks would have no problem with this, of course, but servers like Twisted would still have to run WSGI apps in a worker thread pool, just because they *might* block. Then that is not acceptable and "would not work". The mechanics of yielding futures instances allows you to (in your server) implement the necessary async code however you wish while providing a uniform interface to both sync and async applications running on sync and async servers. In fact, you would be able to safely run a sync application on an async server and vice-versa. You can, on an async server: :: Add a callback to the yielded future to re-schedule the application generator. :: If using greenthreads, just block on future.result() then immediately wake up the application generator. :: Do other things I can't think of because I'm still waking up. The first solution is how Marrow HTTPd would operate. If we're okay with this as a limitation, then adding _async method variants that return futures might work, and we can proceed from there. That is not optimum, because now you have an optional API that applications who want to be compatible will need to detect and choose between. Mostly, though, it seems to me that the need to be able to write blocking code does away with most of the benefit of trying to have a single API in the first place. You have artificially created this need, ignoring the semantics of using the server-specific executor to detect async-capable requests and the yield mechanics I suggested; which happens to be a single, coherent API across sync and async servers and applications. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
09.01.2011 22:56, P.J. Eby kirjoitti: At 08:09 PM 1/9/2011 +0200, Alex Grönholm wrote: Asynchronous applications may not be ready to send the status line as the first thing coming out of the generator. So? In the sketches that are the subject of this thread, it doesn't have to be the first thing. If the application yields a future first, it will be paused... and so will the middleware. When this line is executed in the middleware: status, headers, body = yield app(environ) ...the middleware is paused until the application actually yields its response tuple. Specifically, this yield causes the app iterator to be pushed on the Coroutine object's .stack attribute, then iterated. If the application yields a future, the server suspends the whole thing until it gets called back, at which point it .send()s the result back into the app iterator. The app iterator then yields its response, which is tagged as a return value, so the app is popped off the .stack, and the response is sent via .send() into the middleware, which then proceeds as if nothing happened in the meantime. It then yields *its* response, and whatever body iterator is given gets put into a second coroutine that proceeds similarly. When the process_response() part of the middleware does a "yield body_iter", the body iterator is pushed, and the middleware is paused until the body iterator yields a chunk. If the body yields a future, the whole process is suspended and resumed. The middleware won't be resumed until the body yields another chunk, at which point it is resumed. If it yields a chunk of its own, then that's passed up to any response-processing middleware further up the stack. In contrast, middleware based on the 2+body protocol cannot process a body without embedding coroutine management into the middleware itself. For example, you can't write a standalone body processor function, and reuse it inside of two pieces of middleware, without doing a bunch of send()/throw() logic to make it work. Some boilerplate code was necessary in WSGI 1 middleware too. Alice's cleaned up example didn't look too bad, and it would not require that Coroutine stack at all. I think that at this point both sides need to present some code that really works, and those implementations could then be compared. The examples so far have been a bit too abstract to be fairly evaluated. Outside of the application/middleware you mean? I hope there isn't any more confusion left about what a future is. The fact is that you cannot use synchronous API calls directly from an async app no matter what. Some workaround is always necessary. Which pretty much kills the whole idea as being a single, universal WSGI protocol, since most people don't care about async. I'm confused. Did you not know this? If so, why then were you at least initially receptive to the idea? Personally I don't think that this is a big problem. Async apps will always have to take care not to block the reactor unreasonably long, and that is never going to change. Synchronous apps just need to follow the protocol, but beyond that they shouldn't have to care about the async side of things. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
At 08:09 PM 1/9/2011 +0200, Alex Grönholm wrote: Asynchronous applications may not be ready to send the status line as the first thing coming out of the generator. So? In the sketches that are the subject of this thread, it doesn't have to be the first thing. If the application yields a future first, it will be paused... and so will the middleware. When this line is executed in the middleware: status, headers, body = yield app(environ) ...the middleware is paused until the application actually yields its response tuple. Specifically, this yield causes the app iterator to be pushed on the Coroutine object's .stack attribute, then iterated. If the application yields a future, the server suspends the whole thing until it gets called back, at which point it .send()s the result back into the app iterator. The app iterator then yields its response, which is tagged as a return value, so the app is popped off the .stack, and the response is sent via .send() into the middleware, which then proceeds as if nothing happened in the meantime. It then yields *its* response, and whatever body iterator is given gets put into a second coroutine that proceeds similarly. When the process_response() part of the middleware does a "yield body_iter", the body iterator is pushed, and the middleware is paused until the body iterator yields a chunk. If the body yields a future, the whole process is suspended and resumed. The middleware won't be resumed until the body yields another chunk, at which point it is resumed. If it yields a chunk of its own, then that's passed up to any response-processing middleware further up the stack. In contrast, middleware based on the 2+body protocol cannot process a body without embedding coroutine management into the middleware itself. For example, you can't write a standalone body processor function, and reuse it inside of two pieces of middleware, without doing a bunch of send()/throw() logic to make it work. Outside of the application/middleware you mean? I hope there isn't any more confusion left about what a future is. The fact is that you cannot use synchronous API calls directly from an async app no matter what. Some workaround is always necessary. Which pretty much kills the whole idea as being a single, universal WSGI protocol, since most people don't care about async. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
09.01.2011 19:03, P.J. Eby kirjoitti: At 06:06 AM 1/9/2011 +0200, Alex Grönholm wrote: A new feature here is that the application itself yields a (status, headers) tuple and then chunks of the body (or futures). Hm. I'm not sure if I like that. The typical app developer really shouldn't be yielding multiple body strings in the first place. I much prefer that the canonical example of a WSGI app just return a list with a single bytestring -- preferably in a single statement for the entire return operation, whether it's a yield or a return. Uh, so don't yield multiple body strings then? How is that so difficult? IOW, I want it to look like the normal way to do thing is to just return the whole request at once, and use the additional difficulty of creating a second iterator to discourage people writing iterated bodies when they should just write everything to a BytesIO and be done with it. I fail to understand why a second iterator is necessary when we can get away with just one. Also, it makes middleware simpler: the last line can just yield the result of calling the app, or a modified version, i.e.: yield app(environ) or: s, h, b = app(environ) # ... modify or replace s, h, b yield s, h, b Asynchronous applications may not be ready to send the status line as the first thing coming out of the generator. Consider an app that receives a file. The first thing coming out of the app is a future. The app needs to receive the entire file until it can determine what status line to send. Maybe there was an I/O error writing the file, so it needs to send a 500 response instead of 200. This is not possible with a body iterator, and if we are already iterating the application generator, I really don't understand why the body needs to be an iterator as well. In your approach, the above samples have to be rewritten as: return app(environ) or: result = app(environ) s, h = yield result # ... modify or replace s, h yield s, h for data in result: # modify b as we go yield result Only that last bit doesn't actually work, because you have to be able to send future results back *into* the result. Try actually making some code that runs on this protocol and yields to futures during the body iteration. Did you miss the gist posted by myself (and improved by Alice)? Really, this modified protocol can't work with a full async API the way my coroutine-based version does, AND the middleware is much more complicated. In my version, your do-nothing middleware looks like this: class NullMiddleware(object): def __init__(self, app): self.app = app def __call__(environ): # ACTION: pre-application environ mangling s, h, body = yield self.app(environ) # modify or replace s, h, body here yield s, h, body If you want to actually process the body in some way, it looks like: class NullMiddleware(object): def __init__(self, app): self.app = app def __call__(environ): # ACTION: pre-application environ mangling s, h, body = yield self.app(environ) # modify or replace s, h, body here yield s, h, self.process(body) def process(self, body_iter): while True: chunk = yield body_iter if chunk is None: break # process/modify chunk here yield chunk And that's still a lot simpler than your sketch. Personally, I would write both of the above as: def null_middleware(app): def wrapped(environ): # ACTION: pre-application environ mangling s, h, body = yield app(environ) # modify or replace s, h, body here yield s, h, process(body) def process(body_iter): while True: chunk = yield body_iter if chunk is None: break # process/modify chunk here yield chunk return wrapped But that's just personal taste. Even as a class, it's much easier to write. The above middleware pattern works with the sketches I gave on the PEAK wiki, and I've now updated the wiki to include an example app and middleware for clarity. Really, the only hole in this approach is dealing with applications that block. The elephant in the room here is that while it's easy to write these example applications so they don't block, in practice people read files and do database queries and whatnot in their requests, and those APIs are generally synchronous. So, unless they somehow fold their entire application into a future, it doesn't work. I liked the idea of having a separate async_read() method in wsgi.input, which would set the underlying socket in nonblocking mode and return a future. The event loop would watch the socket and read data into a buffer and trigger the callback when the given amount of data has been
Re: [Web-SIG] Server-side async API implementation sketches
At 04:25 AM 1/9/2011 -0800, Alice BevanMcGregor wrote: On 2011-01-08 13:16:52 -0800, P.J. Eby said: In the limit case, it appears that any WSGI 1 server could provide an (emulated) async WSGI2 implementation, simply by wrapping WSGI2 apps with a finished version of the decorator in my sketch. Or, since users could do it themselves, this would mean that WSGI2 deployment wouldn't be dependent on all server implementers immediately turning out their own WSGI2 implementations. This, if you'll pardon my language, is bloody awesome. :D That would strongly drive adoption of WSGI2. Note that adapting a WSGI1 application to WSGI2 server would likewise be very handy, and I suspect, even easier to implement. I very much doubt that. You'd need greenlets or a thread with a communication channel in order to support WSGI 1 apps that use write() calls. By the way, I don't really see the point of the new sketches you're doing, as they aren't nearly as general as the one I've already done, but still have the same fundamental limitation: wsgi.input. If wsgi.input offers any synchronous methods, then they must be used from a future and must somehow raise an error when called from within the application -- otherwise it would block, nullifying the point of having a generator-based API. If it offers only asynchronous methods, OTOH, then you can't pass wsgi.input to any existing libraries (e.g. the cgi module). The latter problem is the worse one, because it means that the translation of an app between my original WSGI2 API and the current sketch is no longer just "replace 'return' with 'yield'". The only way this would work is if WSGI applications are still allowed to be written in a blocking style. Greenlet-based frameworks would have no problem with this, of course, but servers like Twisted would still have to run WSGI apps in a worker thread pool, just because they *might* block. If we're okay with this as a limitation, then adding _async method variants that return futures might work, and we can proceed from there. Mostly, though, it seems to me that the need to be able to write blocking code does away with most of the benefit of trying to have a single API in the first place. Either everyone ends up putting their whole app into a future, or else the server has to accept that the app could block... and put it into a future for them. ;-) So, the former case will be unacceptable to app developers who don't feel a need for async code, and the latter doesn't seem to offer anything to the developers of non-blocking servers. (The exception to these conditions, of course, are greenlet-based servers, but they can run WSGI *1* apps in a non-blocking way, and so have no need for a new protocol.) ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
At 06:06 AM 1/9/2011 +0200, Alex Grönholm wrote: A new feature here is that the application itself yields a (status, headers) tuple and then chunks of the body (or futures). Hm. I'm not sure if I like that. The typical app developer really shouldn't be yielding multiple body strings in the first place. I much prefer that the canonical example of a WSGI app just return a list with a single bytestring -- preferably in a single statement for the entire return operation, whether it's a yield or a return. IOW, I want it to look like the normal way to do thing is to just return the whole request at once, and use the additional difficulty of creating a second iterator to discourage people writing iterated bodies when they should just write everything to a BytesIO and be done with it. Also, it makes middleware simpler: the last line can just yield the result of calling the app, or a modified version, i.e.: yield app(environ) or: s, h, b = app(environ) # ... modify or replace s, h, b yield s, h, b In your approach, the above samples have to be rewritten as: return app(environ) or: result = app(environ) s, h = yield result # ... modify or replace s, h yield s, h for data in result: # modify b as we go yield result Only that last bit doesn't actually work, because you have to be able to send future results back *into* the result. Try actually making some code that runs on this protocol and yields to futures during the body iteration. Really, this modified protocol can't work with a full async API the way my coroutine-based version does, AND the middleware is much more complicated. In my version, your do-nothing middleware looks like this: class NullMiddleware(object): def __init__(self, app): self.app = app def __call__(environ): # ACTION: pre-application environ mangling s, h, body = yield self.app(environ) # modify or replace s, h, body here yield s, h, body If you want to actually process the body in some way, it looks like: class NullMiddleware(object): def __init__(self, app): self.app = app def __call__(environ): # ACTION: pre-application environ mangling s, h, body = yield self.app(environ) # modify or replace s, h, body here yield s, h, self.process(body) def process(self, body_iter): while True: chunk = yield body_iter if chunk is None: break # process/modify chunk here yield chunk And that's still a lot simpler than your sketch. Personally, I would write both of the above as: def null_middleware(app): def wrapped(environ): # ACTION: pre-application environ mangling s, h, body = yield app(environ) # modify or replace s, h, body here yield s, h, process(body) def process(body_iter): while True: chunk = yield body_iter if chunk is None: break # process/modify chunk here yield chunk return wrapped But that's just personal taste. Even as a class, it's much easier to write. The above middleware pattern works with the sketches I gave on the PEAK wiki, and I've now updated the wiki to include an example app and middleware for clarity. Really, the only hole in this approach is dealing with applications that block. The elephant in the room here is that while it's easy to write these example applications so they don't block, in practice people read files and do database queries and whatnot in their requests, and those APIs are generally synchronous. So, unless they somehow fold their entire application into a future, it doesn't work. I liked the idea of having a separate async_read() method in wsgi.input, which would set the underlying socket in nonblocking mode and return a future. The event loop would watch the socket and read data into a buffer and trigger the callback when the given amount of data has been read. Conversely, .read() would set the socket in blocking mode. What kinds of problems would this cause? That you could never *call* the .read() method outside of a future, or else you would block the server, thereby obliterating the point of having the async API in the first place. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-09 07:04:49 -0800, exar...@twistedmatrix.com said: I think this effort would benefit from more thought on how exactly accessing this external library support will work. If async wsgi is limited to performing a single read asynchronously, then it hardly seems compelling. Apologies if the last e-mail was too harsh; I'm about to go to bed, and it' been a long night/morning. ;) Here's a proposed solution: a generator API on top of futures. If the async server implementing the executor can detect a generator being submitted, then: :: The executor accepts the generator and begins iteration (passing the executor and the arguments supplied to submit). :: The generator is expected to be /fast/. :: The generator does work until it needs an operation over a file descriptor, at which point it yields the fd and the operation (say, 'r', or 'w'). :: The executor schedules with the async reactor the generator to be re-called when the operation is possible. :: The Future is considered complete when the generator raises GeneratorExit and the first argument is used as the return value of the Future. Yielding a 2-tuple of readers/writers would work, too, and allow for more concurrent utilization of sockets, though I'm not sure of the use cases for this. If so, the generator would be woken up when any of the readers or writers are available and sent() a 2-tuple of available_readers, available_writers. The executor is passed along for any operations the generator can not accomplish safely without threads, and the executor, as it's running through the generator, will accomplish the same semantics as iterating the WSGI application: if a future instance is yielded, the generator is suspended until the future is complete, allowing heavy processing to be mixed with async calls in a fully async server. The wsgi.input operations can be implemented this way, as can database operations and pretty much anything that uses sockets, pipes, or on-disk files. In fact, the WSGI application -itself- could be called in this way (with the omission of the executor or a simple wrapper that saves the executor into the environ). Just a quick thought before running off to bed. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-09 07:04:49 -0800, exar...@twistedmatrix.com said: Don't say it if it's not true. Deferreds aren't tied to a reactor, and Marrow doesn't appear to have anything called "deferred". So this parallel to Twisted's Deferred is misleading and confusing. It was merely a comparison to the "you schedule something, attach some callbacks to it, and when it's finished your callbacks get executed" feature. I did not mention Twisted; also: :: defer - postpone: hold back to a later time; "let's postpone the exam" :: deferred - postponed: put off until a later time; "surgery has been postponed" Futures are very similar to deferreds with the one difference you mention: future instances are created by the executor/reactor and are (possibly) the internal representation instead of Twisted treating the Deferred as the executor in terms of registering calls. In most other ways, they share the same goals, and similar methods, even. Marrow's "deferred calls" code is buried in marrow.io, with IOStreams accepting callbacks as part of the standard read/write calls and registering these internally. IOStream then performs read/writes across the raw sockets utilizing callbacks from the IOLoop reactor. When an IOStream meets its criteria (e.g. written all of the requested data, read a number of bytes >= the requested count, or read until a marker has appeared in the stream, e.g. \r\n) IOLoop then executes the callbacks registered with it, passing the data, if any. I will likely expand this to include additional criteria and callback hooks. IOStream, in this way, acts more like Twisted Deferreds than Futures. I think this effort would benefit from more thought on how exactly accessing this external library support will work. If async wsgi is limited to performing a single read asynchronously, then it hardly seems compelling. There appears to be a misunderstanding over how futures work. Please read PEP 3148 [1] carefully. While there's not much there, here's the gist: the executor schedules the callable passed to submit. If the "worker pool" is full, the underlying pooling mechanism will delay execution of the callable until a slot is freed. Pool and slot are defined, by example only, as thread or process pools, but are not restricted to such. (There are three relevant classes defined by concurrent.futures: Executor, ProcessPoolExecutor, and ThreadPoolExecutor. Again, as long as you implement the Executor duck-typed interface, you're good to go and compliant to PEP 3148, regardless of underlying mechanics.) If a "slot" is available at the moment of submission, the callable has a reasonable expectation of being immediately executed. The future.result() method merely blocks awaiting completion of the already running, not yet running, or already completed future. If already completed (a la the future sent back up to to the application after yielding it) the call to result is non-blocking / immediate. Yielding the future is simply a way of safely "blocking" (usually done by calling .result() before the future is complete), not some absolute requirement for the future itself to run. The future (and thus async socket calls et. al.) can, and should, be scheduled with the underlying async reactor in the call to submit(). - Alice. [1] http://www.python.org/dev/peps/pep-3148/ ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 11:36 am, al...@gothcandy.com wrote: On 2011-01-08 19:34:41 -0800, P.J. Eby said: At 04:40 AM 1/9/2011 +0200, Alex Gr�nholm wrote: 09.01.2011 04:15, Alice Bevan�McGregor kirjoitti: I hope that clearly identifies my idea on the subject. Since async>>servers will /already/ be implementing their own executors, I don't>>see this as too crazy. -1 on this. Those executors are meant for executing code in a thread>pool. Mandating a magical socket operation filter here would>considerably complicate server implementation. Actually, the *reverse* is true. If you do it the way Alice proposes, my sketches don't get any more complex, because the filtering goes in the executor facade or submit function. Indeed; the executor is what then adds the file descriptor to the underlying server async reactor (select/epoll/kqueue/other). In the case of the Marrow server, this would utilize a reactor callback (some might say "deferred") to Don't say it if it's not true. Deferreds aren't tied to a reactor, and Marrow doesn't appear to have anything called "deferred". So this parallel to Twisted's Deferred is misleading and confusing. Since each async server will either implement or utilize a specific async framework, each will offer its own "async-supported" featureset. What I mean is that all servers should make wsgi.input calls async- able, some would go further to make all socket calls async. Some might go even further than that and define an API for external libraries (e.g. DBs) to be truly cooperatively async. I think this effort would benefit from more thought on how exactly accessing this external library support will work. If async wsgi is limited to performing a single read asynchronously, then it hardly seems compelling. Jean-Paul ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-08 13:16:52 -0800, P.J. Eby said: In the limit case, it appears that any WSGI 1 server could provide an (emulated) async WSGI2 implementation, simply by wrapping WSGI2 apps with a finished version of the decorator in my sketch. Or, since users could do it themselves, this would mean that WSGI2 deployment wouldn't be dependent on all server implementers immediately turning out their own WSGI2 implementations. This, if you'll pardon my language, is bloody awesome. :D That would strongly drive adoption of WSGI2. Note that adapting a WSGI1 application to WSGI2 server would likewise be very handy, and I suspect, even easier to implement. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-08 19:34:41 -0800, P.J. Eby said: At 04:40 AM 1/9/2011 +0200, Alex Grönholm wrote: 09.01.2011 04:15, Alice BevanMcGregor kirjoitti: I hope that clearly identifies my idea on the subject. Since async>>servers will /already/ be implementing their own executors, I don't>>see this as too crazy. -1 on this. Those executors are meant for executing code in a thread>pool. Mandating a magical socket operation filter here would>considerably complicate server implementation. Actually, the *reverse* is true. If you do it the way Alice proposes, my sketches don't get any more complex, because the filtering goes in the executor facade or submit function. Indeed; the executor is what then adds the file descriptor to the underlying server async reactor (select/epoll/kqueue/other). In the case of the Marrow server, this would utilize a reactor callback (some might say "deferred") to update the Future instance with the data, setting completion status, executing callbacks, etc. One might even be able to use a threading.Event (or whatever is the opposite of a lock) to wake up blocking .result() calls, even if not multi-threaded (greenthreads, etc.). Of course, adding the file descriptor to a pure async reactor then .result() blocking on it from your application would result in a deadlock; the .result() would never complete as the reactor would never get a chance to perform the pending request. (This is why Marrow requires threading be enabled globally before adding an executor to the environment; this requires rather explicit documentation.) This problem is solved completely by yielding the future instance (pausing the application) to let the reactor do its thing. (Yielding the future becomes a replacement for the blocking behaviour of future.result().) Effectively what I propose adds emulation of threading on top of async by mutating an Executor. (The Executor would be a mixed threading+async executor.) I suggest bubbling a future back up the yield stack instead of the actual result to allow the application (or middleware, or whatever happened to yield the future) to capture exceptions generated by the future'd request. Bubbling the future instance avoids excessive exception handling cruft in each middleware layer; and I see no real issue with this. AFIK, you can use a shorthand (possibly wrapped in a try: block) if all you care about is the result: data = (yield my_future).result() Truthfully, I don't really see the point of exposing the map() method (which is the only other executor method we'd expose), so it probably makes more sense to just offer a 'wsgi.submit' key... which can be a function as follows: [snip] True; the executor itself could easily be hidden behind the filter. In a multi-threaded environment, however, the map call poses no problem, and can be quite useful. (E.g. with one of my use cases for inclusion of an executor in the environment: image scaling.) Granted, this might be a rather long function. However, since it's essentially an optimization, a given server can decide how many functions can be shortcut in this way. The spec may wish to offer a guarantee or recommendation for specific methods of certain stdlib-provided types (sockets in particular) and wsgi.input. +1 Personally, I do think it might be *better* to offer extended operations on wsgi.input that could be used via yield, e.g. "yield input.nb_read()". But of course then the trampoline code has torecognize those values instead of futures. Because wsgi.input is provided by the server, and the executor is provided by the server, is there a reason why these extended functions couldn't return... futures? :) Note, too, that this complexity also only affects servers that want to offer a truly async API. A synchronous server has no reason to pay particular attention to what's in a future, since it can't offer any performance improvement. I feel a sync server and async server should provide the same API for accessing the input. E.g. the application/middleware must be agnostic to the server in this regard. This is why a little bit of magic goes a long way. The following code would work on any WSGI2 stack that offers an executor (sync, async, or provided by middleware): data = (yield env['wsgi.submit'](env['wsgi.input'].read, 4096)).result() In a sync server, the blocking read would execute in another thread. In an async one appropriate actions would be taken to request a socket read from the client. Both cases pause the application pending the result. (If you don't immediately yield the future the behaviour between servers is the same!) I do think that this sort of API discussion, though, is the most dangerous part of trying to do an async spec. That is, I don'texpect that everyone will spontaneously agree on the exact same API. Alice's proposal (simply submitting object methods) has theadvantage of severely limiting t
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-08 20:06:19 -0800, Alex Grönholm said: I liked the idea of having a separate async_read() method in wsgi.input, which would set the underlying socket in nonblocking mode and return a future. The event loop would watch the socket and read data into a buffer and trigger the callback when the given amount of data has been read. Conversely, .read() would set the socket in blocking mode. What kinds of problems would this cause? Manipulating the underlying socket is potentially dangerous (pipelining) and, in fact, not possible AFIK while being PEP444-compliant. When the request body is fully consumed, additional attempts to read _must_ return empty strings. Thus raw sockets are right out at a high level; internal to the reactor this may be possible, however. It'd be interesting to adapt marrow.io to using futures in this way as an experiment. OTOH, if you utilize callbacks extensively (as m.s.http does) you run into the problem of data passing. Your application is called (wrapped in middleware), sets up some futures and callbacks, then returns. No returned data. Middleware just got shot in the foot. The server, also, got shot in the foot. How can it get a resopnse tuple back from a callback? How can middleware be utilized? That's a weird problem to wrap my head around. Blocking the application pending the results of various socket operations is something that would have to be mandated to avoid this issue. :/ Multiple in-flight reads would also be problematic; you may end up with buffer interleaving issues. (e.g. job A reads 128 bytes at a time and has been requested to return 4KB, job B does the same... what happens to the data?) Then you begin to involve locking... Notice that my write_body method [1], writes using async, passing the iterable to the callback which is itself. This is after-the-fact (after the request has been returned) and is A-OK, though would need to be updated heavily to support the ideas of async floating around right now. I'm also extremely careful to never have multiple async callbacks pending (and thus never have muliple "jobs" for a single connection working at once). - Alice. [1] https://github.com/pulp/marrow.server.http/blob/draft/marrow/server/http/protocol.py#L313-332 ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
09.01.2011 05:45, P.J. Eby kirjoitti: At 06:15 PM 1/8/2011 -0800, Alice BevanMcGregor wrote: On 2011-01-08 17:22:44 -0800, Alex Grönholm said: On 2011-01-08 13:16:52 -0800, P.J. Eby said: I've written the sketches dealing only with PEP 3148 futures, but sockets were also proposed, and IMO there should be simple support for obtaining data from wsgi.input. I'm a bit unclear as to how this will work with async. How do you propose that an asynchronous application receives the request body? In my example https://gist.github.com/770743 (which has been simplified greatly by P.J. Eby in the "Future- and Generator-Based Async Idea" thread) for dealing with wsgi.input, I have: future = environ['wsgi.executor'].submit(environ['wsgi.input'].read, 4096) yield future While ugly, if you were doing this, you'd likely: submit = environ['wsgi.executor'].submit input_ = environ['wsgi.input'] future = yield submit(input_.read, 4096) data = future. I don't quite understand the above -- in my sketch, the above would be: data = yield submit(input._read, 4096) It looks like your original sketch wants to call .result() on the future, whereas in my version, the return value of yielding a future is the result (or an error is thrown if the result was an error). I cooked up a simple do-nothing middleware example which Alice decorated with some comments: https://gist.github.com/771398 A new feature here is that the application itself yields a (status, headers) tuple and then chunks of the body (or futures). Is there some reason I'm missing, for why you'd want to explicitly fetch the result in a separate step? Meanwhile, thinking about Alex's question, ISTM that if WSGI 2 is asynchronous, then the wsgi.input object should probably just have read(), readline() etc. methods that simply return (possibly-mock) futures. That's *much* better than having to do all that submit() crud just to read data from wsgi.input(). OTOH, if you want to use the cgi module to parse a form POST from the input, you're going to need to write an async version of it in that case, or else feed the entire operation to an executor... but then the methods would need to be synchronous... *argh*. I'm starting to not like this idea at all. Alex has actually pinpointed a very weak spot in the scheme, which is that if wsgi.input is synchronous, you destroy the asynchrony, but if it's asynchronous, you can't use it with any normal code that operates on a stream. I liked the idea of having a separate async_read() method in wsgi.input, which would set the underlying socket in nonblocking mode and return a future. The event loop would watch the socket and read data into a buffer and trigger the callback when the given amount of data has been read. Conversely, .read() would set the socket in blocking mode. What kinds of problems would this cause? I don't see any immediate fixes for this problem, so I'll let it marinate in the back of my mind for a while. This might be the achilles heel for the whole idea of a low-rent async WSGI. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/alex.gronholm%40nextday.fi ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
At 06:15 PM 1/8/2011 -0800, Alice BevanMcGregor wrote: On 2011-01-08 17:22:44 -0800, Alex Grönholm said: On 2011-01-08 13:16:52 -0800, P.J. Eby said: I've written the sketches dealing only with PEP 3148 futures, but sockets were also proposed, and IMO there should be simple support for obtaining data from wsgi.input. I'm a bit unclear as to how this will work with async. How do you propose that an asynchronous application receives the request body? In my example https://gist.github.com/770743 (which has been simplified greatly by P.J. Eby in the "Future- and Generator-Based Async Idea" thread) for dealing with wsgi.input, I have: future = environ['wsgi.executor'].submit(environ['wsgi.input'].read, 4096) yield future While ugly, if you were doing this, you'd likely: submit = environ['wsgi.executor'].submit input_ = environ['wsgi.input'] future = yield submit(input_.read, 4096) data = future. I don't quite understand the above -- in my sketch, the above would be: data = yield submit(input._read, 4096) It looks like your original sketch wants to call .result() on the future, whereas in my version, the return value of yielding a future is the result (or an error is thrown if the result was an error). Is there some reason I'm missing, for why you'd want to explicitly fetch the result in a separate step? Meanwhile, thinking about Alex's question, ISTM that if WSGI 2 is asynchronous, then the wsgi.input object should probably just have read(), readline() etc. methods that simply return (possibly-mock) futures. That's *much* better than having to do all that submit() crud just to read data from wsgi.input(). OTOH, if you want to use the cgi module to parse a form POST from the input, you're going to need to write an async version of it in that case, or else feed the entire operation to an executor... but then the methods would need to be synchronous... *argh*. I'm starting to not like this idea at all. Alex has actually pinpointed a very weak spot in the scheme, which is that if wsgi.input is synchronous, you destroy the asynchrony, but if it's asynchronous, you can't use it with any normal code that operates on a stream. I don't see any immediate fixes for this problem, so I'll let it marinate in the back of my mind for a while. This might be the achilles heel for the whole idea of a low-rent async WSGI. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
At 04:40 AM 1/9/2011 +0200, Alex Grönholm wrote: 09.01.2011 04:15, Alice BevanMcGregor kirjoitti: I hope that clearly identifies my idea on the subject. Since async servers will /already/ be implementing their own executors, I don't see this as too crazy. -1 on this. Those executors are meant for executing code in a thread pool. Mandating a magical socket operation filter here would considerably complicate server implementation. Actually, the *reverse* is true. If you do it the way Alice proposes, my sketches don't get any more complex, because the filtering goes in the executor facade or submit function. Truthfully, I don't really see the point of exposing the map() method (which is the only other executor method we'd expose), so it probably makes more sense to just offer a 'wsgi.submit' key... which can be a function as follows: def submit(callable, *args, **kw): ob = getattr(callable, '__self__', None) if isinstance(ob, ServerProvidedSocket): # could be an ABC future = MockFuture() if callable==ob.read: # set up read callback to fire future elif callable==ob.write: # set up write callback to fire future return future else: return real_executor.submit(callable, *args, **kw) Granted, this might be a rather long function. However, since it's essentially an optimization, a given server can decide how many functions can be shortcut in this way. The spec may wish to offer a guarantee or recommendation for specific methods of certain stdlib-provided types (sockets in particular) and wsgi.input. Personally, I do think it might be *better* to offer extended operations on wsgi.input that could be used via yield, e.g. "yield input.nb_read()". But of course then the trampoline code has to recognize those values instead of futures. Either way works, but somewhere there is going to be some type-testing (explicit or implicit) taking place to determine how to suspend and resume the app. Note, too, that this complexity also only affects servers that want to offer a truly async API. A synchronous server has no reason to pay particular attention to what's in a future, since it can't offer any performance improvement. I do think that this sort of API discussion, though, is the most dangerous part of trying to do an async spec. That is, I don't expect that everyone will spontaneously agree on the exact same API. Alice's proposal (simply submitting object methods) has the advantage of severely limiting the scope of API discussions. ;-) ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
09.01.2011 04:15, Alice Bevan–McGregor kirjoitti: On 2011-01-08 17:22:44 -0800, Alex Grönholm said: On 2011-01-08 13:16:52 -0800, P.J. Eby said: I've written the sketches dealing only with PEP 3148 futures, but sockets were also proposed, and IMO there should be simple support for obtaining data from wsgi.input. I'm a bit unclear as to how this will work with async. How do you propose that an asynchronous application receives the request body? In my example https://gist.github.com/770743 (which has been simplified greatly by P.J. Eby in the "Future- and Generator-Based Async Idea" thread) for dealing with wsgi.input, I have: future = environ['wsgi.executor'].submit(environ['wsgi.input'].read, 4096) yield future While ugly, if you were doing this, you'd likely: submit = environ['wsgi.executor'].submit input_ = environ['wsgi.input'] future = yield submit(input_.read, 4096) data = future. That's a bit nicer to read, and simplifies things if you need to make a number of async calls. The idea here is that: :: Your async server subclasses ThreadPoolExecutor. :: The subclass overloads the submit method. :: Your submit method detects bound methods on wsgi.input, sockets, and files. :: If one of the above is detected, create a mock future that defines 'fd' and 'operation' attributes or similar. :: When yielding the mock future, your async reactor can detect 'fd' and do the appropriate thing for your async framework. (Generally adding the fd to the appropriate select/epoll/kqueue readers/writers lists.) :: When the condition is met, set_running_or_notify_cancel (when internally reading or writing data), set_result, saving the value, and return the future (filled with its data) back up to the application. :: The application accepts the future instance as the return value of yield, and calls result across it to get the data. (Obviously writes, if allowed, won't have data, but reads will.) I hope that clearly identifies my idea on the subject. Since async servers will /already/ be implementing their own executors, I don't see this as too crazy. -1 on this. Those executors are meant for executing code in a thread pool. Mandating a magical socket operation filter here would considerably complicate server implementation. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/alex.gronholm%40nextday.fi ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-08 17:22:44 -0800, Alex Grönholm said: On 2011-01-08 13:16:52 -0800, P.J. Eby said: I've written the sketches dealing only with PEP 3148 futures, but sockets were also proposed, and IMO there should be simple support for obtaining data from wsgi.input. I'm a bit unclear as to how this will work with async. How do you propose that an asynchronous application receives the request body? In my example https://gist.github.com/770743 (which has been simplified greatly by P.J. Eby in the "Future- and Generator-Based Async Idea" thread) for dealing with wsgi.input, I have: future = environ['wsgi.executor'].submit(environ['wsgi.input'].read, 4096) yield future While ugly, if you were doing this, you'd likely: submit = environ['wsgi.executor'].submit input_ = environ['wsgi.input'] future = yield submit(input_.read, 4096) data = future. That's a bit nicer to read, and simplifies things if you need to make a number of async calls. The idea here is that: :: Your async server subclasses ThreadPoolExecutor. :: The subclass overloads the submit method. :: Your submit method detects bound methods on wsgi.input, sockets, and files. :: If one of the above is detected, create a mock future that defines 'fd' and 'operation' attributes or similar. :: When yielding the mock future, your async reactor can detect 'fd' and do the appropriate thing for your async framework. (Generally adding the fd to the appropriate select/epoll/kqueue readers/writers lists.) :: When the condition is met, set_running_or_notify_cancel (when internally reading or writing data), set_result, saving the value, and return the future (filled with its data) back up to the application. :: The application accepts the future instance as the return value of yield, and calls result across it to get the data. (Obviously writes, if allowed, won't have data, but reads will.) I hope that clearly identifies my idea on the subject. Since async servers will /already/ be implementing their own executors, I don't see this as too crazy. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
08.01.2011 23:16, P.J. Eby kirjoitti: As a semi-proof-of-concept, I whipped these up: http://peak.telecommunity.com/DevCenter/AsyncWSGISketch It's an expanded version of my Coroutine concept, updated with sample server code for both a synchronous server and an asynchronous one. The synchronous "server" is really just a decorator that wraps a WSGI2 async app with futures support, and handles pauses by simply waiting for the future to finish. The asynchronous server is a bit more hand-wavy, in that there are some bits (clearly marked) that will be server/framework dependent. However, they should be straightforward for a specialist in any given async framework to implement. What is *most* handwavy at the moment, however, is in the details of precisely what one is allowed to "yield to". I've written the sketches dealing only with PEP 3148 futures, but sockets were also proposed, and IMO there should be simple support for obtaining data from wsgi.input. However, even this part is pretty easy to extrapolate: both server examples just add more type-testing branches in their "base_trampoline()" function, copying and modifying the existing branches that deal with futures. The entire result is surprisingly compact -- each server weighed in at about 40 lines, and the common Coroutine class used by both adds another 60-something lines. In the limit case, it appears that any WSGI 1 server could provide an (emulated) async WSGI2 implementation, simply by wrapping WSGI2 apps with a finished version of the decorator in my sketch. Or, since users could do it themselves, this would mean that WSGI2 deployment wouldn't be dependent on all server implementers immediately turning out their own WSGI2 implementations. True async API implementations would be more involved, of course -- using a WSGI2 decorator on say, Twisted's WSGI1 implementation, would give you no performance advantages vs. using Twisted's APIs directly. But, as soon as someone wrote a Twisted-specific translation of my async-server sketch, such an app would be portable. More discussion is still needed, but at this point I'm convinced the concept is *technically* feasible. (Whether there's enough need in the "market" to make it worthwhile, is a separate question.) I'm a bit unclear as to how this will work with async. How do you propose that an asynchronous application receives the request body? ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/alex.gronholm%40nextday.fi ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com