On Thu, 8 Apr 2010, P.J. Eby wrote:
This is also a good time for people to learn that generators are usually a *very bad* way to write WSGI apps - yielding is for server push or sending blocks of large files, not tiny strings. In general, if you're yielding more than one block, you're almost certainly doing WSGI wrong. The typical HTML, XML, or JSON output that's 99% of a webapp's requests should be transmitted as a single string, rather than as a series of snippets.
Now the thread that included the quoted bit above has died down a bit, I wanted to get back to this. I was surprised when I read this as I found it counter intuitive, different to what I'm doing in practical day to day WSGI app creation and contrary to what my old school network services thinking thinks (start getting stuff queued for the pipe as soon as possible). The apps I'm creating tend to be HTTP APIs that are trying to be RESTful and as such they have singular resources I call entities, and aggregates of those entities I call collections. The APIs provide access to GETting and PUTting entities and GETting collections. Whenever a GET request is made on an entity or collection, the entity or entities involved is serialzed to some string form. When there are many entities in a collection, yielding their serialized forms makes semantic sense as well as (it appears) resource utiliziation sense. I realize I'm able to build up a complete string or yield via a generator, or a whole bunch of various ways to accomplish things (which is part of why I like WSGI: that content is just an iterator, that's a good thing) so I'm not looking for a statement of what is or isn't possible, but rather opinions. Why is yielding lots of moderately sized strings *very bad*? Why is it _not_ very bad (as presumably others think)? The model I have in my mind is an application where there is a fair amount of layering and separation between the request handling, the persistence layer, and the serialization system. When a GET for a collection happens, it would call the persistence layer, which would return a generator of entities, which would be passed to the serialization, which would yield a block of output per entity. Here's some pseudo code: def get_collection(environ, start_response): try: entities = store.get_collection('something') except NoSomething: start_response('404 Not Found', []) return ['sorry'] start_response('200 OK' [('Content-Type', 'text/html')]) # yield a block of html per entity return serializer.generate_html_from_entities(entities) "In general, if you're yielding more than one block, you're almost certainly doing WSGI wrong." I don't understand how this is wrong. It appears to allow nice conceptual separation between the store and serializer while still allowing the memory (and sometimes cpu) efficiences of generators. It may be that I'm a special case (some of the serializations can be quite expansive and expensive), but I would be surprised if that were so. So what's going on? P.S. Speaking of these things, can anyone point me to a JSON tool that can yield a string of JSON as a series of blocks? Assuming a data structure that is a long list of anonymous dicts, json.dumps(the_list) returns a single string. It would be nice, to fit in the model above I could yield each dict. Better if I could pass the_list as a generator. I can think of ways to create such a tool myself, but I'd like to use an existing one if it exists. _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com