On Sat, Mar 9, 2013 at 5:50 AM, Vincent Pelletier <plr.vinc...@gmail.com> wrote: > Le Friday 08 March 2013 18:50:09, Laurence Rowe a écrit : >> It would be great if there was a way to advise ZODB in advance that >> certain objects would be required so it could fetch multiple object >> states in a single request to the storage server. > > +1 > > I can see this used to process a large tree, objects being be processed as > they are loaded (loadds being pipelined). > > Pseudo-code interface suggestion: > > class IPipelinedStorage: > def loadMany(oid_list, callback, tid=None, before_tid=None): > callback being along the lines of: > def callback(oid, data_record, tid, next_tid): > if stop_condition: > raise ... (StopIteration ? just anything ?) > return more_oids_to_queue_for_loading > tid and before_tid (mutualy exclusive) specify the snapshot to use, to > implement equivalent of loadSerial and loadBefore. > > class IPipelinedConnection: > def walk(ob, callback): > callback being along the lines of: > def callback(just_loaded_object, referee_list): > # do womething on just_loaded_object > return filtered_referee_list > referee_list would expose at least referee's class (name ?), and hold their > oid for Connection.walk internal use (only ?). > Or maybe just ghosts, but callback would have to take care of not > unghostifying them - it would void the purpose of pipelining loads. > > Above ZODB (persistent containers with internal persistent objects, like > BTree): > Implement an iterator over subobjects ignoring intermediate internal > structure (think BTree.*Bucket classes). > > Specific iteration order could probably be specified to be able to implement > iterkeys and such in BTree for example, but storage may have to implement load > reordering when they happen in parallel (like NEO, and as could probably be > implemented for zeoraid and relStorage configured with multiple mirrored > databases), limiting latency/processing parallelism and possibly leading to > memory footprint explosion. > So I think it should be possible to also request no special loading order to > get lowest latency backend can provide and somewhat constant memory footprint. > > Any thought/comment ?
I think this is more complicated than necessary. I think a simple method on a storage that gives a hint that a set of object ids will be loaded is enough. A network storage could then issue a pipelined request for those oids. The application can then proceed as usual. I think I've proposed such an API before, but am too lazy to look it up. Something like: load_hint(*oids) I'd like to see this functionality, but I don't have time to do it soon. I must say that I think this API is more likely to be abused than used effectively. Prefetching catalog indexes is a sort of anti-pattern than only makes sense for small catalogs. It would likely make more sense to have a dedicated catalog server that returned oids and possibly object records in response to queries (or whimper, use solr ). Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton Jerky is better than bacon! http://zo.pe/Kqm _______________________________________________ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev