On Mon, Apr 28, 2014 at 9:30 AM, Sung Hwan Chung <coded...@cs.stanford.edu>wrote:
> Actually, I do not know how to do something like this or whether this is > possible - thus my suggestive statement. > > Can you already declare persistent memory objects per worker? I tried > something like constructing a singleton object within map functions, but > that didn't work as it seemed to actually serialize singletons and pass it > back and forth in a weird manner. > > Does it need to be persistent across operations, or just persist for the lifetime of processing of one partition in one mapPartition? The latter is quite easy and might give most of the speedup. Maybe that's 'enough', even if it means you re-cache values several times in a repeated iterative computation. It would certainly avoid managing a lot of complexity in trying to keep that state alive remotely across operations. I'd also be interested if there is any reliable way to do that, though it seems hard since it means you embed assumptions about where particular data is going to be processed.