On Mon, Apr 28, 2014 at 9:30 AM, Sung Hwan Chung
<coded...@cs.stanford.edu>wrote:

> Actually, I do not know how to do something like this or whether this is
> possible - thus my suggestive statement.
>
> Can you already declare persistent memory objects per worker? I tried
> something like constructing a singleton object within map functions, but
> that didn't work as it seemed to actually serialize singletons and pass it
> back and forth in a weird manner.
>
>
Does it need to be persistent across operations, or just persist for the
lifetime of processing of one partition in one mapPartition? The latter is
quite easy and might give most of the speedup.

Maybe that's 'enough', even if it means you re-cache values several times
in a repeated iterative computation. It would certainly avoid managing a
lot of complexity in trying to keep that state alive remotely across
operations. I'd also be interested if there is any reliable way to do that,
though it seems hard since it means you embed assumptions about where
particular data is going to be processed.

Reply via email to