Re: is it okay to reuse objects across RDD's?

Sean Owen Mon, 28 Apr 2014 01:25:29 -0700

On Mon, Apr 28, 2014 at 8:22 AM, Sung Hwan Chung
<coded...@cs.stanford.edu>wrote:
>
> e.g. something like
>
> rdd.mapPartition((rows : Iterator[String]) => {
>   var idx = 0
>   rows.map((row: String) => {
>     val valueMap = SparkWorker.getMemoryContent("valMap")
>     val prevVal = valueMap(idx)
>     idx += 1
>     ...
>   })
>   ...
> })
>
> The developer can implement their own fault recovery mechanism if the
> worker has crashed and lost the memory content.
>


Yea you can always just declare your own per-partition data structures in a
function block like that, right? valueMap can be initialized to an empty
map, loaded from somewhere, or even a value that is broadcast from the
driver.

That's certainly better than tacking data onto RDDs.

It's not restored if the computation is lost of course, but in this and
many other cases, it's fine, as it is just for some cached intermediate
results.

This already works then or did I misunderstand the original use case?

Re: is it okay to reuse objects across RDD's?

Reply via email to