On Mon, Apr 28, 2014 at 8:22 AM, Sung Hwan Chung <coded...@cs.stanford.edu>wrote: > > e.g. something like > > rdd.mapPartition((rows : Iterator[String]) => { > var idx = 0 > rows.map((row: String) => { > val valueMap = SparkWorker.getMemoryContent("valMap") > val prevVal = valueMap(idx) > idx += 1 > ... > }) > ... > }) > > The developer can implement their own fault recovery mechanism if the > worker has crashed and lost the memory content. >
Yea you can always just declare your own per-partition data structures in a function block like that, right? valueMap can be initialized to an empty map, loaded from somewhere, or even a value that is broadcast from the driver. That's certainly better than tacking data onto RDDs. It's not restored if the computation is lost of course, but in this and many other cases, it's fine, as it is just for some cached intermediate results. This already works then or did I misunderstand the original use case?