Sung Hwan, yes, I'm saying exactly what you interpreted, including that if
you tried it, it would (mostly) work, and my uncertainty with respect to
guarantees on the semantics. Definitely there would be no fault tolerance
if the mutations depend on state that is not captured in the RDD lineage.
DD
Thanks Chris,
I'm not exactly sure what you mean with MutablePair, but are you saying
that we could create RDD[MutablePair] and modify individual rows?
If so, will that play nicely with RDD's lineage and fault tolerance?
As for the alternatives, I don't think 1 is something we want to do, since
Sung Hwan, strictly speaking, RDDs are immutable, so the canonical way to
get what you want is to transform to another RDD. But you might look at
MutablePair (
https://github.com/apache/spark/blob/60abc252545ec7a5d59957a32e764cd18f6c16b4/core/src/main/scala/org/apache/spark/util/MutablePair.scala)