Re: Mutable tagging RDD rows ?

2014-03-28 Thread Christopher Nguyen
Sung Hwan, yes, I'm saying exactly what you interpreted, including that if you tried it, it would (mostly) work, and my uncertainty with respect to guarantees on the semantics. Definitely there would be no fault tolerance if the mutations depend on state that is not captured in the RDD lineage. DD

Re: Mutable tagging RDD rows ?

2014-03-28 Thread Sung Hwan Chung
Thanks Chris, I'm not exactly sure what you mean with MutablePair, but are you saying that we could create RDD[MutablePair] and modify individual rows? If so, will that play nicely with RDD's lineage and fault tolerance? As for the alternatives, I don't think 1 is something we want to do, since

Re: Mutable tagging RDD rows ?

2014-03-28 Thread Christopher Nguyen
Sung Hwan, strictly speaking, RDDs are immutable, so the canonical way to get what you want is to transform to another RDD. But you might look at MutablePair ( https://github.com/apache/spark/blob/60abc252545ec7a5d59957a32e764cd18f6c16b4/core/src/main/scala/org/apache/spark/util/MutablePair.scala)