Mutable tagging RDD rows ?

2014-03-28 Thread Sung Hwan Chung
Hey guys, I need to tag individual RDD lines with some values. This tag value would change at every iteration. Is this possible with RDD (I suppose this is sort of like mutable RDD, but it's more) ? If not, what would be the best way to do something like this? Basically, we need to keep mutable i

Re: Mutable tagging RDD rows ?

2014-03-28 Thread Christopher Nguyen
Sung Hwan, strictly speaking, RDDs are immutable, so the canonical way to get what you want is to transform to another RDD. But you might look at MutablePair ( https://github.com/apache/spark/blob/60abc252545ec7a5d59957a32e764cd18f6c16b4/core/src/main/scala/org/apache/spark/util/MutablePair.scala)

Re: Mutable tagging RDD rows ?

2014-03-28 Thread Sung Hwan Chung
Thanks Chris, I'm not exactly sure what you mean with MutablePair, but are you saying that we could create RDD[MutablePair] and modify individual rows? If so, will that play nicely with RDD's lineage and fault tolerance? As for the alternatives, I don't think 1 is something we want to do, since

Re: Mutable tagging RDD rows ?

2014-03-28 Thread Christopher Nguyen
Sung Hwan, yes, I'm saying exactly what you interpreted, including that if you tried it, it would (mostly) work, and my uncertainty with respect to guarantees on the semantics. Definitely there would be no fault tolerance if the mutations depend on state that is not captured in the RDD lineage. DD