Thanks Chris,

I'm not exactly sure what you mean with MutablePair, but are you saying
that we could create RDD[MutablePair] and modify individual rows?

If so, will that play nicely with RDD's lineage and fault tolerance?

As for the alternatives, I don't think 1 is something we want to do, since
that would require another complex system we'll have to implement. Is DDF
going to be an alternative to RDD?

Thanks again!



On Fri, Mar 28, 2014 at 7:02 PM, Christopher Nguyen <c...@adatao.com> wrote:

> Sung Hwan, strictly speaking, RDDs are immutable, so the canonical way to
> get what you want is to transform to another RDD. But you might look at
> MutablePair (
> https://github.com/apache/spark/blob/60abc252545ec7a5d59957a32e764cd18f6c16b4/core/src/main/scala/org/apache/spark/util/MutablePair.scala)
> to see if the semantics meet your needs.
>
> Alternatively you can consider:
>
>    1. Build & provide a fast lookup service that stores and returns the
>    mutable information keyed by the RDD row IDs, or
>    2. Use DDF (Distributed DataFrame) which we'll make available in the
>    near future, which will give you fully mutable-row table semantics.
>
>
> --
> Christopher T. Nguyen
> Co-founder & CEO, Adatao <http://adatao.com>
> linkedin.com/in/ctnguyen
>
>
>
> On Fri, Mar 28, 2014 at 5:16 PM, Sung Hwan Chung <coded...@cs.stanford.edu
> > wrote:
>
>> Hey guys,
>>
>> I need to tag individual RDD lines with some values. This tag value would
>> change at every iteration. Is this possible with RDD (I suppose this is
>> sort of like mutable RDD, but it's more) ?
>>
>> If not, what would be the best way to do something like this? Basically,
>> we need to keep mutable information per data row (this would be something
>> much smaller than actual data row, however).
>>
>> Thanks
>>
>
>

Reply via email to