Re: Only TraversableOnce?

Nan Zhu Tue, 08 Apr 2014 05:15:06 -0700

so, the data structure looks like:

D consists of D1, D2, D3 (DX is partition)


and 

DX consists of d1, d2, d3 (dx is the part in your context)?

what you want to do is to transform 

DX to (d1 + d2, d1 + d3, d2 + d3)?



Best, 

-- 
Nan Zhu



On Tuesday, April 8, 2014 at 8:09 AM, wxhsdp wrote:

> In my application, data parts inside an RDD partition have ralations. so I
> need to do some operations beween them. 
> 
> for example
> RDD T1 has several partitions, each partition has three parts A, B and C.
> then I transform T1 to T2. after transform, T2 also has three parts D, E and
> F, D = A+B, E = A+C, F = B+C. As far as I know, spark only supports
> operations traversing the RDD and calling a function for each element. how
> can I do such a transform?
> 
> in hadoop I copy the data in each partition to a user defined buffer and do
> any operations I like in the buffer, finally I call output.collect() to emit
> the data. But how can I construct a new RDD with distributed partitions in
> spark? makeRDD only distributes a local Scala collection to form an RDD.
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Only-TraversableOnce-tp3873.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com 
> (http://Nabble.com).
> 
>

Re: Only TraversableOnce?

Reply via email to