I didn’t group the integers, but process them in group of two,   

partition that

scala> val a = sc.parallelize(List(1, 2, 3, 4), 2)  
a: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at 
<console>:12


process each partition and process elements in the partition in group of 2

scala> a.mapPartitions(p => {val l = p.toList;   
     | val ret = new ListBuffer[Int]
     | for (i <- 0 until l.length by 2) {
     | ret += l(i) + l(i + 1)
     | }
     | ret.toList.iterator
     | }
     | )
res7: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at mapPartitions at 
<console>:16



scala> res7.collect

res10: Array[Int] = Array(3, 7)

Best,

--  
Nan Zhu



On Monday, March 24, 2014 at 8:40 PM, Nan Zhu wrote:

> partition your input into even number partitions  
>  
> use mapPartition to operate on Iterator[Int]
>  
> maybe there are some more efficient way….
>  
> Best,  
>  
> --  
> Nan Zhu
>  
>  
>  
> On Monday, March 24, 2014 at 7:59 PM, yh18190 wrote:
>  
> > Hi, I have large data set of numbers ie RDD and wanted to perform a 
> > computation only on groupof two values at a time. For example 
> > 1,2,3,4,5,6,7... is an RDD Can i group the RDD into (1,2),(3,4),(5,6)...?? 
> > and perform the respective computations ?in an efficient manner? As we 
> > do'nt have a way to index elements directly using forloop etc..(i,i+1)...is 
> > their way to resolve this problem? Please suggest me ..i would be really 
> > thankful to you  
> > View this message in context: Splitting RDD and Grouping together to 
> > perform computation 
> > (http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153.html)
> > Sent from the Apache Spark User List mailing list archive 
> > (http://apache-spark-user-list.1001560.n3.nabble.com/) at Nabble.com 
> > (http://Nabble.com).
>  

Reply via email to