I didn’t group the integers, but process them in group of two, partition that
scala> val a = sc.parallelize(List(1, 2, 3, 4), 2) a: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:12 process each partition and process elements in the partition in group of 2 scala> a.mapPartitions(p => {val l = p.toList; | val ret = new ListBuffer[Int] | for (i <- 0 until l.length by 2) { | ret += l(i) + l(i + 1) | } | ret.toList.iterator | } | ) res7: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at mapPartitions at <console>:16 scala> res7.collect res10: Array[Int] = Array(3, 7) Best, -- Nan Zhu On Monday, March 24, 2014 at 8:40 PM, Nan Zhu wrote: > partition your input into even number partitions > > use mapPartition to operate on Iterator[Int] > > maybe there are some more efficient way…. > > Best, > > -- > Nan Zhu > > > > On Monday, March 24, 2014 at 7:59 PM, yh18190 wrote: > > > Hi, I have large data set of numbers ie RDD and wanted to perform a > > computation only on groupof two values at a time. For example > > 1,2,3,4,5,6,7... is an RDD Can i group the RDD into (1,2),(3,4),(5,6)...?? > > and perform the respective computations ?in an efficient manner? As we > > do'nt have a way to index elements directly using forloop etc..(i,i+1)...is > > their way to resolve this problem? Please suggest me ..i would be really > > thankful to you > > View this message in context: Splitting RDD and Grouping together to > > perform computation > > (http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153.html) > > Sent from the Apache Spark User List mailing list archive > > (http://apache-spark-user-list.1001560.n3.nabble.com/) at Nabble.com > > (http://Nabble.com). >