>From the jist of it, it seems like you need to override the default partitioner to control how your data is distributed among partitions. Take a look at different Partitioners available (Default, Range, Hash) if none of these get you desired result, you might want to provide your own.
On Fri, Mar 28, 2014 at 2:08 PM, Adrian Mocanu <amoc...@verticalscope.com>wrote: > I say you need to remap so you have a key for each tuple that you can sort > on. > Then call rdd.sortByKey(true) like this mystream.transform(rdd => > rdd.sortByKey(true)) > For this fn to be available you need to import > org.apache.spark.rdd.OrderedRDDFunctions > > -----Original Message----- > From: yh18190 [mailto:yh18...@gmail.com] > Sent: March-28-14 5:02 PM > To: u...@spark.incubator.apache.org > Subject: RE: Splitting RDD and Grouping together to perform computation > > > Hi, > Here is my code for given scenario.Could you please let me know where to > sort?I mean on what basis we have to sort??so that they maintain order in > partition as thatof original sequence.. > > val res2=reduced_hccg.map(_._2)// which gives RDD of numbers > res2.foreach(println) > val result= res2.mapPartitions(p=>{ > val l=p.toList > > val approx=new ListBuffer[(Int)] > val detail=new ListBuffer[Double] > for(i<-0 until l.length-1 by 2) > { > println(l(i),l(i+1)) > approx+=(l(i),l(i+1)) > > > } > approx.toList.iterator > detail.toList.iterator > }) > result.foreach(println) > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153p3450.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >