I thought thats what sort based shuffled did, sort the keys going to the same partition.
I have tried (c1, c2) as (Int, Int) tuple as well. I don't think that ordering of c2 type is the problem here. On Tue, Feb 3, 2015 at 5:21 PM, Sean Owen <so...@cloudera.com> wrote: > Hm, I don't think the sort partitioner is going to cause the result to > be ordered by c1,c2 if you only partitioned on c1. I mean, it's not > even guaranteed that the type of c2 has an ordering, right? > > On Tue, Feb 3, 2015 at 3:38 PM, nitinkak001 <nitinkak...@gmail.com> wrote: > > I am trying to implement secondary sort in spark as we do in map-reduce. > > > > Here is my data(tab separated, without c1, c2, c2). > > c1 c2 c3 > > 1 2 4 > > 1 3 6 > > 2 4 7 > > 2 6 8 > > 3 5 5 > > 3 1 8 > > 3 2 0 > > > > To do secondary sort, I create paried RDD as > > > > /((c1 + ","+ c2), row)/ > > > > and then use a custom partitioner to partition only on c1. I have set > > /spark.shuffle.manager = SORT/ so the keys per partition are sorted. For > the > > key "3" I am expecting to get > > (3, 1) > > (3, 2) > > (3, 5) > > but still getting the original order > > 3,5 > > 3,1 > > 3,2 > > > > Here is the custom partitioner code: > > > > /class StraightPartitioner(p: Int) extends org.apache.spark.Partitioner { > > def numPartitions = p > > def getPartition(key: Any) = { > > key.asInstanceOf[String].split(",")(0).toInt > > } > > > > }/ > > > > and driver code, please tell me what I am doing wrong > > > > /val conf = new SparkConf().setAppName("MapInheritanceExample") > > conf.set("spark.shuffle.manager", "SORT"); > > val sc = new SparkContext(conf) > > val pF = sc.textFile(inputFile) > > > > val log = LogFactory.getLog("MapFunctionTest") > > val partitionedRDD = pF.map { x => > > > > var arr = x.split("\t"); > > (arr(0)+","+arr(1), null) > > > > }.partitionBy(new StraightPartitioner(10)) > > > > var outputRDD = partitionedRDD.mapPartitions(p => { > > p.map({ case(o, n) => { > > o > > } > > }) > > })/ > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Sort-based-shuffle-not-working-properly-tp21487.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > >