Anyone ?
On Tue, Apr 21, 2015 at 3:38 PM Marius Danciu <marius.dan...@gmail.com> wrote: > Hello anyone, > > I have a question regarding the sort shuffle. Roughly I'm doing something > like: > > rdd.mapPartitionsToPair(f1).groupByKey().mapPartitionsToPair(f2) > > The problem is that in f2 I don't see the keys being sorted. The keys are > Java Comparable not scala.math.Ordered or scala.math.Ordering (it would be > weird for each key to implement Ordering as mentioned in the JIRA item > https://issues.apache.org/jira/browse/SPARK-2045) > > Questions: > 1. Do I need to explicitly sortByKey ? (if I do this I can see the keys > correctly sorted in f2) ... but I'm worried about the extra costs since > Spark 1.3.0 is supposed to use the SORT shuffle manager by default, right ? > 2. Do I need each key to be an scala.math.Ordered ? ... is Java Comparable > used at all ? > > ... btw I'm using Spark from Java ... don't ask me why :) > > > > Best, > Marius >