oh wow, that is a really interesting observation, Marco & Jerry. I wonder if this is worth exposing in combineByKey()? I think Jerry's proposed workaround is all you can do for now -- use reflection to side-step the fact that the methods you need are private.
On Mon, Apr 27, 2015 at 8:07 AM, Saisai Shao <sai.sai.s...@gmail.com> wrote: > Hi Marco, > > As I know, current combineByKey() does not expose the related argument > where you could set keyOrdering on the ShuffledRDD, since ShuffledRDD is > package private, if you can get the ShuffledRDD through reflection or other > way, the keyOrdering you set will be pushed down to shuffle. If you use a > combination of transformations to do it, the result will be same but the > efficiency may be different, some transformations will separate into > different stages, which will introduce additional shuffle. > > Thanks > Jerry > > > 2015-04-27 19:00 GMT+08:00 Marco <marcope...@gmail.com>: > >> Hi, >> >> I'm trying, after reducing by key, to get data ordered among partitions >> (like RangePartitioner) and within partitions (like sortByKey or >> repartitionAndSortWithinPartition) pushing the sorting down to the >> shuffles machinery of the reducing phase. >> >> I think, but maybe I'm wrong, that the correct way to do that is that >> combineByKey call setKeyOrdering function on the ShuflleRDD that it >> returns. >> >> Am I wrong? Can be done by a combination of other transformations with >> the same efficiency? >> >> Thanks, >> Marco >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >