Hi: I would like to sort historical data using the dataset api.
env.setParallelism(10)
val dataset = [(Long, String)] ..
.paritionByRange(_._1)
.sortPartition(_._1, Order.ASCEDING)
.writeAsCsv("mydata.csv").setParallelism(1)
the data is out of order (in local order)
but
.print()
prints the data in to correct order. I have run a small toy sample multiple
times.
Is there a way to sort the entire dataset with parallelism > 1 and write it
to a single file in ascending order?
