Its because you are just printing on the rdd You can sort the df like below
input.toDF().sort().collect() or if you do not want to convert to a dataframe you can use the sort by *sortByKey*([*ascending*], [*numTasks*]) Regards Sam On Tue, Feb 14, 2017 at 11:41 AM, 萝卜丝炒饭 <1427357...@qq.com> wrote: > HI all, > the belowing is my test code. I found the output of val > input is different. how do i fix the order please? > > scala> val input = sc.parallelize( Array(1,2,3)) > input: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[13] at > parallelize at <console>:24 > > scala> input.foreach(print) > 132 > scala> input.foreach(print) > 213 > scala> input.foreach(print) > 312