cloud-fan commented on a change in pull request #25751: [SPARK-29042][Core] Sampling-based RDD with unordered input should be INDETERMINATE URL: https://github.com/apache/spark/pull/25751#discussion_r323810510
########## File path: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ########## @@ -2779,6 +2779,45 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi .contains("Spark cannot rollback the ShuffleMapStage 1")) } + test("SPARK-29042: Sampled RDD with unordered input should be indeterminate") { + val shuffleMapRdd1 = new MyRDD(sc, 2, Nil, indeterminate = false) + + val shuffleDep1 = new ShuffleDependency(shuffleMapRdd1, new HashPartitioner(2)) + val shuffleId1 = shuffleDep1.shuffleId + val shuffleMapRdd2 = new MyRDD(sc, 2, List(shuffleDep1), tracker = mapOutputTracker) + + assert(shuffleMapRdd2.outputDeterministicLevel == DeterministicLevel.UNORDERED) + + val sampledRdd = shuffleMapRdd2.sample(true, 0.3, 1000L) + assert(sampledRdd.outputDeterministicLevel == DeterministicLevel.INDETERMINATE) Review comment: I think we can stop here. We have enough test coverage for test rerun when the RDD is INDETERMINATE. We just need to prove that the sampled RDD with unordered input is INDETERMINATE ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org