[ https://issues.apache.org/jira/browse/SPARK-27164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793222#comment-16793222 ]
Ajith S commented on SPARK-27164: --------------------------------- i will be working on this > RDD.countApprox on empty RDDs schedules jobs which never complete > ------------------------------------------------------------------ > > Key: SPARK-27164 > URL: https://issues.apache.org/jira/browse/SPARK-27164 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.2.3, 2.4.0 > Environment: macOS, Spark-2.4.0 with Hadoop 2.7 running on Java 11.0.1 > Also observed on: > macOS, Spark-2.2.3 with Hadoop 2.7 running on Java 1.8.0_151 > Reporter: Ryan Moore > Priority: Major > Attachments: Screen Shot 2019-03-14 at 1.49.19 PM.png > > > When calling `countApprox` on an RDD which has no partitions (such as those > created by `sparkContext.emptyRDD`) a job is scheduled with 0 stages and 0 > tasks. That job appears under the "Active Jobs" in the Spark UI until it is > either killed or the Spark context is shut down. > > {code:java} > Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.1) > Type in expressions to have them evaluated. > Type :help for more information. > scala> val ints = sc.makeRDD(Seq(1)) > ints: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at > <console>:24 > scala> ints.countApprox(1000) > res0: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [1.000, 1.000]) > // PartialResult is returned, Scheduled job completed > scala> ints.filter(_ => false).countApprox(1000) > res1: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job completed > scala> sc.emptyRDD[Int].countApprox(1000) > res5: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job is ACTIVE but never completes > scala> sc.union(Nil : Seq[org.apache.spark.rdd.RDD[Int]]).countApprox(1000) > res16: > org.apache.spark.partial.PartialResult[org.apache.spark.partial.BoundedDouble] > = (final: [0.000, 0.000]) > // PartialResult is returned, Scheduled job is ACTIVE but never completes > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org