Liechuan Ou created SPARK-32735: ----------------------------------- Summary: RDD actions in DStream.transfrom don't show at batch page Key: SPARK-32735 URL: https://issues.apache.org/jira/browse/SPARK-32735 Project: Spark Issue Type: Bug Components: DStreams, Web UI Affects Versions: 3.0.0 Reporter: Liechuan Ou
h4. Issue {code:java} val lines = ssc.socketTextStream("localhost", 9999) val words = lines.flatMap(_.split(" ")) val mappedStream= words.transform(rdd => { val c = rdd.count(); rdd.map(x => s"$c x")} ) mappedStream.foreachRDD(rdd => rdd.foreach(x => println(x))){code} Every batch two spark jobs are created. Only the second one is associated with the streaming output operation and shows at batch page. h4. Investigation The first action rdd.count() is invoked by JobGenerator.generateJobs. Batch time and output op id are not available in spark context because they are set in JobScheduler later. h4. Proposal delegate dstream.getOrCompute to JobScheduler so that all rdd actions can run in spark context with correct local properties. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org