Hello! We're running Spark 2.3.0 on Scala 2.11. We have a number of Spark Streaming jobs that are using MapWithState. We've observed that these jobs will complete some set of stages, and then not schedule the next set of stages. It looks like the DAG Scheduler correctly identifies required stages:
19/08/27 15:29:48 INFO YarnClusterScheduler: Removed TaskSet 79.0, whose tasks have all completed, from pool 19/08/27 15:29:48 INFO DAGScheduler: ShuffleMapStage 79 (map at SomeCode.scala:121) finished in 142.985 s 19/08/27 15:29:48 INFO DAGScheduler: looking for newly runnable stages 19/08/27 15:29:48 INFO DAGScheduler: running: Set() 19/08/27 15:29:48 INFO DAGScheduler: waiting: Set(ShuffleMapStage 81, ResultStage 82, ResultStage 83, ShuffleMapStage 54, ResultStage 61, ResultStage 55, ShuffleMapStage 48, ShuffleMapStage 84, Result Stage 49, ShuffleMapStage 85, ShuffleMapStage 56, ResultStage 86, ShuffleMapStage 57, ResultStage 58, ResultStage 80) 19/08/27 15:29:48 INFO DAGScheduler: failed: Set() However, we see no stages that begin execution. This happens semi-rarely (every couple of days), which makes repro difficult. I checked known bugs fixed in 2.3.x and did not see anything pop out. Has anyone else seen this behavior? Any thoughts on debugging? Regards, Bryan Jeffrey