Liyin Tang created SPARK-16244:
----------------------------------

             Summary: Failed job/stage couldn't stop JobGenerator immediately.
                 Key: SPARK-16244
                 URL: https://issues.apache.org/jira/browse/SPARK-16244
             Project: Spark
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 1.5.2
            Reporter: Liyin Tang


This streaming job has a very simple DAG. Each batch have only 1 job, and each 
job has only 1 stage.

Based on the following logs, we observed a potential race condition. Stage 1 
failed due to some tasks failure, and it tigers failJobAndIndependentStages.

In the meanwhile, the next stage (job), 2, is submitted and was able to 
successfully run a few tasks before stopping JobGenerator via shutdown hook.

Since the next job was able to run through a few tasks successfully, it just 
messed up all the checkpoints / offset management.

I will attach the log in the jira as well.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to