Manu Zhang created SPARK-43510:
----------------------------------

             Summary: Spark application hangs when YarnAllocator processing 
completed containers before updating internal state
                 Key: SPARK-43510
                 URL: https://issues.apache.org/jira/browse/SPARK-43510
             Project: Spark
          Issue Type: Bug
          Components: YARN
    Affects Versions: 3.4.0
            Reporter: Manu Zhang


I see application hangs when containers are preempted immediately after 
allocation as follows.
{code:java}
23/05/14 09:11:33 INFO YarnAllocator: Launching container 
container_e3812_1684033797982_57865_01_000382 on host 
hdc42-mcc10-01-0910-4207-015-tess0028.stratus.rno.ebay.com for executor with ID 
277 for ResourceProfile Id 0 
23/05/14 09:11:33 WARN YarnAllocator: Cannot find executorId for container: 
container_e3812_1684033797982_57865_01_000382
23/05/14 09:11:33 INFO YarnAllocator: Completed container 
container_e3812_1684033797982_57865_01_000382 (state: COMPLETE, exit status: 
-102)
23/05/14 09:11:33 INFO YarnAllocator: Container 
container_e3812_1684033797982_57865_01_000382 was preempted. {code}
Note the warning log where YarnAllocator cannot find executorId for the 
container when processing completed containers. The only plausible cause is 
YarnAllocator processing completed container before updating internal state and 
adding the executorId. The latter happens in a separate thread after executor 
launch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to