[ 
https://issues.apache.org/jira/browse/YARN-11251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048461#comment-18048461
 ] 

ASF GitHub Bot commented on YARN-11251:
---------------------------------------

github-actions[bot] commented on PR #4877:
URL: https://github.com/apache/hadoop/pull/4877#issuecomment-3700916251

   We're closing this stale PR because it has been open for 100 days with no 
activity. This isn't a judgement on the merit of the PR in any way. It's just a 
way of keeping the PR queue manageable.
   If you feel like this was a mistake, or you would like to continue working 
on it, please feel free to re-open it and ask for a committer to remove the 
stale tag and review again.
   Thanks all for your contribution.




> Separate ThreadPool for AMLauncher Launch and Clean Events
> ----------------------------------------------------------
>
>                 Key: YARN-11251
>                 URL: https://issues.apache.org/jira/browse/YARN-11251
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: yarn
>    Affects Versions: 3.4.0
>            Reporter: Prabhu Joseph
>            Assignee: Samrat Deb
>            Priority: Major
>              Labels: pull-request-available
>
> Have seen too many AM Launch Failures due to Token Expired or Container 
> Liveliness Expiry when AM Launch Threads are busy retrying to connect to AM 
> Host (Spot Instances) which are down. Having Separate ThreadPools for both 
> Cleanup and Launch will reduce the AM Launch failures.
> *Token Expired*
> {code}
> 2022-07-19 14:56:33,486 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  (IPC Server handler 39 on 8041): Unauthorized request to start container.
> This token is expired. current time is 1658242593486 found 1658242289457
> Note: System times on machines may be out of sync. Check system time and time 
> zones.
> {code}
> *Container Liveliness Expiry*
> {code}
> 2022-07-19 16:06:48,663 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl 
> (ResourceManager Event Processor): container_1656573205571_2357731_01_000001 
> Container Transitioned from ACQUIRED to EXPIRED
> 2022-07-19 16:10:08,663 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor (Ping Checker): 
> Expired:<container=container_1656573205571_2357773_01_000001, increase=false> 
> Timed out after 600 secs
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to