[
https://issues.apache.org/jira/browse/YARN-11251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048461#comment-18048461
]
ASF GitHub Bot commented on YARN-11251:
---------------------------------------
github-actions[bot] commented on PR #4877:
URL: https://github.com/apache/hadoop/pull/4877#issuecomment-3700916251
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> Separate ThreadPool for AMLauncher Launch and Clean Events
> ----------------------------------------------------------
>
> Key: YARN-11251
> URL: https://issues.apache.org/jira/browse/YARN-11251
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: yarn
> Affects Versions: 3.4.0
> Reporter: Prabhu Joseph
> Assignee: Samrat Deb
> Priority: Major
> Labels: pull-request-available
>
> Have seen too many AM Launch Failures due to Token Expired or Container
> Liveliness Expiry when AM Launch Threads are busy retrying to connect to AM
> Host (Spot Instances) which are down. Having Separate ThreadPools for both
> Cleanup and Launch will reduce the AM Launch failures.
> *Token Expired*
> {code}
> 2022-07-19 14:56:33,486 ERROR
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
> (IPC Server handler 39 on 8041): Unauthorized request to start container.
> This token is expired. current time is 1658242593486 found 1658242289457
> Note: System times on machines may be out of sync. Check system time and time
> zones.
> {code}
> *Container Liveliness Expiry*
> {code}
> 2022-07-19 16:06:48,663 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl
> (ResourceManager Event Processor): container_1656573205571_2357731_01_000001
> Container Transitioned from ACQUIRED to EXPIRED
> 2022-07-19 16:10:08,663 INFO
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor (Ping Checker):
> Expired:<container=container_1656573205571_2357773_01_000001, increase=false>
> Timed out after 600 secs
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]