[
https://issues.apache.org/jira/browse/YARN-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zeekling updated YARN-11737:
----------------------------
Description:
The following code is modified in Issue YARN-4771:
!image-2024-10-17-09-41-28-276.png!
Containers of running jobs will not be cleared. Spark applies for a large
number of containers, which increases the heartbeat burden of the NM. As a
result, the heartbeat between the NM and RM times out.
Adam Binford has already mentioned this problem under YARN-4771, and we have
this problem in our production environment.
!image-2024-10-17-09-44-36-918.png!
> When Spark Streaming applies for a large number of containers, the heartbeat
> times out after the NM is restarted.
> -----------------------------------------------------------------------------------------------------------------
>
> Key: YARN-11737
> URL: https://issues.apache.org/jira/browse/YARN-11737
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: zeekling
> Priority: Major
> Attachments: image-2024-10-17-09-41-28-276.png,
> image-2024-10-17-09-44-36-918.png
>
>
> The following code is modified in Issue YARN-4771:
> !image-2024-10-17-09-41-28-276.png!
>
> Containers of running jobs will not be cleared. Spark applies for a large
> number of containers, which increases the heartbeat burden of the NM. As a
> result, the heartbeat between the NM and RM times out.
>
> Adam Binford has already mentioned this problem under YARN-4771, and we have
> this problem in our production environment.
>
> !image-2024-10-17-09-44-36-918.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]