[jira] [Updated] (YARN-11737) When Spark Streaming applies for a large number of containers, the heartbeat times out after the NM is restarted.

zeekling (Jira) Wed, 16 Oct 2024 18:48:59 -0700


     [ 
https://issues.apache.org/jira/browse/YARN-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


zeekling updated YARN-11737:
----------------------------
    Description: 
The following code is modified in Issue YARN-4771:

!image-2024-10-17-09-41-28-276.png!

 

Containers of running jobs will not be cleared. Spark applies for a large 
number of containers, which increases the heartbeat burden of the NM. As a 
result, the heartbeat between the NM and RM times out.

 

Adam Binford has already mentioned this problem under YARN-4771, and we have 
this problem in our production environment.

 

!image-2024-10-17-09-44-36-918.png!

> When Spark Streaming applies for a large number of containers, the heartbeat 
> times out after the NM is restarted.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-11737
>                 URL: https://issues.apache.org/jira/browse/YARN-11737
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: zeekling
>            Priority: Major
>         Attachments: image-2024-10-17-09-41-28-276.png, 
> image-2024-10-17-09-44-36-918.png
>
>
> The following code is modified in Issue YARN-4771:
> !image-2024-10-17-09-41-28-276.png!
>  
> Containers of running jobs will not be cleared. Spark applies for a large 
> number of containers, which increases the heartbeat burden of the NM. As a 
> result, the heartbeat between the NM and RM times out.
>  
> Adam Binford has already mentioned this problem under YARN-4771, and we have 
> this problem in our production environment.
>  
> !image-2024-10-17-09-44-36-918.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YARN-11737) When Spark Streaming applies for a large number of containers, the heartbeat times out after the NM is restarted.

Reply via email to