[jira] [Commented] (YARN-6523) Optimize system credentials sent in node heartbeat responses

2021-04-25 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331735#comment-17331735
 ] 

Qi Zhu commented on YARN-6523:
--

[~maniraj...@gmail.com] [~Naganarasimha] [~jlowe]

Can we backport this to 3.2.2?

Thanks.

> Optimize system credentials sent in node heartbeat responses
> 
>
> Key: YARN-6523
> URL: https://issues.apache.org/jira/browse/YARN-6523
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: RM
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Naganarasimha G R
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-6523.001.patch, YARN-6523.002.patch, 
> YARN-6523.003.patch, YARN-6523.004.patch, YARN-6523.005.patch, 
> YARN-6523.006.patch, YARN-6523.007.patch, YARN-6523.008.patch, 
> YARN-6523.009.patch, YARN-6523.010.patch, YARN-6523.011.patch, 
> YARN-6523.012.patch, YARN-6523.013.patch, YARN-6523.014.patch, 
> YARN-6523.015.patch
>
>
> Currently as part of heartbeat response RM sets all application's tokens 
> though all applications might not be active on the node. On top of it 
> NodeHeartbeatResponsePBImpl converts tokens for each app into 
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too 
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with 
> 8GB RAM configured for RM



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7769) FS QueueManager should not create default queue at init

2021-04-25 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331726#comment-17331726
 ] 

Wilfred Spiegelenburg commented on YARN-7769:
-

Are we pushing the documentation in this Jira or are we opening up a new one? 
If we do a new Jira to fix the docs we're OK to commit otherwise we need to get 
the documentation update added in this Jira before we commit.

[~bteke] & [~snemeth] any preference from your side?

> FS QueueManager should not create default queue at init
> ---
>
> Key: YARN-7769
> URL: https://issues.apache.org/jira/browse/YARN-7769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-7769.001.patch, YARN-7769.002.patch, 
> YARN-7769.003.patch
>
>
> Currently the FairScheduler QueueManager automatically creates the default 
> queue. However the default queue does not need to exist. We have two possible 
> cases which we should handle:
> * Based on the placement rule "Default" the name for the default queue might 
> not be default and it should be created with a different name
> * There might not be a "Default" placement rule at all which removes the need 
> to create the queue.
> We should leave the creation of the default queue to the point in time that 
> we can assess if it is needed or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10125) In Federation, kill application from client does not kill Unmanaged AM's and containers launched by Unmanaged AM

2021-04-25 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331510#comment-17331510
 ] 

Brahma Reddy Battula commented on YARN-10125:
-

[~dmmkr] thanks for updating the patch. +1 latest patch.

> In Federation, kill application from client does not kill Unmanaged AM's and 
> containers launched by Unmanaged AM
> 
>
> Key: YARN-10125
> URL: https://issues.apache.org/jira/browse/YARN-10125
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, federation, router
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
> Attachments: YARN-10125.001.patch, YARN-10125.002.patch
>
>
> In Federation, killing an application from client using "bin/yarn application 
> -kill ", kills the containers only of the home subcluster, 
> the Unmanaged AM and the containers launched in other subcluster are not 
> being killed causing blocking of resources.
> The containers get killed after the task gets completed and The unmanaged AM 
> gets killed after 10 minutes of killing the application, killing any 
> remaining running containers in that subcluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10637) We should support fs to cs support for auto refresh queues when conf changed, after YARN-10623 finished.

2021-04-25 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316917#comment-17316917
 ] 

Qi Zhu edited comment on YARN-10637 at 4/25/21, 9:35 AM:
-

cc  [~gandras]

Could you take a look this when you are free.:D

Thanks.


was (Author: zhuqi):
  [~gandras]

Could you take a look this when you are free.:D

Thanks.

> We should support fs to cs support for auto refresh queues when conf changed, 
> after YARN-10623 finished.
> 
>
> Key: YARN-10637
> URL: https://issues.apache.org/jira/browse/YARN-10637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10637.001.patch, YARN-10637.002.patch, 
> YARN-10637.003.patch, YARN-10637.004.patch
>
>
> cc [~pbacsko] [~gandras] [~bteke]
> We should also fill this, when  YARN-10623 finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7713) Add parallel copying of directories into FSDownload

2021-04-25 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331460#comment-17331460
 ] 

Qi Zhu commented on YARN-7713:
--

[~ChrisKarampeazis]

Thanks for your work here. 

cc [~ebadger] [~epayne] [~Jim_Brennan] [~pbacsko] [~gandras] 

I think it's a very good improvement.

If you can take a look this when you are free?

Thanks.

> Add parallel copying of directories into FSDownload
> ---
>
> Key: YARN-7713
> URL: https://issues.apache.org/jira/browse/YARN-7713
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Miklos Szegedi
>Priority: Major
>  Labels: newbie, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> YARN currently copies directories sequentially when localizing. This could be 
> improved to do in parallel, since the source blocks are normally on different 
> nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org