[jira] [Resolved] (YARN-11616) Fast fail when multiple attribute kvs are specified

2023-11-23 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu resolved YARN-11616.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> Fast fail when multiple attribute kvs are specified
> ---
>
> Key: YARN-11616
> URL: https://issues.apache.org/jira/browse/YARN-11616
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodeattibute
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> In the {{NodeConstraintParser}}, it won't throw exception when multiple 
> attribute kvs are specified. It will return incorrect placement constraint, 
> which will mislead users. Like the 
> {{rm.yarn.io/foo=1,rm.yarn.io/bar=2}}, it will parse it to 
> {{node,EQ,rm.yarn.io/bar=[1:2]}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-11616) Fast fail when multiple attribute kvs are specified

2023-11-23 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu reassigned YARN-11616:
-

Assignee: Junfan Zhang

> Fast fail when multiple attribute kvs are specified
> ---
>
> Key: YARN-11616
> URL: https://issues.apache.org/jira/browse/YARN-11616
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodeattibute
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> In the {{NodeConstraintParser}}, it won't throw exception when multiple 
> attribute kvs are specified. It will return incorrect placement constraint, 
> which will mislead users. Like the 
> {{rm.yarn.io/foo=1,rm.yarn.io/bar=2}}, it will parse it to 
> {{node,EQ,rm.yarn.io/bar=[1:2]}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10590) Fix legacy auto queue creation absolute resource calculation loss

2022-01-06 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469794#comment-17469794
 ] 

Qi Zhu commented on YARN-10590:
---

Thanks [~gandras] for taking this, feel free to take it. 

> Fix legacy auto queue creation absolute resource calculation loss
> -
>
> Key: YARN-10590
> URL: https://issues.apache.org/jira/browse/YARN-10590
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10590.001.patch, YARN-10590.002.patch
>
>
> "Because as we discussed in YARN-10504 , the initialization of auto created 
> queues from template was changed (see comment and comment)."
> 1. As the comment we discussed, we found the effective core is different(the 
> gap), because the update effective  will override the absolute auto created 
> leaf queue.
> 2. But actually, the new logic in YARN-10504 override is right, the 
> difference is caused by test case , don't consider the calculation loss of 
> multi resource type, the cap/absolute are all calculated by one type, 
> (memory) in DefaultResourceCalculator, (dominant type) in 
> DominantResourceCalculator. As we known in the comment, the absolute auto 
> created leaf queue will merge the effective resource by cap/absolute 
> calculated result, this caused the gap.
> 2. In other case(not absolute case) in the auto created leaf queue, the merge 
> will not cause the gap, in update effective resource override will also use 
> the one type calculated result. 
> 3. So this jira just make the test right, the calculation result is already 
> right.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'

2021-12-10 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457519#comment-17457519
 ] 

Qi Zhu commented on YARN-10178:
---

Hi [~epayne] [~gandras] , thanks for looking into this problem, [~gandras]  you 
can feel free to assign this to yourself, i have no free time recently, i used 
to test  the latest patch, thanks a lot.

 

> Global Scheduler async thread crash caused by 'Comparison method violates its 
> general contract'
> ---
>
> Key: YARN-10178
> URL: https://issues.apache.org/jira/browse/YARN-10178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.2.1
>Reporter: tuyu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10178.001.patch, YARN-10178.002.patch, 
> YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch
>
>
> Global Scheduler Async Thread crash stack
> {code:java}
> ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
> Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: 
> Comparison method violates its general contract!  
>at 
> java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1462)
> at java.util.Collections.sort(Collections.java:177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616)
> {code}
> JAVA 8 Arrays.sort default use timsort algo, and timsort has  few require 
> {code:java}
> 1.x.compareTo(y) != y.compareTo(x)
> 2.x>y,y>z --> x > z
> 3.x=y, x.compareTo(z) == y.compareTo(z)
> {code}
> if not Arrays paramters not satify this require,TimSort will throw 
> 'java.lang.IllegalArgumentException'
> look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know 
> Capacity Scheduler use this these queue resource usage to compare
> {code:java}
> AbsoluteUsedCapacity
> UsedCapacity
> ConfiguredMinResource
> AbsoluteCapacity
> {code}
> In Capacity Scheduler Global Scheduler AsyncThread use 
> PriorityUtilizationQueueOrderingPolicy function to choose queue to assign 
> container,and construct a CSAssignment struct, and use 
> submitResourceCommitRequest function add CSAssignment to backlogs
> ResourceCommitterService  will tryCommit this CSAssignment,look tryCommit 
> function,there will update queue resource usage
> {code:java}
> public boolean tryCommit(Resource cluster, ResourceCommitRequest r,
> boolean updatePending) {
>   long commitStart = System.nanoTime();
>   ResourceCommitRequest request =
>   (ResourceCommitRequest) r;
>  
>   ...
>   boolean isSuccess = false;
>   if (attemptId != null) {
> FiCaSchedulerApp app = getApplicationAttempt(attemptId);
> // Required sanity check for attemptId - when async-scheduling enabled,
> // proposal might be outdated if AM failover just finished
> // and proposal queue was not be consumed in time
> i

[jira] [Commented] (YARN-10632) Make maximum depth allowed to be configurable

2021-10-25 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433775#comment-17433775
 ] 

Qi Zhu commented on YARN-10632:
---

Sure [~gandras] , feel free to assign this issue to yourself.

> Make maximum depth allowed to be configurable
> -
>
> Key: YARN-10632
> URL: https://issues.apache.org/jira/browse/YARN-10632
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10632.001.patch, YARN-10632.002.patch, 
> YARN-10632.003.patch, YARN-10632.004.patch
>
>
> Now the max depth allowed are fixed to 2. But i think this should be 
> configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10974) CS UI: queue filter and openQueues param do not work as expected

2021-10-12 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427513#comment-17427513
 ] 

Qi Zhu commented on YARN-10974:
---

Thanks [~chengbing.liu] for this patch, LGTM +1.
cc [~gandras] [~bteke]  [~pbacsko]

> CS UI: queue filter and openQueues param do not work as expected
> 
>
> Key: YARN-10974
> URL: https://issues.apache.org/jira/browse/YARN-10974
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.3.1
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>Priority: Major
> Attachments: YARN-10974.001.patch
>
>
> With YARN-9879, the Capacity Scheduler now use full queue path instead of 
> leaf queue's name, we should also reflect this change in scheduler UI page.
> This issue addresses two changes:
>  # *Fixed and refined queue filter logic*: instead of exact matching of leaf 
> queue's name, now use prefix matching for non-leaf queue, and exact matching 
> of full queue path for leaf queue.
> This change conforms to the logic of FS.
>  # *Fixed openQueues parameter change behavior in URL during queue expansion 
> and collapsing*, by not displaying the "Queue: " prefix with each leaf and 
> non-leaf queue. 
> Prior to this change, queue collapsing does not remove the corresponding 
> queue name from the openQueues parameter, due to the "Queue: " prefix 
> contains a space, which is encoded as '%20' and therefore not matched cleanly.
> This change also conforms to the logic of FS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract'

2021-09-14 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415305#comment-17415305
 ] 

Qi Zhu commented on YARN-10178:
---

Thanks [~wangda] for reply.

cc [~MatthewSharp] [~it_singer] 

I am glad you can apply patch to confirm the solution.

> Global Scheduler async thread crash caused by 'Comparison method violates its 
> general contract'
> ---
>
> Key: YARN-10178
> URL: https://issues.apache.org/jira/browse/YARN-10178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.2.1
>Reporter: tuyu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10178.001.patch, YARN-10178.002.patch, 
> YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch
>
>
> Global Scheduler Async Thread crash stack
> {code:java}
> ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
> Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: 
> Comparison method violates its general contract!  
>at 
> java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1462)
> at java.util.Collections.sort(Collections.java:177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616)
> {code}
> JAVA 8 Arrays.sort default use timsort algo, and timsort has  few require 
> {code:java}
> 1.x.compareTo(y) != y.compareTo(x)
> 2.x>y,y>z --> x > z
> 3.x=y, x.compareTo(z) == y.compareTo(z)
> {code}
> if not Arrays paramters not satify this require,TimSort will throw 
> 'java.lang.IllegalArgumentException'
> look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know 
> Capacity Scheduler use this these queue resource usage to compare
> {code:java}
> AbsoluteUsedCapacity
> UsedCapacity
> ConfiguredMinResource
> AbsoluteCapacity
> {code}
> In Capacity Scheduler Global Scheduler AsyncThread use 
> PriorityUtilizationQueueOrderingPolicy function to choose queue to assign 
> container,and construct a CSAssignment struct, and use 
> submitResourceCommitRequest function add CSAssignment to backlogs
> ResourceCommitterService  will tryCommit this CSAssignment,look tryCommit 
> function,there will update queue resource usage
> {code:java}
> public boolean tryCommit(Resource cluster, ResourceCommitRequest r,
> boolean updatePending) {
>   long commitStart = System.nanoTime();
>   ResourceCommitRequest request =
>   (ResourceCommitRequest) r;
>  
>   ...
>   boolean isSuccess = false;
>   if (attemptId != null) {
> FiCaSchedulerApp app = getApplicationAttempt(attemptId);
> // Required sanity check for attemptId - when async-scheduling enabled,
> // proposal might be outdated if AM failover just finished
> // and proposal queue was not be consumed in time
> if (app != null && attemptId.equals(app.getApplicationAttemptId())) {
>   if (app.ac

[jira] [Commented] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-20 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402130#comment-17402130
 ] 

Qi Zhu commented on YARN-10844:
---

Hi [~chaosju], There still some checkstyle problems should be handled.

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.4.0
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch, YARN-10844.003.patch, 
> YARN-10844.004.patch, YARN-10844.005.patch, YARN-10844.006.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401447#comment-17401447
 ] 

Qi Zhu commented on YARN-10844:
---

Hi [~chaosju]

I think it should be fixed in new class.

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.4.0
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch, YARN-10844.003.patch, 
> YARN-10844.004.patch, YARN-10844.005.patch, YARN-10844.006.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-17 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400732#comment-17400732
 ] 

Qi Zhu commented on YARN-10844:
---

Hi [~chaosju] pls fix the checkstyle.

Thanks

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.4.0
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch, YARN-10844.003.patch, 
> YARN-10844.004.patch, YARN-10846.003.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-12 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10844:
--
Affects Version/s: 3.4.0

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.4.0
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch, YARN-10844.003.patch, 
> YARN-10846.003.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-10 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10844:
--
Attachment: YARN-10844.003.patch

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch, YARN-10844.003.patch, 
> YARN-10846.003.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-10 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10844:
--
Attachment: (was: YARN-10844.003.patch)

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch, YARN-10846.003.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-10 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10844:
--
Attachment: YARN-10844.003.patch

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch, YARN-10844.003.patch, 
> YARN-10846.003.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-10 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10844:
--
Attachment: (was: YARN-10844.003.patch)

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch, YARN-10844.003.patch, 
> YARN-10846.003.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-10 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396650#comment-17396650
 ] 

Qi Zhu commented on YARN-10844:
---

Hi [~chaosju] ,

The 003 should be 10844, i resubmitted to trigger Jenkins.

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch, YARN-10844.003.patch, 
> YARN-10846.003.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-10 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396552#comment-17396552
 ] 

Qi Zhu commented on YARN-10844:
---

Of course, [~snemeth] you can skip it.

 

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch, YARN-10846.003.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-10 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396541#comment-17396541
 ] 

Qi Zhu commented on YARN-10844:
---

Hi [~snemeth],

I want to review and commit this after the Jenkins, i am appreciate that you 
can help review it.

Thanks.

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch, YARN-10846.003.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-09 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10844:
--
Fix Version/s: (was: 3.4.0)

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch, YARN-10846.003.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395866#comment-17395866
 ] 

Qi Zhu edited comment on YARN-10844 at 8/10/21, 2:06 AM:
-

Thanks [~chaosju] generally LGTM, let's trigger the Jenkins now.

You should resubmitted the patch.


was (Author: zhuqi):
Thanks [~chaosju] generally LGTM, let's trigger the Jenkins now.

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10693) Add document for YARN-10623 auto refresh queue conf in CS

2021-08-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395930#comment-17395930
 ] 

Qi Zhu commented on YARN-10693:
---

[~pbacsko] [~bteke] 

I have updated in latest PR, thanks.

> Add document for YARN-10623 auto refresh queue conf in CS
> -
>
> Key: YARN-10693
> URL: https://issues.apache.org/jira/browse/YARN-10693
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-10693.001.patch, YARN-10693.002.patch, 
> YARN-10693.003.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10844) Add Leveldb Statestore metrics to NM

2021-08-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395866#comment-17395866
 ] 

Qi Zhu commented on YARN-10844:
---

Thanks [~chaosju] generally LGTM, let's trigger the Jenkins now.

> Add Leveldb Statestore metrics to NM
> 
>
> Key: YARN-10844
> URL: https://issues.apache.org/jira/browse/YARN-10844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: chaosju
>Assignee: chaosju
>Priority: Minor
> Attachments: YARN-10844.001.patch
>
>
> It is necessary to capture the performance metrics of Leveldb StateStore in 
> NodeManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10693) Add document for YARN-10623 auto refresh queue conf in CS

2021-08-07 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395234#comment-17395234
 ] 

Qi Zhu commented on YARN-10693:
---

Thanks a lot [~bteke] for review, your suggestion is very good to me, i will 
update on Monday. 

> Add document for YARN-10623 auto refresh queue conf in CS
> -
>
> Key: YARN-10693
> URL: https://issues.apache.org/jira/browse/YARN-10693
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10693.001.patch, YARN-10693.002.patch, 
> YARN-10693.003.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10880) nodelabels update log is to noisy

2021-08-06 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394609#comment-17394609
 ] 

Qi Zhu commented on YARN-10880:
---

Thanks [~LuoGe] for patch.

LGTM +1.

 

> nodelabels update log is to noisy
> -
>
> Key: YARN-10880
> URL: https://issues.apache.org/jira/browse/YARN-10880
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.1
>Reporter: LuoGe
>Priority: Minor
> Attachments: wx20210806-093...@2x.png, YARN-10880.001.patch
>
>
> when use YARN *Distributed* NodeLabel setup, every time the node update, RM 
> will INFO log “INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: No Modified Node 
> label Mapping to replace”,the log is too noisy, see the attachment pic, so 
> can we just change to DEBUG or remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10854) Support marking inactive node as untracked without configured include path

2021-08-02 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391496#comment-17391496
 ] 

Qi Zhu commented on YARN-10854:
---

Thanks [~Tao Yang] for contribution, and [~templedf] [~prabhujoseph] for review!

Committed to trunk. 

> Support marking inactive node as untracked without configured include path
> --
>
> Key: YARN-10854
> URL: https://issues.apache.org/jira/browse/YARN-10854
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-10854.001.patch, YARN-10854.002.patch, 
> YARN-10854.003.patch, YARN-10854.004.patch, YARN-10854.005.patch
>
>
> Currently inactive nodes which have been decommissioned/shutdown/lost for a 
> while(specified expiration time defined via 
> {{yarn.resourcemanager.node-removal-untracked.timeout-ms}}, 60 seconds by 
> default) and not exist in both include and exclude files can be marked as 
> untracked nodes and can be removed from RM state (YARN-4311). It's very 
> useful when auto-scaling is enabled in elastic cloud environment, which can 
> avoid unlimited increase of inactive nodes (mostly are decommissioned nodes).
> But this only works when the include path is configured, mismatched for most 
> of our cloud environments without configured white list of nodes, which can 
> lead to easily control for the auto-scaling of nodes without further security 
> requirements.
> So I propose to support marking inactive node as untracked without configured 
> include path, to be compatible with the former versions, we can add a switch 
> config for this.
> Any thoughts/suggestions/feedbacks are welcome!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10854) Support marking inactive node as untracked without configured include path

2021-07-30 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390536#comment-17390536
 ] 

Qi Zhu commented on YARN-10854:
---

[~Tao Yang] It seems some jenkins related problems need solved.

> Support marking inactive node as untracked without configured include path
> --
>
> Key: YARN-10854
> URL: https://issues.apache.org/jira/browse/YARN-10854
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-10854.001.patch, YARN-10854.002.patch, 
> YARN-10854.003.patch, YARN-10854.004.patch
>
>
> Currently inactive nodes which have been decommissioned/shutdown/lost for a 
> while(specified expiration time defined via 
> {{yarn.resourcemanager.node-removal-untracked.timeout-ms}}, 60 seconds by 
> default) and not exist in both include and exclude files can be marked as 
> untracked nodes and can be removed from RM state (YARN-4311). It's very 
> useful when auto-scaling is enabled in elastic cloud environment, which can 
> avoid unlimited increase of inactive nodes (mostly are decommissioned nodes).
> But this only works when the include path is configured, mismatched for most 
> of our cloud environments without configured white list of nodes, which can 
> lead to easily control for the auto-scaling of nodes without further security 
> requirements.
> So I propose to support marking inactive node as untracked without configured 
> include path, to be compatible with the former versions, we can add a switch 
> config for this.
> Any thoughts/suggestions/feedbacks are welcome!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10854) Support marking inactive node as untracked without configured include path

2021-07-30 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390426#comment-17390426
 ] 

Qi Zhu commented on YARN-10854:
---

Thanks [~Tao Yang] for update, LGTM, i will help commit this patch after the 
jenkins passed.

> Support marking inactive node as untracked without configured include path
> --
>
> Key: YARN-10854
> URL: https://issues.apache.org/jira/browse/YARN-10854
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-10854.001.patch, YARN-10854.002.patch, 
> YARN-10854.003.patch, YARN-10854.004.patch
>
>
> Currently inactive nodes which have been decommissioned/shutdown/lost for a 
> while(specified expiration time defined via 
> {{yarn.resourcemanager.node-removal-untracked.timeout-ms}}, 60 seconds by 
> default) and not exist in both include and exclude files can be marked as 
> untracked nodes and can be removed from RM state (YARN-4311). It's very 
> useful when auto-scaling is enabled in elastic cloud environment, which can 
> avoid unlimited increase of inactive nodes (mostly are decommissioned nodes).
> But this only works when the include path is configured, mismatched for most 
> of our cloud environments without configured white list of nodes, which can 
> lead to easily control for the auto-scaling of nodes without further security 
> requirements.
> So I propose to support marking inactive node as untracked without configured 
> include path, to be compatible with the former versions, we can add a switch 
> config for this.
> Any thoughts/suggestions/feedbacks are welcome!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10854) Support marking inactive node as untracked without configured include path

2021-07-28 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389198#comment-17389198
 ] 

Qi Zhu commented on YARN-10854:
---

Thanks [~Tao Yang] for this, the patch LGTM and make sense to me.

The code is just related to the NodesListManager, there is no risk to the 
scheduling process. And it's a good point to cloud environment for frequent 
auto-scaling operations.

And the test Decommission case is clear, minor things, we'd better fill all the 
test case, such as lost and shutdown and graceful decommission etc.

 

 

> Support marking inactive node as untracked without configured include path
> --
>
> Key: YARN-10854
> URL: https://issues.apache.org/jira/browse/YARN-10854
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-10854.001.patch, YARN-10854.002.patch, 
> YARN-10854.003.patch
>
>
> Currently inactive nodes which have been decommissioned/shutdown/lost for a 
> while(specified expiration time defined via 
> {{yarn.resourcemanager.node-removal-untracked.timeout-ms}}, 60 seconds by 
> default) and not exist in both include and exclude files can be marked as 
> untracked nodes and can be removed from RM state (YARN-4311). It's very 
> useful when auto-scaling is enabled in elastic cloud environment, which can 
> avoid unlimited increase of inactive nodes (mostly are decommissioned nodes).
> But this only works when the include path is configured, mismatched for most 
> of our cloud environments without configured white list of nodes, which can 
> lead to easily control for the auto-scaling of nodes without further security 
> requirements.
> So I propose to support marking inactive node as untracked without configured 
> include path, to be compatible with the former versions, we can add a switch 
> config for this.
> Any thoughts/suggestions/feedbacks are welcome!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10727) ParentQueue does not validate the queue on removal

2021-07-27 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388469#comment-17388469
 ] 

Qi Zhu commented on YARN-10727:
---

Thanks [~gandras]  LGTM +1.

> ParentQueue does not validate the queue on removal
> --
>
> Key: YARN-10727
> URL: https://issues.apache.org/jira/browse/YARN-10727
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10727.001.patch
>
>
> With the addition of YARN-10532 ParentQueue has a public method, removeQueue, 
> which allows the deletion of a queue at runtime. However, there is no 
> validation regarding the queue which is to be removed, therefore it is 
> possible to remove a queue from the CSQueueManager that is not a child of the 
> ParentQueue. Since it is a public method, there must be validations such as:
>  * check, if the parent of the queue to be removed is the current ParentQueue
>  * check, if the parent actually contains the queue in its childQueues 
> collection



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10833) RM logs endpoint vulnerable to clickjacking

2021-07-23 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386636#comment-17386636
 ] 

Qi Zhu commented on YARN-10833:
---

Thanks [~bteke] for contribution and [~gandras] [~snemeth] for review.

Committed the PR to trunk.

 

 

> RM logs endpoint vulnerable to clickjacking
> ---
>
> Key: YARN-10833
> URL: https://issues.apache.org/jira/browse/YARN-10833
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-10833.001.patch, YARN-10833.002.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The /logs endpoint is missing the X-FRAME-OPTIONS in the response header, 
> even though YARN is configured to do include it. This makes it vulnerable to 
> clickjacking.
> {code:java}
> Request URL: http://{{rm_host}}:8088/logs/
> Request Method: GET
> Status Code: 200 OK
> Remote Address: [::1]:8088
> Referrer Policy: strict-origin-when-cross-origin
> HTTP/1.1 200 OK
> Date: Fri, 25 Jun 2021 17:38:38 GMT
> Cache-Control: no-cache
> Expires: Fri, 25 Jun 2021 17:38:38 GMT
> Date: Fri, 25 Jun 2021 17:38:38 GMT
> Pragma: no-cache
> Content-Type: text/html;charset=utf-8
> X-Content-Type-Options: nosniff
> X-XSS-Protection: 1; mode=block
> Content-Length: 469 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10657) We should make max application per queue to support node label.

2021-07-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385471#comment-17385471
 ] 

Qi Zhu commented on YARN-10657:
---

Thanks a lot [~gandras] for patch !

Committed to trunk.

> We should make max application per queue to support node label.
> ---
>
> Key: YARN-10657
> URL: https://issues.apache.org/jira/browse/YARN-10657
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Andras Gyori
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10657.001.patch, YARN-10657.002.patch, 
> YARN-10657.003.patch, YARN-10657.004.patch, YARN-10657.005.patch
>
>
> https://issues.apache.org/jira/browse/YARN-10641?focusedCommentId=17291708&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17291708
> As we discussed in above comment:
> We should deep into the label related max applications per queue.
> I think when node label enabled in queue, max applications should consider 
> the max capacity of all labels.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10833) RM logs endpoint vulnerable to clickjacking

2021-07-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385306#comment-17385306
 ] 

Qi Zhu commented on YARN-10833:
---

Thanks [~bteke] for patch, LGTM +1.

cc [~aajisaka] [~snemeth]

If no other comment, i will help commit this.

> RM logs endpoint vulnerable to clickjacking
> ---
>
> Key: YARN-10833
> URL: https://issues.apache.org/jira/browse/YARN-10833
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-10833.001.patch, YARN-10833.002.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The /logs endpoint is missing the X-FRAME-OPTIONS in the response header, 
> even though YARN is configured to do include it. This makes it vulnerable to 
> clickjacking.
> {code:java}
> Request URL: http://{{rm_host}}:8088/logs/
> Request Method: GET
> Status Code: 200 OK
> Remote Address: [::1]:8088
> Referrer Policy: strict-origin-when-cross-origin
> HTTP/1.1 200 OK
> Date: Fri, 25 Jun 2021 17:38:38 GMT
> Cache-Control: no-cache
> Expires: Fri, 25 Jun 2021 17:38:38 GMT
> Date: Fri, 25 Jun 2021 17:38:38 GMT
> Pragma: no-cache
> Content-Type: text/html;charset=utf-8
> X-Content-Type-Options: nosniff
> X-XSS-Protection: 1; mode=block
> Content-Length: 469 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10860) Make max container per heartbeat configs refreshable

2021-07-21 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385195#comment-17385195
 ] 

Qi Zhu commented on YARN-10860:
---

Thanks [~ebadger] for reminder.:)

I have cherry-picked back to the other active 3.x branches, which are 
branch-3.3 and branch-3.2. 

> Make max container per heartbeat configs refreshable
> 
>
> Key: YARN-10860
> URL: https://issues.apache.org/jira/browse/YARN-10860
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.2
>
> Attachments: YARN-10860.001.patch, YARN-10860.branch-2.10.001.patch
>
>
> {{yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments}} 
> and 
> {{yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled}} 
> are currently *not* refreshable configs, but I believe they should be. This 
> JIRA is to turn these into refreshable configs, just like 
> {{yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments}} 
> is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10860) Make max container per heartbeat configs refreshable

2021-07-21 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10860:
--
Fix Version/s: 3.3.2
   3.2.3

> Make max container per heartbeat configs refreshable
> 
>
> Key: YARN-10860
> URL: https://issues.apache.org/jira/browse/YARN-10860
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.2
>
> Attachments: YARN-10860.001.patch, YARN-10860.branch-2.10.001.patch
>
>
> {{yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments}} 
> and 
> {{yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled}} 
> are currently *not* refreshable configs, but I believe they should be. This 
> JIRA is to turn these into refreshable configs, just like 
> {{yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments}} 
> is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10860) Make max container per heartbeat configs refreshable

2021-07-21 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384715#comment-17384715
 ] 

Qi Zhu commented on YARN-10860:
---

Thanks [~ebadger] for patch and [~gandras] for review.

Committed to trunk and 2.10.

> Make max container per heartbeat configs refreshable
> 
>
> Key: YARN-10860
> URL: https://issues.apache.org/jira/browse/YARN-10860
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Fix For: 3.4.0, 2.10.2
>
> Attachments: YARN-10860.001.patch, YARN-10860.branch-2.10.001.patch
>
>
> {{yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments}} 
> and 
> {{yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled}} 
> are currently *not* refreshable configs, but I believe they should be. This 
> JIRA is to turn these into refreshable configs, just like 
> {{yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments}} 
> is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10860) Make max container per heartbeat configs refreshable

2021-07-20 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384603#comment-17384603
 ] 

Qi Zhu commented on YARN-10860:
---

LGTM , if  [~gandras] other comments.

> Make max container per heartbeat configs refreshable
> 
>
> Key: YARN-10860
> URL: https://issues.apache.org/jira/browse/YARN-10860
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-10860.001.patch, YARN-10860.branch-2.10.001.patch
>
>
> {{yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments}} 
> and 
> {{yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled}} 
> are currently *not* refreshable configs, but I believe they should be. This 
> JIRA is to turn these into refreshable configs, just like 
> {{yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments}} 
> is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10657) We should make max application per queue to support node label.

2021-07-20 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384594#comment-17384594
 ] 

Qi Zhu commented on YARN-10657:
---

Thanks [~gandras] for update, LGTM now, just fix the checkstyle.

> We should make max application per queue to support node label.
> ---
>
> Key: YARN-10657
> URL: https://issues.apache.org/jira/browse/YARN-10657
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10657.001.patch, YARN-10657.002.patch, 
> YARN-10657.003.patch, YARN-10657.004.patch
>
>
> https://issues.apache.org/jira/browse/YARN-10641?focusedCommentId=17291708&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17291708
> As we discussed in above comment:
> We should deep into the label related max applications per queue.
> I think when node label enabled in queue, max applications should consider 
> the max capacity of all labels.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10630) [UI2] Ambiguous queue name resolution

2021-07-20 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384573#comment-17384573
 ] 

Qi Zhu commented on YARN-10630:
---

Committed to trunk, thanks [~gandras] !

 

> [UI2] Ambiguous queue name resolution
> -
>
> Key: YARN-10630
> URL: https://issues.apache.org/jira/browse/YARN-10630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-07-19 at 15.30.38.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Yarn UIv2 uses queueName instead of queuePath (which was added in the 
> scheduler response in YARN-10610), which makes the queue resolution ambiguous 
> in case of identical queue short names (eg. root.a.b <-> root.b). This causes 
> invalid behaviour in multiple places.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10630) [UI2] Ambiguous queue name resolution

2021-07-20 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu resolved YARN-10630.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> [UI2] Ambiguous queue name resolution
> -
>
> Key: YARN-10630
> URL: https://issues.apache.org/jira/browse/YARN-10630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: Screenshot 2021-07-19 at 15.30.38.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Yarn UIv2 uses queueName instead of queuePath (which was added in the 
> scheduler response in YARN-10610), which makes the queue resolution ambiguous 
> in case of identical queue short names (eg. root.a.b <-> root.b). This causes 
> invalid behaviour in multiple places.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10867) YARN should expose a ENV used to map a custom device into docker container

2021-07-20 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384155#comment-17384155
 ] 

Qi Zhu commented on YARN-10867:
---

cc [~ebadger] [~epayne] [~gandras] [~Jim_Brennan] 

Could you take a look at this issue when you are free?

Thanks.

> YARN should expose a ENV used to map a custom device into docker container
> --
>
> Key: YARN-10867
> URL: https://issues.apache.org/jira/browse/YARN-10867
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chi Heng
>Priority: Major
>
> In some scenarios, like mounting a FUSE in docker,user needs to map a custom 
> device (eg. /dev/fuse) into docker container.I notice that an adddevice 
> method is defined in [ 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerRunCommand.java
>  
> |https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerRunCommand.java]
>  ,I suppose that an ENV or config property should to be exposed to user to 
> call this method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10860) Make max container per heartbeat configs refreshable

2021-07-19 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383789#comment-17383789
 ] 

Qi Zhu edited comment on YARN-10860 at 7/20/21, 5:55 AM:
-

Thanks [~ebadger] for filling this.

I totally agree with you that these should be refreshable. 

And the patch LGTM, we do not need a unit test, it's clear.


was (Author: zhuqi):
Thanks [~ebadger] for filling this.

I totally agree with you that these should be refreshable.

> Make max container per heartbeat configs refreshable
> 
>
> Key: YARN-10860
> URL: https://issues.apache.org/jira/browse/YARN-10860
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-10860.001.patch, YARN-10860.branch-2.10.001.patch
>
>
> {{yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments}} 
> and 
> {{yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled}} 
> are currently *not* refreshable configs, but I believe they should be. This 
> JIRA is to turn these into refreshable configs, just like 
> {{yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments}} 
> is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10860) Make max container per heartbeat configs refreshable

2021-07-19 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383789#comment-17383789
 ] 

Qi Zhu commented on YARN-10860:
---

Thanks [~ebadger] for filling this.

I totally agree with you that these should be refreshable.

> Make max container per heartbeat configs refreshable
> 
>
> Key: YARN-10860
> URL: https://issues.apache.org/jira/browse/YARN-10860
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-10860.001.patch, YARN-10860.branch-2.10.001.patch
>
>
> {{yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments}} 
> and 
> {{yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled}} 
> are currently *not* refreshable configs, but I believe they should be. This 
> JIRA is to turn these into refreshable configs, just like 
> {{yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments}} 
> is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10858) [UI2] YARN-10826 breaks Queue view

2021-07-19 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383778#comment-17383778
 ] 

Qi Zhu commented on YARN-10858:
---

Thanks [~aajisaka] for backport.

> [UI2] YARN-10826 breaks Queue view
> --
>
> Key: YARN-10858
> URL: https://issues.apache.org/jira/browse/YARN-10858
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-ui-v2
>Reporter: Andras Gyori
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: Screenshot 2021-07-19 at 11.40.57.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> With YARN-10826, UIv2 was upgraded to EmberJS 2.11.3. However, the Queues tab 
> is broken and loads an empty page. After reverting the commit, the page is 
> working as intended.
> We need to investigate what causes this issue, and how we could mitigate it 
> without reverting the commit back.
> cc.
> [~iwasakims] [~aajisaka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10858) [UI2] YARN-10826 breaks Queue view

2021-07-19 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu resolved YARN-10858.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> [UI2] YARN-10826 breaks Queue view
> --
>
> Key: YARN-10858
> URL: https://issues.apache.org/jira/browse/YARN-10858
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-ui-v2
>Reporter: Andras Gyori
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: Screenshot 2021-07-19 at 11.40.57.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> With YARN-10826, UIv2 was upgraded to EmberJS 2.11.3. However, the Queues tab 
> is broken and loads an empty page. After reverting the commit, the page is 
> working as intended.
> We need to investigate what causes this issue, and how we could mitigate it 
> without reverting the commit back.
> cc.
> [~iwasakims] [~aajisaka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10858) [UI2] YARN-10826 breaks Queue view

2021-07-19 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383726#comment-17383726
 ] 

Qi Zhu commented on YARN-10858:
---

Thanks [~iwasakims] for fix, and [~gandras] [~aajisaka] for review.

Committed to trunk.

> [UI2] YARN-10826 breaks Queue view
> --
>
> Key: YARN-10858
> URL: https://issues.apache.org/jira/browse/YARN-10858
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-ui-v2
>Reporter: Andras Gyori
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-07-19 at 11.40.57.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> With YARN-10826, UIv2 was upgraded to EmberJS 2.11.3. However, the Queues tab 
> is broken and loads an empty page. After reverting the commit, the page is 
> working as intended.
> We need to investigate what causes this issue, and how we could mitigate it 
> without reverting the commit back.
> cc.
> [~iwasakims] [~aajisaka]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10630) [UI2] Ambiguous queue name resolution

2021-07-19 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383720#comment-17383720
 ] 

Qi Zhu edited comment on YARN-10630 at 7/20/21, 3:37 AM:
-

Thanks [~gandras] for work.

Patch LGTM, i will commit it if no other comment.


was (Author: zhuqi):
Thanks [~zhuqi] for work.

Patch LGTM, i will commit it if no other comment.

> [UI2] Ambiguous queue name resolution
> -
>
> Key: YARN-10630
> URL: https://issues.apache.org/jira/browse/YARN-10630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-07-19 at 15.30.38.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Yarn UIv2 uses queueName instead of queuePath (which was added in the 
> scheduler response in YARN-10610), which makes the queue resolution ambiguous 
> in case of identical queue short names (eg. root.a.b <-> root.b). This causes 
> invalid behaviour in multiple places.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10630) [UI2] Ambiguous queue name resolution

2021-07-19 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383720#comment-17383720
 ] 

Qi Zhu commented on YARN-10630:
---

Thanks [~zhuqi] for work.

Patch LGTM, i will commit it if no other comment.

> [UI2] Ambiguous queue name resolution
> -
>
> Key: YARN-10630
> URL: https://issues.apache.org/jira/browse/YARN-10630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-07-19 at 15.30.38.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Yarn UIv2 uses queueName instead of queuePath (which was added in the 
> scheduler response in YARN-10610), which makes the queue resolution ambiguous 
> in case of identical queue short names (eg. root.a.b <-> root.b). This causes 
> invalid behaviour in multiple places.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-1187) Add discrete event-based simulation to yarn scheduler simulator

2021-07-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382957#comment-17382957
 ] 

Qi Zhu edited comment on YARN-1187 at 7/19/21, 2:25 AM:


Thanks [~108anup] for patch.

If we can based on the trunk to trigger the Jenkins.

But the patch is so big, it's hard to review, we'd better to split some 
subtasks to review.


was (Author: zhuqi):
Thanks [~108anup] for patch.

If we can based on the trunk to trigger the Jenkins.

> Add discrete event-based simulation to yarn scheduler simulator
> ---
>
> Key: YARN-1187
> URL: https://issues.apache.org/jira/browse/YARN-1187
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Andrew Chung
>Priority: Major
> Attachments: YARN-1187 design doc.pdf, 
> YARN-1187-branch-2.1.3.001.patch, YARN-1187-trunk.001.patch
>
>
> Follow the discussion in YARN-1021.
> Discrete event simulation decouples the running from any real-world clock. 
> This allows users to step through the execution, set debug points, and 
> definitely get a deterministic rexec. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1187) Add discrete event-based simulation to yarn scheduler simulator

2021-07-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382957#comment-17382957
 ] 

Qi Zhu commented on YARN-1187:
--

Thanks [~108anup] for patch.

If we can based on the trunk to trigger the Jenkins.

> Add discrete event-based simulation to yarn scheduler simulator
> ---
>
> Key: YARN-1187
> URL: https://issues.apache.org/jira/browse/YARN-1187
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Andrew Chung
>Priority: Major
> Attachments: YARN-1187 design doc.pdf, 
> YARN-1187-branch-2.1.3.001.patch, YARN-1187-trunk.001.patch
>
>
> Follow the discussion in YARN-1021.
> Discrete event simulation decouples the running from any real-world clock. 
> This allows users to step through the execution, set debug points, and 
> definitely get a deterministic rexec. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10855) yarn logs cli fails to retrieve logs if any TFile is corrupt or empty

2021-07-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382955#comment-17382955
 ] 

Qi Zhu commented on YARN-10855:
---

Thanks [~Jim_Brennan] for patch!

Committed to trunk. 

> yarn logs cli fails to retrieve logs if any TFile is corrupt or empty
> -
>
> Key: YARN-10855
> URL: https://issues.apache.org/jira/browse/YARN-10855
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.2, 2.10.1, 3.4.0, 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10855.001.patch, YARN-10855.002.patch, 
> YARN-10855.003.patch
>
>
> When attempting to retrieve yarn logs via the CLI command, it failed with the 
> following stack trace (on branch-2.10):
> {noformat}
> yarn logs -applicationId application_1591017890475_1049740 > logs
> 20/06/05 19:15:50 INFO client.RMProxy: Connecting to ResourceManager 
> 20/06/05 19:15:51 INFO client.AHSProxy: Connecting to Application History 
> server 
> Exception in thread "main" java.io.EOFException: Cannot seek to negative 
> offset
>   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1701)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
>   at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:624)
>   at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:503)
>   at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:227)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:333)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:367) 
> {noformat}
> The problem was that there was a zero-length TFile for one of the containers 
> in the application aggregated log directory in hdfs.  When we removed the 
> zero length file, {{yarn logs}} was able to retrieve the logs.
> A corrupt or zero length TFile for one container should not prevent loading 
> logs for the rest of the application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10855) yarn logs cli fails to retrieve logs if any TFile is corrupt or empty

2021-07-16 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382403#comment-17382403
 ] 

Qi Zhu edited comment on YARN-10855 at 7/17/21, 2:42 AM:
-

Thanks [~Jim_Brennan] for update.

cc [~epayne]

If no other comments, i will commit it.


was (Author: zhuqi):
Thanks [~Jim_Brennan] for update.

If no other comments, i will commit it.

> yarn logs cli fails to retrieve logs if any TFile is corrupt or empty
> -
>
> Key: YARN-10855
> URL: https://issues.apache.org/jira/browse/YARN-10855
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.2, 2.10.1, 3.4.0, 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10855.001.patch, YARN-10855.002.patch, 
> YARN-10855.003.patch
>
>
> When attempting to retrieve yarn logs via the CLI command, it failed with the 
> following stack trace (on branch-2.10):
> {noformat}
> yarn logs -applicationId application_1591017890475_1049740 > logs
> 20/06/05 19:15:50 INFO client.RMProxy: Connecting to ResourceManager 
> 20/06/05 19:15:51 INFO client.AHSProxy: Connecting to Application History 
> server 
> Exception in thread "main" java.io.EOFException: Cannot seek to negative 
> offset
>   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1701)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
>   at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:624)
>   at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:503)
>   at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:227)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:333)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:367) 
> {noformat}
> The problem was that there was a zero-length TFile for one of the containers 
> in the application aggregated log directory in hdfs.  When we removed the 
> zero length file, {{yarn logs}} was able to retrieve the logs.
> A corrupt or zero length TFile for one container should not prevent loading 
> logs for the rest of the application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10855) yarn logs cli fails to retrieve logs if any TFile is corrupt or empty

2021-07-16 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382403#comment-17382403
 ] 

Qi Zhu commented on YARN-10855:
---

Thanks [~Jim_Brennan] for update.

If no other comments, i will commit it.

> yarn logs cli fails to retrieve logs if any TFile is corrupt or empty
> -
>
> Key: YARN-10855
> URL: https://issues.apache.org/jira/browse/YARN-10855
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.2, 2.10.1, 3.4.0, 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10855.001.patch, YARN-10855.002.patch, 
> YARN-10855.003.patch
>
>
> When attempting to retrieve yarn logs via the CLI command, it failed with the 
> following stack trace (on branch-2.10):
> {noformat}
> yarn logs -applicationId application_1591017890475_1049740 > logs
> 20/06/05 19:15:50 INFO client.RMProxy: Connecting to ResourceManager 
> 20/06/05 19:15:51 INFO client.AHSProxy: Connecting to Application History 
> server 
> Exception in thread "main" java.io.EOFException: Cannot seek to negative 
> offset
>   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1701)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
>   at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:624)
>   at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:503)
>   at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:227)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:333)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:367) 
> {noformat}
> The problem was that there was a zero-length TFile for one of the containers 
> in the application aggregated log directory in hdfs.  When we removed the 
> zero length file, {{yarn logs}} was able to retrieve the logs.
> A corrupt or zero length TFile for one container should not prevent loading 
> logs for the rest of the application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10855) yarn logs cli fails to retrieve logs if any TFile is corrupt or empty

2021-07-15 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381791#comment-17381791
 ] 

Qi Zhu commented on YARN-10855:
---

Thanks [~Jim_Brennan] for patch.

If we can change the LogAggregationTFileController resource close in finally to 
try with resource.

It's a minor advice, the patch LGTM generally.

 

> yarn logs cli fails to retrieve logs if any TFile is corrupt or empty
> -
>
> Key: YARN-10855
> URL: https://issues.apache.org/jira/browse/YARN-10855
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.2, 2.10.1, 3.4.0, 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10855.001.patch
>
>
> When attempting to retrieve yarn logs via the CLI command, it failed with the 
> following stack trace (on branch-2.10):
> {noformat}
> yarn logs -applicationId application_1591017890475_1049740 > logs
> 20/06/05 19:15:50 INFO client.RMProxy: Connecting to ResourceManager 
> 20/06/05 19:15:51 INFO client.AHSProxy: Connecting to Application History 
> server 
> Exception in thread "main" java.io.EOFException: Cannot seek to negative 
> offset
>   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1701)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
>   at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:624)
>   at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:503)
>   at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:227)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:333)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:367) 
> {noformat}
> The problem was that there was a zero-length TFile for one of the containers 
> in the application aggregated log directory in hdfs.  When we removed the 
> zero length file, {{yarn logs}} was able to retrieve the logs.
> A corrupt or zero length TFile for one container should not prevent loading 
> logs for the rest of the application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10456) RM PartitionQueueMetrics records are named QueueMetrics in Simon metrics registry

2021-07-15 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380826#comment-17380826
 ] 

Qi Zhu edited comment on YARN-10456 at 7/15/21, 10:40 AM:
--

[~Jim_Brennan], [~ebadger], [~zhuqi] , [~prabhujoseph], [~BilwaST], [~snemeth] :
 Would someone be willing to review this? Thanks!


was (Author: eepayne):
[~Jim_Brennan], [~ebadger], [~edfi202], [~prabhujoseph], [~BilwaST], [~snemeth] 
:
Would someone be willing to review this? Thanks!

> RM PartitionQueueMetrics records are named QueueMetrics in Simon metrics 
> registry
> -
>
> Key: YARN-10456
> URL: https://issues.apache.org/jira/browse/YARN-10456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.3.0, 3.2.1, 3.1.4, 2.10.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10456.001.patch
>
>
> Several queue metrics (such as AppsRunning, PendingContainers, etc.) stopped 
> working after we upgraded to 2.10.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10456) RM PartitionQueueMetrics records are named QueueMetrics in Simon metrics registry

2021-07-15 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381238#comment-17381238
 ] 

Qi Zhu commented on YARN-10456:
---

Thanks [~epayne] for this work.

The patch LGTM +1.

> RM PartitionQueueMetrics records are named QueueMetrics in Simon metrics 
> registry
> -
>
> Key: YARN-10456
> URL: https://issues.apache.org/jira/browse/YARN-10456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.3.0, 3.2.1, 3.1.4, 2.10.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10456.001.patch
>
>
> Several queue metrics (such as AppsRunning, PendingContainers, etc.) stopped 
> working after we upgraded to 2.10.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10657) We should make max application per queue to support node label.

2021-07-08 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377715#comment-17377715
 ] 

Qi Zhu commented on YARN-10657:
---

Thanks [~gandras] for deep into.

I agree with you that it's hard to have a perfect solution to handle this. I 
think we should use the limit by the node label set in 
default-node-label-expression.

But if the use set the limit for other node label, if we should disable this or 
get the max limit of all label max app just like my original patch?

 

 

> We should make max application per queue to support node label.
> ---
>
> Key: YARN-10657
> URL: https://issues.apache.org/jira/browse/YARN-10657
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10657.001.patch, YARN-10657.002.patch, 
> YARN-10657.003.patch
>
>
> https://issues.apache.org/jira/browse/YARN-10641?focusedCommentId=17291708&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17291708
> As we discussed in above comment:
> We should deep into the label related max applications per queue.
> I think when node label enabled in queue, max applications should consider 
> the max capacity of all labels.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10846) Add dispatcher metrics to NM

2021-07-06 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376168#comment-17376168
 ] 

Qi Zhu commented on YARN-10846:
---

Thanks [~chaosju] for this patch.

We'd better to add a unit test for this.

> Add dispatcher metrics to NM
> 
>
> Key: YARN-10846
> URL: https://issues.apache.org/jira/browse/YARN-10846
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: chaosju
>Assignee: chaosju
>Priority: Major
> Attachments: YARN-10846.001.patch, screenshot-1.png
>
>
> base [YARN-9615|https://issues.apache.org/jira/browse/YARN-9615], add 
> dispatcher metrics to NM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10657) We should make max application per queue to support node label.

2021-06-09 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu reassigned YARN-10657:
-

Assignee: Andras Gyori

> We should make max application per queue to support node label.
> ---
>
> Key: YARN-10657
> URL: https://issues.apache.org/jira/browse/YARN-10657
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10657.001.patch, YARN-10657.002.patch
>
>
> https://issues.apache.org/jira/browse/YARN-10641?focusedCommentId=17291708&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17291708
> As we discussed in above comment:
> We should deep into the label related max applications per queue.
> I think when node label enabled in queue, max applications should consider 
> the max capacity of all labels.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10657) We should make max application per queue to support node label.

2021-06-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360109#comment-17360109
 ] 

Qi Zhu commented on YARN-10657:
---

[~gandras] Of course you can take it, and i will help review. :D

Assigned it to you.

> We should make max application per queue to support node label.
> ---
>
> Key: YARN-10657
> URL: https://issues.apache.org/jira/browse/YARN-10657
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10657.001.patch, YARN-10657.002.patch
>
>
> https://issues.apache.org/jira/browse/YARN-10641?focusedCommentId=17291708&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17291708
> As we discussed in above comment:
> We should deep into the label related max applications per queue.
> I think when node label enabled in queue, max applications should consider 
> the max capacity of all labels.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10657) We should make max application per queue to support node label.

2021-06-09 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu reassigned YARN-10657:
-

Assignee: (was: Qi Zhu)

> We should make max application per queue to support node label.
> ---
>
> Key: YARN-10657
> URL: https://issues.apache.org/jira/browse/YARN-10657
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Priority: Major
> Attachments: YARN-10657.001.patch, YARN-10657.002.patch
>
>
> https://issues.apache.org/jira/browse/YARN-10641?focusedCommentId=17291708&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17291708
> As we discussed in above comment:
> We should deep into the label related max applications per queue.
> I think when node label enabled in queue, max applications should consider 
> the max capacity of all labels.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10801) Fix Auto Queue template to properly set all configuration properties

2021-06-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360017#comment-17360017
 ] 

Qi Zhu commented on YARN-10801:
---

Thanks [~gandras] for update.

The latest patch LGTM.

> Fix Auto Queue template to properly set all configuration properties
> 
>
> Key: YARN-10801
> URL: https://issues.apache.org/jira/browse/YARN-10801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10801.001.patch, YARN-10801.002.patch, 
> YARN-10801.003.patch, YARN-10801.004.patch
>
>
> Currently Auto Queue templates set configuration properties only on 
> Configuration object passed in the constructor. Due to the fact, that a lot 
> of configuration values are ready from the Configuration object in csContext, 
> template properties are not set in every cases. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10801) Fix Auto Queue template to properly set all configuration properties

2021-06-08 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359366#comment-17359366
 ] 

Qi Zhu edited comment on YARN-10801 at 6/8/21, 1:42 PM:


Thanks [~gandras] for patch LGTM now.

I have a question about the code, if we should also make the 
MaximumApplicationMasterResourcePerQueuePercent to 100% since the user limit 
already unlimted, if we should also make this unlimited?

Other value just 0.5 0.6 0.7, we can't define an accurate value, what do you 
think about this?
cc [~bteke] [~gandras] 
{code:java}
if (isDynamicQueue()) {
  // set to -1, to disable it
  configuration.setUserLimitFactor(getQueuePath(), -1);
  // Set Max AM percentage to a higher value
  configuration.setMaximumApplicationMasterResourcePerQueuePercent(
  getQueuePath(), 0.5f);
}
{code}
Thanks.


was (Author: zhuqi):
Thanks [~gandras] for patch LGTM now.


I have a question about the code, if we should also make the 
MaximumApplicationMasterResourcePerQueuePercent to 100% since the user limit 
already unlimted, if we should also make this unlimited?
{code:java}
if (isDynamicQueue()) {
  // set to -1, to disable it
  configuration.setUserLimitFactor(getQueuePath(), -1);
  // Set Max AM percentage to a higher value
  configuration.setMaximumApplicationMasterResourcePerQueuePercent(
  getQueuePath(), 0.5f);
}
{code}
Thanks.

> Fix Auto Queue template to properly set all configuration properties
> 
>
> Key: YARN-10801
> URL: https://issues.apache.org/jira/browse/YARN-10801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10801.001.patch, YARN-10801.002.patch, 
> YARN-10801.003.patch
>
>
> Currently Auto Queue templates set configuration properties only on 
> Configuration object passed in the constructor. Due to the fact, that a lot 
> of configuration values are ready from the Configuration object in csContext, 
> template properties are not set in every cases. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10801) Fix Auto Queue template to properly set all configuration properties

2021-06-08 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359366#comment-17359366
 ] 

Qi Zhu commented on YARN-10801:
---

Thanks [~gandras] for patch LGTM now.


I have a question about the code, if we should also make the 
MaximumApplicationMasterResourcePerQueuePercent to 100% since the user limit 
already unlimted, if we should also make this unlimited?
{code:java}
if (isDynamicQueue()) {
  // set to -1, to disable it
  configuration.setUserLimitFactor(getQueuePath(), -1);
  // Set Max AM percentage to a higher value
  configuration.setMaximumApplicationMasterResourcePerQueuePercent(
  getQueuePath(), 0.5f);
}
{code}
Thanks.

> Fix Auto Queue template to properly set all configuration properties
> 
>
> Key: YARN-10801
> URL: https://issues.apache.org/jira/browse/YARN-10801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10801.001.patch, YARN-10801.002.patch, 
> YARN-10801.003.patch
>
>
> Currently Auto Queue templates set configuration properties only on 
> Configuration object passed in the constructor. Due to the fact, that a lot 
> of configuration values are ready from the Configuration object in csContext, 
> template properties are not set in every cases. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10807) Parents node labels are incorrectly added to child queues in weight mode

2021-06-08 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359343#comment-17359343
 ] 

Qi Zhu commented on YARN-10807:
---

Thanks [~bteke] for patch and [~gandras] for review.

Committed to trunk.

> Parents node labels are incorrectly added to child queues in weight mode 
> -
>
> Key: YARN-10807
> URL: https://issues.apache.org/jira/browse/YARN-10807
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10807.001.patch, YARN-10807.002.patch
>
>
> In ParentQueue.updateClusterResource when calculating the normalized weights 
> CS will iterate through the parent's nodelabels. If the parent has a node 
> label that a specific child doesn't it will incorrectly added to the child's 
> node label list through the queueCapacities.setNormalizedWeights(label, 
> weight) call:
> {code:java}
> // Normalize weight of children
>   if (getCapacityConfigurationTypeForQueues(childQueues)
>   == QueueCapacityType.WEIGHT) {
> for (String nodeLabel : queueCapacities.getExistingNodeLabels()) {
>   float sumOfWeight = 0;
>   for (CSQueue queue : childQueues) {
> float weight = Math.max(0,
> queue.getQueueCapacities().getWeight(nodeLabel));
> sumOfWeight += weight;
>   }
>   // When sum of weight == 0, skip setting normalized_weight (so
>   // normalized weight will be 0).
>   if (Math.abs(sumOfWeight) > 1e-6) {
> for (CSQueue queue : childQueues) {
> queue.getQueueCapacities().setNormalizedWeight(nodeLabel,
> queue.getQueueCapacities().getWeight(nodeLabel) / 
> sumOfWeight);
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10807) Parents node labels are incorrectly added to child queues in weight mode

2021-06-07 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358980#comment-17358980
 ] 

Qi Zhu commented on YARN-10807:
---

Thanks [~bteke] for update.

The patch LGTM.

 

> Parents node labels are incorrectly added to child queues in weight mode 
> -
>
> Key: YARN-10807
> URL: https://issues.apache.org/jira/browse/YARN-10807
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10807.001.patch, YARN-10807.002.patch
>
>
> In ParentQueue.updateClusterResource when calculating the normalized weights 
> CS will iterate through the parent's nodelabels. If the parent has a node 
> label that a specific child doesn't it will incorrectly added to the child's 
> node label list through the queueCapacities.setNormalizedWeights(label, 
> weight) call:
> {code:java}
> // Normalize weight of children
>   if (getCapacityConfigurationTypeForQueues(childQueues)
>   == QueueCapacityType.WEIGHT) {
> for (String nodeLabel : queueCapacities.getExistingNodeLabels()) {
>   float sumOfWeight = 0;
>   for (CSQueue queue : childQueues) {
> float weight = Math.max(0,
> queue.getQueueCapacities().getWeight(nodeLabel));
> sumOfWeight += weight;
>   }
>   // When sum of weight == 0, skip setting normalized_weight (so
>   // normalized weight will be 0).
>   if (Math.abs(sumOfWeight) > 1e-6) {
> for (CSQueue queue : childQueues) {
> queue.getQueueCapacities().setNormalizedWeight(nodeLabel,
> queue.getQueueCapacities().getWeight(nodeLabel) / 
> sumOfWeight);
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10807) Parents node labels are incorrectly added to child queues in weight mode

2021-06-07 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358486#comment-17358486
 ] 

Qi Zhu edited comment on YARN-10807 at 6/7/21, 9:54 AM:


Thanks [~bteke] for this work.

If we'd better skip the not existed node label also in sum logic,  even thought 
we will sum 0 when it not exisit:
{code:java}
for (CSQueue queue : childQueues) {
float weight = Math.max(0,
queue.getQueueCapacities().getWeight(nodeLabel));
sumOfWeight += weight;
 }
{code}
Other things LGTM.

Thanks.


was (Author: zhuqi):
Thanks [~bteke] for this work.

If we can skip the not existed also in sum logic,  even thought we will sum 0 
when it not exisit:
{code:java}
for (CSQueue queue : childQueues) {
float weight = Math.max(0,
queue.getQueueCapacities().getWeight(nodeLabel));
sumOfWeight += weight;
   }
{code}
Other things LGTM.

Thanks.

> Parents node labels are incorrectly added to child queues in weight mode 
> -
>
> Key: YARN-10807
> URL: https://issues.apache.org/jira/browse/YARN-10807
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10807.001.patch
>
>
> In ParentQueue.updateClusterResource when calculating the normalized weights 
> CS will iterate through the parent's nodelabels. If the parent has a node 
> label that a specific child doesn't it will incorrectly added to the child's 
> node label list through the queueCapacities.setNormalizedWeights(label, 
> weight) call:
> {code:java}
> // Normalize weight of children
>   if (getCapacityConfigurationTypeForQueues(childQueues)
>   == QueueCapacityType.WEIGHT) {
> for (String nodeLabel : queueCapacities.getExistingNodeLabels()) {
>   float sumOfWeight = 0;
>   for (CSQueue queue : childQueues) {
> float weight = Math.max(0,
> queue.getQueueCapacities().getWeight(nodeLabel));
> sumOfWeight += weight;
>   }
>   // When sum of weight == 0, skip setting normalized_weight (so
>   // normalized weight will be 0).
>   if (Math.abs(sumOfWeight) > 1e-6) {
> for (CSQueue queue : childQueues) {
> queue.getQueueCapacities().setNormalizedWeight(nodeLabel,
> queue.getQueueCapacities().getWeight(nodeLabel) / 
> sumOfWeight);
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10807) Parents node labels are incorrectly added to child queues in weight mode

2021-06-07 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358486#comment-17358486
 ] 

Qi Zhu commented on YARN-10807:
---

Thanks [~bteke] for this work.

If we can skip the not existed also in sum logic,  even thought we will sum 0 
when it not exisit:
{code:java}
for (CSQueue queue : childQueues) {
float weight = Math.max(0,
queue.getQueueCapacities().getWeight(nodeLabel));
sumOfWeight += weight;
   }
{code}
Other things LGTM.

Thanks.

> Parents node labels are incorrectly added to child queues in weight mode 
> -
>
> Key: YARN-10807
> URL: https://issues.apache.org/jira/browse/YARN-10807
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10807.001.patch
>
>
> In ParentQueue.updateClusterResource when calculating the normalized weights 
> CS will iterate through the parent's nodelabels. If the parent has a node 
> label that a specific child doesn't it will incorrectly added to the child's 
> node label list through the queueCapacities.setNormalizedWeights(label, 
> weight) call:
> {code:java}
> // Normalize weight of children
>   if (getCapacityConfigurationTypeForQueues(childQueues)
>   == QueueCapacityType.WEIGHT) {
> for (String nodeLabel : queueCapacities.getExistingNodeLabels()) {
>   float sumOfWeight = 0;
>   for (CSQueue queue : childQueues) {
> float weight = Math.max(0,
> queue.getQueueCapacities().getWeight(nodeLabel));
> sumOfWeight += weight;
>   }
>   // When sum of weight == 0, skip setting normalized_weight (so
>   // normalized weight will be 0).
>   if (Math.abs(sumOfWeight) > 1e-6) {
> for (CSQueue queue : childQueues) {
> queue.getQueueCapacities().setNormalizedWeight(nodeLabel,
> queue.getQueueCapacities().getWeight(nodeLabel) / 
> sumOfWeight);
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10789) RM HA startup can fail due to race conditions in ZKConfigurationStore

2021-06-03 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356504#comment-17356504
 ] 

Qi Zhu commented on YARN-10789:
---

Thanks [~tarunparimi] for this work.

The latest patch LGTM. +1

> RM HA startup can fail due to race conditions in ZKConfigurationStore
> -
>
> Key: YARN-10789
> URL: https://issues.apache.org/jira/browse/YARN-10789
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-10789.001.patch, YARN-10789.002.patch
>
>
> We are observing below error randomly during hadoop install and RM initial 
> startup when HA is enabled and yarn.scheduler.configuration.store.class=zk is 
> configured. This causes one of the RMs to not startup.
> {code:java}
> 2021-05-26 12:59:18,986 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state INITED
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /confstore/CONF_STORE
> {code}
> We are trying to create the znode /confstore/CONF_STORE when we initialize 
> the ZKConfigurationStore. But the problem is that the ZKConfigurationStore is 
> initialized when CapacityScheduler does a serviceInit. This serviceInit is 
> done by both Active and Standby RM. So we can run into a race condition when 
> both Active and Standby try to create the same znode when both RM are started 
> at same time.
> ZKRMStateStore on the other hand avoids such race conditions, by creating the 
> znodes only after serviceStart. serviceStart only happens for the active RM 
> which won the leader election, unlike serviceInit which happens irrespective 
> of leader election.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10796) Capacity Scheduler: dynamic queue cannot scale out properly if its capacity is 0%

2021-06-02 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356214#comment-17356214
 ] 

Qi Zhu commented on YARN-10796:
---

Thanks [~pbacsko] the latest patch LGTM +1.

And i agree with you the capacity 0, we also need to relax to the max. 

 

> Capacity Scheduler: dynamic queue cannot scale out properly if its capacity 
> is 0%
> -
>
> Key: YARN-10796
> URL: https://issues.apache.org/jira/browse/YARN-10796
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10796-001.patch, YARN-10796-002.patch, 
> YARN-10796-003.patch
>
>
> If we have a dynamic queue (AutoCreatedLeafQueue) with capacity = 0%, then it 
> cannot properly scale even if it's max-capacity and the parent's max-capacity 
> would allow it.
> Example:
> {noformat}
> Cluster Capacity:  16 GB / 16cpu (2 nodes, each with 8 GB / 8 cpu )
> Container allocation size: 1G / 1 vcore
> root.dynamic 
> Effective Capacity:   ( 50.0%)
> Effective Max Capacity:   (100.0%) 
> Template:
> Capacity:   40%
> Max Capacity:   100%
> User Limit Factor:  4
>  {noformat}
> leaf-queue-template.capacity = 40%
>  leaf-queue-template.maximum-capacity = 100%
>  leaf-queue-template.maximum-am-resource-percent = 50%
>  leaf-queue-template.minimum-user-limit-percent =100%
>  leaf-queue-template.user-limit-factor = 4
> "root.dynamic" has a maximum capacity of 100% and a capacity of 50%.
> Let's assume there are running containers in these dynamic queues (MR sleep 
> jobs):
>  root.dynamic.user1 = 1 AM + 3 container (capacity = 40%)
>  root.dynamic.user2 = 1 AM + 3 container (capacity = 40%)
>  root.dynamic.user3 = 1 AM + 15 container (capacity = 0%)
> This scenario will result in an underutilized cluster. There will be approx 
> 18% unused capacity. On the other hand, it's still possible to submit a new 
> application to root.dynamic.user1 or root.dynamic.user2 and reaching a 100% 
> utilization is possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10522) Document for Flexible Auto Queue Creation in Capacity Scheduler.

2021-06-02 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356213#comment-17356213
 ] 

Qi Zhu commented on YARN-10522:
---

Thanks for [~bteke] taking this.

I assigned it to you.

 

> Document for Flexible Auto Queue Creation in Capacity Scheduler.
> 
>
> Key: YARN-10522
> URL: https://issues.apache.org/jira/browse/YARN-10522
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Benjamin Teke
>Priority: Major
>
> We should update document to support this feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10522) Document for Flexible Auto Queue Creation in Capacity Scheduler.

2021-06-02 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu reassigned YARN-10522:
-

Assignee: Benjamin Teke

> Document for Flexible Auto Queue Creation in Capacity Scheduler.
> 
>
> Key: YARN-10522
> URL: https://issues.apache.org/jira/browse/YARN-10522
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Benjamin Teke
>Priority: Major
>
> We should update document to support this feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10522) Document for Flexible Auto Queue Creation in Capacity Scheduler.

2021-06-02 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu reassigned YARN-10522:
-

Assignee: (was: Ankit Kumar)

> Document for Flexible Auto Queue Creation in Capacity Scheduler.
> 
>
> Key: YARN-10522
> URL: https://issues.apache.org/jira/browse/YARN-10522
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Priority: Major
>
> We should update document to support this feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10795) Improve Capacity Scheduler reinitialisation performance

2021-05-31 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354809#comment-17354809
 ] 

Qi Zhu commented on YARN-10795:
---

Thanks [~gandras] for this work.

It will be very helpful to clusters with many queues. :D

> Improve Capacity Scheduler reinitialisation performance
> ---
>
> Key: YARN-10795
> URL: https://issues.apache.org/jira/browse/YARN-10795
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Andras Gyori
>Priority: Major
>
> Mostly due to CapacitySchedulerConfiguration#getPropsWithPrefix or similar 
> methods, the CapacityScheduler#reinit method has some quadratic complexity 
> part with respect to queue numbers. Over 1000+ queues, it is a matter of 
> minutes, which is too high to be a viable option when it is used in mutation 
> api.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10781) The Thread of the NM aggregate log is exhausted and no other Application can aggregate the log

2021-05-25 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351456#comment-17351456
 ] 

Qi Zhu commented on YARN-10781:
---

[~zhangxiping]

If you enabled rolling log aggregation for long running jobs? 

It can ease your problem.

> The Thread of the NM aggregate log is exhausted and no other Application can 
> aggregate the log
> --
>
> Key: YARN-10781
> URL: https://issues.apache.org/jira/browse/YARN-10781
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.2, 3.3.0
>Reporter: Xiping Zhang
>Priority: Major
> Attachments: applications.png, containers.png, containers.png
>
>
> We observed more than 100 applications running on one NM.Most of these 
> applications are SparkStreaming applications, but these applications do not 
> have running Containers.When the offline application running on it finishes, 
> the log cannot be reported to HDFS. When we killed a large number of 
> SparkStreaming applications, we found that a large number of log files were 
> being created on the NN side, causing the read and write performance on the 
> NN side to degrade significantly.Causes the business application to time out。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10786) Federation:We can't access the AM page while using federation

2021-05-25 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350939#comment-17350939
 ] 

Qi Zhu commented on YARN-10786:
---

Thanks [~Song Jiacheng] for contribution.

The patch LGTM. +1

cc [~pbacsko] [~gandras] [~bilwa_st]

Could you help double check this?

Thanks.

> Federation:We can't access the AM page while using federation
> -
>
> Key: YARN-10786
> URL: https://issues.apache.org/jira/browse/YARN-10786
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.1
>Reporter: Song Jiacheng
>Priority: Major
>  Labels: federation
> Fix For: 3.2.1
>
> Attachments: YARN-10786.v1.patch, 
> n_v25156273211c049f8b396dcf15fcd9a84.png, 
> v1.1_dataplat_Hadoop平台_HDP3_2_1版本升级_YARN_26_Federation严重BUG--无法查看AM_WebHome_1621584160759-478.png
>
>
> The reason of this is that AM gets the proxy URI from config 
> yarn.web-proxy.address, and if it does not exist, it will get the URI from 
> yarn.resourcemanager.webapp.address.
> But in federation, we don't know which RM will be the home cluster of an 
> application, so I do this fix:
> 1. Add this config in the yarn-site.xml on client.
> 
>  yarn.web-proxy.address
>  rm1:9088,rm2:9088
>  
> 2. Change the way to get the config from Configuration#get to 
> Configuration#getStrings in WebAppUtils#getProxyHostsAndPortsForAmFilter.
> So that I can access the AM page now.
> This config need to be added in the client side, so it will affect 
> application only.
> Before fixing, click the AM link in RM or Router:
> !v1.1_dataplat_Hadoop平台_HDP3_2_1版本升级_YARN_26_Federation严重BUG--无法查看AM_WebHome_1621584160759-478.png!
>  And after the fix, we can access the AM page as normal...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10786) Federation:We can't access the AM page while using federation

2021-05-25 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350861#comment-17350861
 ] 

Qi Zhu commented on YARN-10786:
---

Thanks [~Song Jiacheng] for report this.

Can you add some image to confirm your patch will fix this?
 # Before this patch.
 # After this patch.

> Federation:We can't access the AM page while using federation
> -
>
> Key: YARN-10786
> URL: https://issues.apache.org/jira/browse/YARN-10786
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.1
>Reporter: Song Jiacheng
>Priority: Major
>  Labels: federation
> Fix For: 3.2.1
>
> Attachments: YARN-10786.v1.patch
>
>
> The reason of this is that AM gets the proxy URI from config 
> yarn.web-proxy.address, and if it does not exist, it will get the URI from 
> yarn.resourcemanager.webapp.address.
> But in federation, we don't know which RM will be the home cluster of an 
> application, so I do this fix:
> 1. Add this config in the yarn-site.xml on client.
> 
>  yarn.web-proxy.address
>  rm1:9088,rm2:9088
>  
> 2. Change the way to get the config from Configuration#get to 
> Configuration#getStrings in WebAppUtils#getProxyHostsAndPortsForAmFilter.
> So that I can access the AM page now.
> This config need to be added in the client side, so it will affect 
> application only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10770) container-executor permission is wrong in SecureContainer.md

2021-05-24 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350807#comment-17350807
 ] 

Qi Zhu commented on YARN-10770:
---

Thanks [~aajisaka] for good finding, [~sahuja] for patch.

The patch LGTM +1.

> container-executor permission is wrong in SecureContainer.md
> 
>
> Key: YARN-10770
> URL: https://issues.apache.org/jira/browse/YARN-10770
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Siddharth Ahuja
>Priority: Major
>  Labels: newbie
> Attachments: YARN-10770.001.patch
>
>
> {noformat}
>   The `container-executor` program must be owned by `root` and have the 
> permission set `---sr-s---`.
> {noformat}
> It should be 6050 {noformat}---Sr-s---{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10785) Yarn NodeManager aux-services should support trim.

2021-05-24 Thread Qi Zhu (Jira)
Qi Zhu created YARN-10785:
-

 Summary: Yarn NodeManager aux-services should support trim.
 Key: YARN-10785
 URL: https://issues.apache.org/jira/browse/YARN-10785
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Qi Zhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10771) Add cluster metric for size of SchedulerEventQueue and RMEventQueue

2021-05-24 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350482#comment-17350482
 ] 

Qi Zhu commented on YARN-10771:
---

Thanks [~chaosju] for contribution and [~pbacsko] for review.

The test is not related, passed locally, merged 005 to trunk.

> Add cluster metric for size of SchedulerEventQueue and RMEventQueue
> ---
>
> Key: YARN-10771
> URL: https://issues.apache.org/jira/browse/YARN-10771
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: chaosju
>Assignee: chaosju
>Priority: Major
> Attachments: YARN-10763.001.patch, YARN-10771.002.patch, 
> YARN-10771.003.patch, YARN-10771.004.patch, YARN-10771.005.patch
>
>
> Add cluster metric for size of Scheduler event queue and RM event queue, This 
> lets us know the load of the RM and convenient monitoring the metrics.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10783) Allow definition of auto queue template properties in root

2021-05-23 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350262#comment-17350262
 ] 

Qi Zhu commented on YARN-10783:
---

Thanks [~gandras] for this.

The patch LGTM +1.

> Allow definition of auto queue template properties in root
> --
>
> Key: YARN-10783
> URL: https://issues.apache.org/jira/browse/YARN-10783
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10783.001.patch
>
>
> YARN-10564 introduced template properties set on auto queue creation eligible 
> queues, however root does not take it into consideration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10781) The Thread of the NM aggregate log is exhausted and no other Application can aggregate the log

2021-05-23 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350255#comment-17350255
 ] 

Qi Zhu edited comment on YARN-10781 at 5/24/21, 6:02 AM:
-

[~zhangxiping]

It only init app and create the thread pool, when AM launch.

But the dynamic executor removed is not the AM(driver) , if you mean that the 
AM of dynamic resource spark will not exit the 

doAppLogAggregation loop in nodemanager when all the containers (non driver 
executor) have been released?

Could you make sure that how this happened, i don't see a case can cause this 
happen?

 


was (Author: zhuqi):
[~zhangxiping]

It only init app and create the thread pool, when AM launch.

But the dynamic executor removed is not the AM(driver) , if you mean that the 
AM of dynamic resource spark will not exit the 

doAppLogAggregation loop in nodemanager when all the containers (non driver 
executor) have been released?

 

> The Thread of the NM aggregate log is exhausted and no other Application can 
> aggregate the log
> --
>
> Key: YARN-10781
> URL: https://issues.apache.org/jira/browse/YARN-10781
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.2, 3.3.0
>Reporter: Xiping Zhang
>Priority: Major
>
> We observed more than 100 applications running on one NM.Most of these 
> applications are SparkStreaming tasks, but these applications do not have 
> running Containers.When the offline application running on it finishes, the 
> log cannot be reported to HDFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10781) The Thread of the NM aggregate log is exhausted and no other Application can aggregate the log

2021-05-23 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350255#comment-17350255
 ] 

Qi Zhu commented on YARN-10781:
---

[~zhangxiping]

It only init app and create the thread pool, when AM launch.

But the dynamic executor removed is not the AM(driver) , if you mean that the 
AM of dynamic resource spark will not exit the 

doAppLogAggregation loop in nodemanager when all the containers (non driver 
executor) have been released?

 

> The Thread of the NM aggregate log is exhausted and no other Application can 
> aggregate the log
> --
>
> Key: YARN-10781
> URL: https://issues.apache.org/jira/browse/YARN-10781
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.2, 3.3.0
>Reporter: Xiping Zhang
>Priority: Major
>
> We observed more than 100 applications running on one NM.Most of these 
> applications are SparkStreaming tasks, but these applications do not have 
> running Containers.When the offline application running on it finishes, the 
> log cannot be reported to HDFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy

2021-05-21 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349124#comment-17349124
 ] 

Qi Zhu commented on YARN-10324:
---

[~yaoguangdong] I'm not sure if you removed the original 003, and resubmitted 
it ?

Waiting for the jenkins now, if it not triggered some hours later, you should 
attached it again to trigger.

> Fetch data from NodeManager may case read timeout when disk is busy
> ---
>
> Key: YARN-10324
> URL: https://issues.apache.org/jira/browse/YARN-10324
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: auxservices
>Affects Versions: 2.7.0, 3.2.1
>Reporter: Yao Guangdong
>Assignee: Yao Guangdong
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10324.001.patch, YARN-10324.002.patch, 
> YARN-10324.003.patch, image-2021-05-21-17-48-03-476.png
>
>
>  With the cluster size become more and more big.The cost  time on Reduce 
> fetch Map's result from NodeManager become more and more long.We often see 
> the WARN logs in the reduce's logs as follow.
> {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
> TX-196-168-211.com:13562 with 5 map outputs
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
> {quote}
>  We check the NodeManager server find that the disk IO util and connections 
> became very high when the read timeout happened.We analyze that if we have 
> 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million 
> times IO stream operate in the shuffle phase.If the reduce fetch data size is 
> very small from map output files.Which make the disk IO util become very high 
> in big cluster.Then read timeout happened frequently.The application finished 
> time become longer.
> We find ShuffleHandler have IndexCache for cache file.out.index file.Then we 
> want to change the small IO to big IO which can reduce the small disk IO 
> times. So we try to cache all the small file data(file.out) in memory when 
> the first fetch request come.Then the others fetch request only need read 
> data from memory avoid disk IO operation.After we cache data to memory we 
> find the read timeout disappeared.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy

2021-05-21 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349111#comment-17349111
 ] 

Qi Zhu commented on YARN-10324:
---

[~yaoguangdong] You should submitted it, make it patch available, then the 
jenkins will trigger.

With the button:

!image-2021-05-21-17-48-03-476.png|width=88,height=45!

> Fetch data from NodeManager may case read timeout when disk is busy
> ---
>
> Key: YARN-10324
> URL: https://issues.apache.org/jira/browse/YARN-10324
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: auxservices
>Affects Versions: 2.7.0, 3.2.1
>Reporter: Yao Guangdong
>Assignee: Yao Guangdong
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10324.001.patch, YARN-10324.002.patch, 
> YARN-10324.003.patch, image-2021-05-21-17-48-03-476.png
>
>
>  With the cluster size become more and more big.The cost  time on Reduce 
> fetch Map's result from NodeManager become more and more long.We often see 
> the WARN logs in the reduce's logs as follow.
> {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
> TX-196-168-211.com:13562 with 5 map outputs
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
> {quote}
>  We check the NodeManager server find that the disk IO util and connections 
> became very high when the read timeout happened.We analyze that if we have 
> 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million 
> times IO stream operate in the shuffle phase.If the reduce fetch data size is 
> very small from map output files.Which make the disk IO util become very high 
> in big cluster.Then read timeout happened frequently.The application finished 
> time become longer.
> We find ShuffleHandler have IndexCache for cache file.out.index file.Then we 
> want to change the small IO to big IO which can reduce the small disk IO 
> times. So we try to cache all the small file data(file.out) in memory when 
> the first fetch request come.Then the others fetch request only need read 
> data from memory avoid disk IO operation.After we cache data to memory we 
> find the read timeout disappeared.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy

2021-05-21 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10324:
--
Attachment: image-2021-05-21-17-48-03-476.png

> Fetch data from NodeManager may case read timeout when disk is busy
> ---
>
> Key: YARN-10324
> URL: https://issues.apache.org/jira/browse/YARN-10324
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: auxservices
>Affects Versions: 2.7.0, 3.2.1
>Reporter: Yao Guangdong
>Assignee: Yao Guangdong
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10324.001.patch, YARN-10324.002.patch, 
> YARN-10324.003.patch, image-2021-05-21-17-48-03-476.png
>
>
>  With the cluster size become more and more big.The cost  time on Reduce 
> fetch Map's result from NodeManager become more and more long.We often see 
> the WARN logs in the reduce's logs as follow.
> {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
> TX-196-168-211.com:13562 with 5 map outputs
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
> {quote}
>  We check the NodeManager server find that the disk IO util and connections 
> became very high when the read timeout happened.We analyze that if we have 
> 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million 
> times IO stream operate in the shuffle phase.If the reduce fetch data size is 
> very small from map output files.Which make the disk IO util become very high 
> in big cluster.Then read timeout happened frequently.The application finished 
> time become longer.
> We find ShuffleHandler have IndexCache for cache file.out.index file.Then we 
> want to change the small IO to big IO which can reduce the small disk IO 
> times. So we try to cache all the small file data(file.out) in memory when 
> the first fetch request come.Then the others fetch request only need read 
> data from memory avoid disk IO operation.After we cache data to memory we 
> find the read timeout disappeared.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10657) We should make max application per queue to support node label.

2021-05-21 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349094#comment-17349094
 ] 

Qi Zhu commented on YARN-10657:
---

Thanks [~gandras] for reply.

We now can close before we can discuss a better solution for node label based 
max applications.

Thanks.

> We should make max application per queue to support node label.
> ---
>
> Key: YARN-10657
> URL: https://issues.apache.org/jira/browse/YARN-10657
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10657.001.patch, YARN-10657.002.patch
>
>
> https://issues.apache.org/jira/browse/YARN-10641?focusedCommentId=17291708&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17291708
> As we discussed in above comment:
> We should deep into the label related max applications per queue.
> I think when node label enabled in queue, max applications should consider 
> the max capacity of all labels.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10779) Add option to disable lowercase conversion in GetApplicationsRequestPBImpl and ApplicationSubmissionContextPBImpl

2021-05-21 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349073#comment-17349073
 ] 

Qi Zhu commented on YARN-10779:
---

Thanks [~pbacsko] for reply.

I also agree that it only affect the resource manager when rm restarted but not 
the CS related. 

[~gandras] And if we need the re-initialization the RM related property in 
future, just like reconfig namenode in HDFS, we can let it to be non-static, 
but now i think is fine to be static.

Thanks.

> Add option to disable lowercase conversion in GetApplicationsRequestPBImpl 
> and ApplicationSubmissionContextPBImpl
> -
>
> Key: YARN-10779
> URL: https://issues.apache.org/jira/browse/YARN-10779
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: resourcemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10779-001.patch, YARN-10779-002.patch, 
> YARN-10779-003.patch, YARN-10779-POC.patch
>
>
> In both {{GetApplicationsRequestPBImpl}} and 
> {{ApplicationSubmissionContextPBImpl}}, there is a forced lowercase 
> conversion:
> {noformat}
> checkTags(tags);
> // Convert applicationTags to lower case and add
> this.applicationTags = new TreeSet<>();
> for (String tag : tags) {
>   this.applicationTags.add(StringUtils.toLowerCase(tag));
> }
>   }
> {noformat}
> However, we encountered some cases where this is not desirable for "userid" 
> tags. 
> Proposed solution: since both classes are pretty low-level and can be often 
> instantiated, a {{Configuration}} object which loads {{yarn-site.xml}} should 
> be cached inside them. A new property should be created which tells whether 
> lowercase conversion should occur or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10781) The Thread of the NM aggregate log is exhausted and no other Application can aggregate the log

2021-05-21 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349065#comment-17349065
 ] 

Qi Zhu commented on YARN-10781:
---

Thanks [~zhangxiping] for this.

If you mean when the Spark dynamic resource enabled.

I think the spark will remove the idle executor, and after remove the idle 
executor, the aggregate thread will not exit? 

How is spark to handle this, can you add the related code of spark?

Thanks.

 

> The Thread of the NM aggregate log is exhausted and no other Application can 
> aggregate the log
> --
>
> Key: YARN-10781
> URL: https://issues.apache.org/jira/browse/YARN-10781
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.2, 3.3.0
>Reporter: Xiping Zhang
>Priority: Major
>
> We observed more than 100 applications running on one NM.Most of these 
> applications are SparkStreaming tasks, but these applications do not have 
> running Containers.When the offline application running on it finishes, the 
> log cannot be reported to HDFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10779) Add option to disable lowercase conversion in GetApplicationsRequestPBImpl and ApplicationSubmissionContextPBImpl

2021-05-21 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349052#comment-17349052
 ] 

Qi Zhu commented on YARN-10779:
---

Thanks [~gandras] for reminder.

If we should enable use to reinitialize this to false, and which case we need 
change it to false for CS users.

And if we just change the static to non - static can solve this?

Thanks.

> Add option to disable lowercase conversion in GetApplicationsRequestPBImpl 
> and ApplicationSubmissionContextPBImpl
> -
>
> Key: YARN-10779
> URL: https://issues.apache.org/jira/browse/YARN-10779
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: resourcemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10779-001.patch, YARN-10779-002.patch, 
> YARN-10779-003.patch, YARN-10779-POC.patch
>
>
> In both {{GetApplicationsRequestPBImpl}} and 
> {{ApplicationSubmissionContextPBImpl}}, there is a forced lowercase 
> conversion:
> {noformat}
> checkTags(tags);
> // Convert applicationTags to lower case and add
> this.applicationTags = new TreeSet<>();
> for (String tag : tags) {
>   this.applicationTags.add(StringUtils.toLowerCase(tag));
> }
>   }
> {noformat}
> However, we encountered some cases where this is not desirable for "userid" 
> tags. 
> Proposed solution: since both classes are pretty low-level and can be often 
> instantiated, a {{Configuration}} object which loads {{yarn-site.xml}} should 
> be cached inside them. A new property should be created which tells whether 
> lowercase conversion should occur or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10779) Add option to disable lowercase conversion in GetApplicationsRequestPBImpl and ApplicationSubmissionContextPBImpl

2021-05-20 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348941#comment-17348941
 ] 

Qi Zhu commented on YARN-10779:
---

Thanks [~pbacsko] for this work.

The patch LGTM, just to fix the only one checkstyle.

> Add option to disable lowercase conversion in GetApplicationsRequestPBImpl 
> and ApplicationSubmissionContextPBImpl
> -
>
> Key: YARN-10779
> URL: https://issues.apache.org/jira/browse/YARN-10779
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: resourcemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10779-001.patch, YARN-10779-002.patch, 
> YARN-10779-003.patch, YARN-10779-POC.patch
>
>
> In both {{GetApplicationsRequestPBImpl}} and 
> {{ApplicationSubmissionContextPBImpl}}, there is a forced lowercase 
> conversion:
> {noformat}
> checkTags(tags);
> // Convert applicationTags to lower case and add
> this.applicationTags = new TreeSet<>();
> for (String tag : tags) {
>   this.applicationTags.add(StringUtils.toLowerCase(tag));
> }
>   }
> {noformat}
> However, we encountered some cases where this is not desirable for "userid" 
> tags. 
> Proposed solution: since both classes are pretty low-level and can be often 
> instantiated, a {{Configuration}} object which loads {{yarn-site.xml}} should 
> be cached inside them. A new property should be created which tells whether 
> lowercase conversion should occur or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10543) Timeline Server V1.5 not supporting audit log

2021-05-19 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348066#comment-17348066
 ] 

Qi Zhu commented on YARN-10543:
---

Thanks [~gb.ana...@gmail.com] for patch.

The patch generally LGTM.

But we'd better to add a simple unit test to intercept the audit log.

> Timeline Server V1.5 not supporting audit log
> -
>
> Key: YARN-10543
> URL: https://issues.apache.org/jira/browse/YARN-10543
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Affects Versions: 3.1.1
>Reporter: ANANDA G B
>Assignee: ANANDA G B
>Priority: Major
>  Labels: TimeLine
> Attachments: YARN-10543-001.patch, YARN-10543-002.patch
>
>
> Like JHS, TS V1.5 can also support audit log when Timeline REST APIs are 
> accessed. This will helps to know the operation performed on TS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10771) Add cluster metric for size of SchedulerEventQueue and RMEventQueue

2021-05-19 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347313#comment-17347313
 ] 

Qi Zhu edited comment on YARN-10771 at 5/20/21, 2:29 AM:
-

Thanks [~chaosju] for update.

The patch LGTM now.

Waiting [~pbacsko] [~ebadger]  for double check.

Thanks.


was (Author: zhuqi):
Thanks [~chaosju] for update.

The patch LGTM now.

Waiting [~pbacsko] for double check.

Thanks.

> Add cluster metric for size of SchedulerEventQueue and RMEventQueue
> ---
>
> Key: YARN-10771
> URL: https://issues.apache.org/jira/browse/YARN-10771
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: chaosju
>Assignee: chaosju
>Priority: Major
> Attachments: YARN-10763.001.patch, YARN-10771.002.patch, 
> YARN-10771.003.patch, YARN-10771.004.patch
>
>
> Add cluster metric for size of Scheduler event queue and RM event queue, This 
> lets us know the load of the RM and convenient monitoring the metrics.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10701) The yarn.resource-types should support multi types without trimmed.

2021-05-19 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347679#comment-17347679
 ] 

Qi Zhu commented on YARN-10701:
---

The test is not related this jira.

Committed to branch-3.3.

Thanks [~weichiu] for reminder.

> The yarn.resource-types should support multi types without trimmed.
> ---
>
> Key: YARN-10701
> URL: https://issues.apache.org/jira/browse/YARN-10701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10701-branch-3.3.001.patch, YARN-10701.001.patch, 
> YARN-10701.002.patch
>
>
> {code:java}
> 
>  
>  yarn.resource-types
>  yarn.io/gpu, yarn.io/fpga
>  
>  {code}
>  When i configured the resource type above with gpu and fpga, the error 
> happend:
>  
> {code:java}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is 
> not a valid resource name. A valid resource name must begin with a letter and 
> contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource 
> name may also be optionally preceded by a name space followed by a slash. A 
> valid name space consists of period-separated groups of letters, numbers, and 
> dashes.{code}
>   
>  The resource types should support trim.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10701) The yarn.resource-types should support multi types without trimmed.

2021-05-19 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347445#comment-17347445
 ] 

Qi Zhu commented on YARN-10701:
---

Submitted backport-3.3 patch to trigger jenkins.

> The yarn.resource-types should support multi types without trimmed.
> ---
>
> Key: YARN-10701
> URL: https://issues.apache.org/jira/browse/YARN-10701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10701-branch-3.3.001.patch, YARN-10701.001.patch, 
> YARN-10701.002.patch
>
>
> {code:java}
> 
>  
>  yarn.resource-types
>  yarn.io/gpu, yarn.io/fpga
>  
>  {code}
>  When i configured the resource type above with gpu and fpga, the error 
> happend:
>  
> {code:java}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is 
> not a valid resource name. A valid resource name must begin with a letter and 
> contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource 
> name may also be optionally preceded by a name space followed by a slash. A 
> valid name space consists of period-separated groups of letters, numbers, and 
> dashes.{code}
>   
>  The resource types should support trim.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10701) The yarn.resource-types should support multi types without trimmed.

2021-05-19 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347438#comment-17347438
 ] 

Qi Zhu commented on YARN-10701:
---

Thanks [~weichiu] for reminder.

I will help to backport to branch-3.3.

> The yarn.resource-types should support multi types without trimmed.
> ---
>
> Key: YARN-10701
> URL: https://issues.apache.org/jira/browse/YARN-10701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10701.001.patch, YARN-10701.002.patch
>
>
> {code:java}
> 
>  
>  yarn.resource-types
>  yarn.io/gpu, yarn.io/fpga
>  
>  {code}
>  When i configured the resource type above with gpu and fpga, the error 
> happend:
>  
> {code:java}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is 
> not a valid resource name. A valid resource name must begin with a letter and 
> contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource 
> name may also be optionally preceded by a name space followed by a slash. A 
> valid name space consists of period-separated groups of letters, numbers, and 
> dashes.{code}
>   
>  The resource types should support trim.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-10701) The yarn.resource-types should support multi types without trimmed.

2021-05-19 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu reopened YARN-10701:
---

> The yarn.resource-types should support multi types without trimmed.
> ---
>
> Key: YARN-10701
> URL: https://issues.apache.org/jira/browse/YARN-10701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10701.001.patch, YARN-10701.002.patch
>
>
> {code:java}
> 
>  
>  yarn.resource-types
>  yarn.io/gpu, yarn.io/fpga
>  
>  {code}
>  When i configured the resource type above with gpu and fpga, the error 
> happend:
>  
> {code:java}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is 
> not a valid resource name. A valid resource name must begin with a letter and 
> contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource 
> name may also be optionally preceded by a name space followed by a slash. A 
> valid name space consists of period-separated groups of letters, numbers, and 
> dashes.{code}
>   
>  The resource types should support trim.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10774) Federation: Normalize the yarn federation queue name

2021-05-19 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347427#comment-17347427
 ] 

Qi Zhu edited comment on YARN-10774 at 5/19/21, 9:01 AM:
-

[~luoyuan] Now fs support both root.XXX and xxx but cs still not support this.

See YARN-10728.

Thanks.


was (Author: zhuqi):
[~luoyuan] Now fs support both root.XXX and xxx but cs still not support this.

> Federation: Normalize the yarn federation queue name
> 
>
> Key: YARN-10774
> URL: https://issues.apache.org/jira/browse/YARN-10774
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Reporter: Yuan LUO
>Priority: Major
> Attachments: YARN-10774.001.patch
>
>
> While in YARN at root.abc is equivalent to the abc queue, the routing 
> behavior of both should be consistent in yarn federation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10774) Federation: Normalize the yarn federation queue name

2021-05-19 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347427#comment-17347427
 ] 

Qi Zhu commented on YARN-10774:
---

[~luoyuan] Now fs support both root.XXX and xxx but cs still not support this.

> Federation: Normalize the yarn federation queue name
> 
>
> Key: YARN-10774
> URL: https://issues.apache.org/jira/browse/YARN-10774
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Reporter: Yuan LUO
>Priority: Major
> Attachments: YARN-10774.001.patch
>
>
> While in YARN at root.abc is equivalent to the abc queue, the routing 
> behavior of both should be consistent in yarn federation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10771) Add cluster metric for size of SchedulerEventQueue and RMEventQueue

2021-05-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347313#comment-17347313
 ] 

Qi Zhu commented on YARN-10771:
---

Thanks [~chaosju] for update.

The patch LGTM now.

Waiting [~pbacsko] for double check.

Thanks.

> Add cluster metric for size of SchedulerEventQueue and RMEventQueue
> ---
>
> Key: YARN-10771
> URL: https://issues.apache.org/jira/browse/YARN-10771
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: chaosju
>Assignee: chaosju
>Priority: Major
> Attachments: YARN-10763.001.patch, YARN-10771.002.patch, 
> YARN-10771.003.patch, YARN-10771.004.patch
>
>
> Add cluster metric for size of Scheduler event queue and RM event queue, This 
> lets us know the load of the RM and convenient monitoring the metrics.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10771) Add cluster metric for size of SchedulerEventQueue and RMEventQueue

2021-05-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347005#comment-17347005
 ] 

Qi Zhu commented on YARN-10771:
---

Thanks [~chaosju] for update.

The patch LGTM, go on to fix the checkstyle.

> Add cluster metric for size of SchedulerEventQueue and RMEventQueue
> ---
>
> Key: YARN-10771
> URL: https://issues.apache.org/jira/browse/YARN-10771
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: chaosju
>Assignee: chaosju
>Priority: Major
> Attachments: YARN-10763.001.patch, YARN-10771.002.patch, 
> YARN-10771.003.patch
>
>
> Add cluster metric for size of Scheduler event queue and RM event queue, This 
> lets us know the load of the RM and convenient monitoring the metrics.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >