[jira] [Commented] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions

2019-09-26 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939147#comment-16939147
 ] 

Jonathan Hung commented on YARN-9858:
-

Thanks, good point.

I think instead we can just initialize this set in DefaultAMSProcessor / 
RMAppManager constructor since there's only one of each. Then we can avoid 
making changes to RMContext completely. I think it's best to avoid coupling 
RMContextImpl creation with setting exclusiveEnforcedPartitions, otherwise if 
another RMContextImpl is initialized in the future, it's not obvious this field 
needs to be set.

I'll post a patch for this shortly.

> Optimize RMContext getExclusiveEnforcedPartitions 
> --
>
> Key: YARN-9858
> URL: https://issues.apache.org/jira/browse/YARN-9858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9858.001.patch
>
>
> Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a 
> hot code path, need to optimize it .
> Since AMS allocate invoked by multiple handlers locking on conf will occur
> {code}
> java.lang.Thread.State: BLOCKED (on object monitor)
>  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841)
>  - waiting to lock <0x7f1f8107c748> (a 
> org.apache.hadoop.yarn.conf.YarnConfiguration)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions

2019-09-26 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939136#comment-16939136
 ] 

Bibin Chundatt edited comment on YARN-9858 at 9/27/19 5:46 AM:
---

[~jhung]

Patch could cause  *exclusiveEnforcedPartitions* getting set multiple times 
incase of concurrent execution.
 Its a possibility since its invoked by multiple handler.

Alternative could be to set the *exclusiveEnforcedPartitions* after the 
creation of RMContext at
 # Resourcemanager#serviceInit
 # Resourcemanager#resetRMContext

All activeservices would be in NEW STATE when set it too.  Thoughts?


was (Author: bibinchundatt):
[~jhung]

Patch could cause  *exclusiveEnforcedPartitions* getting set multiple times 
incase of concurrent execution.
 Its a possibility since its invoked by multiple handler.

Alternative could be to set the *exclusiveEnforcedPartitions* after the 
creation of RMContext at
 # Resourcemanager#serviceInit
 # Resourcemanager#resetRMContext

All the activeservices would be in stopped when we set it too.. Thoughts?

> Optimize RMContext getExclusiveEnforcedPartitions 
> --
>
> Key: YARN-9858
> URL: https://issues.apache.org/jira/browse/YARN-9858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9858.001.patch
>
>
> Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a 
> hot code path, need to optimize it .
> Since AMS allocate invoked by multiple handlers locking on conf will occur
> {code}
> java.lang.Thread.State: BLOCKED (on object monitor)
>  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841)
>  - waiting to lock <0x7f1f8107c748> (a 
> org.apache.hadoop.yarn.conf.YarnConfiguration)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions

2019-09-26 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939136#comment-16939136
 ] 

Bibin Chundatt commented on YARN-9858:
--

[~jhung]

Patch could cause  *exclusiveEnforcedPartitions* getting set multiple times 
incase of concurrent execution.
 Its a possibility since its invoked by multiple handler.

Alternative could be to set the *exclusiveEnforcedPartitions* after the 
creation of RMContext at
 # Resourcemanager#serviceInit
 # Resourcemanager#resetRMContext

All the activeservices would be in stopped when we set it too.. Thoughts?

> Optimize RMContext getExclusiveEnforcedPartitions 
> --
>
> Key: YARN-9858
> URL: https://issues.apache.org/jira/browse/YARN-9858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9858.001.patch
>
>
> Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a 
> hot code path, need to optimize it .
> Since AMS allocate invoked by multiple handlers locking on conf will occur
> {code}
> java.lang.Thread.State: BLOCKED (on object monitor)
>  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841)
>  - waiting to lock <0x7f1f8107c748> (a 
> org.apache.hadoop.yarn.conf.YarnConfiguration)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9859) Refactor OpportunisticContainerAllocator

2019-09-26 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9859:

Attachment: YARN-9859.002.patch

> Refactor OpportunisticContainerAllocator
> 
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch, YARN-9859.002.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9859) Refactor OpportunisticContainerAllocator

2019-09-26 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939113#comment-16939113
 ] 

Abhishek Modi commented on YARN-9859:
-

Thanks [~elgoiri] for review.

Changed the title of the jira.
{quote}we should tune the indentation for 237 and adding extra indents to the 
following lines of the constructor.
{quote}
I checked the indentation and it seems to be correct to me. Could you please 
check it again and let me know if I am missing something.

Attached v2 patch with rest of the fixes.

> Refactor OpportunisticContainerAllocator
> 
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch, YARN-9859.002.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9859) Refactor OpportunisticContainerAllocator

2019-09-26 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9859:

Summary: Refactor OpportunisticContainerAllocator  (was: Code cleanup of 
OpportunisticContainerAllocator)

> Refactor OpportunisticContainerAllocator
> 
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9751) Separate queue and app ordering policy capacity scheduler configs

2019-09-26 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939064#comment-16939064
 ] 

Jonathan Hung commented on YARN-9751:
-

Upon more thought, this is technically an incompatible change. Dropping it from 
2.10.0.

> Separate queue and app ordering policy capacity scheduler configs
> -
>
> Key: YARN-9751
> URL: https://issues.apache.org/jira/browse/YARN-9751
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9751.001.patch, YARN-9751.002.patch, 
> YARN-9751.003.patch
>
>
> Right now it's not possible to specify distinct app and queue ordering 
> policies since they share the same {{ordering-policy}} suffix.
> There's already a TODO in CapacitySchedulerConfiguration for this. This Jira 
> intends to fix it.
> {noformat}
> // TODO (wangda): We need to better distinguish app ordering policy and queue
> // ordering policy's classname / configuration options, etc. And dedup code
> // if possible.{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9736) Recursively configure app ordering policies

2019-09-26 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9736:

Labels:   (was: release-blocker)

> Recursively configure app ordering policies
> ---
>
> Key: YARN-9736
> URL: https://issues.apache.org/jira/browse/YARN-9736
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9736.001.patch
>
>
> Currently app ordering policy will find confs with prefix 
> {{.ordering-policy}}. For queues with same ordering policy 
> configurations it's easier to have a queue inherit confs from its parent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9751) Separate queue and app ordering policy capacity scheduler configs

2019-09-26 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9751:

Labels:   (was: release-blocker)

> Separate queue and app ordering policy capacity scheduler configs
> -
>
> Key: YARN-9751
> URL: https://issues.apache.org/jira/browse/YARN-9751
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9751.001.patch, YARN-9751.002.patch, 
> YARN-9751.003.patch
>
>
> Right now it's not possible to specify distinct app and queue ordering 
> policies since they share the same {{ordering-policy}} suffix.
> There's already a TODO in CapacitySchedulerConfiguration for this. This Jira 
> intends to fix it.
> {noformat}
> // TODO (wangda): We need to better distinguish app ordering policy and queue
> // ordering policy's classname / configuration options, etc. And dedup code
> // if possible.{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions

2019-09-26 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939051#comment-16939051
 ] 

Jonathan Hung commented on YARN-9858:
-

Uploaded 001 patch which caches the value from conf. [~bibinchundatt] mind 
taking a look? Thanks!

> Optimize RMContext getExclusiveEnforcedPartitions 
> --
>
> Key: YARN-9858
> URL: https://issues.apache.org/jira/browse/YARN-9858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9858.001.patch
>
>
> Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a 
> hot code path, need to optimize it .
> Since AMS allocate invoked by multiple handlers locking on conf will occur
> {code}
> java.lang.Thread.State: BLOCKED (on object monitor)
>  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841)
>  - waiting to lock <0x7f1f8107c748> (a 
> org.apache.hadoop.yarn.conf.YarnConfiguration)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions

2019-09-26 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9858:

Attachment: YARN-9858.001.patch

> Optimize RMContext getExclusiveEnforcedPartitions 
> --
>
> Key: YARN-9858
> URL: https://issues.apache.org/jira/browse/YARN-9858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9858.001.patch
>
>
> Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a 
> hot code path, need to optimize it .
> Since AMS allocate invoked by multiple handlers locking on conf will occur
> {code}
> java.lang.Thread.State: BLOCKED (on object monitor)
>  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841)
>  - waiting to lock <0x7f1f8107c748> (a 
> org.apache.hadoop.yarn.conf.YarnConfiguration)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-26 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938988#comment-16938988
 ] 

Peter Bacsko commented on YARN-9841:


[~maniraj...@gmail.com] no problem, I'm fine with a separate JIRA.

I haven't had the chance to examine the mapping behaviour, hopefully tomorrow 
I'll have some time. I also asked [~Prabhu Joseph] to verify it because he's 
knowledgeable about CS. 

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938934#comment-16938934
 ] 

Hadoop QA commented on YARN-9841:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
49s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 11 new + 23 unchanged - 0 fixed = 34 total (was 23) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 40s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}149m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerOvercommit |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | YARN-9841 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981457/YARN-9841.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 782cbe466612 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 06998a1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24845/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 

[jira] [Comment Edited] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938834#comment-16938834
 ] 

Manikandan R edited comment on YARN-9841 at 9/26/19 5:29 PM:
-

Thanks [~pbacsko] for review. Addressed all of your comments. Attached 
.002.patch. 
{quote}If we have this for {{%primary_group}}, can't we just handle 
{{%secondary_group}} as well?
{quote}
Initially thought about this, but then preferred to take it in separate Jira 
for ease of tracking and to avoid confusions with description, discussions etc. 
Hope you are fine.

Also, Had a chance to look at observations raised earlier? We can track these 
issues in separate JIRA.
{quote}Can {{ctx}} ever be null? I assume this test should produce the same 
behavior each time, provided the code-under-test doesn't change. So to me it 
seems more logical to make sure that {{ctx}} is not null, which practically 
means a new assertion. Btw this applies to the piece of code above, too.
{quote}
Made changes in {{TestCapacitySchedulerQueueMappingFactory}}, but not in 
{{TestUserGroupMappingPlacementRule}} as it is commonly by various asserts 
wherein some cases ctx is null.


was (Author: maniraj...@gmail.com):
Thanks [~pbacsko] for review. Addressed all of your comments. Attached 
.002.patch. 
{quote}If we have this for {{%primary_group}}, can't we just handle 
{{%secondary_group}} as well?
{quote}
Initially thought about this, but then preferred to take it in separate for 
ease of tracking and to avoid confusions with description etc. Hope you are 
fine.

Also, Had a chance to look at observations raised earlier? We can track these 
issues in separate JIRA.
{quote}Can {{ctx}} ever be null? I assume this test should produce the same 
behavior each time, provided the code-under-test doesn't change. So to me it 
seems more logical to make sure that {{ctx}} is not null, which practically 
means a new assertion. Btw this applies to the piece of code above, too.
{quote}
Made changes in {{TestCapacitySchedulerQueueMappingFactory}}, but not in 
{{TestUserGroupMappingPlacementRule}} as it is commonly by various asserts 
wherein some cases ctx is null.

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938834#comment-16938834
 ] 

Manikandan R commented on YARN-9841:


Thanks [~pbacsko] for review. Addressed all of your comments. Attached 
.002.patch. 
{quote}If we have this for {{%primary_group}}, can't we just handle 
{{%secondary_group}} as well?
{quote}
Initially thought about this, but then preferred to take it in separate for 
ease of tracking and to avoid confusions with description etc. Hope you are 
fine.

Also, Had a chance to look at observations raised earlier? We can track these 
issues in separate JIRA.
{quote}Can {{ctx}} ever be null? I assume this test should produce the same 
behavior each time, provided the code-under-test doesn't change. So to me it 
seems more logical to make sure that {{ctx}} is not null, which practically 
means a new assertion. Btw this applies to the piece of code above, too.
{quote}
Made changes in {{TestCapacitySchedulerQueueMappingFactory}}, but not in 
{{TestUserGroupMappingPlacementRule}} as it is commonly by various asserts 
wherein some cases ctx is null.

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9859) Code cleanup of OpportunisticContainerAllocator

2019-09-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938831#comment-16938831
 ] 

Íñigo Goiri commented on YARN-9859:
---

I would rename the JIRA as Refactor more than cleanup.
A minor comment in OpportunisticContainerAllocatorAMService, we should tune the 
indentation for 237 and adding extra indents to the following lines of the 
constructor.
For OpportunisticContainerAllocator, we should change the javadoc.

As we are moving DistributedOpportunisticContainerAllocator, would you mind 
doing a pass fixing indentation all over that class?

> Code cleanup of OpportunisticContainerAllocator
> ---
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-26 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9841:
---
Attachment: YARN-9841.002.patch

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-09-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938819#comment-16938819
 ] 

Iñigo Goiri commented on YARN-9768:
---

Thanks [~maniraj...@gmail.com] for [^YARN-9768.002.patch].
As we are using getTimeDuration(), the variables should be also time durations, 
I usually do:
{code}
public static final long 
DEFAULT_RM_DELEGATION_TOKEN_RENEWER_THREAD_RETRY_INTERVAL = 
TimeUni.SECONDS.toMillis(60);
{code}

Regarding reading these variables, I prefer using the following indentation:
{code}
tokenRenewerThreadRetryInterval = conf.getTimeDuration(
YarnConfiguration.RM_DELEGATION_TOKEN_RENEWER_THREAD_RETRY_INTERVAL,
YarnConfiguration.DEFAULT_RM_DELEGATION_TOKEN_RENEWER_THREAD_RETRY_INTERVAL,
TimeUnit.MILLISECONDS);
{code}

DelegationTokenRenewer#215 should be a single line.
In DelegationTokenRenewer#227, you should do {{catch(TimeOutException toe)}} 
then add an extra {{catch(Exception e)}}.
I also think DelegationTokenRenewer#234 can be a lambda.

Avoid DelegationTokenRenewer#442, it just adds churn in an unrelated patch. 
Same for #691 and #508.

Why are we making DelegationTokenRenewer#551 a debug message? If we change 
that, let's also use logger style with {}.

DelegationTokenRenewer#1107 should be a single line. Same as 1129 and 1047 with 
the end of file.


For TestDelegationTokenRenewer, let's also avoid the changes like #169.
#1567 should be a single line. Same for 1571 and 1573.

I'm not a big fan of this long sleeps (#1591).

You have a print in 1593, which could be done properly adding a message to the 
assertTrue (which could use a extracted version of the conf).

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9699) Migration tool that help to generate CS config based on FS config

2019-09-26 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9699:
-
Summary: Migration tool that help to generate CS config based on FS config  
(was: Migration tool that help to generate CS configs based on FS)

> Migration tool that help to generate CS config based on FS config
> -
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9857) TestDelegationTokenRenewer throws NPE but tests pass

2019-09-26 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938781#comment-16938781
 ] 

Hudson commented on YARN-9857:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17395 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17395/])
YARN-9857. TestDelegationTokenRenewer throws NPE but tests pass. (ebadger: rev 
18a8c2404e10f18e3a0024753d3f8f558fe604af)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


> TestDelegationTokenRenewer throws NPE but tests pass
> 
>
> Key: YARN-9857
> URL: https://issues.apache.org/jira/browse/YARN-9857
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9857.001.patch
>
>
> {{TestDelegationTokenRenewer}} throws some NPEs:
> {code:bash}
> 2019-09-25 12:51:23,446 WARN  [pool-19-thread-2] 
> security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(945)) - Unable to 
> add the application to the delegation token renewer.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:942)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2019-09-25 12:51:23,446 DEBUG [main] util.MBeans 
> (MBeans.java:unregister(138)) - Unregistering 
> Hadoop:service=ResourceManager,name=CapacitySchedulerMetrics
> Exception in thread "pool-19-thread-2" java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:951)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
> 2019-09-25 12:51:23,447 DEBUG [main] util.MBeans 
> (MBeans.java:unregister(138)) - Unregistering 
> Hadoop:service=ResourceManager,name=MetricsSystem,sub=Stats
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2019-09-25 12:51:23,447 INFO  [main] impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
> {code}
> the RMContext dispatcher is not set for the RMMock which results in NPE 
> accessing the event handler of the dispatcher inside 
> {{DelegationTokenRenewer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9857) TestDelegationTokenRenewer throws NPE but tests pass

2019-09-26 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9857:
--
Fix Version/s: 3.3.0

> TestDelegationTokenRenewer throws NPE but tests pass
> 
>
> Key: YARN-9857
> URL: https://issues.apache.org/jira/browse/YARN-9857
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9857.001.patch
>
>
> {{TestDelegationTokenRenewer}} throws some NPEs:
> {code:bash}
> 2019-09-25 12:51:23,446 WARN  [pool-19-thread-2] 
> security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(945)) - Unable to 
> add the application to the delegation token renewer.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:942)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2019-09-25 12:51:23,446 DEBUG [main] util.MBeans 
> (MBeans.java:unregister(138)) - Unregistering 
> Hadoop:service=ResourceManager,name=CapacitySchedulerMetrics
> Exception in thread "pool-19-thread-2" java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:951)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
> 2019-09-25 12:51:23,447 DEBUG [main] util.MBeans 
> (MBeans.java:unregister(138)) - Unregistering 
> Hadoop:service=ResourceManager,name=MetricsSystem,sub=Stats
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2019-09-25 12:51:23,447 INFO  [main] impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
> {code}
> the RMContext dispatcher is not set for the RMMock which results in NPE 
> accessing the event handler of the dispatcher inside 
> {{DelegationTokenRenewer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9857) TestDelegationTokenRenewer throws NPE but tests pass

2019-09-26 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938766#comment-16938766
 ] 

Eric Badger commented on YARN-9857:
---

Couldn't recreate the failure initially because apparently there are 2 
different TestDelegationTokenRenewer for some reason and I was running the 
wrong one.

+1 lgtm. Thanks for the patch, [~ahussein] and for the review [~Jim_Brennan]

I've committed this to trunk

> TestDelegationTokenRenewer throws NPE but tests pass
> 
>
> Key: YARN-9857
> URL: https://issues.apache.org/jira/browse/YARN-9857
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9857.001.patch
>
>
> {{TestDelegationTokenRenewer}} throws some NPEs:
> {code:bash}
> 2019-09-25 12:51:23,446 WARN  [pool-19-thread-2] 
> security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(945)) - Unable to 
> add the application to the delegation token renewer.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:942)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2019-09-25 12:51:23,446 DEBUG [main] util.MBeans 
> (MBeans.java:unregister(138)) - Unregistering 
> Hadoop:service=ResourceManager,name=CapacitySchedulerMetrics
> Exception in thread "pool-19-thread-2" java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:951)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
> 2019-09-25 12:51:23,447 DEBUG [main] util.MBeans 
> (MBeans.java:unregister(138)) - Unregistering 
> Hadoop:service=ResourceManager,name=MetricsSystem,sub=Stats
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2019-09-25 12:51:23,447 INFO  [main] impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
> {code}
> the RMContext dispatcher is not set for the RMMock which results in NPE 
> accessing the event handler of the dispatcher inside 
> {{DelegationTokenRenewer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-09-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938759#comment-16938759
 ] 

Manikandan R commented on YARN-9768:


[~inigoiri] [~crh] Can you review?

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9699) Migration tool that help to generate CS configs based on FS

2019-09-26 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938683#comment-16938683
 ] 

Szilard Nemeth edited comment on YARN-9699 at 9/26/19 2:41 PM:
---

Hi [~pbacsko]!
Looked into your PoC. Your approach looks good to me.
Please start to resolve the TODOs and try to reach a state of completion that 
we can commit!
Thanks!


was (Author: snemeth):
Hi [~pbacsko]!
Looked into your PoC. Your approach looks good to me.
Please start to resolve the TODOs and try to reach a state that we can commit!
Thanks!

> Migration tool that help to generate CS configs based on FS
> ---
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9699) Migration tool that help to generate CS configs based on FS

2019-09-26 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938683#comment-16938683
 ] 

Szilard Nemeth commented on YARN-9699:
--

Hi [~pbacsko]!
Looked into your PoC. Your approach looks good to me.
Please start to resolve the TODOs and try to reach a state that we can commit!
Thanks!

> Migration tool that help to generate CS configs based on FS
> ---
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9860) Enable service mode for Docker containers on YARN

2019-09-26 Thread Prabhu Joseph (Jira)
Prabhu Joseph created YARN-9860:
---

 Summary: Enable service mode for Docker containers on YARN
 Key: YARN-9860
 URL: https://issues.apache.org/jira/browse/YARN-9860
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


This task is to add support to YARN for running Docker containers in "Service 
Mode". 

Service Mode - Run the container as defined by the image, but still allow for 
injecting configuration. 

Background:
Entrypoint mode helped - now able to use the ENV and ENTRYPOINT/CMD as 
defined in the image. However, still requires modification to official images 
due to user propagation
User propagation is problematic for running a secure cluster with sssd

Implementation:
Must be enabled via c-e.cfg (example: docker.service-mode.allowed=true)
Must be requested at runtime - (example: 
YARN_CONTAINER_RUNTIME_DOCKER_SERVICE_MODE=true)
Entrypoint mode is default enabled for this mode (If Service Mode is 
requested, YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE should be set to 
true)
Writable log mount will not be added - stdout logging may still work 
with entrypoint mode - remove the writable bind mounts
User and groups will not be propagated (now: docker run --user nobody 
--group-add=nobody  , after: docker run  )
Read-only resources mounted at the file level, files get chmod 777, 
parent directory only accessible by the run as user.


cc [~shaneku...@gmail.com]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9760) Support configuring application priorities on a workflow level

2019-09-26 Thread Varun Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938635#comment-16938635
 ] 

Varun Saxena commented on YARN-9760:


MAPREDUCE-7231 tracks hadoop-mapreduce-client-jobclient timeout. Related to 
TestPipeApplication. Going by MAPREDUCE-7036, the ASF license error may also be 
related.
TestFairSchedulerPreemption failure is tracked by YARN-9333.

Will fix some of the checkstyle issues post the review.

> Support configuring application priorities on a workflow level
> --
>
> Key: YARN-9760
> URL: https://issues.apache.org/jira/browse/YARN-9760
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jonathan Hung
>Assignee: Varun Saxena
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9760.01.patch, YARN-9760.02.patch
>
>
> Currently priorities are submitted on an application level, but for end users 
> it's common to submit workloads to YARN at a workflow level. This jira 
> proposes a feature to store workflow id + priority mappings on RM (similar to 
> queue mappings). If app is submitted with a certain workflow id (as set in 
> application submission context) RM will override this app's priority with the 
> one defined in the mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9699) Migration tool that help to generate CS configs based on FS

2019-09-26 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938634#comment-16938634
 ] 

Szilard Nemeth commented on YARN-9699:
--

Thanks [~pbacsko] for summing this up. 
Couple of things to add: 
1. All related newly filed jiras should be either added as a sub-jira or a 
related jira of YARN-9698.
2. Another thing CS behaves differently is with the uniqueness of leaf queues 
instead of queue paths. Will file a jira for this soon. 
3. I also agree with your points about yarn-site.xml vs. capacity-scheduler.xml.
4. [~sunilg] also mentioned that Stage #1 should obviously contain mapping the 
queue hierarchies and this should be the first priority. AFAIK, your tool can 
already do that and others on top as well.
5. We also talked about when to show warnings and when to stop the conversion, 
so we should define these criteria.

Thanks!



> Migration tool that help to generate CS configs based on FS
> ---
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9859) Code cleanup of OpportunisticContainerAllocator

2019-09-26 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938613#comment-16938613
 ] 

Abhishek Modi commented on YARN-9859:
-

[~elgoiri] could you please review this whenever you get some time. Thanks.

> Code cleanup of OpportunisticContainerAllocator
> ---
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9859) Code cleanup of OpportunisticContainerAllocator

2019-09-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938607#comment-16938607
 ] 

Hadoop QA commented on YARN-9859:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
53s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
42s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 51s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 4 new + 
30 unchanged - 2 fixed = 34 total (was 32) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
52s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
33s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 
52s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}183m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | YARN-9859 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981428/YARN-9859.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1bb30b051950 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh 

[jira] [Commented] (YARN-9362) Code cleanup in TestNMLeveldbStateStoreService

2019-09-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938577#comment-16938577
 ] 

Hadoop QA commented on YARN-9362:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
51s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 7 unchanged - 6 fixed = 7 total (was 13) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
25s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | YARN-9362 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981432/YARN-9362.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8bfcb939d540 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7b6219a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24843/testReport/ |
| Max. process+thread count | 307 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24843/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Commented] (YARN-2442) ResourceManager JMX UI does not give HA State

2019-09-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938550#comment-16938550
 ] 

Hadoop QA commented on YARN-2442:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} YARN-2442 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-2442 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913203/YARN-2442.02.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24844/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> ResourceManager JMX UI does not give HA State
> -
>
> Key: YARN-2442
> URL: https://issues.apache.org/jira/browse/YARN-2442
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Nishan Shetty
>Assignee: Rohith Sharma K S
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-2442.patch, YARN-2442.02.patch
>
>
> ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, 
> STOPPED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2442) ResourceManager JMX UI does not give HA State

2019-09-26 Thread Cyrus Jackson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938545#comment-16938545
 ] 

Cyrus Jackson commented on YARN-2442:
-

[~rohithsharma] if you are not actively working on this, could I create a patch 
for latest trunk. Thanks.

> ResourceManager JMX UI does not give HA State
> -
>
> Key: YARN-2442
> URL: https://issues.apache.org/jira/browse/YARN-2442
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Nishan Shetty
>Assignee: Rohith Sharma K S
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-2442.patch, YARN-2442.02.patch
>
>
> ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, 
> STOPPED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9362) Code cleanup in TestNMLeveldbStateStoreService

2019-09-26 Thread Denes Gerencser (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Gerencser updated YARN-9362:
--
Attachment: YARN-9362.001.patch

> Code cleanup in TestNMLeveldbStateStoreService
> --
>
> Key: YARN-9362
> URL: https://issues.apache.org/jira/browse/YARN-9362
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Denes Gerencser
>Priority: Minor
> Attachments: YARN-9362.001.patch
>
>
> There are many ways to improve TestNMLeveldbStateStoreService: 
> 1. RecoveredContainerState fields are asserted many times repeatedly. Some 
> simple method extractions would definitely make this more readable.
> 2. The tests are very long and hard to read in general: Again, finding how 
> methods could be extracted to avoid code repetition could help. 
> 3. You name it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9362) Code cleanup in TestNMLeveldbStateStoreService

2019-09-26 Thread Denes Gerencser (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Gerencser reassigned YARN-9362:
-

Assignee: Denes Gerencser

> Code cleanup in TestNMLeveldbStateStoreService
> --
>
> Key: YARN-9362
> URL: https://issues.apache.org/jira/browse/YARN-9362
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Denes Gerencser
>Priority: Minor
>
> There are many ways to improve TestNMLeveldbStateStoreService: 
> 1. RecoveredContainerState fields are asserted many times repeatedly. Some 
> simple method extractions would definitely make this more readable.
> 2. The tests are very long and hard to read in general: Again, finding how 
> methods could be extracted to avoid code repetition could help. 
> 3. You name it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9859) Code cleanup of OpportunisticContainerAllocator

2019-09-26 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9859:

Attachment: YARN-9859.001.patch

> Code cleanup of OpportunisticContainerAllocator
> ---
>
> Key: YARN-9859
> URL: https://issues.apache.org/jira/browse/YARN-9859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9859.001.patch
>
>
> Right now OpportunisticContainerAllocator is written mainly for Distributed 
> Scheduling and schedules Opportunistic containers on limited set of nodes. As 
> part of this jira, we are going to make OpportunisticContainerAllocator as an 
> abstract class and DistributedOpportunisticContainerAllocator as actual 
> implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9859) Code cleanup of OpportunisticContainerAllocator

2019-09-26 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9859:
---

 Summary: Code cleanup of OpportunisticContainerAllocator
 Key: YARN-9859
 URL: https://issues.apache.org/jira/browse/YARN-9859
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Right now OpportunisticContainerAllocator is written mainly for Distributed 
Scheduling and schedules Opportunistic containers on limited set of nodes. As 
part of this jira, we are going to make OpportunisticContainerAllocator as an 
abstract class and DistributedOpportunisticContainerAllocator as actual 
implementation. This would be prerequisite for YARN-9697.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions

2019-09-26 Thread Bibin Chundatt (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin Chundatt updated YARN-9858:
-
Description: 
Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a hot 
code path, need to optimize it .

Since AMS allocate invoked by multiple handlers locking on conf will occur

{code}
java.lang.Thread.State: BLOCKED (on object monitor)
 at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841)
 - waiting to lock <0x7f1f8107c748> (a 
org.apache.hadoop.yarn.conf.YarnConfiguration)
 at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214)
 at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268)
{code}

  was:Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is 
a hot code path, need to optimize it.


> Optimize RMContext getExclusiveEnforcedPartitions 
> --
>
> Key: YARN-9858
> URL: https://issues.apache.org/jira/browse/YARN-9858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
>
> Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a 
> hot code path, need to optimize it .
> Since AMS allocate invoked by multiple handlers locking on conf will occur
> {code}
> java.lang.Thread.State: BLOCKED (on object monitor)
>  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841)
>  - waiting to lock <0x7f1f8107c748> (a 
> org.apache.hadoop.yarn.conf.YarnConfiguration)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938379#comment-16938379
 ] 

Hadoop QA commented on YARN-9841:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
59s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 11 new + 23 unchanged - 0 fixed = 34 total (was 23) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 
19s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}144m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | YARN-9841 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981398/YARN-9841.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0852d266cdcc 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 587a8ee |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24841/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24841/testReport/ |
| Max. process+thread count | 822 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 

[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938360#comment-16938360
 ] 

Hadoop QA commented on YARN-9841:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 8 new + 5 unchanged - 0 fixed = 13 total (was 5) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 12 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 57s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 80m 
46s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | YARN-9841 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981220/YARN-9841.junit.patch 
|
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 69f4f248f9ad 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 587a8ee |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24840/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/24840/artifact/out/whitespace-eol.txt
 |
|  Test Results | 

[jira] [Assigned] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions

2019-09-26 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung reassigned YARN-9858:
---

Assignee: Jonathan Hung

> Optimize RMContext getExclusiveEnforcedPartitions 
> --
>
> Key: YARN-9858
> URL: https://issues.apache.org/jira/browse/YARN-9858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
>
> Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a 
> hot code path, need to optimize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions

2019-09-26 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9858:

Target Version/s: 2.10.0

> Optimize RMContext getExclusiveEnforcedPartitions 
> --
>
> Key: YARN-9858
> URL: https://issues.apache.org/jira/browse/YARN-9858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
>
> Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a 
> hot code path, need to optimize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions

2019-09-26 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9858:

Labels: release-blocker  (was: )

> Optimize RMContext getExclusiveEnforcedPartitions 
> --
>
> Key: YARN-9858
> URL: https://issues.apache.org/jira/browse/YARN-9858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
>
> Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a 
> hot code path, need to optimize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions

2019-09-26 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9858:
---

 Summary: Optimize RMContext getExclusiveEnforcedPartitions 
 Key: YARN-9858
 URL: https://issues.apache.org/jira/browse/YARN-9858
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Hung


Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a hot 
code path, need to optimize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-26 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938312#comment-16938312
 ] 

Jonathan Hung commented on YARN-9730:
-

Sure. Thanks [~bibinchundatt] for the comment. I will address it in YARN-9858.

> Support forcing configured partitions to be exclusive based on app node label
> -
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.addendum, 
> YARN-9730.001.patch, YARN-9730.002.addendum, YARN-9730.002.patch, 
> YARN-9730.003.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive) 
> partition P (by setting app submission context's node label set to P). Node 
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every 
> application in X, and every scheduler key in this application, and fails to 
> allocate each time since the app's requested label and the node's label don't 
> match. This causes huge performance degradation when number of apps in X is 
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
> partition P is "forced-exclusive", then:
>  * 1a. If app sets its submission context's node label to P, all its resource 
> requests will be overridden to P
>  * 1b. If app sets its submission context's node label Q, any of its resource 
> requests whose labels are P will be overridden to Q
>  * 2. In the scheduler, we add apps with node label expression P to a 
> separate data structure. When a node in partition P heartbeats to scheduler, 
> we only try to schedule apps in this data structure. When a node in partition 
> Q heartbeats to scheduler, we schedule the rest of the apps as normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9699) Migration tool that help to generate CS configs based on FS

2019-09-26 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938306#comment-16938306
 ] 

Peter Bacsko edited comment on YARN-9699 at 9/26/19 6:25 AM:
-

Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth].

A couple of things to add:
* Development of this tool should happen in stages. Stage #1 converter will not 
support every FS feature/property, simply because it's missing in CS. Those 
which are missing should be added gradually (see YARN-9840 and YARN-9841 for 
example). What we have in the POC is already a good starting point.

* Users should be able to define a "rule" file (simple property file), which 
imposes certain limits on various things (eg. no more than 100 queue are 
allowed on the same level) and also defines what should happen if the tool 
encounters an unsupported feature. For example, CS does not support max running 
apps per user, so we can have the following settings:
{noformat}
maximumQueuesPerLevel=100
maxAppsPerUser=warning
{noformat}
In this case, "warning" means that the user will be warned that this particular 
setting is not supported in CS and won't be migrated. Another possible setting 
could be "error", which aborts the conversion immediately with an error message.

* We also need strict validation of certain things: the sum of capacities are 
100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is 
allowed with the same name.

* [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated 
and not {{yarn-site.xml}} (only the scheduler class should be changed in this 
file). Looking at the current settings and mappings, I believe this is not 
possible, because there are properties that should be placed in the 
{{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even 
if those properties can reside in {{capacity-scheduler.xml}} (someone must 
confirm this), their FS counterpart should be removed from the site config.

* Output of the tool is most likely going to be file/files, but having stdout 
as an option is nice to have.

Not sure if I missed something, feel free to correct me if I'm wrong.


was (Author: pbacsko):
Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth].

A couple of things to add:
* Development of this tool should happen in stages. Stage #1 converter will not 
support every FS feature/property, simply because it's missing in CS. Those 
which are missing should be added gradually (see YARN-9840 and YARN-9841 for 
example). What we have in the POC is already a good starting point.

* Users should be able to define a "rule" file (simple property file), which 
imposes certain limits on various things (eg. no more than 100 queue are 
allowed on the same level) and also defines what should happen if the tool 
encounters an unsupported feature. For example, CS does not support max running 
apps per user, so we can have the following settings:
{noformat}
maximumQueuesPerLevel=100
maxAppsPerUser=warning
{noformat}
In this case, "warning" means that the user will be warned that this particular 
setting is not supported in CS and won't be migrated. Another possible setting 
could be "error", which aborts the conversion immediately with an error message.

* We also need strict validation of certain things: the sum of capacities are 
100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is 
allowed with the same name.

* [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated 
and not {{yarn-site.xml}} (only the scheduler class should be changed in this 
file). Looking at the current settings and mappings, I believe this is not 
possible, because there are properties that should be placed in the 
{{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even 
if those properties can reside in {{capacity-scheduler.xml}} (someone must 
confirm this), their FS counterpart should be removed from the site config.

* Output of the tool is most likely going to be file/files, but having stdout 
as an option is preferred.

Not sure if I missed something, feel free to correct me if I'm wrong.

> Migration tool that help to generate CS configs based on FS
> ---
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9849) Leaf queues not inheriting parent queue status after adding status as “RUNNING” and thereafter, commenting the same in capacity-scheduler.xml

2019-09-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938311#comment-16938311
 ] 

Hadoop QA commented on YARN-9849:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 51s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m  2s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}133m 41s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueState |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | YARN-9849 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981391/YARN-9849.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8a4b8b777bd4 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 606e341 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24839/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 

[jira] [Comment Edited] (YARN-9699) Migration tool that help to generate CS configs based on FS

2019-09-26 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938306#comment-16938306
 ] 

Peter Bacsko edited comment on YARN-9699 at 9/26/19 6:24 AM:
-

Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth].

A couple of things to add:
* Development of this tool should happen in stages. Stage #1 converter will not 
support every FS feature/property, simply because it's missing in CS. Those 
which are missing should be added gradually (see YARN-9840 and YARN-9841 for 
example). What we have in the POC is already a good starting point.

* Users should be able to define a "rule" file (simple property file), which 
imposes certain limits on various things (eg. no more than 100 queue are 
allowed on the same level) and also defines what should happen if the tool 
encounters an unsupported feature. For example, CS does not support max running 
apps per user, so we can have the following settings:
{noformat}
maximumQueuesPerLevel=100
maxAppsPerUser=warning
{noformat}
In this case, "warning" means that the user will be warned that this particular 
setting is not supported in CS and won't be migrated. Another possible setting 
could be "error", which aborts the conversion immediately with an error message.

* We also need strict validation of certain things: the sum of capacities are 
100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is 
allowed with the same name.

* [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated 
and not {{yarn-site.xml}} (only the scheduler class should be changed in this 
file). Looking at the current settings and mappings, I believe this is not 
possible, because there are properties that should be placed in the 
{{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even 
if those properties can reside in {{capacity-scheduler.xml}} (someone must 
confirm this), their FS counterpart should be removed from the site config.

* Output of the tool is most likely going to be file/files, but having stdout 
as an option is preferred.

Not sure if I missed something, feel free to correct me if I'm wrong.


was (Author: pbacsko):
Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth].

A couple of things to add:
* Development of this tool should happen in stages. Stage #1 converter will not 
support every FS feature/property, simply because it's missing in CS. Those 
which are missing should be added gradually (see YARN-9840 and YARN-9841 for 
example). What we have in the POC is already a good starting point.

* Users should be able to define a "rule" file, which imposes certain limits on 
various things (eg. no more than 100 queue are allowed on the same level) and 
also defines what should happen if the tool encounters an unsupported feature. 
For example, CS does not support max running apps per user, so we can have the 
following settings:
{noformat}
maximumQueuesPerLevel=100
maxAppsPerUser=warning
{noformat}
In this case, "warning" means that the user will be warned that this particular 
setting is not supported in CS and won't be migrated. Another possible setting 
could be "error", which aborts the conversion immediately with an error message.

* We also need strict validation of certain things: the sum of capacities are 
100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is 
allowed with the same name.

* [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated 
and not {{yarn-site.xml}} (only the scheduler class should be changed in this 
file). Looking at the current settings and mappings, I believe this is not 
possible, because there are properties that should be placed in the 
{{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even 
if those properties can reside in {{capacity-scheduler.xml}} (someone must 
confirm this), their FS counterpart should be removed from the site config.

* Output of the tool is most likely going to be file/files, but having stdout 
as an option is preferred.

Not sure if I missed something, feel free to correct me if I'm wrong.

> Migration tool that help to generate CS configs based on FS
> ---
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9699) Migration tool that help to generate CS configs based on FS

2019-09-26 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938306#comment-16938306
 ] 

Peter Bacsko edited comment on YARN-9699 at 9/26/19 6:14 AM:
-

Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth].

A couple of things to add:
* Development of this tool should happen in stages. Stage #1 converter will not 
support every FS feature/property, simply because it's missing in CS. Those 
which are missing should be added gradually (see YARN-9840 and YARN-9841 for 
example). What we have in the POC is already a good starting point.

* Users should be able to define a "rule" file, which imposes certain limits on 
various things (eg. no more than 100 queue are allowed on the same level) and 
also defines what should happen if the tool encounters an unsupported feature. 
For example, CS does not support max running apps per user, so we can have the 
following settings:
{noformat}
maximumQueuesPerLevel=100
maxAppsPerUser=warning
{noformat}
In this case, "warning" means that the user will be warned that this particular 
setting is not supported in CS and won't be migrated. Another possible setting 
could be "error", which aborts the conversion immediately with an error message.

* We also need strict validation of certain things: the sum of capacities are 
100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is 
allowed with the same name.

* [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated 
and not {{yarn-site.xml}} (only the scheduler class should be changed in this 
file). Looking at the current settings and mappings, I believe this is not 
possible, because there are properties that should be placed in the 
{{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even 
if those properties can reside in {{capacity-scheduler.xml}} (someone must 
confirm this), their FS counterpart should be removed from the site config.

* Output of the tool is most likely going to be file/files, but having stdout 
as an option is preferred.

Not sure if I missed something, feel free to correct me if I'm wrong.


was (Author: pbacsko):
Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth].

A couple of things to add:
* Development of this tool should happen in stages. Stage #1 converter will not 
support every FS feature/property, simply because it's missing in CS. Those 
which are missing should be added gradually (see YARN-9840 and YARN9841 for 
example). What we have in the POC is already a good starting point.

* Users should be able to define a "rule" file, which imposes certain limits on 
various things (eg. no more than 100 queue are allowed on the same level) and 
also defines what should happen if the tool encounters an unsupported feature. 
For example, CS does not support max running apps per user, so we can have the 
following settings:
{noformat}
maximumQueuesPerLevel=100
maxAppsPerUser=warning
{noformat}
In this case, "warning" means that the user will be warned that this particular 
setting is not supported in CS and won't be migrated. Another possible setting 
could be "error", which aborts the conversion immediately with an error message.

* We also need strict validation of certain things: the sum of capacities are 
100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is 
allowed with the same name.

* [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated 
and not {{yarn-site.xml}} (only the scheduler class should be changed in this 
file). Looking at the current settings and mappings, I believe this is not 
possible, because there are properties that should be placed in the 
{{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even 
if those properties can reside in {{capacity-scheduler.xml}} (someone must 
confirm this), their FS counterpart should be removed from the site config.

* Output of the tool is most likely going to be file/files, but having stdout 
as an option is preferred.

Not sure if I missed something, feel free to correct me if I'm wrong.

> Migration tool that help to generate CS configs based on FS
> ---
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9699) Migration tool that help to generate CS configs based on FS

2019-09-26 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938306#comment-16938306
 ] 

Peter Bacsko edited comment on YARN-9699 at 9/26/19 6:08 AM:
-

Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth].

A couple of things to add:
* Development of this tool should happen in stages. Stage #1 converter will not 
support every FS feature/property, simply because it's missing in CS. Those 
which are missing should be added gradually (see YARN-9840 and YARN9841 for 
example). What we have in the POC is already a good starting point.

* Users should be able to define a "rule" file, which imposes certain limits on 
various things (eg. no more than 100 queue are allowed on the same level) and 
also defines what should happen if the tool encounters an unsupported feature. 
For example, CS does not support max running apps per user, so we can have the 
following settings:
{noformat}
maximumQueuesPerLevel=100
maxAppsPerUser=warning
{noformat}
In this case, "warning" means that the user will be warned that this particular 
setting is not supported in CS and won't be migrated. Another possible setting 
could be "error", which aborts the conversion immediately with an error message.

* We also need strict validation of certain things: the sum of capacities are 
100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is 
allowed with the same name.

* [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated 
and not {{yarn-site.xml}} (only the scheduler class should be changed in this 
file). Looking at the current settings and mappings, I believe this is not 
possible, because there are properties that should be placed in the 
{{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even 
if those properties can reside in {{capacity-scheduler.xml}} (someone must 
confirm this), their FS counterpart should be removed from the site config.

* Output of the tool is most likely going to be file/files, but having stdout 
as an option is preferred.

Not sure if I missed something, feel free to correct me if I'm wrong.


was (Author: pbacsko):
Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth].

A couple of things to add:
* Development of this tool should happen in stages. Stage #1 converter will not 
support every FS feature/property, simply because it's missing in CS. Those 
which are missing should be added gradually (see YARN-9840 and YARN9841 for 
example). What we have in the POC is already a good starting point.

* Users should be able to define a "rule" file, which imposes certain limits on 
various things (eg. no more than 100 queue are allowed on the same level) and 
also defines what should happen if the tool encounters an unsupported feature. 
For example, CS does not support max running apps per user, so we can have the 
following settings:
{noformat}
maximumQueuesPerLevel=100
maxAppsPerUser=warning
{noformat}
In this case, "warning" means that the user will be warned that this particular 
setting is not supported in CS and won't be migrated. Another possible setting 
could be "error", which aborts the conversion immediately with an error message.

* We also need strict validation of certain things: the sum of capacities are 
100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is 
allowed with the same name.

* [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated 
and not {{yarn-site.xml}} (only the scheduler class should be changed in this 
file). Looking at the current settings and mappings, I believe this is not 
possible, because there are properties that should be placed in the 
{{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even 
if those properties can reside in {{capacity-scheduler.xml}} (someone must 
confirm this), their FS counterpart should be removed from the site config.

Not sure if I missed something, feel free to correct me if I'm wrong.

> Migration tool that help to generate CS configs based on FS
> ---
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9699) Migration tool that help to generate CS configs based on FS

2019-09-26 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938306#comment-16938306
 ] 

Peter Bacsko commented on YARN-9699:


Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth].

A couple of things to add:
* Development of this tool should happen in stages. Stage #1 converter will not 
support every FS feature/property, simply because it's missing in CS. Those 
which are missing should be added gradually (see YARN-9840 and YARN9841 for 
example). What we have in the POC is already a good starting point.

* Users should be able to define a "rule" file, which imposes certain limits on 
various things (eg. no more than 100 queue are allowed on the same level) and 
also defines what should happen if the tool encounters an unsupported feature. 
For example, CS does not support max running apps per user, so we can have the 
following settings:
{noformat}
maximumQueuesPerLevel=100
maxAppsPerUser=warning
{noformat}
In this case, "warning" means that the user will be warned that this particular 
setting is not supported in CS and won't be migrated. Another possible setting 
could be "error", which aborts the conversion immediately with an error message.

* We also need strict validation of certain things: the sum of capacities are 
100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is 
allowed with the same name.

* [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated 
and not {{yarn-site.xml}} (only the scheduler class should be changed in this 
file). Looking at the current settings and mappings, I believe this is not 
possible, because there are properties that should be placed in the 
{{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even 
if those properties can reside in {{capacity-scheduler.xml}} (someone must 
confirm this), their FS counterpart should be removed from the site config.

Not sure if I missed something, feel free to correct me if I'm wrong.

> Migration tool that help to generate CS configs based on FS
> ---
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-26 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938302#comment-16938302
 ] 

Bibin Chundatt edited comment on YARN-9730 at 9/26/19 6:00 AM:
---

[~jhung]

Thank you for working on this. Sorry to come in really late too ..

{code}
240   if (ResourceRequest.ANY.equals(req.getResourceName())) {
241 SchedulerUtils.enforcePartitionExclusivity(req,
242 getRmContext().getExclusiveEnforcedPartitions(),
243 asc.getNodeLabelExpression());
244   }
{code}

Configuration query on the AM allocation flow is going to be costly which i 
observed while evaluating the performance..
Could you optimize {{getRmContext().getExclusiveEnforcedPartitions()}}, since 
this is going to be invoked for every *request*






was (Author: bibinchundatt):
[~jhung]

Thank you for working on this. Sorry to come in really late too ..

{quote}
240   if (ResourceRequest.ANY.equals(req.getResourceName())) {
241 SchedulerUtils.enforcePartitionExclusivity(req,
242 getRmContext().getExclusiveEnforcedPartitions(),
243 asc.getNodeLabelExpression());
244   }
{quote}

Configuration query on the AM allocation flow is going to be costly which i 
observed while evaluating the performance..
Could you optimize {{getRmContext().getExclusiveEnforcedPartitions()}}, since 
this is going to be invoked for every *request*





> Support forcing configured partitions to be exclusive based on app node label
> -
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.addendum, 
> YARN-9730.001.patch, YARN-9730.002.addendum, YARN-9730.002.patch, 
> YARN-9730.003.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive) 
> partition P (by setting app submission context's node label set to P). Node 
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every 
> application in X, and every scheduler key in this application, and fails to 
> allocate each time since the app's requested label and the node's label don't 
> match. This causes huge performance degradation when number of apps in X is 
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
> partition P is "forced-exclusive", then:
>  * 1a. If app sets its submission context's node label to P, all its resource 
> requests will be overridden to P
>  * 1b. If app sets its submission context's node label Q, any of its resource 
> requests whose labels are P will be overridden to Q
>  * 2. In the scheduler, we add apps with node label expression P to a 
> separate data structure. When a node in partition P heartbeats to scheduler, 
> we only try to schedule apps in this data structure. When a node in partition 
> Q heartbeats to scheduler, we schedule the rest of the apps as normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org